Closest Pair, Part 2

February 13, 2015

In a previous exercise we studied the brute-force method for finding the closest pair of points in a set of points by forming all pairs, computing the distance between each of them, and choosing the smallest. That algorithm has time complexity O(n²). Today we will look at a divide-and-conquer algorithm that has time complexity O(n log n.

The divide-and-conquer algorithm sorts the pairs along their x-coordinates, splits the list of pairs in two, recursively finds the closest pair in the two halves, then compares all points for the closest pair that crosses the dividing line between the two sets of points, taking the minimum of the three possibilities. The third possibility is the tricky one. It won’t do to consider all possible pairs. Instead, we consider only those points less than d distance from the dividing line, where d is the minimum of the distances of the two recursive calls. It can be proved, though we won’t do so here, that the third step takes only linear time, so the entire algorithm is O(n log n).

Your task is to write a program that computes the closest pair of points in an input set using the divide-and-conquer algorithm. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

2 Comments »

2 Responses to “Closest Pair, Part 2”

matthew said
February 13, 2015 at 9:25 AM
I was jumping the gun a bit with my solution to Part I.

Do you use the fact that the ys list is sorted?

Mike said

February 15, 2015 at 11:10 PM

from collections import namedtuple
from itertools import combinations, product

class Point(namedtuple("Point_", "x y")):
    def mag(self):
        return (self.x ** 2 + self.y ** 2) ** 0.5
    def __sub__(self, other):
        return Point(self.x - other.x, self.y - other.y)

def closest_pair2(points):
    def cp(ps):
        if len(ps) <= 3:
            d, pr = min(((p - q).mag(), (p, q)) for p,q in combinations(ps, 2))

        else:
            m = len(ps) // 2
            mid = ps[m]

            d, pr = min(cp(ps[:m]), cp(ps[m:]))

            for p in ps[m-1::-1]:
                if p.x < mid.x - d: break
                
                for q in ps[m:]:
                    if q.x > p.x + d: break
                    
                    if p.y - d <= q.y <= p.y + d:
                        td = (p - q).mag()
                        if td < d:
                            d = td
                            pr = p, q

        return d, pr

    return cp(sorted(points))

Takes about 4.5 seconds to find closest pair in a set of 1000000 points–much faster than brute-force method.
Example:

>>> from random import randrange
>>> from timeit import timeit
>>> def random_point(a, b=None):
    return Point(randrange(a,b), randrange(a,b))

>>> points = set()
>>> while len(points) < 1000000:
    points.add(random_point(5000000))

>>> timeit(stmt="print(closest_pair2(points))", setup="from __main__ import points, closest_pair2", number=1)
(6.4031242374328485, (Point(x=185209, y=328746), Point(x=185214, y=328750)))
4.447064363699084
>>>

Programming Praxis