Closest Pair, Part 2
February 13, 2015
We begin by noting that a point can be represented as an imaginary number, with its x-coordinate in the real part and its y-coordinate in the imaginary part, so that the distance between two points is just the magnitude of their difference. Thus:
> (define p1 (make-rectangular 0 0))
> (define p2 (make-rectangular 1 1))
> (real-part p1)
0
> (imag-part p2)
1
> (magnitude (- p1 p2))
1.4142135623730951
Then we can generate a random set of points as in the previous exercise:
(define max-coord 1000000)
(define infinity (* max-coord max-coord))
(define (rand-points n)
(define (r) (randint max-coord))
(do ((n n (- n 1))
(ps (list) (cons (make-rectangular (r) (r)) ps)))
((zero? n) ps)))
Our first step is to rewrite the brute-force algorithm using the new representation:
(define (brute-force-closest-pair ps)
(let ((min-dist infinity) (min-pair (list)))
(do ((ps ps (cdr ps))) ((null? (cdr ps)))
(do ((qs (cdr ps) (cdr qs))) ((null? qs))
(let ((d (magnitude (- (car ps) (car qs)))))
(when (< d min-dist)
(set! min-dist d)
(set! min-pair (cons (car ps) (list (car qs))))))))
(values min-dist min-pair)))
The closest-pair function takes a list of points, then sorts them on their x-coordinates and again on their y-coordinates and passes the two lists of points to an auxiliary function:
(define (closest-pair ps)
(closest-pair-aux
(sort (lambda (a b) (< (real-part a) (real-part b))) ps)
(sort (lambda (a b) (< (imag-part a) (imag-part b))) ps)))
Now things get hard. If the list of points is small, we call the brute-force algorithm since it will be faster than all the recursive calls of the divide-and-conquer algorithm. Otherwise, we split the list of points in two on its x-coordinate, recursively find the closest pairs in the two pieces, then find the closest crossover pair:
(define (closest-pair-aux xs ys)
(if (< (length xs) 10) (brute-force-closest-pair xs)
(let* ((len (length xs)) (mid (quotient len 2))
(x-left (take mid xs)) (x-right (drop mid xs))
(y-left (list-of p (p in ys)
(< (real-part p) (real-part (car x-right))))))
(y-right (list-of p (p in ys)
(>= (real-part p) (real-part (car x-right))))))
(let-values (((left-dist left-pair) (closest-pair-aux x-left y-left))
((right-dist right-pair) (closest-pair-aux x-right y-right)))
(let ((min-dist (if (< left-dist right-dist) left-dist right-dist))
(min-pair (if (< left-dist right-dist) left-pair right-pair)))
(let ((close-points (list-of p (p in ys)
(< (magnitude (- p (car x-right))) min-dist))))
(if (< (length close-points) 2) (values min-dist min-pair)
(let-values (((close-dist close-pair)
(brute-force-closest-pair close-points)))
(if (< min-dist close-dist)
(values min-dist min-pair)
(values close-dist close-pair))))))))))
Let's look at that closely. The input lists are split on the midpoint: x-left and x-right are the left and right halves of the input list sorted on x-coordinate, and y-left and y-right are the left and right halves of the input list sorted on y-coordinate, with (real-part (car x-right)) as the dividing x-coordinate. Then left-dist, left-pair and right-dist, right-pair are the minimums of the left and right halves, calculated recursively, and min-dist, min-pair is the minimum of the two halves.
The remaining code calculates the minimum of the set of crossover pairs: close-points is the set of points within min-dist of the dividing x-coordinate, the minimum of that set of points is calculated using the brute-force algorithm, and replaces the previous minimum if it is smaller.
Here are some examples:
> (define ps (rand-points 1000))
> (time (call-with-values
(lambda () (brute-force-closest-pair ps))
(lambda (min-dist min-pair)
(display min-dist) (display " ")
(display min-pair) (newline))))
1088.9173522356966 (667333+162783i 666808+163737i)
cpu time: 1727 real time: 1846 gc time: 390
> (time (call-with-values
(lambda () (closest-pair ps))
(lambda (min-dist min-pair)
(display min-dist) (display " ")
(display min-pair) (newline))))
1088.9173522356966 (666808+163737i 667333+162783i)
cpu time: 46 real time: 49 gc time: 12
The results of the two algorithms are the same, even though the points appear in reverse order. The divide-and-conquer algorithm is very much faster than the brute-force algorithm.
We used many functions from the Standard Prelude. It seems that codepad.org has lost the ability to save pastes, so the code is assembled on the next page.
I was jumping the gun a bit with my solution to Part I.
Do you use the fact that the ys list is sorted?
from collections import namedtuple from itertools import combinations, product class Point(namedtuple("Point_", "x y")): def mag(self): return (self.x ** 2 + self.y ** 2) ** 0.5 def __sub__(self, other): return Point(self.x - other.x, self.y - other.y) def closest_pair2(points): def cp(ps): if len(ps) <= 3: d, pr = min(((p - q).mag(), (p, q)) for p,q in combinations(ps, 2)) else: m = len(ps) // 2 mid = ps[m] d, pr = min(cp(ps[:m]), cp(ps[m:])) for p in ps[m-1::-1]: if p.x < mid.x - d: break for q in ps[m:]: if q.x > p.x + d: break if p.y - d <= q.y <= p.y + d: td = (p - q).mag() if td < d: d = td pr = p, q return d, pr return cp(sorted(points))Takes about 4.5 seconds to find closest pair in a set of 1000000 points–much faster than brute-force method.
Example:
>>> from random import randrange >>> from timeit import timeit >>> def random_point(a, b=None): return Point(randrange(a,b), randrange(a,b)) >>> points = set() >>> while len(points) < 1000000: points.add(random_point(5000000)) >>> timeit(stmt="print(closest_pair2(points))", setup="from __main__ import points, closest_pair2", number=1) (6.4031242374328485, (Point(x=185209, y=328746), Point(x=185214, y=328750))) 4.447064363699084 >>>