Closest Pair, Part 2
February 13, 2015
We begin by noting that a point can be represented as an imaginary number, with its x-coordinate in the real part and its y-coordinate in the imaginary part, so that the distance between two points is just the magnitude of their difference. Thus:
> (define p1 (make-rectangular 0 0))
> (define p2 (make-rectangular 1 1))
> (real-part p1)
0
> (imag-part p2)
1
> (magnitude (- p1 p2))
1.4142135623730951
Then we can generate a random set of points as in the previous exercise:
(define max-coord 1000000) (define infinity (* max-coord max-coord)) (define (rand-points n) (define (r) (randint max-coord)) (do ((n n (- n 1)) (ps (list) (cons (make-rectangular (r) (r)) ps))) ((zero? n) ps)))
Our first step is to rewrite the brute-force algorithm using the new representation:
(define (brute-force-closest-pair ps) (let ((min-dist infinity) (min-pair (list))) (do ((ps ps (cdr ps))) ((null? (cdr ps))) (do ((qs (cdr ps) (cdr qs))) ((null? qs)) (let ((d (magnitude (- (car ps) (car qs))))) (when (< d min-dist) (set! min-dist d) (set! min-pair (cons (car ps) (list (car qs)))))))) (values min-dist min-pair)))
The closest-pair
function takes a list of points, then sorts them on their x-coordinates and again on their y-coordinates and passes the two lists of points to an auxiliary function:
(define (closest-pair ps) (closest-pair-aux (sort (lambda (a b) (< (real-part a) (real-part b))) ps) (sort (lambda (a b) (< (imag-part a) (imag-part b))) ps)))
Now things get hard. If the list of points is small, we call the brute-force algorithm since it will be faster than all the recursive calls of the divide-and-conquer algorithm. Otherwise, we split the list of points in two on its x-coordinate, recursively find the closest pairs in the two pieces, then find the closest crossover pair:
(define (closest-pair-aux xs ys) (if (< (length xs) 10) (brute-force-closest-pair xs) (let* ((len (length xs)) (mid (quotient len 2)) (x-left (take mid xs)) (x-right (drop mid xs)) (y-left (list-of p (p in ys) (< (real-part p) (real-part (car x-right)))))) (y-right (list-of p (p in ys) (>= (real-part p) (real-part (car x-right)))))) (let-values (((left-dist left-pair) (closest-pair-aux x-left y-left)) ((right-dist right-pair) (closest-pair-aux x-right y-right))) (let ((min-dist (if (< left-dist right-dist) left-dist right-dist)) (min-pair (if (< left-dist right-dist) left-pair right-pair))) (let ((close-points (list-of p (p in ys) (< (magnitude (- p (car x-right))) min-dist)))) (if (< (length close-points) 2) (values min-dist min-pair) (let-values (((close-dist close-pair) (brute-force-closest-pair close-points))) (if (< min-dist close-dist) (values min-dist min-pair) (values close-dist close-pair))))))))))
Let's look at that closely. The input lists are split on the midpoint: x-left and x-right are the left and right halves of the input list sorted on x-coordinate, and y-left and y-right are the left and right halves of the input list sorted on y-coordinate, with (real-part (car x-right)) as the dividing x-coordinate. Then left-dist, left-pair and right-dist, right-pair are the minimums of the left and right halves, calculated recursively, and min-dist, min-pair is the minimum of the two halves.
The remaining code calculates the minimum of the set of crossover pairs: close-points is the set of points within min-dist of the dividing x-coordinate, the minimum of that set of points is calculated using the brute-force algorithm, and replaces the previous minimum if it is smaller.
Here are some examples:
> (define ps (rand-points 1000)) > (time (call-with-values (lambda () (brute-force-closest-pair ps)) (lambda (min-dist min-pair) (display min-dist) (display " ") (display min-pair) (newline)))) 1088.9173522356966 (667333+162783i 666808+163737i) cpu time: 1727 real time: 1846 gc time: 390 > (time (call-with-values (lambda () (closest-pair ps)) (lambda (min-dist min-pair) (display min-dist) (display " ") (display min-pair) (newline)))) 1088.9173522356966 (666808+163737i 667333+162783i) cpu time: 46 real time: 49 gc time: 12
The results of the two algorithms are the same, even though the points appear in reverse order. The divide-and-conquer algorithm is very much faster than the brute-force algorithm.
We used many functions from the Standard Prelude. It seems that codepad.org
has lost the ability to save pastes, so the code is assembled on the next page.
I was jumping the gun a bit with my solution to Part I.
Do you use the fact that the ys list is sorted?
Takes about 4.5 seconds to find closest pair in a set of 1000000 points–much faster than brute-force method.
Example: