Closest Pair, Part 2

February 13, 2015

We begin by noting that a point can be represented as an imaginary number, with its x-coordinate in the real part and its y-coordinate in the imaginary part, so that the distance between two points is just the magnitude of their difference. Thus:

> (define p1 (make-rectangular 0 0))
> (define p2 (make-rectangular 1 1))
> (real-part p1)
0
> (imag-part p2)
1
> (magnitude (- p1 p2))
1.4142135623730951

Then we can generate a random set of points as in the previous exercise:

(define max-coord 1000000)
(define infinity (* max-coord max-coord))

(define (rand-points n)
  (define (r) (randint max-coord))
  (do ((n n (- n 1))
       (ps (list) (cons (make-rectangular (r) (r)) ps)))
      ((zero? n) ps)))

Our first step is to rewrite the brute-force algorithm using the new representation:

(define (brute-force-closest-pair ps)
  (let ((min-dist infinity) (min-pair (list)))
    (do ((ps ps (cdr ps))) ((null? (cdr ps)))
      (do ((qs (cdr ps) (cdr qs))) ((null? qs))
        (let ((d (magnitude (- (car ps) (car qs)))))
          (when (< d min-dist)
            (set! min-dist d)
            (set! min-pair (cons (car ps) (list (car qs))))))))
    (values min-dist min-pair)))

The closest-pair function takes a list of points, then sorts them on their x-coordinates and again on their y-coordinates and passes the two lists of points to an auxiliary function:

(define (closest-pair ps)
  (closest-pair-aux
    (sort (lambda (a b) (< (real-part a) (real-part b))) ps)
    (sort (lambda (a b) (< (imag-part a) (imag-part b))) ps)))

Now things get hard. If the list of points is small, we call the brute-force algorithm since it will be faster than all the recursive calls of the divide-and-conquer algorithm. Otherwise, we split the list of points in two on its x-coordinate, recursively find the closest pairs in the two pieces, then find the closest crossover pair:

(define (closest-pair-aux xs ys)
  (if (< (length xs) 10) (brute-force-closest-pair xs)
    (let* ((len (length xs)) (mid (quotient len 2))
           (x-left (take mid xs)) (x-right (drop mid xs))
           (y-left (list-of p (p in ys)
             (< (real-part p) (real-part (car x-right))))))
           (y-right (list-of p (p in ys)
             (>= (real-part p) (real-part (car x-right))))))
      (let-values (((left-dist left-pair) (closest-pair-aux x-left y-left))
                   ((right-dist right-pair) (closest-pair-aux x-right y-right)))
        (let ((min-dist (if (< left-dist right-dist) left-dist right-dist))
              (min-pair (if (< left-dist right-dist) left-pair right-pair)))
          (let ((close-points (list-of p (p in ys)
                                (< (magnitude (- p (car x-right))) min-dist))))
            (if (< (length close-points) 2) (values min-dist min-pair)
              (let-values (((close-dist close-pair)
                            (brute-force-closest-pair close-points)))
                (if (< min-dist close-dist)
                    (values min-dist min-pair)
                    (values close-dist close-pair))))))))))

Let's look at that closely. The input lists are split on the midpoint: x-left and x-right are the left and right halves of the input list sorted on x-coordinate, and y-left and y-right are the left and right halves of the input list sorted on y-coordinate, with (real-part (car x-right)) as the dividing x-coordinate. Then left-dist, left-pair and right-dist, right-pair are the minimums of the left and right halves, calculated recursively, and min-dist, min-pair is the minimum of the two halves.

The remaining code calculates the minimum of the set of crossover pairs: close-points is the set of points within min-dist of the dividing x-coordinate, the minimum of that set of points is calculated using the brute-force algorithm, and replaces the previous minimum if it is smaller.

Here are some examples:

> (define ps (rand-points 1000))
> (time (call-with-values
    (lambda () (brute-force-closest-pair ps))
    (lambda (min-dist min-pair)
      (display min-dist) (display " ")
      (display min-pair) (newline))))
1088.9173522356966 (667333+162783i 666808+163737i)
cpu time: 1727 real time: 1846 gc time: 390
> (time (call-with-values
    (lambda () (closest-pair ps))
    (lambda (min-dist min-pair)
      (display min-dist) (display " ")
      (display min-pair) (newline))))
1088.9173522356966 (666808+163737i 667333+162783i)
cpu time: 46 real time: 49 gc time: 12

The results of the two algorithms are the same, even though the points appear in reverse order. The divide-and-conquer algorithm is very much faster than the brute-force algorithm.

We used many functions from the Standard Prelude. It seems that codepad.org has lost the ability to save pastes, so the code is assembled on the next page.

Pages: 1 2 3

2 Responses to “Closest Pair, Part 2”

  1. matthew said

    I was jumping the gun a bit with my solution to Part I.

    Do you use the fact that the ys list is sorted?

  2. Mike said
    from collections import namedtuple
    from itertools import combinations, product
    
    class Point(namedtuple("Point_", "x y")):
        def mag(self):
            return (self.x ** 2 + self.y ** 2) ** 0.5
        def __sub__(self, other):
            return Point(self.x - other.x, self.y - other.y)
    
    def closest_pair2(points):
        def cp(ps):
            if len(ps) <= 3:
                d, pr = min(((p - q).mag(), (p, q)) for p,q in combinations(ps, 2))
    
            else:
                m = len(ps) // 2
                mid = ps[m]
    
                d, pr = min(cp(ps[:m]), cp(ps[m:]))
    
                for p in ps[m-1::-1]:
                    if p.x < mid.x - d: break
                    
                    for q in ps[m:]:
                        if q.x > p.x + d: break
                        
                        if p.y - d <= q.y <= p.y + d:
                            td = (p - q).mag()
                            if td < d:
                                d = td
                                pr = p, q
    
            return d, pr
    
        return cp(sorted(points))
    

    Takes about 4.5 seconds to find closest pair in a set of 1000000 points–much faster than brute-force method.
    Example:

    >>> from random import randrange
    >>> from timeit import timeit
    >>> def random_point(a, b=None):
        return Point(randrange(a,b), randrange(a,b))
    
    >>> points = set()
    >>> while len(points) < 1000000:
        points.add(random_point(5000000))
    
    >>> timeit(stmt="print(closest_pair2(points))", setup="from __main__ import points, closest_pair2", number=1)
    (6.4031242374328485, (Point(x=185209, y=328746), Point(x=185214, y=328750)))
    4.447064363699084
    >>> 
    
    

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s