Multi-Way Merge

March 29, 2016

Let’s start by creating some lists to use in testing:

(define list1 '(2 4 6 8 10 12 14 16 18 20))
(define list2 '(3 6 9 12 15 18 21 24 27 30))
(define list3 '(5 10 15 20 25 30 35 40 45))
(define list4 '(7 14 21 28 35 42 49 56 63))
(define list5 '(11 22 33 44 55 66 77 88 99))

The first thing we need is a function that merges two sorted lists, using a user-definable comparison operator; there’s one in the Standard Prelude, but we prefer to write our own:

(define (merge2 lt? xs ys) ; merge 2 lists
  (let loop ((xs xs) (ys ys) (zs (list)))
    (cond ((and (null? xs) (null? ys))
            (reverse zs))
          ((null? xs)
            (loop xs (cdr ys) (cons (car ys) zs)))
          ((null? ys)
            (loop (cdr xs) ys (cons (car xs) zs)))
          ((lt? (car xs) (car ys))
            (loop (cdr xs) ys (cons (car xs) zs)))
          ((lt? (car ys) (car xs))
            (loop xs (cdr ys) (cons (car ys) zs)))
          (else
            (loop (cdr xs) ys (cons (car xs) zs))))))
> (merge2 < list1 list2)
(2 3 4 6 6 8 9 10 12 12 14 15 16 18 18 20 21 24 27 30)

Now we can use that function to merge multiple lists; we use the notation xs or zs to indicate a list and xss or zss to indicate a list of lists:

(define (k-merge1 lt? . xss) ; brute force 
  (let loop ((xss xss) (zs (list)))
    (if (null? xss) zs
      (loop (cdr xss) (merge2 lt? (car xss) zs)))))
> (k-merge1 < list1 list2 list3 list4 list5)
((2 3 4 5 6 6 7 8 9 10 10 11 12 12 14 14 15 15 16 18 18 20
  20 21 21 22 24 25 27 28 30 30 33 35 35 40 42 44 45 49 55 56
  63 66 77 88 99))

The tournament version works in two passes: the first pass merges adjacent input lists pairwise, then a recursive pass (with apply) merges the now-larger-but-half-as-many merged pairs. The recursion stops when only a single list remains:

(define (k-merge2 lt? . xss) ; tournament of pairs
  (if (null? xss) xss (if (null? (cdr xss)) (car xss)
    (let loop ((xss xss) (zss (list)))
      (cond ((null? xss)
              (apply k-merge2 lt? zss))
            ((null? (cdr xss))
              (apply k-merge2 lt? (cons (car xss) zss)))
            (else (loop (cddr xss)
                        (cons (merge2 lt? (car xss) (cadr xss))
                              zss))))))))
> (k-merge2 < list1 list2 list3 list4 list5)
((2 3 4 5 6 6 7 8 9 10 10 11 12 12 14 14 15 15 16 18 18 20
  20 21 21 22 24 25 27 28 30 30 33 35 35 40 42 44 45 49 55 56
  63 66 77 88 99))

The priority queue version of the program first creates a heap of k elements, then repeatedly extracts the smallest element from the heap and reheaps the remaining lists: For the priority queue we use pairing heaps from a previous exercise:

(define (k-merge3 lt? . xss) ; priority queue
  (define (less? xs ys) (lt? (car xs) (car ys)))
  (let loop ((xss xss) (pq pq-empty))
    (if (and (pair? xss) (pair? (car xss)))
        (loop (cdr xss) (pq-insert less? (car xss) pq))
        (let loop ((pq pq) (zs (list)))
          (if (pq-empty? pq) (reverse zs)
            (let ((z (car (pq-first pq)))
                  (xs (cdr (pq-first pq))))
              (if (pair? xs)
                  (loop (pq-insert less? xs (pq-rest less? pq))
                        (cons z zs))
                  (loop (pq-rest less? pq) (cons z zs)))))))))
> (k-merge3 < list1 list2 list3 list4 list5)
((2 3 4 5 6 6 7 8 9 10 10 11 12 12 14 14 15 15 16 18 18 20
  20 21 21 22 24 25 27 28 30 30 33 35 35 40 42 44 45 49 55 56
  63 66 77 88 99))

Which version do I prefer? I have always used the priority queue method, but after implementing the tournament of pairs, I think I prefer it because it is easy to get right (mine worked the first time I ran it) and, unless there are a large number of large lists, it should be just about as fast as the more complicated priority queue method (mine did not work right the first time I ran it).

You can run the program at http://ideone.com/o2iUf8.

Pages: 1 2

2 Responses to “Multi-Way Merge”

  1. John Cowan said

    Not directly on point, but Olin Shivers’s list merge sorts for SRFI 32 definitely repay study. Both are natural (preserve order) and stable. The destructive version doesn’t allocate any new pairs, it just set-cdr’s everything into shape.

  2. matthew said

    Nice exercise in linked-list bashing. Here’s the tournament sort but arranged to recurse down each half of the list before merging the two results, comes to much the same thing, but maybe has better locality. C++ but only just:

    struct Node {
      Node(int n_,Node *next_): n(n_), next(next_) {}
      int n; Node *next;
    };
    
    // a is an array of n Node * lists.
    Node *merge(Node **a, int n) {
      if (n == 1) return *a;
      Node *p = merge(a,n/2), *q = merge(a+n/2,n-n/2);
      Node *r = NULL, **t = &r;
      while(p && q) {
        if (p->n <= q->n) {
          *t = p; t = &p->next; p = *t;
        } else {
          *t = q; t = &q->next; q = *t;
        }
      }
      if (p) *t = p;
      else if (q) *t = q;
      return r;
    }
    

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: