List Intersection And Union
November 16, 2012
Our first algorithm is quadratic: O(mn) where the two lists have lengths m and n, and O(n2) assuming that m and n are similar. The idea is to compare each integer in one list to all the integers in the other list; for intersection, keep all the integers that are in both lists, and for union, keep all the integers in one list plus all the integers in the other list that aren’t in the first list:
(define (intersect1 xs ys)
(let loop ((ys ys) (zs (list)))
(cond ((null? ys) zs)
((member (car ys) xs)
(loop (cdr ys) (cons (car ys) zs)))
(else (loop (cdr ys) zs)))))
(define (union1 xs ys)
(let loop ((ys ys) (zs xs))
(cond ((null? ys) zs)
((member (car ys) xs)
(loop (cdr ys) zs))
(else (loop (cdr ys) (cons (car ys) zs))))))
This is quadratic, even though it looks linear, because member
takes linear time to scan the entire list.
Our second algorithm is O(m log n); we sort each of the input lists, then run through the lists in order:
(define (intersect2 xs ys)
(let loop ((xs (sort < xs)) (ys (sort < ys)) (zs (list)))
(cond ((or (null? xs) (null? ys)) zs)
((< (car xs) (car ys)) (loop (cdr xs) ys zs))
((< (car ys) (car xs)) (loop xs (cdr ys) zs))
(else (loop (cdr xs) (cdr ys) (cons (car xs) zs))))))
(define (union2 xs ys)
(let loop ((xs (sort < xs)) (ys (sort < ys)) (zs (list)))
(cond ((null? xs) (append ys zs))
((null? ys) (append xs zs))
((< (car xs) (car ys))
(loop (cdr xs) ys (cons (car xs) zs)))
((< (car ys) (car xs))
(loop xs (cdr ys) (cons (car ys) zs)))
(else (loop (cdr xs) (cdr ys) (cons (car xs) zs))))))
Our third algorithm takes O(m+n) time, which is linear. It is similar to our first algorithm, except that instead of using member
to scan the xs list we enter all the elements of xs in a hash table which can be accessed in constant time:
(define (intersect3 xs ys)
(let ((h (make-hash identity = #f 9973)))
(do ((xs xs (cdr xs))) ((null? xs))
(h 'insert (car xs) (car xs)))
(let loop ((ys ys) (zs (list)))
(cond ((null? ys) zs)
((h 'lookup (car ys))
(loop (cdr ys) (cons (car ys) zs)))
(else (loop (cdr ys) zs))))))
(define (union3 xs ys)
(let ((h (make-hash identity = #f 9973)))
(do ((xs xs (cdr xs))) ((null? xs))
(h 'insert (car xs) (car xs)))
(let loop ((ys ys) (zs xs))
(cond ((null? ys) zs)
((h 'lookup (car ys))
(loop (cdr ys) zs))
(else (loop (cdr ys) (cons (car ys) zs)))))))
Here are some examples:
> (define a (list 4 7 12 6 17 5 13))
> (define b (list 7 19 4 11 13 2 15))
> (intersect1 a b)
(13 4 7)
> (union1 a b)
(15 2 11 19 4 7 12 6 17 5 13)
> (intersect2 a b)
(13 7 4)
> (union2 a b)
(19 17 15 13 12 11 7 6 5 4 2)
> (intersect3 a b)
(13 4 7)
> (union3 a b)
(15 2 11 19 4 7 12 6 17 5 13)
We did some timing tests to show that the various implementations achieve their expected time bounds. To begin, here is a function that returns a random list of one tenth of the non-negative integers less than n
(define (rand-list n)
(take (quotient n 10) (shuffle (range n))))
Then we perform the timings and throw away the results; the expression (if #f #f)
is the standard Scheme idiom for generating a void
value:
(define (time-test intersect union n)
(let ((a (rand-list n)) (b (rand-list n)))
(time (begin (intersect a b) (union a b) (if #f #f)))))
Here are our results, with all times in milliseconds:
n 100000 200000 400000 800000
------ ------ ------ ------ ------
first 452 1997 7130 28579
second 16 31 63 125
third 31 78 187 421
The first algorithm is clearly quadratic; when the number of items doubles, time quadruples. The second and third algorithms show clearly that our sort
function is very much better than our hash tables.
We used hash tables and the identity
function from the Standard Prelude. You can run the program at http://programmingpraxis.codepad.org/79FY5BVP.
[…] today’s Programming Praxis exercise, our goal is to write union and intersection functions for lists in […]
My Haskell version (see http://bonsaicode.wordpress.com/2012/11/16/programming-praxis-list-intersection-and-union/ for a version with comments):
A Python version. With the results below. The full code is here.
Coeff: e.g. for inter1 timings are 8.78e-5*n^2
Ratio: How much worse fits alternative model (indicated by alt)
inter1 O(n2) coeff: 8.78e-05 ratio: 704 alt: O(nlogn) Loop over a and b and check if there are doubles
inter2 O(n2) coeff: 3.12e-05 ratio: 1958 alt: O(nlogn) Loop over a and check if present in b
inter3 O(n2) coeff: 3.47e-05 ratio: 510 alt: O(nlogn) Output a after removing all elements in b
inter4 O(nlogn) coeff: 1.03e-05 ratio: 34 alt: O(n) Make heaps of a and b. Copy to output elements that are in a and b
inter99 O(n) coeff: 2.71e-07 ratio: 5 alt: O(nlogn) Make set of a and b. Output list of intersection.
union1 O(n2) coeff: 5.83e-05 ratio: 7656 alt: O(nlogn) Loop over the elements of a+b
union2 O(n2) coeff: 3.06e-05 ratio: 1225 alt: O(nlogn) Output a + Loop over b and check if present in a
union4 O(nlogn) coeff: 1.09e-05 ratio: 3 alt: O(n) Make heaps of a and b. Copy to output elements that are in a or b
union99 O(n) coeff: 2.94e-07 ratio: 4 alt: O(nlogn) Make set of a and b. Output list of union.
union99a O(nlogn) coeff: 2.30e-07 ratio: 1 alt: O(n) Output list of set of a + b.
[…] Pages: 1 2 […]
[…] three different list algorithms three times, each with a different runtime complexity. From their first post last week we have list intersection and union and from a newer post yesterday we have the […]
Here’s my try in Racket and Python (including the list difference from the next post): List algorithms and efficiency
And just the source code: python, racket
I went ahead and graphed the timings as well, to show that they were actually O(n2), O(n log n), and O(n). I was actually a little surprised at how well that turned out.
Here are my tries in Scheme. I have not run time tests but simply reasoned about the performance.
[…] to show that the various time complexities are achieved. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments […]