List Difference
November 20, 2012
These are all similar to the intersection
and union
functions of the prior exercise. Here is the quadratic version of the function, which uses member
to look up items in a list:
(define (difference1 xs ys)
(let loop ((xs xs) (zs (list)))
(cond ((null? xs) zs)
((member (car xs) ys)
(loop (cdr xs) zs))
(else (loop (cdr xs) (cons (car xs) zs))))))
The O(n log n) version performs an n log n sort followed by a linear-time “merge”-like operation:
(define (difference2 xs ys)
(let loop ((xs (sort < xs)) (ys (sort < ys)) (zs (list)))
(cond ((null? xs) zs)
((null? ys) (append xs zs))
((< (car xs) (car ys))
(loop (cdr xs) ys (cons (car xs) zs)))
((< (car ys) (car xs))
(loop xs (cdr ys) zs))
(else (loop (cdr xs) (cdr ys) zs)))))
The linear version of the function is the same as the quadratic version, but uses a hash table instead of a list:
(define (difference3 xs ys)
(let ((h (make-hash identity = #f 9973)))
(do ((ys ys (cdr ys))) ((null? ys))
(h 'insert (car ys) (car ys)))
(let loop ((xs xs) (zs (list)))
(cond ((null? xs) zs)
((h 'lookup (car xs))
(loop (cdr xs) zs))
(else (loop (cdr xs) (cons (car xs) zs)))))))
The timing test is similar to the prior exercise:
(define (time-test difference n)
(let ((a (rand-list n)) (b (rand-list n)))
(time (begin (difference a b) (difference b a) (if #f #f)))))
Our results are also similar to the prior exercise. As before, sorting is fast, and our hash tables aren’t:
n 100000 200000 400000 800000
------ ------ ------ ------ ------
first 452 1794 7114 28626
second 16 31 63 125
third 47 94 203 515
We used make-hash
, sort
and identity
from the Standard Prelude. You can run the program at http://programmingpraxis.codepad.org/4FQ7oMpV.
A Python version. After last weeks exercise, this was not too difficult. A full version can be found here.
[…] Pages: 1 2 […]
Here’s my try in Racket and Python (including the list intersection and union from the previous post): List algorithms and efficiency
And just the source code: python, racket
I went ahead and graphed the timings as well, to show that they were actually O(n2), O(n log n), and O(n). I was actually a little surprised at how well that turned out.
Hi,
the algorithm complexity is measured in “list operations”, I gather. For example, sets in Python are implemented as (open-addressed) hashes. This
may make their instantiation much more complex than the creation of a linked list.
Interesting problem, though.
Pedro.
Wouldn’t the instantiation of the data structure be a one time cost and thus factored out of the big-O runtime? Or is there a cost to growing the hash as more and more items are added to it?
I did timing graphs for the three which matched up pretty well, but they were based on the Racket versions rather than Python. Perhaps I should try that.