List Difference

November 20, 2012

These are all similar to the intersection and union functions of the prior exercise. Here is the quadratic version of the function, which uses member to look up items in a list:

(define (difference1 xs ys) (let loop ((xs xs) (zs (list))) (cond ((null? xs) zs) ((member (car xs) ys) (loop (cdr xs) zs)) (else (loop (cdr xs) (cons (car xs) zs))))))
The O(n log n) version performs an n log n sort followed by a linear-time “merge”-like operation:

(define (difference2 xs ys) (let loop ((xs (sort < xs)) (ys (sort < ys)) (zs (list))) (cond ((null? xs) zs) ((null? ys) (append xs zs)) ((< (car xs) (car ys)) (loop (cdr xs) ys (cons (car xs) zs))) ((< (car ys) (car xs)) (loop xs (cdr ys) zs)) (else (loop (cdr xs) (cdr ys) zs)))))

The linear version of the function is the same as the quadratic version, but uses a hash table instead of a list:

(define (difference3 xs ys) (let ((h (make-hash identity = #f 9973))) (do ((ys ys (cdr ys))) ((null? ys)) (h 'insert (car ys) (car ys))) (let loop ((xs xs) (zs (list))) (cond ((null? xs) zs) ((h 'lookup (car xs)) (loop (cdr xs) zs)) (else (loop (cdr xs) (cons (car xs) zs)))))))

The timing test is similar to the prior exercise:

(define (time-test difference n) (let ((a (rand-list n)) (b (rand-list n))) (time (begin (difference a b) (difference b a) (if #f #f)))))

Our results are also similar to the prior exercise. As before, sorting is fast, and our hash tables aren’t:

n 100000 200000 400000 800000 ------ ------ ------ ------ ------ first 452 1794 7114 28626 second 16 31 63 125 third 47 94 203 515

We used make-hash, sort and identity from the Standard Prelude. You can run the program at http://programmingpraxis.codepad.org/4FQ7oMpV.

Posted by programmingpraxis

Filed in Exercises

5 Comments »

5 Responses to “List Difference”

Paul said

November 20, 2012 at 9:54 AM

A Python version. After last weeks exercise, this was not too difficult. A full version can be found here.

def diff1(a, b):
    """Loop over a and check if ai not in b"""
    return [ai for ai in a if ai not in b]
    
def diff4(a, b):    
    """Make heaps of a and b. Copy to output elements that are i na but not in b"""
    ha = list(a)
    hb = list(b)
    res = []
    heapq.heapify(ha)
    heapq.heapify(hb)
    while ha and hb:
        if ha[0] < hb[0]:
            res.append(heapq.heappop(ha))
        elif ha[0] > hb[0]:
            heapq.heappop(hb)
        else:
            heapq.heappop(ha)
            heapq.heappop(hb)
    return res + ha # ha can contain a tail
    
def diff6(a, b):
    """Make set of b and loop over a and check that ai not in set(b)"""
    setb = set(b)
    return [ai for ai in a if ai not in setb]
    
def diff99(a, b):
    """Make sets of a and b and subtract"""
    return list(set(a) - set(b))
    
"""
diff1    O(n2)    coeff: 4.01e-06  ratio:    27   alt: O(n3)    Loop over a and check if ai not in b
diff4    O(nlogn) coeff: 6.06e-07  ratio:     5   alt: O(n)     Make heaps of a and b. Copy to output elements that are i na but not in b
diff6    O(n)     coeff: 1.38e-08  ratio:     4   alt: O(nlogn) Make set of b and loop over a and check that ai not in set(b)
diff99   O(nlogn) coeff: 2.07e-08  ratio:     2   alt: O(n)     Make sets of a and b and subtract
"""

List Difference | My Blog said
November 20, 2012 at 8:03 PM
[…] Pages: 1 2 […]
JP said
November 28, 2012 at 5:07 AM
Here’s my try in Racket and Python (including the list intersection and union from the previous post): List algorithms and efficiency

And just the source code: python, racket

I went ahead and graphed the timings as well, to show that they were actually O(n2), O(n log n), and O(n). I was actually a little surprised at how well that turned out.
Pedro said
December 11, 2012 at 6:11 PM
Hi,
the algorithm complexity is measured in “list operations”, I gather. For example, sets in Python are implemented as (open-addressed) hashes. This
may make their instantiation much more complex than the creation of a linked list.
Interesting problem, though.

Pedro.
JP said
December 11, 2012 at 6:48 PM
Wouldn’t the instantiation of the data structure be a one time cost and thus factored out of the big-O runtime? Or is there a cost to growing the hash as more and more items are added to it?

I did timing graphs for the three which matched up pretty well, but they were based on the Racket versions rather than Python. Perhaps I should try that.

Programming Praxis

List Difference

November 20, 2012

5 Responses to “List Difference”

Leave a comment

Categories

Archives

Archives

Programming Praxis

List Difference

November 20, 2012

Share this:

Related

5 Responses to “List Difference”

Leave a comment

Categories

Archives

Archives