List Difference

November 20, 2012

I receive several emails from my readers each week, and I try to answer every one of them (except those that begin “I really admire your blog. You are obviously a great programmer. It has nothing to do with your recent exercise, but I am a student just learning to program, and I need help with my Java program that doesn’t work. Can you look at this program and tell me what is wrong? And they attach 400 lines of code without even bothering to tell me what is the task, or providing sample input and output, or even their name). I love to hear from my readers; your emails sustain me, and keep me writhing.

I had an email from a reader yesterday asking why I didn’t assign list differencing as a task when I assigned list intersection and union. The answer is: because I forgot!

So today we look at list differencing. Given two lists, their difference is the items in the first list that are not in the second list. For our sample lists A containing 4, 7, 12, 6, 17, 5, and 13 and B containing 7, 19, 4, 11, 13, 2, and 15, the difference A − B is 12, 6, 17, and 5 and the difference B − A is 19, 11, 2, and 15.

Your task is to write three list differencing functions, one each with time complexities O(n2), O(n log n), and O(n), and perform timing tests to confirm the expected time complexities. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution in the comments below.

Pages: 1 2

5 Responses to “List Difference”

  1. Paul said

    A Python version. After last weeks exercise, this was not too difficult. A full version can be found here.

    def diff1(a, b):
        """Loop over a and check if ai not in b"""
        return [ai for ai in a if ai not in b]
        
    def diff4(a, b):    
        """Make heaps of a and b. Copy to output elements that are i na but not in b"""
        ha = list(a)
        hb = list(b)
        res = []
        heapq.heapify(ha)
        heapq.heapify(hb)
        while ha and hb:
            if ha[0] < hb[0]:
                res.append(heapq.heappop(ha))
            elif ha[0] > hb[0]:
                heapq.heappop(hb)
            else:
                heapq.heappop(ha)
                heapq.heappop(hb)
        return res + ha # ha can contain a tail
        
    def diff6(a, b):
        """Make set of b and loop over a and check that ai not in set(b)"""
        setb = set(b)
        return [ai for ai in a if ai not in setb]
        
    def diff99(a, b):
        """Make sets of a and b and subtract"""
        return list(set(a) - set(b))
        
    """
    diff1    O(n2)    coeff: 4.01e-06  ratio:    27   alt: O(n3)    Loop over a and check if ai not in b
    diff4    O(nlogn) coeff: 6.06e-07  ratio:     5   alt: O(n)     Make heaps of a and b. Copy to output elements that are i na but not in b
    diff6    O(n)     coeff: 1.38e-08  ratio:     4   alt: O(nlogn) Make set of b and loop over a and check that ai not in set(b)
    diff99   O(nlogn) coeff: 2.07e-08  ratio:     2   alt: O(n)     Make sets of a and b and subtract
    """
    
  2. JP said

    Here’s my try in Racket and Python (including the list intersection and union from the previous post): List algorithms and efficiency

    And just the source code: python, racket

    I went ahead and graphed the timings as well, to show that they were actually O(n2), O(n log n), and O(n). I was actually a little surprised at how well that turned out.

  3. Pedro said

    Hi,
    the algorithm complexity is measured in “list operations”, I gather. For example, sets in Python are implemented as (open-addressed) hashes. This
    may make their instantiation much more complex than the creation of a linked list.
    Interesting problem, though.

    Pedro.

  4. JP said

    Wouldn’t the instantiation of the data structure be a one time cost and thus factored out of the big-O runtime? Or is there a cost to growing the hash as more and more items are added to it?

    I did timing graphs for the three which matched up pretty well, but they were based on the Racket versions rather than Python. Perhaps I should try that.

Leave a comment