List Difference
November 20, 2012
I receive several emails from my readers each week, and I try to answer every one of them (except those that begin “I really admire your blog. You are obviously a great programmer. It has nothing to do with your recent exercise, but I am a student just learning to program, and I need help with my Java program that doesn’t work. Can you look at this program and tell me what is wrong? And they attach 400 lines of code without even bothering to tell me what is the task, or providing sample input and output, or even their name). I love to hear from my readers; your emails sustain me, and keep me writhing.
I had an email from a reader yesterday asking why I didn’t assign list differencing as a task when I assigned list intersection and union. The answer is: because I forgot!
So today we look at list differencing. Given two lists, their difference is the items in the first list that are not in the second list. For our sample lists A containing 4, 7, 12, 6, 17, 5, and 13 and B containing 7, 19, 4, 11, 13, 2, and 15, the difference A − B is 12, 6, 17, and 5 and the difference B − A is 19, 11, 2, and 15.
Your task is to write three list differencing functions, one each with time complexities O(n2), O(n log n), and O(n), and perform timing tests to confirm the expected time complexities. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution in the comments below.
A Python version. After last weeks exercise, this was not too difficult. A full version can be found here.
[…] Pages: 1 2 […]
Here’s my try in Racket and Python (including the list intersection and union from the previous post): List algorithms and efficiency
And just the source code: python, racket
I went ahead and graphed the timings as well, to show that they were actually O(n2), O(n log n), and O(n). I was actually a little surprised at how well that turned out.
Hi,
the algorithm complexity is measured in “list operations”, I gather. For example, sets in Python are implemented as (open-addressed) hashes. This
may make their instantiation much more complex than the creation of a linked list.
Interesting problem, though.
Pedro.
Wouldn’t the instantiation of the data structure be a one time cost and thus factored out of the big-O runtime? Or is there a cost to growing the hash as more and more items are added to it?
I did timing graphs for the three which matched up pretty well, but they were based on the Racket versions rather than Python. Perhaps I should try that.