List Intersection And Union
November 16, 2012
You are given two linked lists of integers with no duplicates. The intersection of the two lists is a list of integers that are in both lists, and the union of the two lists is a list of integers that are in either list. For instance, if list A contains the integers 4, 7, 12, 6, 17, 5 and 13 and list B contains the integers 7, 19, 4, 11, 13, 2, and 15 their intersection is the list 4, 7, and 13 and their union is the list 4, 7, 12, 6, 17, 5, 13, 19, 11, 2 and 15.
Your task is to write as many different versions of the functions that perform intersection and union as you can think of; you should have versions with time complexities of O(n2), O(n log n) and O(n), and you should perform timing tests to show that the various time complexities are achieved. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Pages: 1 2
[...] today’s Programming Praxis exercise, our goal is to write union and intersection functions for lists in [...]
My Haskell version (see http://bonsaicode.wordpress.com/2012/11/16/programming-praxis-list-intersection-and-union/ for a version with comments):
import qualified Data.IntSet as I import qualified Data.Set as S import System.CPUTime import Text.Printf genUnion, genIntersect :: ([a] -> b) -> (a -> b -> Bool) -> [a] -> [a] -> [a] genUnion load get xs ys = xs ++ filter (not . flip get (load xs)) ys genIntersect load get xs ys = filter (`get` (load ys)) xs union_n2, intersect_n2 :: Eq a => [a] -> [a] -> [a] union_n2 = genUnion id elem intersect_n2 = genIntersect id elem union_nlogn, intersect_nlogn :: Ord a => [a] -> [a] -> [a] union_nlogn = genUnion S.fromList S.member intersect_nlogn = genIntersect S.fromList S.member union_n, intersect_n :: [Int] -> [Int] -> [Int] union_n = genUnion I.fromList I.member intersect_n = genIntersect I.fromList I.member time :: a -> IO Integer time f = do start <- getCPUTime _ <- return $! f end <- getCPUTime return (div (end - start) (10^9)) benchmark :: ([Int] -> [Int] -> [Int]) -> [Int] -> IO () benchmark f ns = putStrLn . (printf "%7d" =<<) =<< mapM (\n -> time (length $ f [1..n] [1..n])) ns main :: IO () main = do let a = [4,7,12,6,17,5,13] let b = [7,19,4,11,13,2,15] let testUnion f = print $ f a b == [4,7,12,6,17,5,13,19,11,2,15] let testIsect f = print $ f a b == [4,7,13] testUnion union_n2 testIsect intersect_n2 testUnion union_nlogn testIsect intersect_nlogn testUnion union_n testIsect intersect_n let ns = iterate (*2) 10000 benchmark union_n2 $ take 4 ns benchmark union_nlogn $ take 8 ns benchmark union_n $ take 8 nsA Python version. With the results below. The full code is here.
def inter1(a, b): """Loop over a and b and check if there are doubles""" return [ai for ai in a for bi in b if ai == bi] def union1(a, b): """Loop over the elements of a+b""" res = [] for x in a + b: if x not in res: res.append(x) return res def inter2(a, b): """Loop over a and check if present in b""" return [ai for ai in a if ai in b] def union2(a, b): """Output a + Loop over b and check if present in a""" return a + [bi for bi in b if bi not in a] def inter3(a, b): """Output a after removing all elements in b""" res = list(a) for ai in a: if ai not in b: res.remove(ai) return res def inter4(a, b): """Make heaps of a and b. Copy to output elements that are in a and b""" ha = list(a) hb = list(b) res = [] heapq.heapify(ha) heapq.heapify(hb) while ha and hb: if ha[0] < hb[0]: heapq.heappop(ha) elif ha[0] > hb[0]: heapq.heappop(hb) else: res.append(heapq.heappop(ha)) heapq.heappop(hb) return res def union4(a, b): """Make heaps of a and b. Copy to output elements that are in a or b""" ha = list(a) hb = list(b) res = [] heapq.heapify(ha) heapq.heapify(hb) while ha and hb: if ha[0] < hb[0]: res.append(heapq.heappop(ha)) elif ha[0] > hb[0]: res.append(heapq.heappop(hb)) else: res.append(heapq.heappop(ha)) heapq.heappop(hb) return res + ha + hb # either ha or hb can contain a tail def inter99(a, b): """Make set of a and b. Output list of intersection.""" return list(set(a) & set(b)) def union99(a, b): """Make set of a and b. Output list of union.""" return list(set(a) | set(b)) def union99a(a, b): """Output list of set of a + b.""" return list(set(a + b))Coeff: e.g. for inter1 timings are 8.78e-5*n^2
Ratio: How much worse fits alternative model (indicated by alt)
inter1 O(n2) coeff: 8.78e-05 ratio: 704 alt: O(nlogn) Loop over a and b and check if there are doubles
inter2 O(n2) coeff: 3.12e-05 ratio: 1958 alt: O(nlogn) Loop over a and check if present in b
inter3 O(n2) coeff: 3.47e-05 ratio: 510 alt: O(nlogn) Output a after removing all elements in b
inter4 O(nlogn) coeff: 1.03e-05 ratio: 34 alt: O(n) Make heaps of a and b. Copy to output elements that are in a and b
inter99 O(n) coeff: 2.71e-07 ratio: 5 alt: O(nlogn) Make set of a and b. Output list of intersection.
union1 O(n2) coeff: 5.83e-05 ratio: 7656 alt: O(nlogn) Loop over the elements of a+b
union2 O(n2) coeff: 3.06e-05 ratio: 1225 alt: O(nlogn) Output a + Loop over b and check if present in a
union4 O(nlogn) coeff: 1.09e-05 ratio: 3 alt: O(n) Make heaps of a and b. Copy to output elements that are in a or b
union99 O(n) coeff: 2.94e-07 ratio: 4 alt: O(nlogn) Make set of a and b. Output list of union.
union99a O(nlogn) coeff: 2.30e-07 ratio: 1 alt: O(n) Output list of set of a + b.
[...] Pages: 1 2 [...]
[...] three different list algorithms three times, each with a different runtime complexity. From their first post last week we have list intersection and union and from a newer post yesterday we have the [...]
Here’s my try in Racket and Python (including the list difference from the next post): List algorithms and efficiency
And just the source code: python, racket
I went ahead and graphed the timings as well, to show that they were actually O(n2), O(n log n), and O(n). I was actually a little surprised at how well that turned out.