## List Intersection And Union

### November 16, 2012

Our first algorithm is quadratic: O(mn) where the two lists have lengths m and n, and O(n2) assuming that m and n are similar. The idea is to compare each integer in one list to all the integers in the other list; for intersection, keep all the integers that are in both lists, and for union, keep all the integers in one list plus all the integers in the other list that aren’t in the first list:

```(define (intersect1 xs ys)   (let loop ((ys ys) (zs (list)))     (cond ((null? ys) zs)           ((member (car ys) xs)             (loop (cdr ys) (cons (car ys) zs)))           (else (loop (cdr ys) zs)))))```

```(define (union1 xs ys)   (let loop ((ys ys) (zs xs))     (cond ((null? ys) zs)           ((member (car ys) xs)             (loop (cdr ys) zs))           (else (loop (cdr ys) (cons (car ys) zs))))))```

This is quadratic, even though it looks linear, because `member` takes linear time to scan the entire list.

Our second algorithm is O(m log n); we sort each of the input lists, then run through the lists in order:

```(define (intersect2 xs ys)   (let loop ((xs (sort < xs)) (ys (sort < ys)) (zs (list)))     (cond ((or (null? xs) (null? ys)) zs)           ((< (car xs) (car ys)) (loop (cdr xs) ys zs))           ((< (car ys) (car xs)) (loop xs (cdr ys) zs))           (else (loop (cdr xs) (cdr ys) (cons (car xs) zs))))))```

```(define (union2 xs ys)   (let loop ((xs (sort < xs)) (ys (sort < ys)) (zs (list)))     (cond ((null? xs) (append ys zs))           ((null? ys) (append xs zs))           ((< (car xs) (car ys))             (loop (cdr xs) ys (cons (car xs) zs)))           ((< (car ys) (car xs))             (loop xs (cdr ys) (cons (car ys) zs)))           (else (loop (cdr xs) (cdr ys) (cons (car xs) zs))))))```

Our third algorithm takes O(m+n) time, which is linear. It is similar to our first algorithm, except that instead of using `member` to scan the xs list we enter all the elements of xs in a hash table which can be accessed in constant time:

```(define (intersect3 xs ys)   (let ((h (make-hash identity = #f 9973)))     (do ((xs xs (cdr xs))) ((null? xs))       (h 'insert (car xs) (car xs)))     (let loop ((ys ys) (zs (list)))       (cond ((null? ys) zs)             ((h 'lookup (car ys))               (loop (cdr ys) (cons (car ys) zs)))             (else (loop (cdr ys) zs))))))```

```(define (union3 xs ys)   (let ((h (make-hash identity = #f 9973)))     (do ((xs xs (cdr xs))) ((null? xs))       (h 'insert (car xs) (car xs)))     (let loop ((ys ys) (zs xs))       (cond ((null? ys) zs)             ((h 'lookup (car ys))               (loop (cdr ys) zs))             (else (loop (cdr ys) (cons (car ys) zs)))))))```

Here are some examples:

```> (define a (list 4 7 12 6 17 5 13)) > (define b (list 7 19 4 11 13 2 15)) > (intersect1 a b) (13 4 7) > (union1 a b) (15 2 11 19 4 7 12 6 17 5 13) > (intersect2 a b) (13 7 4) > (union2 a b) (19 17 15 13 12 11 7 6 5 4 2) > (intersect3 a b) (13 4 7) > (union3 a b) (15 2 11 19 4 7 12 6 17 5 13)```

We did some timing tests to show that the various implementations achieve their expected time bounds. To begin, here is a function that returns a random list of one tenth of the non-negative integers less than n

```(define (rand-list n)   (take (quotient n 10) (shuffle (range n))))```

Then we perform the timings and throw away the results; the expression `(if #f #f)` is the standard Scheme idiom for generating a `void` value:

```(define (time-test intersect union n)   (let ((a (rand-list n)) (b (rand-list n)))     (time (begin (intersect a b) (union a b) (if #f #f)))))```

Here are our results, with all times in milliseconds:

```n       100000  200000  400000  800000 ------  ------  ------  ------  ------ first      452    1997    7130   28579 second      16      31      63     125 third       31      78     187     421```

The first algorithm is clearly quadratic; when the number of items doubles, time quadruples. The second and third algorithms show clearly that our `sort` function is very much better than our hash tables.

We used hash tables and the `identity` function from the Standard Prelude. You can run the program at http://programmingpraxis.codepad.org/79FY5BVP.

Pages: 1 2

### 8 Responses to “List Intersection And Union”

1. […] today’s Programming Praxis exercise, our goal is to write union and intersection functions for lists in […]

```import qualified Data.IntSet as I
import qualified Data.Set as S
import System.CPUTime
import Text.Printf

genUnion, genIntersect :: ([a] -> b) -> (a -> b -> Bool) -> [a] -> [a] -> [a]
genUnion load get xs ys = xs ++ filter (not . flip get (load xs)) ys

union_n2, intersect_n2 :: Eq a => [a] -> [a] -> [a]
union_n2        = genUnion     id elem
intersect_n2    = genIntersect id elem

union_nlogn, intersect_nlogn :: Ord a => [a] -> [a] -> [a]
union_nlogn     = genUnion     S.fromList S.member
intersect_nlogn = genIntersect S.fromList S.member

union_n, intersect_n :: [Int] -> [Int] -> [Int]
union_n         = genUnion     I.fromList I.member
intersect_n     = genIntersect I.fromList I.member

time :: a -> IO Integer
time f = do start <- getCPUTime
_     <- return \$! f
end   <- getCPUTime
return (div (end - start) (10^9))

benchmark :: ([Int] -> [Int] -> [Int]) -> [Int] -> IO ()
benchmark f ns = putStrLn . (printf "%7d" =<<) =<<
mapM (\n -> time (length \$ f [1..n] [1..n])) ns

main :: IO ()
main = do let a = [4,7,12,6,17,5,13]
let b = [7,19,4,11,13,2,15]
let testUnion f = print \$ f a b == [4,7,12,6,17,5,13,19,11,2,15]
let testIsect f = print \$ f a b == [4,7,13]
testUnion union_n2
testIsect intersect_n2
testUnion union_nlogn
testIsect intersect_nlogn
testUnion union_n
testIsect intersect_n

let ns = iterate (*2) 10000
benchmark union_n2    \$ take 4 ns
benchmark union_nlogn \$ take 8 ns
benchmark union_n     \$ take 8 ns
```
3. Paul said

A Python version. With the results below. The full code is here.

```def inter1(a, b):
"""Loop over a and b and check if there are doubles"""
return [ai for ai in a for bi in b if ai == bi]

def union1(a, b):
"""Loop over the elements of a+b"""
res = []
for x in a + b:
if x not in res:
res.append(x)
return res

def inter2(a, b):
"""Loop over a and check if present in b"""
return [ai for ai in a if ai in b]

def union2(a, b):
"""Output a + Loop over b and check if present in a"""
return a + [bi for bi in b if bi not in a]

def inter3(a, b):
"""Output a after removing all elements in b"""
res = list(a)
for ai in a:
if ai not in b: res.remove(ai)
return res

def inter4(a, b):
"""Make heaps of a and b. Copy to output elements that are in a and b"""
ha = list(a)
hb = list(b)
res = []
heapq.heapify(ha)
heapq.heapify(hb)
while ha and hb:
if ha[0] < hb[0]:
heapq.heappop(ha)
elif ha[0] > hb[0]:
heapq.heappop(hb)
else:
res.append(heapq.heappop(ha))
heapq.heappop(hb)
return res

def union4(a, b):
"""Make heaps of a and b. Copy to output elements that are in a or b"""
ha = list(a)
hb = list(b)
res = []
heapq.heapify(ha)
heapq.heapify(hb)
while ha and hb:
if ha[0] < hb[0]:
res.append(heapq.heappop(ha))
elif ha[0] > hb[0]:
res.append(heapq.heappop(hb))
else:
res.append(heapq.heappop(ha))
heapq.heappop(hb)
return res + ha + hb # either ha or hb can contain a tail

def inter99(a, b):
"""Make set of a and b. Output list of intersection."""
return list(set(a) & set(b))

def union99(a, b):
"""Make set of a and b. Output list of union."""
return list(set(a) | set(b))

def union99a(a, b):
"""Output list of set of a + b."""
return list(set(a + b))
```

Coeff: e.g. for inter1 timings are 8.78e-5*n^2
Ratio: How much worse fits alternative model (indicated by alt)
inter1 O(n2) coeff: 8.78e-05 ratio: 704 alt: O(nlogn) Loop over a and b and check if there are doubles
inter2 O(n2) coeff: 3.12e-05 ratio: 1958 alt: O(nlogn) Loop over a and check if present in b
inter3 O(n2) coeff: 3.47e-05 ratio: 510 alt: O(nlogn) Output a after removing all elements in b
inter4 O(nlogn) coeff: 1.03e-05 ratio: 34 alt: O(n) Make heaps of a and b. Copy to output elements that are in a and b
inter99 O(n) coeff: 2.71e-07 ratio: 5 alt: O(nlogn) Make set of a and b. Output list of intersection.
union1 O(n2) coeff: 5.83e-05 ratio: 7656 alt: O(nlogn) Loop over the elements of a+b
union2 O(n2) coeff: 3.06e-05 ratio: 1225 alt: O(nlogn) Output a + Loop over b and check if present in a
union4 O(nlogn) coeff: 1.09e-05 ratio: 3 alt: O(n) Make heaps of a and b. Copy to output elements that are in a or b
union99 O(n) coeff: 2.94e-07 ratio: 4 alt: O(nlogn) Make set of a and b. Output list of union.
union99a O(nlogn) coeff: 2.30e-07 ratio: 1 alt: O(n) Output list of set of a + b.

4. […] Pages: 1 2 […]

5. […] three different list algorithms three times, each with a different runtime complexity. From their first post last week we have list intersection and union and from a newer post yesterday we have the […]

6. JP said

Here’s my try in Racket and Python (including the list difference from the next post): List algorithms and efficiency

And just the source code: python, racket

I went ahead and graphed the timings as well, to show that they were actually O(n2), O(n log n), and O(n). I was actually a little surprised at how well that turned out.

7. Christophe said

Here are my tries in Scheme. I have not run time tests but simply reasoned about the performance.

```
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; Union and intersection of two lists.                                     ;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(#%require (only racket printf))

(define list1 '(4  7 12 6 17 5 13))
(define list2 '(7 19 4 11 13 2 15))

(define correct-union '(4 7 12 6 17 5 13 19 11 2 15))
(define correct-inter '(4 7 13))
; Naive version. O(n²)
(define (naive-union-inter l1 l2)
(let ((union l1)
(intersection '()))
(let eat-lists
((first l1)
(second l2))
; Union O(n²)
(if (not (member (car second) union))
(set! union (cons (car second) union)))
; Intersection
(if (and (member (car first) l2) (not (member (car first) intersection)))
(set! intersection (cons (car first) intersection)))

(if (and (member (car second) l1) (not (member (car second) intersection)))
(set! intersection (cons (car second) intersection)))

(if (not (or (eq? '() (cdr first)) (eq? '() (cdr second))))
(eat-lists (cdr first) (cdr second))))
; Test solution
(display (equal? (sort union <) (sort correct-union <)))(newline)
(display (equal? (sort intersection <) (sort correct-inter <)))(newline)))
(naive-union-inter list1 list2)

; A little less naive.
; If we assume that the sort method is in O(n log(n))
; we can state that the total time complexity for this
; algorithm is:
;O(n log(n)) + O(n log(n)) + O(n) == O(n log(n))
(define (less-naive-union l1 l2)
(let ((sl1  (sort l1 <))
(sl2  (sort l2 <))
(union        '())
(intersection '()))
(let eat-lists
((fst sl1)
(snd sl2))
(if (not (and (eq? '() fst) (eq? '() snd)))
(cond
((eq? '() fst)
(set! union (append snd union)))
((eq? '() snd)
(set! union (append fst union)))
((equal? (car fst) (car snd))
(set! union (cons (car fst) union))
(set! intersection (cons (car fst) intersection))
(eat-lists (cdr fst) (cdr snd)))
((< (car fst) (car snd))
(set! union (cons (car fst) union))
(eat-lists (cdr fst) snd))
((> (car fst) (car snd))
(set! union (cons (car snd) union))
(eat-lists fst (cdr snd))))))

(printf "Union: ~a~nExpected Union: ~a~nCorrect?: ~a~n" (sort union <) (sort correct-union <) (equal? (sort union <) (sort correct-union <)))
(printf "Intersection : ~a~nExpected Intersection: ~a~nCorrect?: ~a~n" (sort intersection <) (sort correct-inter <) (equal? (sort intersection <) (sort correct-inter <)))
))

(less-naive-union list1 list2)

; Map a function over a vector.
; Function needs to parameters:
; The value and the index of the value.
(define (vector-map! proc v)
(let map
((index 0))
(vector-set! v index (proc (vector-ref v index) index))
(if (< index (- (vector-length v) 1))
(map (+ 1 index)))))

; Less naive O(n).
; This solution is actually in O(n) but not in place.
(define (fast-union l1 l2)
(let ((max     -1)
(min +inf.0)
(vec    '())
(union  '())
(inter  '()))
; Determine maximum and minmum. O(n)
(let walk-lists
((first-l  l1)
(second-l l2))
(if (> (car first-l) max)
(set! max (car first-l)))
(if (< (car first-l) min)
(set! min (car first-l)))
(if (> (car second-l) max)
(set! max (car second-l)))
(if (< (car second-l) min)
(set! min (car second-l)))
(if (not (or (eq? '() (cdr first-l))
(eq? '() (cdr second-l))))
(walk-lists (cdr first-l) (cdr second-l))))
; Create an array of this size. O(n)
(set! vec (make-vector (+ 1 (- max min)) 0))
(printf "Vector: ~a~n" vec)
; Loop over each list and mark if it has the element. O(n)
(let ((proc (lambda (x) (vector-set! vec (- x min) (+ 1 (vector-ref vec (- x min)))))))
(for-each proc l1)
(for-each proc l2))
; Create the union. O(n)
(vector-map! (lambda (val idx)
(if (> val 0)
(set! union (cons (+ min idx) union)))
val) vec)
; Create the intersection O(n)
(vector-map! (lambda (val idx)
(if (equal? 2 val)
(set! inter (cons (+ min idx) inter)))
val) vec)
; Test solution
(printf "Union: ~a~nExpected Union: ~a~nCorrect?: ~a~n" (sort union <) (sort correct-union <) (equal? (sort union <) (sort correct-union <)))
(printf "Intersection : ~a~nExpected Intersection: ~a~nCorrect?: ~a~n" (sort inter <) (sort correct-inter <) (equal? (sort inter <) (sort correct-inter <)))
))
(fast-union list1 list2)
```
8. […] to show that the various time complexities are achieved. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments […]