List Intersection And Union

November 16, 2012

You are given two linked lists of integers with no duplicates. The intersection of the two lists is a list of integers that are in both lists, and the union of the two lists is a list of integers that are in either list. For instance, if list A contains the integers 4, 7, 12, 6, 17, 5 and 13 and list B contains the integers 7, 19, 4, 11, 13, 2, and 15 their intersection is the list 4, 7, and 13 and their union is the list 4, 7, 12, 6, 17, 5, 13, 19, 11, 2 and 15.

Your task is to write as many different versions of the functions that perform intersection and union as you can think of; you should have versions with time complexities of O(n2), O(n log n) and O(n), and you should perform timing tests to show that the various time complexities are achieved. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

8 Responses to “List Intersection And Union”

  1. […] today’s Programming Praxis exercise, our goal is to write union and intersection functions for lists in […]

  2. My Haskell version (see http://bonsaicode.wordpress.com/2012/11/16/programming-praxis-list-intersection-and-union/ for a version with comments):

    import qualified Data.IntSet as I
    import qualified Data.Set as S
    import System.CPUTime
    import Text.Printf
    
    genUnion, genIntersect :: ([a] -> b) -> (a -> b -> Bool) -> [a] -> [a] -> [a]
    genUnion load get xs ys = xs ++ filter (not . flip get (load xs)) ys
    genIntersect load get xs ys = filter (`get` (load ys)) xs
    
    union_n2, intersect_n2 :: Eq a => [a] -> [a] -> [a]
    union_n2        = genUnion     id elem
    intersect_n2    = genIntersect id elem
    
    union_nlogn, intersect_nlogn :: Ord a => [a] -> [a] -> [a]
    union_nlogn     = genUnion     S.fromList S.member
    intersect_nlogn = genIntersect S.fromList S.member
    
    union_n, intersect_n :: [Int] -> [Int] -> [Int]
    union_n         = genUnion     I.fromList I.member
    intersect_n     = genIntersect I.fromList I.member
    
    time :: a -> IO Integer
    time f = do start <- getCPUTime
                _     <- return $! f
                end   <- getCPUTime
                return (div (end - start) (10^9))
    
    benchmark :: ([Int] -> [Int] -> [Int]) -> [Int] -> IO ()
    benchmark f ns = putStrLn . (printf "%7d" =<<) =<<
                     mapM (\n -> time (length $ f [1..n] [1..n])) ns
    
    main :: IO ()
    main = do let a = [4,7,12,6,17,5,13]
              let b = [7,19,4,11,13,2,15]
              let testUnion f = print $ f a b == [4,7,12,6,17,5,13,19,11,2,15]
              let testIsect f = print $ f a b == [4,7,13]
              testUnion union_n2
              testIsect intersect_n2
              testUnion union_nlogn
              testIsect intersect_nlogn
              testUnion union_n
              testIsect intersect_n
    
              let ns = iterate (*2) 10000
              benchmark union_n2    $ take 4 ns
              benchmark union_nlogn $ take 8 ns
              benchmark union_n     $ take 8 ns
    
  3. Paul said

    A Python version. With the results below. The full code is here.

    def inter1(a, b):
        """Loop over a and b and check if there are doubles"""
        return [ai for ai in a for bi in b if ai == bi]
    
    def union1(a, b):
        """Loop over the elements of a+b"""
        res = []
        for x in a + b:
            if x not in res:
                res.append(x)
        return res
    
    def inter2(a, b):
        """Loop over a and check if present in b"""
        return [ai for ai in a if ai in b]
    
    def union2(a, b):
        """Output a + Loop over b and check if present in a"""
        return a + [bi for bi in b if bi not in a]
    
    def inter3(a, b):
        """Output a after removing all elements in b"""
        res = list(a)
        for ai in a:
            if ai not in b: res.remove(ai)
        return res
    
    def inter4(a, b):
        """Make heaps of a and b. Copy to output elements that are in a and b"""
        ha = list(a)
        hb = list(b)
        res = []
        heapq.heapify(ha)
        heapq.heapify(hb)
        while ha and hb:
            if ha[0] < hb[0]:
                heapq.heappop(ha)
            elif ha[0] > hb[0]:
                heapq.heappop(hb)
            else:
                res.append(heapq.heappop(ha))
                heapq.heappop(hb)
        return res
                
    def union4(a, b):    
        """Make heaps of a and b. Copy to output elements that are in a or b"""
        ha = list(a)
        hb = list(b)
        res = []
        heapq.heapify(ha)
        heapq.heapify(hb)
        while ha and hb:
            if ha[0] < hb[0]:
                res.append(heapq.heappop(ha))
            elif ha[0] > hb[0]:
                res.append(heapq.heappop(hb))
            else:
                res.append(heapq.heappop(ha))
                heapq.heappop(hb)
        return res + ha + hb # either ha or hb can contain a tail
    
    def inter99(a, b):
        """Make set of a and b. Output list of intersection."""
        return list(set(a) & set(b))
    
    def union99(a, b):
        """Make set of a and b. Output list of union."""
        return list(set(a) | set(b))
    
    def union99a(a, b):
        """Output list of set of a + b."""
        return list(set(a + b))
    

    Coeff: e.g. for inter1 timings are 8.78e-5*n^2
    Ratio: How much worse fits alternative model (indicated by alt)
    inter1 O(n2) coeff: 8.78e-05 ratio: 704 alt: O(nlogn) Loop over a and b and check if there are doubles
    inter2 O(n2) coeff: 3.12e-05 ratio: 1958 alt: O(nlogn) Loop over a and check if present in b
    inter3 O(n2) coeff: 3.47e-05 ratio: 510 alt: O(nlogn) Output a after removing all elements in b
    inter4 O(nlogn) coeff: 1.03e-05 ratio: 34 alt: O(n) Make heaps of a and b. Copy to output elements that are in a and b
    inter99 O(n) coeff: 2.71e-07 ratio: 5 alt: O(nlogn) Make set of a and b. Output list of intersection.
    union1 O(n2) coeff: 5.83e-05 ratio: 7656 alt: O(nlogn) Loop over the elements of a+b
    union2 O(n2) coeff: 3.06e-05 ratio: 1225 alt: O(nlogn) Output a + Loop over b and check if present in a
    union4 O(nlogn) coeff: 1.09e-05 ratio: 3 alt: O(n) Make heaps of a and b. Copy to output elements that are in a or b
    union99 O(n) coeff: 2.94e-07 ratio: 4 alt: O(nlogn) Make set of a and b. Output list of union.
    union99a O(nlogn) coeff: 2.30e-07 ratio: 1 alt: O(n) Output list of set of a + b.

  4. […] three different list algorithms three times, each with a different runtime complexity. From their first post last week we have list intersection and union and from a newer post yesterday we have the […]

  5. JP said

    Here’s my try in Racket and Python (including the list difference from the next post): List algorithms and efficiency

    And just the source code: python, racket

    I went ahead and graphed the timings as well, to show that they were actually O(n2), O(n log n), and O(n). I was actually a little surprised at how well that turned out.

  6. Christophe said

    Here are my tries in Scheme. I have not run time tests but simply reasoned about the performance.

    
    ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
    ;;; Union and intersection of two lists.                                     ;;;;
    ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
    (#%require (only racket printf))
    
    
    (define list1 '(4  7 12 6 17 5 13))
    (define list2 '(7 19 4 11 13 2 15))
    
    
    (define correct-union '(4 7 12 6 17 5 13 19 11 2 15))
    (define correct-inter '(4 7 13))
    ; Naive version. O(n²)
    (define (naive-union-inter l1 l2)
      (let ((union l1)
            (intersection '()))
        (let eat-lists
          ((first l1)
           (second l2))
          ; Union O(n²)
          (if (not (member (car second) union))
              (set! union (cons (car second) union)))
          ; Intersection
          (if (and (member (car first) l2) (not (member (car first) intersection)))
              (set! intersection (cons (car first) intersection)))
          
          (if (and (member (car second) l1) (not (member (car second) intersection)))
              (set! intersection (cons (car second) intersection)))
          
          (if (not (or (eq? '() (cdr first)) (eq? '() (cdr second))))
              (eat-lists (cdr first) (cdr second))))
        ; Test solution
        (display (equal? (sort union <) (sort correct-union <)))(newline)
        (display (equal? (sort intersection <) (sort correct-inter <)))(newline)))
    (naive-union-inter list1 list2)
    
    
    
    ; A little less naive. 
    ; If we assume that the sort method is in O(n log(n))
    ; we can state that the total time complexity for this
    ; algorithm is:
    ;O(n log(n)) + O(n log(n)) + O(n) == O(n log(n))
    (define (less-naive-union l1 l2)
      (let ((sl1  (sort l1 <))
            (sl2  (sort l2 <))
            (union        '())
            (intersection '()))
        (let eat-lists
          ((fst sl1)
           (snd sl2))
          (if (not (and (eq? '() fst) (eq? '() snd)))
          (cond
            ((eq? '() fst)
             (set! union (append snd union)))
            ((eq? '() snd)
             (set! union (append fst union)))
            ((equal? (car fst) (car snd))
             (set! union (cons (car fst) union))
             (set! intersection (cons (car fst) intersection))
             (eat-lists (cdr fst) (cdr snd)))
            ((< (car fst) (car snd))
             (set! union (cons (car fst) union))
             (eat-lists (cdr fst) snd))
            ((> (car fst) (car snd))
             (set! union (cons (car snd) union))
             (eat-lists fst (cdr snd))))))
        
        (printf "Union: ~a~nExpected Union: ~a~nCorrect?: ~a~n" (sort union <) (sort correct-union <) (equal? (sort union <) (sort correct-union <)))
        (printf "Intersection : ~a~nExpected Intersection: ~a~nCorrect?: ~a~n" (sort intersection <) (sort correct-inter <) (equal? (sort intersection <) (sort correct-inter <)))
        ))
    
    (less-naive-union list1 list2)
    
    ; Map a function over a vector.
    ; Function needs to parameters:
    ; The value and the index of the value.
    (define (vector-map! proc v)
      (let map
        ((index 0))
        (vector-set! v index (proc (vector-ref v index) index))
        (if (< index (- (vector-length v) 1))
            (map (+ 1 index)))))
    
    
    ; Less naive O(n).
    ; This solution is actually in O(n) but not in place.
    (define (fast-union l1 l2)
      (let ((max     -1)
            (min +inf.0)
            (vec    '())
            (union  '())
            (inter  '()))
        ; Determine maximum and minmum. O(n)
        (let walk-lists
          ((first-l  l1)
           (second-l l2))
          (if (> (car first-l) max) 
              (set! max (car first-l)))
          (if (< (car first-l) min)
              (set! min (car first-l)))
          (if (> (car second-l) max) 
              (set! max (car second-l)))
          (if (< (car second-l) min) 
              (set! min (car second-l)))
          (if (not (or (eq? '() (cdr first-l)) 
                       (eq? '() (cdr second-l))))
              (walk-lists (cdr first-l) (cdr second-l))))
        ; Create an array of this size. O(n)
        (set! vec (make-vector (+ 1 (- max min)) 0))
        (printf "Vector: ~a~n" vec)
        ; Loop over each list and mark if it has the element. O(n)
        (let ((proc (lambda (x) (vector-set! vec (- x min) (+ 1 (vector-ref vec (- x min)))))))
          (for-each proc l1)
          (for-each proc l2))
        ; Create the union. O(n)
        (vector-map! (lambda (val idx)
                       (if (> val 0)
                           (set! union (cons (+ min idx) union)))
                       val) vec)
        ; Create the intersection O(n)
        (vector-map! (lambda (val idx)
                       (if (equal? 2 val)
                           (set! inter (cons (+ min idx) inter)))
                       val) vec)
        ; Test solution
        (printf "Union: ~a~nExpected Union: ~a~nCorrect?: ~a~n" (sort union <) (sort correct-union <) (equal? (sort union <) (sort correct-union <)))
        (printf "Intersection : ~a~nExpected Intersection: ~a~nCorrect?: ~a~n" (sort inter <) (sort correct-inter <) (equal? (sort inter <) (sort correct-inter <)))
        ))
    (fast-union list1 list2)
    
  7. […] to show that the various time complexities are achieved. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments […]

Leave a comment