Career Cup

December 6, 2016

Regular readers know that I sometimes find inspiration for these exercises at Career Cup:

Given two sorted arrays, efficiently find the median of the combined array.

Your task is to write the indicated program. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Advertisements

Pages: 1 2

8 Responses to “Career Cup”

  1. Colin said

    I’m still working through it, but isn’t there a log^2 n solution? I think it would have a form like this:

    determine the target index (n/2), then

    find-value-at-merged-index: (
    if either array is empty, pull from the target index in the other array
    otherwise: (
    pull the median of the larger array;
    determine its index in the “merged” array by binary searching in the smaller array to determine how many elements are smaller than it; (O log n)
    if that location is:
    the median index: we’re done
    greater than the median index: “discard” the upper half of the larger array and all greater elements in the smaller array, then recur (O log n times)
    less than the median index: “discard” the lower half of the larger array and all lesser elements in the smaller array, decrement target index by that number, then recur (O log n times)
    )
    )

  2. Paul said

    I am not sure if it is is O(n), but I have code working (randomly tested it against brute force) like this: determine “pseudo medians” as ar[n/2-1] for both arrays (n is array length). If the “median” of the smaller array is smaller, than throw away the lower half of the smaller array and the same number of higher elements of the larger array, otherwise throw away the other half and the lowest elements form the larger array. Then repeat this process until the smaller array has a lenght smaller or equal to 2 and handle all cases with even and odd number of elements in larger array. The code works, but is rather messy. After cleanup I will post.

  3. Paul said

    I tried to post this code on ideone, but somehow it won’t run. Apart from brute force there are 2 functions. median_s implements the idea sketched above and uses slices for every step (recursion). That makes the running time no better than O(n). median_s is basically the same, but is using pointers at every step. No copies of the data is made. Therefore the work in every step is the same, making this run in about O(ln). Timings are below

    from time import clock
    from random import randrange
    
    def median(arr):
        'only used for array lengths <= 4, array should be sorted'
        n = len(arr)
        if n == 0:
            return None
        arr = sorted(arr)
        half, odd = divmod(n, 2)
        return arr[half] if odd else (arr[half] + arr[half-1]) / 2
    
    def median_s(ar1, ar2):
        """D&C method using slices """
        n, m = len(ar1), len(ar2)
        m2 = m // 2
        if n > m:
            return median_s(ar2, ar1)
        if n == 0:
            if m == 0:
                return None
            return ar2[m2] if m & 1 else (ar2[m2] + ar2[m2-1]) / 2
        a = ar1[0]
        if n == 1:
            if m == 1:
                return (ar1[0] + ar2[0]) / 2
            if m & 1:
                med = median([a, ar2[m2-1], ar2[m2+1]])
                return (med + ar2[m2]) / 2
            return median([a, ar2[m2], ar2[m2-1]])
        if n == 2:
            if m == 2:
                return median(ar1 + ar2)
            if m & 1:
                return median([ar2[m2], max(ar1[0], ar2[m2-1]),
                               min(ar1[1], ar2[m2+1])])
            return median(
                [ar2[m2], ar2[m2 - 1], max(ar1[0], ar2[m2 - 2]),
                 min(ar1[1], ar2[m2 + 1])])
        idx1, idx2 = (n - 1) // 2, (m - 1) // 2
        if ar1[idx1] <= ar2[idx2]:
            return median_s(ar1[idx1:], ar2[:-idx1])  # slices
        else:
            return median_s(ar1[:-idx1], ar2[idx1:])  # slices
    
    def median_p(ar1, ar2):
        """D&C method using pointers"""
        n, m = len(ar1), len(ar2)
        if n > m:
            return median_p(ar2, ar1)
        return median2a(ar1, 0, n, ar2, 0, m)
    
    def median2a(ar1, b1, e1, ar2, b2, e2):
        n, m = e1 - b1, e2 - b2
        m2 = m // 2
        if n == 0:
            if m == 0:
                return None
            return ar2[b2+m2] if m & 1 else (ar2[b2+m2] + ar2[b2+m2-1]) / 2
        a = ar1[b1+0]
        if n == 1:
            if m == 1:
                return (ar1[b1+0] + ar2[b2+0]) / 2
            if m & 1:
                med = median([a, ar2[b2+m2-1], ar2[b2+m2+1]])
                return (med + ar2[b2+m2]) / 2
            return median([a, ar2[b2+m2], ar2[b2+m2-1]])
        if n == 2:
            if m == 2:
                return median(ar1 + ar2)
            if m & 1:
                return median([ar2[b2+m2], max(ar1[b1+0], ar2[b2+m2-1]),
                               min(ar1[b1+1], ar2[b2+m2+1])])
            return median(
                [ar2[b2+m2], ar2[b2+m2 - 1], max(ar1[b1+0], ar2[b2+m2 - 2]),
                 min(ar1[b1+1], ar2[b2+m2 + 1])])
        idx1, idx2 = (n - 1) // 2, (m - 1) // 2
        if ar1[b1+idx1] <= ar2[b2+idx2]:
            return median2a(ar1, b1 + idx1, e1, ar2, b2, e2 - idx1)
        else:
            return median2a(ar1, b1, e1 - idx1, ar2, b2 + idx1, e2)
    
    def brute_force(ar1, ar2):
        """brute force"""
        return median(ar1+ar2)
    
    # random testing against brute force
    for case in range(1000):
        n = randrange(0, 41)
        m = n + randrange(max(0, n-2), n+3)
        ar1 = sorted([randrange(1, 30) for _ in range(n)])
        ar2 = sorted([randrange(1, 30) for _ in range(m)])
        mbf = brute_force(ar1, ar2)
        med = median_p(ar1, ar2)
        # med = median_s(ar1, ar2)
        assert med == mbf, "{} {} {} {}".format(med, mbf, ar1, ar2)
    
    def benchmark(method, ar1s, ar2s):
        t0 = clock()
        for case in range(N):
            method(ar1s[case], ar2s[case])
        print("{:25s} time: {:6.2f} ".format(method.__doc__, 1000*(clock() - t0)))
    
    N = 100
    print("times in ms")
    for M in [100, 200, 400, 800, 1600]:
        print(M)
        ar1s = []
        ar2s = []
        for case in range(N):
            n = randrange(0, M)
            m = n + randrange(max(0, n-2), n+3)
            ar1s.append(sorted([randrange(1, 30) for _ in range(n)]))
            ar2s.append(sorted([randrange(1, 30) for _ in range(m)]))
        benchmark(brute_force, ar1s, ar2s)
        benchmark(median_s, ar1s, ar2s)
        benchmark(median_p, ar1s, ar2s)
    
    times in ms
    100
    brute force               time:   1.44
    D&C method using slices   time:   1.90
    D&C method using pointers time:   1.40
    200
    brute force               time:   2.13
    D&C method using slices   time:   2.27
    D&C method using pointers time:   1.54
    400
    brute force               time:   3.58
    D&C method using slices   time:   3.18
    D&C method using pointers time:   1.76
    800
    brute force               time:   6.20
    D&C method using slices   time:   4.77
    D&C method using pointers time:   1.99
    1600
    brute force               time:  10.32
    D&C method using slices   time:   7.55
    D&C method using pointers time:   2.17
    
  4. Colin said

    Not exhaustively tested, but worked on a few simple cases:

    ; given two sorted seq-ables, return a lazy sequence of their merge
    ; not actually used below, but would be good for generative testing
    (defn merge [a b]
      (if-let [(fa & ra) a]
        (if-let [(fb & rb) b]
          (lazy-seq (if (< fa fb)
            (cons fa (merge ra b))
            (cons fb (merge a rb))))
          a)
        b))
    
    
    ; find the first i such that (not (f (v i)), or (count v) if all true.
    ; assumes a single true -> false transition
    (defn bsearch [v f]
      (loop [lo 0, hi (count v)] ; lo is inclusive, hi is exclusive
        (if (= lo hi)
          lo
          (let [mid (quot (+ lo hi) 2)]
            (if (f (v mid))
              (recur (inc mid) hi)
              (recur lo mid))))))
    
    ; given sorted vectors a and b, and 0 <= i < (+ (count a) (count b)),
    ; return the "i'th" element of the result of merging a and b
    ; (assert (= (find-merged-value-at-index i a b) (nth (merge a b) i)))
    (defn find-merged-value-at-index [i a b]
      (if (> (count a) (count b))
        (recur i b a)
        (let [mb (quot (count b) 2)
              bmb (b mb)
              na<bmb (bsearch a #(< % bmb))
              loc-bmb (+ na<bmb mb)]
          (cond
            (< i loc-bmb) (recur i (subvec a 0 na<bmb) (subvec b 0 mb))
            (> i loc-bmb) (recur (- i (inc loc-bmb)) (subvec a na<bmb) (subvec b (inc mb)))
            :otherwise bmb)))) ; =
    
    ; given sorted numeric vectors a and b, return the median of their combination
    (defn merged-median [a b]
      (when (or (seq a) (seq b))
        (let [n (+ (count a) (count b))
              n2 (quot n 2)
              m1 (find-merged-value-at-index n2 a b)]
          (if (odd? n)
            m1
            (/ (+ m1 (find-merged-value-at-index (dec n2) a b)) 2))))) ; yuck
    
  5. matthew said

    Here’s some Javascript. The idea is that we try to find an index i (the final value of “end”), such that the first i elements of the first array and the first k-i elements of the second array, (where k is half the length of the combined arrays) are all less than or equal to the remaining elements in the two arrays. This turns out to be the smallest value of i where the first element of the second segment of the first array is greater or equal to the last element of the first segment of the second array, taking due regard for empty segments, and we can find that with a binary search. Some use is made of out of bounds array accesses in Javascript returning “undefined”:

    function median2(a,b) {
        const check = (a,b) => a == undefined ||
              b == undefined || a <= b;
        const alen = a.length, blen = b.length
        const k = (alen+blen)>>1
        let start = Math.max(0,k-blen)
        let end = Math.min(k,alen)
        while(start < end) {
            const mid = start + ((end-start)>>1);
            if (check(b[k-mid-1],a[mid])) end = mid;
            else start = mid+1;
        }
        const n1 = a[end], n2 = b[k-end]
        if (n1 == undefined) return n2
        else if (n2 == undefined) return n1
        else return Math.min(n1,n2)
    }
    
    const lists = [[],[1000],[1,2],[0,0,0,0],[1,1,1],[2,2,2,2],
                   [0,3,6,9],[2,3,5,6,8,9],
                   [1,2,3,4,5,6,7,8,9,10]]
    for (let a of lists) {
        for (let b of lists) {
            const m1 = median2(a,b)
            const c = a.concat(b).sort((x,y)=>x-y)
            const m2 = c[c.length>>1]
            console.log(m1,m2,a,b)
            console.assert(m1 === m2)
        }
    }
    
  6. Paul said

    @programmingpraxis and @ matthew: the definition of median (see e.g. wikipedia) is that for an odd number of samples the middle value is chosen and for an even number of samples the average of the two “mid” samples is calculated. This makes it a little more complicated to calculate it. Also the median then needs not be one of the array samples.
    @matthew: nice method. It works fast (, but it assumes that the median is always given by element arr[n//2]).

  7. matthew said

    @Paul: thanks & a fair point.For even numbers then, would need to find the maximum of the two first segments and average that with the minimum of the last two segments.

    Also, it occurs to me that the logic could be simplified somewhat if a was always the shorted of the two arrays, swapping if necessary.

  8. matthew said

    Here’s an updated version that averages the central elements for even length arrays, and also makes the first array be the smaller – simplifies setting up the initial bounds for the binary search & maybe makes the whole thing clearer:

    const check = (m,n) => m == undefined || n == undefined || m <= n
    const aux = (m,n,f) => m == undefined ? n : n == undefined ? m : f(m,n)
    
    function median2(a,b) {
        if (a.length > b.length) [a,b] = [b,a]
        const alen = a.length, blen = b.length, tlen = alen+blen
        if (tlen == 0) return undefined
        const k = tlen>>1
        let start = 0, end = alen
        while(start < end) {
            const mid = start + ((end-start)>>1)
            if (check(b[k-mid-1],a[mid])) end = mid
            else start = mid+1;
        }
        const res = aux(a[end], b[k-end], Math.min)
        if (tlen%2) return res
        else return (res + aux(a[end-1],b[k-end-1],Math.max))/2
    }
    
    function median2test(a,b) {
        const c = a.concat(b).sort((x,y) => x-y)
        if (c.length == 0) return undefined
        const mid = c.length>>1
        const m = c[mid]
        if (c.length%2) return m
        else return (m + c[mid-1])/2
    } 
    
    const lists = [[],[1000],[1,2],[0,0,0,0],[1,1,1],[2,2,2,2],
                   [0,3,6,9],[2,3,5,6,8,9],
                   [1,2,3,4,5,6,7,8,9,10]]
    for (let a of lists) {
        for (let b of lists) {
            const m1 = median2(a,b)
            const m2 = median2test(a,b)
            console.log(m1,m2,a,b)
            console.assert(m1 === m2)
        }
    }
    

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: