Career Cup

December 6, 2016

The obvious solution is to merge the two arrays, then take the median. Since the two arrays are already sorted, the merge takes O(n), and the median is then O(1):

(define (med1 lt? xs ys)
  (let* ((zs (merge lt? xs ys)) (n (length zs)))
    (list-ref zs (quotient n 2))))

> (med1 < '(1 3 5 7 9) '(2 4 6 8))
5

We uses lists instead of arrays, so the median is O(n) instead of O(1). If you don’t like that, here we perform the merge ourselves, stopping when we find the median:

(define (med2 lt? xs ys)
  (let ((n (quotient (+ (length xs) (length ys) 1) 2)))
    (let loop ((n (- n 1)) (xs xs) (ys ys))
      (cond ((zero? n) (car xs))
            ((lt? (car xs) (car ys))
              (loop (- n 1) (cdr xs) ys))
            ((lt? (car ys) (car xs))
              (loop (- n 1) xs (cdr ys)))
            (else (loop (- n 1) (cdr xs) ys))))))

> (med2 < '(1 3 5 7 9) '(2 4 6 8))
5

The solutions at Career Cup suggest that an O(log n) solution is possible, but I’m not convinced. It seems to me that all of the binary search solutions can be defeated by the two arrays [1,1,1,1,1] and [3,3,3,3], for which the median of the combined array is 1.

You can run the program at http://ideone.com/rHWd8o.

Posted by programmingpraxis

Filed in Exercises

8 Comments »

8 Responses to “Career Cup”

Colin said
December 6, 2016 at 8:32 PM
I’m still working through it, but isn’t there a log^2 n solution? I think it would have a form like this:

determine the target index (n/2), then

find-value-at-merged-index: (
if either array is empty, pull from the target index in the other array
otherwise: (
pull the median of the larger array;
determine its index in the “merged” array by binary searching in the smaller array to determine how many elements are smaller than it; (O log n)
if that location is:
the median index: we’re done
greater than the median index: “discard” the upper half of the larger array and all greater elements in the smaller array, then recur (O log n times)
less than the median index: “discard” the lower half of the larger array and all lesser elements in the smaller array, decrement target index by that number, then recur (O log n times)
)
)
Paul said
December 6, 2016 at 9:55 PM
I am not sure if it is is O(n), but I have code working (randomly tested it against brute force) like this: determine “pseudo medians” as ar[n/2-1] for both arrays (n is array length). If the “median” of the smaller array is smaller, than throw away the lower half of the smaller array and the same number of higher elements of the larger array, otherwise throw away the other half and the lowest elements form the larger array. Then repeat this process until the smaller array has a lenght smaller or equal to 2 and handle all cases with even and odd number of elements in larger array. The code works, but is rather messy. After cleanup I will post.

Paul said

December 7, 2016 at 8:54 AM

I tried to post this code on ideone, but somehow it won’t run. Apart from brute force there are 2 functions. median_s implements the idea sketched above and uses slices for every step (recursion). That makes the running time no better than O(n). median_s is basically the same, but is using pointers at every step. No copies of the data is made. Therefore the work in every step is the same, making this run in about O(ln). Timings are below

from time import clock
from random import randrange

def median(arr):
    'only used for array lengths <= 4, array should be sorted'
    n = len(arr)
    if n == 0:
        return None
    arr = sorted(arr)
    half, odd = divmod(n, 2)
    return arr[half] if odd else (arr[half] + arr[half-1]) / 2

def median_s(ar1, ar2):
    """D&C method using slices """
    n, m = len(ar1), len(ar2)
    m2 = m // 2
    if n > m:
        return median_s(ar2, ar1)
    if n == 0:
        if m == 0:
            return None
        return ar2[m2] if m & 1 else (ar2[m2] + ar2[m2-1]) / 2
    a = ar1[0]
    if n == 1:
        if m == 1:
            return (ar1[0] + ar2[0]) / 2
        if m & 1:
            med = median([a, ar2[m2-1], ar2[m2+1]])
            return (med + ar2[m2]) / 2
        return median([a, ar2[m2], ar2[m2-1]])
    if n == 2:
        if m == 2:
            return median(ar1 + ar2)
        if m & 1:
            return median([ar2[m2], max(ar1[0], ar2[m2-1]),
                           min(ar1[1], ar2[m2+1])])
        return median(
            [ar2[m2], ar2[m2 - 1], max(ar1[0], ar2[m2 - 2]),
             min(ar1[1], ar2[m2 + 1])])
    idx1, idx2 = (n - 1) // 2, (m - 1) // 2
    if ar1[idx1] <= ar2[idx2]:
        return median_s(ar1[idx1:], ar2[:-idx1])  # slices
    else:
        return median_s(ar1[:-idx1], ar2[idx1:])  # slices

def median_p(ar1, ar2):
    """D&C method using pointers"""
    n, m = len(ar1), len(ar2)
    if n > m:
        return median_p(ar2, ar1)
    return median2a(ar1, 0, n, ar2, 0, m)

def median2a(ar1, b1, e1, ar2, b2, e2):
    n, m = e1 - b1, e2 - b2
    m2 = m // 2
    if n == 0:
        if m == 0:
            return None
        return ar2[b2+m2] if m & 1 else (ar2[b2+m2] + ar2[b2+m2-1]) / 2
    a = ar1[b1+0]
    if n == 1:
        if m == 1:
            return (ar1[b1+0] + ar2[b2+0]) / 2
        if m & 1:
            med = median([a, ar2[b2+m2-1], ar2[b2+m2+1]])
            return (med + ar2[b2+m2]) / 2
        return median([a, ar2[b2+m2], ar2[b2+m2-1]])
    if n == 2:
        if m == 2:
            return median(ar1 + ar2)
        if m & 1:
            return median([ar2[b2+m2], max(ar1[b1+0], ar2[b2+m2-1]),
                           min(ar1[b1+1], ar2[b2+m2+1])])
        return median(
            [ar2[b2+m2], ar2[b2+m2 - 1], max(ar1[b1+0], ar2[b2+m2 - 2]),
             min(ar1[b1+1], ar2[b2+m2 + 1])])
    idx1, idx2 = (n - 1) // 2, (m - 1) // 2
    if ar1[b1+idx1] <= ar2[b2+idx2]:
        return median2a(ar1, b1 + idx1, e1, ar2, b2, e2 - idx1)
    else:
        return median2a(ar1, b1, e1 - idx1, ar2, b2 + idx1, e2)

def brute_force(ar1, ar2):
    """brute force"""
    return median(ar1+ar2)

# random testing against brute force
for case in range(1000):
    n = randrange(0, 41)
    m = n + randrange(max(0, n-2), n+3)
    ar1 = sorted([randrange(1, 30) for _ in range(n)])
    ar2 = sorted([randrange(1, 30) for _ in range(m)])
    mbf = brute_force(ar1, ar2)
    med = median_p(ar1, ar2)
    # med = median_s(ar1, ar2)
    assert med == mbf, "{} {} {} {}".format(med, mbf, ar1, ar2)

def benchmark(method, ar1s, ar2s):
    t0 = clock()
    for case in range(N):
        method(ar1s[case], ar2s[case])
    print("{:25s} time: {:6.2f} ".format(method.__doc__, 1000*(clock() - t0)))

N = 100
print("times in ms")
for M in [100, 200, 400, 800, 1600]:
    print(M)
    ar1s = []
    ar2s = []
    for case in range(N):
        n = randrange(0, M)
        m = n + randrange(max(0, n-2), n+3)
        ar1s.append(sorted([randrange(1, 30) for _ in range(n)]))
        ar2s.append(sorted([randrange(1, 30) for _ in range(m)]))
    benchmark(brute_force, ar1s, ar2s)
    benchmark(median_s, ar1s, ar2s)
    benchmark(median_p, ar1s, ar2s)

times in ms
100
brute force               time:   1.44
D&C method using slices   time:   1.90
D&C method using pointers time:   1.40
200
brute force               time:   2.13
D&C method using slices   time:   2.27
D&C method using pointers time:   1.54
400
brute force               time:   3.58
D&C method using slices   time:   3.18
D&C method using pointers time:   1.76
800
brute force               time:   6.20
D&C method using slices   time:   4.77
D&C method using pointers time:   1.99
1600
brute force               time:  10.32
D&C method using slices   time:   7.55
D&C method using pointers time:   2.17

Colin said

December 7, 2016 at 9:48 PM

Not exhaustively tested, but worked on a few simple cases:

; given two sorted seq-ables, return a lazy sequence of their merge
; not actually used below, but would be good for generative testing
(defn merge [a b]
  (if-let [(fa & ra) a]
    (if-let [(fb & rb) b]
      (lazy-seq (if (< fa fb)
        (cons fa (merge ra b))
        (cons fb (merge a rb))))
      a)
    b))


; find the first i such that (not (f (v i)), or (count v) if all true.
; assumes a single true -> false transition
(defn bsearch [v f]
  (loop [lo 0, hi (count v)] ; lo is inclusive, hi is exclusive
    (if (= lo hi)
      lo
      (let [mid (quot (+ lo hi) 2)]
        (if (f (v mid))
          (recur (inc mid) hi)
          (recur lo mid))))))

; given sorted vectors a and b, and 0 <= i < (+ (count a) (count b)),
; return the "i'th" element of the result of merging a and b
; (assert (= (find-merged-value-at-index i a b) (nth (merge a b) i)))
(defn find-merged-value-at-index [i a b]
  (if (> (count a) (count b))
    (recur i b a)
    (let [mb (quot (count b) 2)
          bmb (b mb)
          na<bmb (bsearch a #(< % bmb))
          loc-bmb (+ na<bmb mb)]
      (cond
        (< i loc-bmb) (recur i (subvec a 0 na<bmb) (subvec b 0 mb))
        (> i loc-bmb) (recur (- i (inc loc-bmb)) (subvec a na<bmb) (subvec b (inc mb)))
        :otherwise bmb)))) ; =

; given sorted numeric vectors a and b, return the median of their combination
(defn merged-median [a b]
  (when (or (seq a) (seq b))
    (let [n (+ (count a) (count b))
          n2 (quot n 2)
          m1 (find-merged-value-at-index n2 a b)]
      (if (odd? n)
        m1
        (/ (+ m1 (find-merged-value-at-index (dec n2) a b)) 2))))) ; yuck

matthew said

December 7, 2016 at 10:14 PM

Here’s some Javascript. The idea is that we try to find an index i (the final value of “end”), such that the first i elements of the first array and the first k-i elements of the second array, (where k is half the length of the combined arrays) are all less than or equal to the remaining elements in the two arrays. This turns out to be the smallest value of i where the first element of the second segment of the first array is greater or equal to the last element of the first segment of the second array, taking due regard for empty segments, and we can find that with a binary search. Some use is made of out of bounds array accesses in Javascript returning “undefined”:

function median2(a,b) {
    const check = (a,b) => a == undefined ||
          b == undefined || a <= b;
    const alen = a.length, blen = b.length
    const k = (alen+blen)>>1
    let start = Math.max(0,k-blen)
    let end = Math.min(k,alen)
    while(start < end) {
        const mid = start + ((end-start)>>1);
        if (check(b[k-mid-1],a[mid])) end = mid;
        else start = mid+1;
    }
    const n1 = a[end], n2 = b[k-end]
    if (n1 == undefined) return n2
    else if (n2 == undefined) return n1
    else return Math.min(n1,n2)
}

const lists = [[],[1000],[1,2],[0,0,0,0],[1,1,1],[2,2,2,2],
               [0,3,6,9],[2,3,5,6,8,9],
               [1,2,3,4,5,6,7,8,9,10]]
for (let a of lists) {
    for (let b of lists) {
        const m1 = median2(a,b)
        const c = a.concat(b).sort((x,y)=>x-y)
        const m2 = c[c.length>>1]
        console.log(m1,m2,a,b)
        console.assert(m1 === m2)
    }
}

Paul said
December 8, 2016 at 8:38 AM
@programmingpraxis and @ matthew: the definition of median (see e.g. wikipedia) is that for an odd number of samples the middle value is chosen and for an even number of samples the average of the two “mid” samples is calculated. This makes it a little more complicated to calculate it. Also the median then needs not be one of the array samples.
@matthew: nice method. It works fast (, but it assumes that the median is always given by element arr[n//2]).
matthew said
December 8, 2016 at 8:56 AM
@Paul: thanks & a fair point.For even numbers then, would need to find the maximum of the two first segments and average that with the minimum of the last two segments.

Also, it occurs to me that the logic could be simplified somewhat if a was always the shorted of the two arrays, swapping if necessary.

matthew said

December 8, 2016 at 11:18 PM

Here’s an updated version that averages the central elements for even length arrays, and also makes the first array be the smaller – simplifies setting up the initial bounds for the binary search & maybe makes the whole thing clearer:

const check = (m,n) => m == undefined || n == undefined || m <= n
const aux = (m,n,f) => m == undefined ? n : n == undefined ? m : f(m,n)

function median2(a,b) {
    if (a.length > b.length) [a,b] = [b,a]
    const alen = a.length, blen = b.length, tlen = alen+blen
    if (tlen == 0) return undefined
    const k = tlen>>1
    let start = 0, end = alen
    while(start < end) {
        const mid = start + ((end-start)>>1)
        if (check(b[k-mid-1],a[mid])) end = mid
        else start = mid+1;
    }
    const res = aux(a[end], b[k-end], Math.min)
    if (tlen%2) return res
    else return (res + aux(a[end-1],b[k-end-1],Math.max))/2
}

function median2test(a,b) {
    const c = a.concat(b).sort((x,y) => x-y)
    if (c.length == 0) return undefined
    const mid = c.length>>1
    const m = c[mid]
    if (c.length%2) return m
    else return (m + c[mid-1])/2
} 

const lists = [[],[1000],[1,2],[0,0,0,0],[1,1,1],[2,2,2,2],
               [0,3,6,9],[2,3,5,6,8,9],
               [1,2,3,4,5,6,7,8,9,10]]
for (let a of lists) {
    for (let b of lists) {
        const m1 = median2(a,b)
        const m2 = median2test(a,b)
        console.log(m1,m2,a,b)
        console.assert(m1 === m2)
    }
}

Programming Praxis

Career Cup

December 6, 2016

8 Responses to “Career Cup”

Leave a comment

Categories

Archives

Archives

Programming Praxis

Career Cup

December 6, 2016

Share this:

Related

8 Responses to “Career Cup”

Leave a comment

Categories

Archives

Archives