Career Cup
December 6, 2016
The obvious solution is to merge the two arrays, then take the median. Since the two arrays are already sorted, the merge takes O(n), and the median is then O(1):
(define (med1 lt? xs ys) (let* ((zs (merge lt? xs ys)) (n (length zs))) (list-ref zs (quotient n 2)))) > (med1 < '(1 3 5 7 9) '(2 4 6 8)) 5
We uses lists instead of arrays, so the median is O(n) instead of O(1). If you don’t like that, here we perform the merge ourselves, stopping when we find the median:
(define (med2 lt? xs ys) (let ((n (quotient (+ (length xs) (length ys) 1) 2))) (let loop ((n (- n 1)) (xs xs) (ys ys)) (cond ((zero? n) (car xs)) ((lt? (car xs) (car ys)) (loop (- n 1) (cdr xs) ys)) ((lt? (car ys) (car xs)) (loop (- n 1) xs (cdr ys))) (else (loop (- n 1) (cdr xs) ys)))))) > (med2 < '(1 3 5 7 9) '(2 4 6 8)) 5
The solutions at Career Cup suggest that an O(log n) solution is possible, but I’m not convinced. It seems to me that all of the binary search solutions can be defeated by the two arrays [1,1,1,1,1] and [3,3,3,3], for which the median of the combined array is 1.
You can run the program at http://ideone.com/rHWd8o.
I’m still working through it, but isn’t there a log^2 n solution? I think it would have a form like this:
determine the target index (n/2), then
find-value-at-merged-index: (
if either array is empty, pull from the target index in the other array
otherwise: (
pull the median of the larger array;
determine its index in the “merged” array by binary searching in the smaller array to determine how many elements are smaller than it; (O log n)
if that location is:
the median index: we’re done
greater than the median index: “discard” the upper half of the larger array and all greater elements in the smaller array, then recur (O log n times)
less than the median index: “discard” the lower half of the larger array and all lesser elements in the smaller array, decrement target index by that number, then recur (O log n times)
)
)
I am not sure if it is is O(n), but I have code working (randomly tested it against brute force) like this: determine “pseudo medians” as ar[n/2-1] for both arrays (n is array length). If the “median” of the smaller array is smaller, than throw away the lower half of the smaller array and the same number of higher elements of the larger array, otherwise throw away the other half and the lowest elements form the larger array. Then repeat this process until the smaller array has a lenght smaller or equal to 2 and handle all cases with even and odd number of elements in larger array. The code works, but is rather messy. After cleanup I will post.
I tried to post this code on ideone, but somehow it won’t run. Apart from brute force there are 2 functions. median_s implements the idea sketched above and uses slices for every step (recursion). That makes the running time no better than O(n). median_s is basically the same, but is using pointers at every step. No copies of the data is made. Therefore the work in every step is the same, making this run in about O(ln). Timings are below
Not exhaustively tested, but worked on a few simple cases:
Here’s some Javascript. The idea is that we try to find an index i (the final value of “end”), such that the first i elements of the first array and the first k-i elements of the second array, (where k is half the length of the combined arrays) are all less than or equal to the remaining elements in the two arrays. This turns out to be the smallest value of i where the first element of the second segment of the first array is greater or equal to the last element of the first segment of the second array, taking due regard for empty segments, and we can find that with a binary search. Some use is made of out of bounds array accesses in Javascript returning “undefined”:
@programmingpraxis and @ matthew: the definition of median (see e.g. wikipedia) is that for an odd number of samples the middle value is chosen and for an even number of samples the average of the two “mid” samples is calculated. This makes it a little more complicated to calculate it. Also the median then needs not be one of the array samples.
@matthew: nice method. It works fast (, but it assumes that the median is always given by element arr[n//2]).
@Paul: thanks & a fair point.For even numbers then, would need to find the maximum of the two first segments and average that with the minimum of the last two segments.
Also, it occurs to me that the logic could be simplified somewhat if a was always the shorted of the two arrays, swapping if necessary.
Here’s an updated version that averages the central elements for even length arrays, and also makes the first array be the smaller – simplifies setting up the initial bounds for the binary search & maybe makes the whole thing clearer: