## Career Cup

### December 6, 2016

Regular readers know that I sometimes find inspiration for these exercises at Career Cup:

Given two sorted arrays, efficiently find the median of the combined array.

Your task is to write the indicated program. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

I’m still working through it, but isn’t there a log^2 n solution? I think it would have a form like this:

determine the target index (n/2), then

find-value-at-merged-index: (

if either array is empty, pull from the target index in the other array

otherwise: (

pull the median of the larger array;

determine its index in the “merged” array by binary searching in the smaller array to determine how many elements are smaller than it; (O log n)

if that location is:

the median index: we’re done

greater than the median index: “discard” the upper half of the larger array and all greater elements in the smaller array, then recur (O log n times)

less than the median index: “discard” the lower half of the larger array and all lesser elements in the smaller array, decrement target index by that number, then recur (O log n times)

)

)

I am not sure if it is is O(n), but I have code working (randomly tested it against brute force) like this: determine “pseudo medians” as ar[n/2-1] for both arrays (n is array length). If the “median” of the smaller array is smaller, than throw away the lower half of the smaller array and the same number of higher elements of the larger array, otherwise throw away the other half and the lowest elements form the larger array. Then repeat this process until the smaller array has a lenght smaller or equal to 2 and handle all cases with even and odd number of elements in larger array. The code works, but is rather messy. After cleanup I will post.

I tried to post this code on ideone, but somehow it won’t run. Apart from brute force there are 2 functions. median_s implements the idea sketched above and uses slices for every step (recursion). That makes the running time no better than O(n). median_s is basically the same, but is using pointers at every step. No copies of the data is made. Therefore the work in every step is the same, making this run in about O(ln). Timings are below

Not exhaustively tested, but worked on a few simple cases:

Here’s some Javascript. The idea is that we try to find an index i (the final value of “end”), such that the first i elements of the first array and the first k-i elements of the second array, (where k is half the length of the combined arrays) are all less than or equal to the remaining elements in the two arrays. This turns out to be the smallest value of i where the first element of the second segment of the first array is greater or equal to the last element of the first segment of the second array, taking due regard for empty segments, and we can find that with a binary search. Some use is made of out of bounds array accesses in Javascript returning “undefined”:

@programmingpraxis and @ matthew: the definition of median (see e.g. wikipedia) is that for an odd number of samples the middle value is chosen and for an even number of samples the average of the two “mid” samples is calculated. This makes it a little more complicated to calculate it. Also the median then needs not be one of the array samples.

@matthew: nice method. It works fast (, but it assumes that the median is always given by element arr[n//2]).

@Paul: thanks & a fair point.For even numbers then, would need to find the maximum of the two first segments and average that with the minimum of the last two segments.

Also, it occurs to me that the logic could be simplified somewhat if a was always the shorted of the two arrays, swapping if necessary.

Here’s an updated version that averages the central elements for even length arrays, and also makes the first array be the smaller – simplifies setting up the initial bounds for the binary search & maybe makes the whole thing clearer: