Merge Sort
November 10, 2009
It is awkward to sort arrays by merge sort, since it requires two arrays, an input array and an output array; merge sort is used far more often to sort linked lists, as in the Standard Prelude. We begin with a function that merges two sub-arrays, left
through middle
and middle + 1
through right
, from an input array into the positions left
through right
in an output array:
function merge(A,left,middle,right, B,i,j,k) {
i = left; j = middle + 1
while (i <= middle && j <= right)
if (A[i] <= A[j]) B[++k] = A[i++]
else B[++k] = A[j++]
while (i <= middle)
B[++k] = A[i++]
while (j <= right)
B[++k] = A[j++]
for (k = left; k <= right; k++) {
A[k] = B[k]; delete B[k] } }
Here i
indexes the first sub-array, j
indexes the second sub-array, and k
indexes the output array; we take advantage of Awk’s implicit initialization for k = 0
. The first while
loop runs until one of the sequences ends, the second and third while
loops pick up the remainders of the runs, and the fourth while
loop copies the output array back to the input array. Notice that we broke our convention, using less-than-or-equal instead of less-than to compare array elements; doing so allows our sort to be stable, with equal elements appearing in the output in the same order as the input.
Given merge
, the actual sort
function splits the array in two, sorts the two halves recursively, then merges them together:
function mergesort(A,left,right, middle) {
if (left < right) {
middle = int((left + right) / 2)
mergesort(A, left, middle)
mergesort(A, middle + 1, right)
merge(A, left, middle, right) } }
function sort(A,n) { mergesort(A, 1, n) }
If you like, you can move the merge inline and eliminate recursion; l1
and l2
index the lower bounds of the two sub-arrays, u1
and u2
index the upper bounds of the two sub-arrays, k
indexes the output array, and g
is the gap between the two sub-arrays, which doubles at each pass:
function sort(A,n, l1,l2,u1,u2,k,g,B) {
for (g = 1; g < n; g *= 2) {
k = 0
for (l1 = 1; l1 + g <= n; l1 = u2 + 1) {
l2 = l1 + g; u1 = l2 - 1
u2 = l2 + g - 1; if (u2 > n) u2 = n
while (l1 <= u1 && l2 <= u2)
if (A[l1] <= A[l2])
B[++k] = A[l1++]
else B[++k] = A[l2++]
while (l1 <= u1)
B[++k] = A[l1++]
while (l2 <= u2)
B[++k] = A[l2++] }
while (k < n)
B[++k] = A[l1++]
for (k = 1; k <= n; k++)
A[k] = B[k] } }
All of the code, including the test suite of the prior exercise, is collected at http://programmingpraxis.codepad.org/Wk6rsu1Z, but you can't run it because codepad doesn't provide an Awk interpreter.
An idiotically cumbersome Java mergesort implementation. The private mergesort returns a pointer to the fully sorted (unsegmented) array.
My implementation in C