A Median-Of-Three Killer Sequence
November 8, 2016
Here is our median-of-three killer:
(define (m3-killer n) (if (odd? n) (error 'm3-killer "length must be even") (let* ((k (/ n 2)) (ary (make-vector n #f))) (do ((i 1 (+ i 1))) ((< k i) ary) (when (odd? i) (vector-set! ary (- i 1) i) (vector-set! ary i (+ k i))) (vector-set! ary (+ k i -1) (+ i i))))))
A timing test shows that it takes less than a second to sort a million random integers, but sorting the median-of-three killer takes 6.5 seconds:
> (time (time-quicksort 10 1000000)) (time (time-quicksort 10 ...)) 554 collections 9298 ms elapsed cpu time, including 78 ms collecting 9312 ms elapsed real time, including 37 ms collecting 2328076520 bytes allocated, including 2335573896 bytes reclaimed > (time (begin (sort < (m3-killer 1000000)) 'done)) (time (begin (sort < ...) ...)) 197 collections 6490 ms elapsed cpu time, including 32 ms collecting 6500 ms elapsed real time, including 28 ms collecting 833222352 bytes allocated, including 828448656 bytes reclaimed done
Of course, the comparison gets worse as the arrays increase in size.
You can run the program at http://ideone.com/FmC65N.
I hope this is leading up to introsort, which has the interesting property that its performance is bounded below by n log n, and yet it is not deterministic like mergesort.
Something odd there – the “killer” sequence should result in quadratic performance & a massive slowdown rather than just a factor of 6.
I think the reason is that your partition functions don’t position the pivot element between the two partitions as the killer sequence expects.
Here’s another version (this is based on Knuth’s description of Sedgewick’s modification of Dijkstra’s redevelopment of Hoare’s original, from “Structured programming with go to statements”, which has a very interesting discussion of partitioning, amongst much else – Hoare’s original paper is worth a read too – he points out, for example, that not moving the pivot to its final position makes proving termination harder (as we might end up with one empty partition and one the same size as the original array – I don’t think this can happen with the suggested algorithm, but it’s not obvious).
Note that the pivot (which is moved to the end of the array) and the first element (which is less that or equal to the pivot) act as sentinels for the first iteration through the loop – the median calculation sets this up nicely, so we still don’t need to check array bounds when moving the pointers.
Here’s a solution in C99.
Based on the killer sequences in the problem, It seems that the problem assumes that median-of-three is calculated from the first element, the middle element, and the last element in an array. My quicksort from the corresponding previous problem calculated median from the first three elements in an array. I’ve modified my earlier code to calculate medians in the same way assumed by this problem.
Compiler optimizations were disabled. I had to increase the stack size to prevent stack overflows.
There are two sections to the output. The first shows example median-of-three killer sequences, generated by init_median_of_three_killer. The second section shows runtimes for sorting random arrays versus killer arrays. Each experiment was conducted with 10 separate sorts, and the time reported is the aggregate time for all 10 sorts. Rows correspond to various array sizes.
A random array sorts faster than a killer array with the same size. However, sorting either type appears to be O(n^2). For both array types, doubling array size roughly quadruples the time to sort, suggesting O(n^2). I understand why this is the case for killer arrays, but am not sure why the algorithm scales this way for random arrays. Do others experience similar behavior? Is there a reason to expect this? It seems to indicate a bug in my code, but I couldn’t find one (although I didn’t test thoroughly).
Output:
> “It seems to indicate a bug in my code, but I couldn’t find one (although I didn’t test thoroughly).”
I found the bug. My declaration for the results array had a problem.
Before: double results[max_order+1][2];
After: double results[2][max_order+1];
The corrected time comparisons are below. As expected, the updated output no longer suggests that sorting random arrays is O(n^2), as opposed to my earlier problematic results.
[…] sorting algorithm that we have been working up to in three previous exercises is introspective sort, or introsort, invented by David Musser in 1997 for the C++ Standard Library. […]