## Introspective Sort

### November 11, 2016

The sorting algorithm that we have been working up to in three previous exercises is introspective sort, or introsort, invented by David Musser in 1997 for the C++ Standard Library. Introsort is basically quicksort, with median-of-three partitioning and a switch to insertion sort when the partitions get small, but with a twist. The problem of quicksort is that some sequences have the property that most of the recursive calls don’t significantly reduce the size of the data to be sorted, causing a quadratic worst case. Introsort fixes that by switching to heapsort if the depth of recursion gets too large; since heapsort has guaranteed O(*n* log *n*) behavior, so does introsort. The changeover from quicksort to heapsort occurs after *k* * floor(log(length(*A*))) recursive calls to quicksort, where *k* is a tuning parameter, frequently set to 2, that can be used to adjust performance of the sorting algorithm.

Your task is to implement introsort. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Is that a natural logarithm?

The

`log`

function in Scheme, that I used in my program, is a natural logarithm to basee. Theoretically, the logarithm should be to base 2, since you are calculating the depth of recursion assuming a perfect split into two equal-size sub-arrays at each recursive call. In practice, you probably want to try many different values ofkto find the optimum value for your circumstances; a value close to 1 means that you will be making many calls to heapsort, which is naturally slower than quicksort, but a value far from 1 means that you are continuing to make non-productive recursive calls rather than switching to heapsort.Fair enough, though this means that with k=2, we are doing heapsort quite a lot, even with random input (so I’m surprised that introsort seems to be faster, though that might just be noise).

I went back and looked at Musser’s paper. He uses 2 * floor(log2 n), but suggests testing to determine an empirically good value that produces good results with your environment. I’ve done a little bit of experimenting, but intend to do more.

Here’s a solution in C99.

The program output is included at the bottom of this post. It shows runtimes for various scenarios. Each experiment was conducted with 10 separate sorts, and the time reported is the aggregate time for all 10 sorts. Rows correspond to various array sizes.

Column 1: array size

Column 2: Random array quicksort

Column 3: Random array heapsort

Column 4: Random array introsort

Column 5: Killer array quicksort

Column 6: Killer array heapsort

Column 7: Killer array introsort

The killer arrays were generated using the ‘Median-Of-Three Killer Sequence’ procedure from an earlier problem.

For all experiments, quicksort includes the optimizations from earlier problems, 1) inline swap, 2) early cutoff to insertion sort, and 3) median-of-three pivot selection. These same optimizations were also used for introsort.

I increased the stack size to prevent stack overflows. Compiler optimizations were disabled.

This updated main function includes column numbers in the output.

Output:

@Daniel: good stuff, but you want to be calculating the depth limit k*log(n) at the start and not at each recursive call.

@matthew, Thanks!

Here’s the updated code along with updated output.

Output: