Introspective Sort

November 11, 2016

We reuse most of the insertion sort and quicksort code from the previous exercises, including the partitioning algorithm. Introsort looks like this:

(define k 2)

(define (introsort lt? ary lo hi depth)
  (when (< cutoff (- hi lo))
    (if (zero? depth) (heapsort lt? ary lo hi)
      (call-with-values
        (lambda () (partition lt? ary lo hi))
        (lambda (p ary)
          (cond ((< (- p lo) (- hi p))
                  (introsort lt? ary lo (+ p 1) (- depth 1))
                  (introsort lt? ary (+ p 1) hi (- depth 1)))
          (else (introsort lt? ary (+ p 1) hi (- depth 1))
                (introsort lt? ary lo (+ p 1) (- depth 1))))))))
  ary)

Then the complete sorting algorithm is this:

(define (sort lt? ary)
  (let* ((len (vector-length ary))
         (depth (* k (round (log len)))))
    (introsort lt? ary 0 len)
    (insert-sort lt? ary 0 len)))

Timing shows that introsort is faster than our fastest quicksort, because the bad partitions that sometimes occur with quicksort are handled by heapsort:

> (time (time-quicksort 1000000 10))
(time (time-quicksort 1000000 ...))
23 collections
5.413136451s elapsed cpu time, including 0.069669265s collecting
5.477370882s elapsed real time, including 0.071198931s collecting
194865504 bytes allocated, including 190784144 bytes reclaimed

The improvement is small, from 5.448 seconds in the previous exercise to 5.413 seconds, but in addition to the improved time this program offers a guaranteed O(n log n) time complexity, with no niggling worries about a quadratic blowup, and it is particularly elegant. Well done, David Musser!

You can run the program at http://ideone.com/N1J8W8, where you will see the complete code, including heapsort.

Posted by programmingpraxis

Filed in Exercises

8 Comments »

8 Responses to “Introspective Sort”

matthew said
November 11, 2016 at 3:16 PM
Is that a natural logarithm?
programmingpraxis said
November 11, 2016 at 4:05 PM
The log function in Scheme, that I used in my program, is a natural logarithm to base e. Theoretically, the logarithm should be to base 2, since you are calculating the depth of recursion assuming a perfect split into two equal-size sub-arrays at each recursive call. In practice, you probably want to try many different values of k to find the optimum value for your circumstances; a value close to 1 means that you will be making many calls to heapsort, which is naturally slower than quicksort, but a value far from 1 means that you are continuing to make non-productive recursive calls rather than switching to heapsort.
matthew said
November 12, 2016 at 11:04 AM
Fair enough, though this means that with k=2, we are doing heapsort quite a lot, even with random input (so I’m surprised that introsort seems to be faster, though that might just be noise).
programmingpraxis said
November 12, 2016 at 4:26 PM
I went back and looked at Musser’s paper. He uses 2 * floor(log2 n), but suggests testing to determine an empirically good value that produces good results with your environment. I’ve done a little bit of experimenting, but intend to do more.

Daniel said

November 13, 2016 at 9:23 PM

Here’s a solution in C99.

The program output is included at the bottom of this post. It shows runtimes for various scenarios. Each experiment was conducted with 10 separate sorts, and the time reported is the aggregate time for all 10 sorts. Rows correspond to various array sizes.

Column 1: array size
Column 2: Random array quicksort
Column 3: Random array heapsort
Column 4: Random array introsort
Column 5: Killer array quicksort
Column 6: Killer array heapsort
Column 7: Killer array introsort

The killer arrays were generated using the ‘Median-Of-Three Killer Sequence’ procedure from an earlier problem.

For all experiments, quicksort includes the optimizations from earlier problems, 1) inline swap, 2) early cutoff to insertion sort, and 3) median-of-three pivot selection. These same optimizations were also used for introsort.

I increased the stack size to prevent stack overflows. Compiler optimizations were disabled.

#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

static inline void swap(int* a, int* b) {
    int tmp = *a;
    *a = *b;
    *b = tmp;
}

// reheapify downward from specified subroot
static void reheapify(int* heap, int subroot, size_t n) {
	while (1) {
		int left = 2*subroot+1;
		if (left >= n) break;
		int right = 2*subroot+2;
		int candidate = left;
		if (right < n && heap[right] > heap[left]) candidate = right;
		if (heap[subroot] >= heap[candidate]) break;
		swap(&heap[subroot], &heap[candidate]);
		subroot = candidate;
	}
}

void heapsort(int* array, size_t n) {
	if (n <= 1) return;
	// heapify with Floyd's method, O(n)
	for (int i = (n-2)/2; i >= 0; --i) {
		reheapify(array, i, n);
	}
	// heapsort procedure:
	//   1) swap root (max) with the last element
	//   2) exclude last element from heap
	//   3) reheapify new root downward
	for (int i = n-1; i > 0; --i) {
		swap(&array[0], &array[i]);
		reheapify(array, 0, i);
	}
}

void insertionsort(int* array, size_t n) {
	for (int i = 1; i < n; ++i) {
		for (int j = i; j > 0 && array[j-1] > array[j]; --j) {
			swap(&array[j], &array[j-1]);
		}
	}
}

static int partition(int* array, size_t n) {
	int pivot = array[0];
	// median-of-three
	if (n >= 3) {
		int sample[3];
		sample[0] = array[0];
		sample[1] = array[n/2];
		sample[2] = array[n-1];
		insertionsort(sample, 3);
		pivot = sample[1];
	}
	int i = -1;
	int j = n;
	while (1) {
		do ++i; while (array[i] < pivot);
		do --j; while (array[j] > pivot);
		if (i >= j) return j;
		swap(&array[i], &array[j]);
	}
}

void quicksort(int* array, size_t n) {
	if (n < 2) return;
	// early cutoff to insertion sort
	if (n < 22) {
		insertionsort(array, n);
		return;
	}
	int p = partition(array, n);
	quicksort(array, p + 1);
	quicksort(&array[p + 1], n - p - 1);
}

// floor(log(n)). Assumes n > 0.
static int ilog2(int n) {
	int output = 0;
	while (n >>= 1) ++output;
	return output;
}

static void introsort_internal(int* array, size_t n, size_t depth) {
	if (n < 2) return;
	// early cutoff to insertion sort
	if (n < 22) {
		insertionsort(array, n);
		return;
	}
	// early cutoff to heapsort
	if (depth > 2 * ilog2(n)) {
		heapsort(array, n);
		return;
	}
	int p = partition(array, n);
	introsort_internal(array, p + 1, depth + 1);
	introsort_internal(&array[p + 1], n - p - 1, depth + 1);
}

void introsort(int* array, size_t n) {
	introsort_internal(array, n, 0);
}

static void init_random_array(int* array, size_t n) {
	for (int i = 0; i < n; ++i) {
		array[i] = rand();
	}
}

static void init_median_of_three_killer(int* array, size_t n) {
	// if n is odd, set the last element to n-1, and proceed
	// with n decremented by 1
	if (n % 2 != 0) {
		array[n-1] = n;
		--n;
	}
	size_t m = n/2;
	for (int i = 0; i < m; ++i) {
		// first half of array (even indices)
		if (i % 2 == 0) array[i] = i + 1;
		// first half of array (odd indices)
		else array[i] = m + i + (m % 2 != 0 ? 1 : 0);
		// second half of array
		array[m + i] = (i+1) * 2;
	}
}

double time_sort_algorithm(
		void (*sort_fn)(int*, size_t),
		void (*init_fn)(int*, size_t),
		size_t n,
		int loops) {
	double elapsed = 0;
	int* array = malloc(n*sizeof(int));
	for (int i = 0; i < loops; ++i) {
		init_fn(array, n);
		clock_t tic = clock();
		sort_fn(array, n);
		clock_t toc = clock();
		elapsed += (double)(toc-tic) / CLOCKS_PER_SEC;
	}
	free(array);
	return elapsed;
}

int main(void) {
	srand(time(NULL));
	int min_order = 10;
	int max_order = 18;
	int loops = 10;

	for (int order = min_order; order <= max_order; ++order) {
		int n = 1 << order;
		printf("%*d", 6, n);

		double time;

		// Random Array quicksort
		time = time_sort_algorithm(quicksort, init_random_array, n, loops);
		printf(" %0.3f", time);

		// Random Array heapsort
		time = time_sort_algorithm(heapsort, init_random_array, n, loops);
		printf(" %0.3f", time);

		// Random Array introsort
		time = time_sort_algorithm(introsort, init_random_array, n, loops);
		printf(" %0.3f", time);

		// Killer Array quicksort
		time = time_sort_algorithm(quicksort, init_median_of_three_killer, n, loops);
		printf(" %7.3f", time);

		// Killer Array heapsort
		time = time_sort_algorithm(heapsort, init_median_of_three_killer, n, loops);
		printf(" %0.3f", time);

		// Killer Array introsort
		time = time_sort_algorithm(introsort, init_median_of_three_killer, n, loops);
		printf(" %0.3f", time);

		printf("\n");
	}

	return 0;
}

  1024 0.000 0.000 0.000   0.000 0.000 0.000
  2048 0.016 0.000 0.000   0.031 0.000 0.000
  4096 0.016 0.000 0.000   0.078 0.015 0.016
  8192 0.000 0.015 0.000   0.344 0.016 0.015
 16384 0.032 0.031 0.031   1.234 0.032 0.047
 32768 0.062 0.078 0.047   4.938 0.093 0.078
 65536 0.094 0.203 0.126  20.765 0.171 0.235
131072 0.234 0.485 0.296  86.234 0.360 0.469
262144 0.516 0.890 0.610 349.516 0.718 1.000

Daniel said

November 13, 2016 at 9:40 PM

This updated main function includes column numbers in the output.

int main(void) {
	srand(time(NULL));
	int min_order = 10;
	int max_order = 18;
	int loops = 10;

	printf("1: Array Size\n");
	printf("2: Random Array Quicksort\n");
	printf("3: Random Array Heapsort\n");
	printf("4: Random Array Introsort\n");
	printf("5: Killer Array Quicksort\n");
	printf("6: Killer Array Heapsort\n");
	printf("7: Killer Array Introsort\n\n");
	printf("     1     2     3     4       5     6     7\n");

	for (int order = min_order; order <= max_order; ++order) {
		int n = 1 << order;
		// Array Size
		printf("%*d", 6, n);

		double time;

		// Random Array Quicksort
		time = time_sort_algorithm(quicksort, init_random_array, n, loops);
		printf(" %0.3f", time);

		// Random Array Heapsort
		time = time_sort_algorithm(heapsort, init_random_array, n, loops);
		printf(" %0.3f", time);

		// Random Array Introsort
		time = time_sort_algorithm(introsort, init_random_array, n, loops);
		printf(" %0.3f", time);

		// Killer Array Quicksort
		time = time_sort_algorithm(quicksort, init_median_of_three_killer, n, loops);
		printf(" %7.3f", time);

		// Killer Array Heapsort
		time = time_sort_algorithm(heapsort, init_median_of_three_killer, n, loops);
		printf(" %0.3f", time);

		// Killer Array Introsort
		time = time_sort_algorithm(introsort, init_median_of_three_killer, n, loops);
		printf(" %0.3f", time);

		printf("\n");
	}

	return 0;
}

Output:

1: Array Size
2: Random Array Quicksort
3: Random Array Heapsort
4: Random Array Introsort
5: Killer Array Quicksort
6: Killer Array Heapsort
7: Killer Array Introsort

     1     2     3     4       5     6     7
  1024 0.015 0.000 0.000   0.000 0.016 0.000
  2048 0.000 0.000 0.015   0.016 0.000 0.016
  4096 0.000 0.015 0.000   0.094 0.016 0.000
  8192 0.015 0.016 0.016   0.359 0.016 0.015
 16384 0.031 0.047 0.032   1.453 0.031 0.062
 32768 0.063 0.094 0.062   5.813 0.078 0.109
 65536 0.125 0.235 0.156  23.047 0.156 0.234
131072 0.250 0.469 0.281  88.672 0.328 0.453
262144 0.470 0.937 0.609 354.890 0.797 0.984

matthew said
November 14, 2016 at 7:53 AM
@Daniel: good stuff, but you want to be calculating the depth limit k*log(n) at the start and not at each recursive call.

Daniel said

November 15, 2016 at 2:22 AM

@matthew, Thanks!

Here’s the updated code along with updated output.

static void introsort_internal(int* array, size_t n, int countdown) {
	if (n < 2) return;
	// early cutoff to insertion sort
	if (n < 22) {
		insertionsort(array, n);
		return;
	}
	// early cutoff to heapsort
	if (countdown < 0) {
		heapsort(array, n);
		return;
	}
	int p = partition(array, n);
	introsort_internal(array, p + 1, countdown - 1);
	introsort_internal(&array[p + 1], n - p - 1, countdown - 1);
}

void introsort(int* array, size_t n) {
	if (n < 2) return;
	introsort_internal(array, n, 2 * ilog2(n));
}

Output:

1: Array Size
2: Random Array Quicksort
3: Random Array Heapsort
4: Random Array Introsort
5: Killer Array Quicksort
6: Killer Array Heapsort
7: Killer Array Introsort

     1     2     3     4       5     6     7
  1024 0.000 0.015 0.000   0.000 0.000 0.000
  2048 0.000 0.000 0.000   0.016 0.000 0.015
  4096 0.000 0.000 0.016   0.047 0.015 0.000
  8192 0.016 0.015 0.016   0.297 0.016 0.015
 16384 0.016 0.062 0.016   1.172 0.062 0.063
 32768 0.062 0.094 0.047   4.953 0.063 0.093
 65536 0.125 0.173 0.078  20.625 0.141 0.203
131072 0.233 0.422 0.188  84.719 0.359 0.406
262144 0.470 0.906 0.500 338.078 0.766 0.937

Programming Praxis