External Sorting
November 20, 2015
Sorting the lines of a file that fits in memory is a simple matter of loading the file and calling the sort function. When the file doesn’t fit in memory, however, it must be sorted in pieces, then combined into a single output, which is harder.
Your task is to write a program that sorts the lines of a file that is too large to fit into memory into ascending ascii order. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Don’t have time to do actual implementation, but could use memory mapped file with suitable cache-oblivious sorting algorithm (https://en.wikipedia.org/wiki/Funnelsort seems to be state of the art, or at least, some kind of n-way mergesort).
See Knuth on replacement selection, where you notice if the incoming line happens to be greater than the last line output to the current temp file, and if so write it out too. This is a huge win on almost-sorted files, a very common case.