Permuted Index
December 22, 2009
In 1972, David Parnas proposed the construction of a permuted index, also known as a keyword-in-context or kwic index, as an exercise in program design. For example, the three sentences
All's well that ends well.
Nature abhors a vacuum.
Every man has a price.
produce the permuted index:
Nature abhors a vacuum. All's well that ends well. All's well that ends well. Every man has a price. Every man has a price. Every man has a price. Nature abhors a vacuum. Every man has a price. All's well that ends well. Nature abhors a vacuum. All's well that ends well. All's well that ends well.
Parnas proposed a three-step algorithm: rotate, sort, unrotate. The rotate step takes a sentence and produces all its rotations. The sort step sorts the rotations by their back half. The unrotate step produces neatly-formatted output. Somewhere, in one of the three steps, rotations that produce back halves starting with words on a stop list are discarded; note that there is no output for “a vacuum” or “a price.”
Your task is to implement a program that produces permuted indexes using Parnas’ three-step algorithm. When you are finished, you are welcome to read or run a suggested solution, or to post your solution or discuss the exercise in the comments below.