## A Statisticle Speling Korrecter

### December 29, 2009

Many word processors and search engines have a feature that suggests alternate spellings of words that appear to be misspelled; for instance, if you search Google for “speling” it replies “Did you mean: spelling?”

Peter Norvig showed how Google’s spelling corrector works in an article on his web site. Norvig’s system works in two phases. In the training phase, it reads a text, counting the number of occurrences of each word. Then, when asked to suggest a correction, it considers similar words, based on their edit distance from the target word, and chooses the one that occurred most frequently in the training text.

The edit distance between two words is the number of deletions (remove one letter), transpositions (swap two adjacent letters), alterations (change one letter to another), and insertions (add one letter) that would be required to convert one word to the other. For instance, Tuesday can be converted into Thursday by inserting an h after the initial T and changing the e to an r, so there is an edit distance of two between them.

Norvig chose a simple rule for selecting a correction: If the input word is in the training corpus, it’s spelled correctly, so return it unchanged. Otherwise, from the set of words an edit distance of one from the input word, select the word that appears most frequently in the training corpus. If neither the input word nor any words that are an edit distance of one from the input word appear in the training corpus, generate the set of words an edit distance of two from the input word and select the word that appears most frequently in the training corpus. And if that still fails, return the original input word back to the user, unchanged.

Your task is to write a spelling corrector similar to Norvig’s. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

## Merry Christmas

### December 25, 2009

[ Programming Praxis wishes a very merry Christmas to all my readers and their families. There will be no exercise today. Exercises will return on their normal schedule next Tuesday. ]

*In those days a decree went out from Caesar Augustus that the whole world should be enrolled. This was the first enrollment, when Quirinius was governor of Syria. So all went to be enrolled, each to his own town.*

*And Joseph too went up from Galilee from the town of Nazareth to Judea, to the city of David that is called Bethlehem, because he was of the house and family of David, to be enrolled with Mary, his betrothed, who was with child. While they were there, the time came for her to have her child, and she gave birth to her firstborn son. She wrapped him in swaddling clothes and laid him in a manger, because there was no room for them in the inn.*

*Now there were shepherds in that region living in the fields and keeping the night watch over their flock. The angel of the Lord appeared to them and the glory of the Lord shone around them, and they were struck with great fear. The angel said to them, “Do not be afraid; for behold, I proclaim to you good news of great joy that will be for all the people. For today in the city of David a savior has been born for you who is Messiah and Lord. And this will be a sign for you: you will find an infant wrapped in swaddling clothes and lying in a manger.”*

*And suddenly there was a multitude of the heavenly host with the angel, praising God and saying: “Glory to God in the highest and on earth peace to those on whom his favor rests.”*

*When the angels went away from them to heaven, the shepherds said to one another, “Let us go, then, to Bethlehem to see this thing that has taken place, which the Lord has made known to us.” So they went in haste and found Mary and Joseph, and the infant lying in the manger. When they saw this, they made known the message that had been told them about this child. All who heard it were amazed by what had been told them by the shepherds.*

*And Mary kept all these things, reflecting on them in her heart. Then the shepherds returned, glorifying and praising God for all they had heard and seen, just as it had been told to them.* — Luke 2:1-20

## Permuted Index

### December 22, 2009

In 1972, David Parnas proposed the construction of a permuted index, also known as a keyword-in-context or kwic index, as an exercise in program design. For example, the three sentences

` All's well that ends well.`

Nature abhors a vacuum.

Every man has a price.

produce the permuted index:

Nature abhors a vacuum. All's well that ends well. All's well that ends well. Every man has a price. Every man has a price. Every man has a price. Nature abhors a vacuum. Every man has a price. All's well that ends well. Nature abhors a vacuum. All's well that ends well. All's well that ends well.

Parnas proposed a three-step algorithm: rotate, sort, unrotate. The rotate step takes a sentence and produces all its rotations. The sort step sorts the rotations by their back half. The unrotate step produces neatly-formatted output. Somewhere, in one of the three steps, rotations that produce back halves starting with words on a *stop list* are discarded; note that there is no output for “a vacuum” or “a price.”

Your task is to implement a program that produces permuted indexes using Parnas’ three-step algorithm. When you are finished, you are welcome to read or run a suggested solution, or to post your solution or discuss the exercise in the comments below.

## Calculating Logarithms

### December 18, 2009

Before computers, arithmetic was done by hand, and one of the primary tools of the practicing arithmetician was a table of logarithms, which was printed in a large book and used to perform multiplication, division and exponentiation. But how to calculate the logarithms?

Leonhard Euler, the Swiss mathematician, explained the algorithm in his 1748 book *Introductio in analysin infinitorum*. The basic idea is that, given two numbers *a* and *b*, *a* < *n* < *b*, the geometric mean of *a* and *b* is equal to the arithmetic mean of their logarithms. Thus, to calculate the logarithm of a given number *n*, find two powers of the logarithmic base that bound *n*, calculate the geometric mean (square root of the product) of the two numbers and the arithmetic mean (half of the sum) of the two numbers, and recur on the the smaller interval between one of the two numbers and the geometric mean that bounds *n*, continuing until the desired accuracy is reached.

But how did Euler calculate the square roots? Sir Isaac Newton, the English physicist, described a method based on derivatives. Given an initial estimate *x* of the square root of *n*, a better estimate is given by . Again, this calculation is iterated until the desired accuracy is reached.

Your task is to write a function to compute square roots using Newton’s method, then use that function to compute logarithms using Euler’s method. When you are finished, you are welcome to read or run a suggested solution, or to post your solution or discuss the exercise in the comments below.

## Affine-Shift Cipher

### December 15, 2009

The affine-shift cipher is simple but cryptographically weak. It works by mapping the twenty-six letters of the alphabet onto the integers 0 through 25, then applying the function (*ax* + *b*) mod 26 to each character in turn, finally mapping back from integers to letters. The key is formed by the numbers *a* and *b*. Decryption is the inverse operation, a^{-1}(*x* − *b*) mod 26, where a^{-1} is the modular inverse of *a* modulo *m*; this means that *a* and *m* must be coprime. For instance, the plain-text PROGRAMMINGPRAXIS is enciphered with *a*=5 and *b*=8 as FPAMPIQQWVMFPITWU, and deciphered with 21 as the modular inverse.

Your task is to write functions to encipher and decipher messages using the affine-shift cipher. When you are finished, you are welcome to read or run a suggested solution, or to post your solution or discuss the exercise in the comments below.

## Selection

### December 11, 2009

In statistics, the *median* is the smallest item that is greater than half the values in a set. Statisticians also define statistics such as quartiles, deciles and percentiles that show the value of an item relative to the other items in a set. A function known as `select`

allows us to find the *k*th smallest item in a set; that is, the smallest item for which *k* of the items in the set are smaller.

A simple algorithm for `select`

has us first partition the set into two subsets that are respectively smaller and larger than some randomly-chosen item from the set. If the smaller-than set has more than *k* elements, it must contain the desired element, so search recursively for the desired element in the smaller-than set; otherwise, subtract the size of the smaller-than set *k*, then search recursively for the desired element in the larger-than set.

Your task is to write a function that implements the selection algorithm. When you are finished, you are welcome to read or run a suggested solution, or to post your solution or discuss the exercise in the comments below.

## Word Count

### December 8, 2009

The unix `wc`

command counts the lines, words and characters in a file; the Unix V7 man page is shown below:

`NAME`

`wc -- word count`

`SYNOPSIS`

`wc [ -lwc ] [ name ... ]`

`DESCRIPTION`

`Wc counts lines, words and characters in the named files,`

or in the standard input if no name appears. A word is a

maximal string of characters delimited by spaces, tabs or

newlines.

`If the optional argument is present, just the specified`

counts (lines, words, or characters) are selected by the

letters l, w or c.

Your task is to implement the unix `wc`

command. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

## Autokey

### December 4, 2009

An autokey cipher uses the plain-text of the message being sent to form part of the key. There are many kinds of autokey ciphers. The one we will examine is classic, invented by Blaise de Vigenère in 1586, over four hundred years ago.

De Vigenère’s autokey begins with a keyword, then appends to the keyword the plain-text of the message being sent. Each letter of the plain-text is then enciphered by indexing through the alphabet according to the corresponding letter of the key:

`plain-text A C O L L E C T I O N O F E T U D E S`

key P R A X I S A C O L L E C T I O N O F

cipher-text P T O I T W C V W Z Y S H X B I Q S X

Decryption is the inverse operation.

Your task is to write functions that encipher and decipher a message as shown above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

## Wolves And Rabbits

### December 1, 2009

A nearby meadow supports a large population of rabbits; the rabbits eat the grass that grows abundantly in the sunny meadow, and reproduce quickly, as rabbits do. The meadow also supports a population of wolves that eat the rabbits. As the population of rabbits grows, so does the population of wolves, until there are so many wolves that they overeat the rabbits, whereupon the wolf population begins to diminish. But once the wolf population diminishes, the rabbit population is able to begin growing again, and of course as it does so does the wolf population, in a cycle that never ends. The graph at right shows the cycles, with the predator’s cycle offset and trailing the prey’s cycles.

Ecologists who study such prey-predator populations have determined that there is never a stable equilibrium, but that the populations are cyclical. In the 1920s, an American demographer named Alfred Lotka and an Italian physicist name Vito Volterra, working independently, discovered the differential equations that govern the size of the prey and predator populations:

dR/dt = R_{g} · R – R_{d} · R · W

dW/dt = W_{g} · R · W – W_{d} · W

Where:

R = number of rabbits at beginning of time period

W = number of wolves at beginning of time period

R_{g} = growth rate of rabbits

R_{d} = death rate of rabbits

W_{g} = growth rate of wolves

W_{d} = death rate of wolves

dR/dt = change in rabbit population during current time period

dW/dt = change in wolf population during current time period

Your task is to write a function that models the wolf and rabbit populations; your function should take the initial populations, growth rates, and death rates of wolves and rabbits and calculate the populations at each time step. When you are finished, you are welcome to read or run a suggested solution, or to post your solution or discuss the exercise in the comments below.