The Playfair Cipher
July 3, 2009
One of the most famous of the classical ciphers was the Playfair cipher, invented by Sir Charles Wheatstone in 1854 but popularized by Lord Lyon Playfair. The cipher was used by the British during the Boer War and World War I, and was still in use as a field-grade cipher as late as World War II, probably because it is simple to teach, fast to use, requires no special equipment, and was reasonably secure by the standards of the day; by the time enemy cryptanalysts could break the message, its contents were useless to them.
Playfair uses a 5-square block containing the pass-phrase; the letter J is mapped to I to make the 26-letter alphabet fit the 5-square block. The pass-phrase is first filled in to the 5-square block, then the remaining letters of the alphabet complete the 5-square block, with duplicates removed. For instance, the pass-phrase PLAYFAIR leads to this 5-square block:
P L A Y F
I R B C D
E G H K M
N O Q S T
U V W X Z
Some variants of Playfair omit Q instead of J, or start the key from the bottom-right corner and work backwards; we’ll stick with the standard method.
The message to be encrypted is split into digrams, with J mapped to I. In addition, no digram may use the same letter twice, so an X is inserted if needed (some Playfair variants use Z or Q instead of X). Likewise, if the last digram has only a single letter, an X is added to the end of the message. For instance, the plain-text PROGRAMMING PRAXIS is split as:
PR OG RA MX MI NG PR AX IS
Then each digram is enciphered separately according to the following rules:
- If the two letters of the digram are in the same row, they are replaced pairwise by the letters to their immediate right, wrapping around to the left of the row if needed.
- If the two letters of the digram are in the same column, they are replaced pairwise by the letters immediately below, wrapping around to the top of the column if needed.
- Otherwise, the first letter of the digram is replaced by the letter in the same row as the first letter of the digram and the same column as the second letter of the digram, and the second letter of the digram is replaced by the letter in the same row as the second letter of the digram and the same column as the first letter of the digram.
For instance, the PR digram is enciphered as LI, because P and R are in different rows and columns, and the OG digram is enciphered as VO, since O and G are in the same column. The complete encipherment is shown below:
PR OG RA MX MI NG PR AX IS
LI VO BL KZ ED OE LI YW CN
The coded message is thus LIVOBLKZEDOELIYWCN. Decryption is simply the inverse of encryption.
Playfair is rather more secure than simpler substitution ciphers because it works with digrams rather than single letters, rendering simple frequency analysis moot. Frequency analysis on digrams is possible, but requires considerably more cipher-text than frequency analysis on single letters because Playfair admits 600 digrams whereas simple substitution admits only 26 letters. Manual cryptanalysis looks at commonly-reversed digrams such as REceivER and DEcodED; if some plain-text is known (such as a standard message header), it is easy to recover at least a partial key.
The most famous use of the Playfair cipher came in 1943. David Kahn writes in The Codebreakers:
Perhaps the most famous cipher of 1943 involved the future president of the U.S., J. F. Kennedy, Jr. On 2 August 1943, Australian Coastwatcher Lieutenant Arthur Reginald Evans of the Royal Australian Naval Volunteer Reserve saw a pinpoint of flame on the dark waters of Blackett Strait from his jungle ridge on Kolombangara Island, one of the Solomons. He did not know that the Japanese destroyer Amagiri had rammed and sliced in half an American patrol boat PT-109, under the command of Lieutenant John F. Kennedy, United States Naval Reserve. Evans received the following message at 0930 on the morning of the 2 of August 1943:
KXJEY UREBE ZWEHE WRYTU HEYFS
KREHE GOYFI WTTTU OLKSY CAJPO
BOTEI ZONTX BYBWT GONEY CUZWR
GDSON SXBOU YWRHE BAAHY USEDQThe translation:
PT BOAT ONE OWE NINE LOST IN ACTION IN BLACKETT
STRAIT TWO MILES SW MERESU COVE X CREW OF TWELVE
X REQUEST ANY INFORMATION.The coastwatchers regularly used the Playfair system. Evans deciphered it with the key ROYAL NEW ZEALAND NAVY and learned of Kennedy’s fate. […] About ten hours later, at 10:00 p.m. Kennedy and his crew was rescued.
Your task is to write functions that encipher and decipher messages using the Playfair cipher. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Steve Yegge’s Phone-Screen Coding Exercises
June 30, 2009
Steve Yegge is a programmer and popular blogger. One of his blog entries proposes these seven phone-screen coding exercises:
- Write a function to reverse a string.
- Write a function to compute the Nth fibonacci number.
- Print out the grade-school multiplication table up to 12 x 12.
- Write a function that sums up integers from a text file, one per line.
- Write a function to print the odd numbers from 1 to 99.
- Find the largest int value in an int array.
- Format an RGB value (three 1-byte numbers) as a 6-digit hexadecimal string.
Your task is to write solutions to stevey’s phone-screen exercises. When you are finished, you are welcome to read a suggested solution, or to post your solution or discuss the exercise in the comments below.
Treaps
June 26, 2009
Dictionaries are a common data type, which we have used in several exercises (Mark V. Shaney, Word Frequencies, Dodgson’s Doublets, Anagrams). Hash tables are often used as the underlying implementation for dictionaries, and in a previous exercise we developed ternary search tries as another implementation for dictionaries, which are useful when the keys are strings. Today’s exercise examines treaps, which are yet another method for implementing dictionaries.
A treap is essentially a binary search tree, represented as either an empty node, called nil, or a node with a key and a value and pointers to two child nodes, the left child and the right child. Binary search trees obey the rule that all nodes in the left child have keys that are less than the key of the current node, and all nodes in the right child have keys that are greater than the key of the current node. One node is distinguished as the root of the tree; it is the only node visible outside the tree. Obviously, a single set of keys could be represented by many different binary search trees, depending on which key is chosen as the root and the choices made at each level under the root.
The speed of a search depends on the depth at which the key is located in the tree. Trees that have more balance (fewer nodes with nil children) are faster to search than trees that have less balance (more nodes with nil children) because the overall depth of the tree is less. The two extreme cases occur when a tree is perfectly balanced (no internal nil nodes), when a search takes time logarithmic in the size of the tree, and when a tree is perfectly imbalanced (a long chain of nodes each with one nil child), when a search takes time linear in the size of the tree.
Treaps use an ingenious method to assure that the tree is approximately balanced at all times. Each node, in addition to the key, value, and two child nodes, contains a numeric priority, which is randomly assigned at the time the node is created and never changed. Of the many possible trees that could represent a particular set of keys, the one that is chosen obeys the heap-order property, with the priority of each node being greater than the priorities of all of its child nodes. Assuming all keys and priorities are distinct, there will be exactly one tree that satisfies both the tree-order and heap-order properties; it is the same tree that would be created if keys were dynmically inserted into a normal binary search tree in order of the priority field. The genius of this method is that the priorities are independent of the keys, making it extremely unlikely that highly-imbalanced trees could be created, thus keeping search times approximately logarithmic.
As nodes are dynamically inserted into and deleted from the tree, it is sometimes necessary to restructure the tree to maintain both the tree-order and the heap-order properties. The restructurings, known as rotations, come in two varieties, left and right:

[ This image, by Ramasamy, is used under the terms of the GNU Free Documentation License or the Creative Commons Attribution Share-Alike License. ]
In both trees shown above, all the nodes in sub-tree A have keys less than p, all the nodes in sub-tree B have keys between p and q, and all the nodes in sub-tree C have keys greater than q. Transforming the tree so that the root moves from p to q is called a left rotation, and transforming the tree so that the root moves from q to p is called a right rotation; in both cases the root node moves down the tree toward the leaves. Note that both rotations preserve the tree-order property. It is the goal of the insert and delete algorithms to use rotations to maintain the heap-order property.
Your task is to implement the lookup, insert, update, delete and enlist operations on treaps. When you are finished, you are welcome to read or run a suggested solution, or post your solution or discuss the exercise in the comments below.
The Mod Out System
June 23, 2009
Over at The Daily WTF, Alex Papadimoulis catalogs some insanely bad programming practices. The Mod Out System is one of the best … worst … absolutely sickening … no boss, I would never do such a thing … well, you get the point.
The hero of the story, Gary, must write a function to expand a list of ranges like 1-6,9,13-19 into a list without ranges like 1,2,3,4,5,6,9,13,14,15,16,17,18,19.
Your task is to write that function. When you are finished, you are welcome to read or run a suggested solution, or to post your solution or discuss the exercise in the comments below.
Monte Carlo Factorization
June 19, 2009
We have previously examined two methods of integer factorization, trial division using wheels and Fermat’s method of the differences of squares. In this exercise we examine a probabilistic method of integer factorization that takes only a small, fixed amount of memory and tends to be very fast. This method was developed by John Pollard in 1975 and is commonly called the Pollard rho algorithm.
The algorithm is based on the observation that, for any two random integers x and y that are congruent modulo p, the greatest common divisor of their difference and n, the integer being factored, will be 1 if p and n are coprime (have no factors in common) but will be between 1 and n if p is a factor of n. By the birthday paradox, p will be found with reasonably large probability after trying about random pairs.
Pollard’s algorithm uses a function modulo n to generate a pseudo-random sequence. Two copies of the sequence are run, one twice as fast as the other, and their values are saved as x and y. At each step, we calculate gcd(x–y,n). If the greatest common divisor is one, we loop, since the two values are coprime. If the greatest common divisor is n, then the values of the two sequences have become equal and Pollard’s algorithm fails, since the sequences have fallen into a cycle, which is detected by Floyd’s tortoise-and-hare cycle-finding algorithm; that’s why we have two copies of the sequence, one (the “hare”) running twice as fast as the other (the “tortoise”). But if the greatest common divisor is between 1 and n, we have found a factor of n.
Failure doesn’t mean failure. It just means that the particular pseudo-random sequence that we chose doesn’t lead to success. Our response to failure is to try another sequence. We use the function x² + c (mod n), where c is initially 1. If Pollard’s algorithm fails, we increase c to 2, then 3, and so on. If we keep increasing c, we will eventually find a factor, though it may take a long time if n is large.
Your task is to implement Pollard’s factorization algorithm. You can test it by calculating the factors of the 98th Mersenne number, 298 – 1. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Who Owns The Zebra?
June 16, 2009
This famous puzzle was first published by Life magazine on December 17, 1962. It has been variously attributed to both Albert Einstein and Lewis Carroll, but the true author is not known. There are several versions; this is the original from Life:
1 There are five houses.
2 The Englishman lives in the red house.
3 The Spaniard owns the dog.
4 Coffee is drunk in the green house.
5 The Ukrainian drinks tea.
6 The green house is immediately to the right of the ivory house.
7 The Old Gold smoker owns snails.
8 Kools are smoked in the yellow house.
9 Milk is drunk in the middle house.
10 The Norwegian lives in the first house.
11 The man who smokes Chesterfields lives in the house next to the man with the fox.
12 Kools are smoked in the house next to the house where the horse is kept.
13 The Lucky Strike smoker drinks orange juice.
14 The Japanese smokes Parliaments.
15 The Norwegian lives next to the blue house.
In the interest of clarity, it must be added that each of the five houses is painted a different color, and their inhabitants are of different national extractions, own different pets, drink different beverages and smoke different brands of American cigarettes.
Your task is to write a program to solve the puzzle and determine: Who drinks water? Who owns the zebra? When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Feynman’s Puzzle
June 12, 2009
Richard Feynman was an American physicist; he worked on the Manhattan Project that developed the atomic bomb, won the Nobel Prize in 1965 for his work in the development of quantum electrodynamics, and famously demonstrated the inelasticity of the rubber o-rings on the space shuttle Challenger using a clamp, a rubber band, and a glass of ice water. Feynman proposed this puzzle:

Your task is to solve Feynman’s Puzzle. When you are finished, you are welcome to read or run a suggested solution, or to post your solution or discuss the exercises in the comments below.
Longest Common Subsequence
June 9, 2009
Finding the longest common subsequence of two sequences is a classic computer science problem with an equally classic solution that dates to the folklore of computing. The longest common subsequence is the longest set of elements that appear in order (not necessarily contiguous) in two sequences; for instance, the longest common subsequence of PROGRAMMING and PRAXIS is PRAI:
P R O G R A M M I N G
| | | |
P R A X I S
The classic solution uses dynamic programming. A matrix is prepared with one sequence in the rows and the other sequence in the columns, giving in each cell the minimum edit distance between the two, where the edit distance is the least number of adds, deletes and changes that can convert one sequence to the other. For instance:
P R O G R A M M I N G
0 0 0 0 0 0 0 0 0 0 0 0
P 0 1 1 1 1 1 1 1 1 1 1 1
R 0 1 2 2 2 2 2 2 2 2 2 2
A 0 1 2 2 2 2 3 3 3 3 3 3
X 0 1 2 2 2 2 3 3 3 3 3 3
I 0 1 2 2 2 2 3 3 3 4 4 4
S 0 1 2 2 2 2 3 3 3 4 4 4
If we represent the rows as Xi and the columns as Yj, the matrix can be filled top-to-bottom, left-to-right using the formula:
{ 0, if i=0 or j=0
{
LCS(i,j) = { LCS(i-1,j-1) + 1, if X(i) = Y(j)
{
{ max(LCS(i-1,j), LCS(i,j-1)), otherwise
Intuitively, the two sequences are scanned in parallel. If the current two cells are identical, the length of the longest common subsequence increases by one; otherwise, there are two possibilities to consider recursively, after deleting the current cell from either one input sequence or the other, and the length of the longest common subsequence is simply the greater of the two possibilities.
Once the matrix of longest common subsequence lengths has been calculated, the longest common subsequence itself can be recovered by noting each point where the length “bumps” to the next lower value along the diagonal, starting at the lower right-hand corner: from 4 to 3 when both sequences are at I, from 3 to 2 when both sequences are at A, from 2 to 1 when both sequences are at R, and from 1 to 0 when both sequences are at P.
Note that the longest common sequence is not necessarily unique. For instance, given the two sequences ABC and BAC, there are two longest common subsequences, AC and BC. Either answer is correct.
Your task is to write a function that takes two sequences and returns their longest common subsequence. When you are finished, you are welcome to read or run a suggested solution, or to post your solution or discuss the exercise in the comments below.
Ternary Search Tries
June 5, 2009
Several of our exercises (Mark V. Shaney, Word Frequencies, Dodgson’s Doublets, Anagrams) have used hash tables based on string keys. When the keys are strings, a useful alternative to hash tables is a data structure called a ternary search trie, which can be faster than a hash table because it takes advantage of the fact that keys are strings, and also provides in-order access to the keys. Ternary search tries were introduced by Jon Bentley and Robert Sedgewick at the 1997 SODA conference.
As the name implies, ternary search tries are a cross between binary search trees and radix tries. Ternary search tries consist of recursive nodes and leaves, just like binary search trees and radix tries. As the name implies, a node of a ternary search trie has three children, one for lesser children, one for greater children, and a third for equal children. Searches proceed character-by-character, as with a trie. At each level of the trie, the search compares the current character of the search string with the character stored in the node. If the search character is less, the search continues at the less-than child, and if the search character is greater, the search continues at the greater-than child. However, when the two characters are equal, the search continues recursively at the equal-to child, proceeding to the next character in the search string.
The picture below, taken from the Bentley/Sedgewick SODA paper, shows a ternary search trie containing twelve two-character words:

Your task is to implement the lookup, insert, update, delete and enlist operations on ternary search tries. When you are finished, you are welcome to read or run a suggested solution, or post your solution or discuss the exercise in the comments below.
Pig Latin
June 2, 2009
Pig Latin is an English-language word game, usually played by children, in which words are mutated systematically. A word that begins with a vowel (a, e, i, o, or u) has the syllable “way” added to the end; for instance, art becomes art-way and eagle becomes eagle-way. A word that begins with one or more consonants has the initial consonants stripped, moved to the end of the word, then the syllable “ay” is added; for instance, start becomes art-stay and door becomes oor-day. A hyphen is added as shown above as an aid to prevent ambiguity; otherwise, a word like aspray could be the translation of spray (ay-spray) or prays (ays-pray). Even so, some words remain ambiguous; art-way could be translated as either art or wart.
Your task is to write functions that translate English to Pig Latin and Pig Latin to English. When you are finished, you may read or run a suggested solution, or post your solution or discuss the exercise in the comments below.