Rail-Fence Cipher

March 31, 2009

The rail-fence cipher is a transposition cipher that rearranges the characters of a clear-text to form the cipher-text. The clear-text is arranged in up-and-down waves like the tops of the pickets on a rail fence; the cipher key is the height of the fence. For instance, the encipherment of “PROGRAMMING PRAXIS” with a key of 4 is shown below, using an underscore to make the space character visible:

P R O G R A M M I N G _ P R A X I S
P           M           P            = P M P
  R       A   M       _   R       S  = R A M _ R S
    O   R       I   G       A   I    = O R I G A I
      G           N           X      = G N X

The cipher-text is read at the right of the pickets: “PMPRAM RSORIGAIGNX”.

Your task is to write functions that encipher and decipher texts using the rail-fence method. When you are finished, you are welcome to read or run a suggested solution, or to post your solution or discuss the exercise in the comments below.

Advertisement

Pages: 1 2

A turing machine is a hypothetical computer, invented in 1936 by the British mathematician Alan Turing, that can evaluate any computable function. Though hypothetical, turing machines play a large role in computer science. Many variants of turing machines have been devised; the purpose of this exercise is to build a simulator for a particular type of turing machine.

The hardware of our turing machine consists of a read/write head and an infinite tape divided into cells that passes across the read/write head. Each cell on the tape contains a symbol, which we take as an ascii character; there is a special blank symbol that is present in all cells which are not yet written. The read/write head can see one cell at a time, read its contents, and write a new symbol (possibly the blank symbol) to the cell.

The software of our turing machine consists of a table encoding a transition function and a state register that maintains the current state. In our machine, states are numbered with non-negative integers; the machine always starts in state zero, and halts if it enters a negative-numbered state. The program being executed by the turing machine is encoded in the transition function, which takes two inputs, the current state and the symbol in the current cell, and then writes a new symbol (or the same symbol, or the special symbol blank) in the current cell, then either moves the read/write head one cell to the left or one cell to the right or remains in the same place, then resets the state to a new state. The transition function is thus encoded in a table of 5-tuples q1 s1 s2 d q2 where q1 is the current state, s1 is the symbol currently under the read/write head, s2 is the symbol to be written, d is the direction to move the tape, and q2 is the next state to enter.

As an example, consider the tape _ _ _ [1] 1 1 + 1 1 1 1 1 _ _ _. This is our notation for a tape that contains, in the middle of an infinite number of blanks, the nine symbols consisting of three one’s, a plus sign, and five one’s, with the read/write head positioned over the first one; the tape can be considered to model the problem of adding the two numbers three and five. The program

    0 1 1 R 0
    0 + 1 R 1
    1 1 1 R 1
    1 _ _ L 2
    2 1 _ L -1

adds the two numbers and writes _ _ _ 1 1 1 1 1 1 1 [1] _ _ _ as its output.

Write a function that takes a turing-machine program and a tape and simulates the action of a turing machine, writing the final tape as output.

Pages: 1 2

Binary Search

March 23, 2009

Children who are learning arithmetic sometimes play a number-guessing game: “I’m thinking of a number between 1 and 100. Can you guess it?” “Is the number less than 50?” “Yes.” “Is the number less than 25?” “No.” And so on, halving the interval at each step until only one number is left. This technique is known colloquially as the binary chop.

Your first task is to write a function that takes a target number and an array of numbers in non-decreasing order and returns either the position of the number in the array, or -1 to indicate the target number is not in the array. For instance, bsearch(32, [13 19 24 29 32 37 43]) should return 4, since 32 is the fourth element of the array (counting from zero).

Beware that this exercise is harder than it looks. Jon Bentley, in his book Programming Pearls, reports that 90% of professional programmers cannot write a proper implementation of binary search in two hours, and Donald Knuth, in the second volume of his book The Art of Computer Programming, reports that though the first binary search was published in 1946, the first bug-free binary search wasn’t published until 1962.

Thus, your second task is to write a suitable test program that shows the accuracy of your binary search function.

Pages: 1 2

Dodgson’s Doublets

March 20, 2009

Charles Dodgson was an English author, mathematician and logician of the nineteenth century; better known by his pseudonym Lewis Carroll, his most famous works are Alice’s Adventures in Wonderland and the sequel Through the Looking Glass. In 1879, Carroll published the Doublets word game in the Vanity Fair magazine:

The rules of the Puzzle are simple enough. Two words are proposed, of the same length; and the Puzzle consists in linking these together by interposing other words, each of which shall differ from the next word in one letter only. That is to say, one letter may be changed in one of the given words, then one letter in the word so obtained, and so on, till we arrive at the other given word. The letters must not be interchanged among themselves, but each must keep to its own place. As an example, the word ‘head’ may be changed into ‘tail’ by interposing the words ‘heal, teal, tell, tall’. I call the given words ‘a Doublet’, the interposed words ‘Links’, and the entire series ‘a Chain’, of which I here append an example:

H E A D
h e a l
t e a l
t e l l
t a l l
T A I L

It is, perhaps, needless to state that it is de rigueur that the links should be English words, such as might be used in good society.

Write a program that takes two words and finds a chain between them. What is the chain that connects black to white?

Pages: 1 2

Comma-Separated Values

March 17, 2009

The comma-separated values file format is commonly used to exchange tabular information, such as spreadsheets, between different programs and operating systems. Though common, the csv format has never been formally defined, and there are variations. One definition is at http://www.rfc-editor.org/rfc/rfc4180.txt.

We will define a csv file as containing ascii text, with records terminated by a line-termination sequence and fields containing zero or more characters separated by commas; a line-termination sequence is a carriage-return character, or a line-feed character, or both characters in either order. Leading and trailing space in unquoted fields is preserved. Fields may be surrounded by double-quote characters (ascii \042) may contain any characters; such fields may contain newlines, literal commas (ascii \054), and double-quote characters represented as two successive double-quotes. The field separator is normally a comma, but may be changed to an arbitrary character (often a semi-colon) in those European countries that use a comma instead of a decimal point.

The examples below show some of the kinky situations that may arise; for readability, the three-character sequence TAB represents a single tab character (ascii \011):

1,abc,def ghi,jkl,unquoted character strings
2,"abc","def ghi","jkl",quoted character strings
3,123,456,789,numbers
4, abc,def , ghi ,strings with whitespace
5, "abc","def" , "ghi" ,quoted strings with whitespace
6, 123,456 , 789 ,numbers with whitespace
7,TAB123,456TAB,TAB789TAB,numbers with tabs for whitespace
8, -123, +456, 1E3,more numbers with whitespace
9,123 456,123"456, 123 456 ,strange numbers
10,abc",de"f,g"hi,embedded quotes
11,"abc""","de""f","g""hi",quoted embedded quotes
12,"","" "",""x"",doubled quotes
13,"abc"def,abc"def","abc" "def",strange quotes
14,,"", ,empty fields
15,abc,"def
ghi",jkl,embedded newline
16,abc,"def",789,multiple types of fields

Each of those records has five fields, shown below with the vertical bar character as a field separator:

1|abc|def ghi|jkl|unquoted character strings
2|abc|def ghi|jkl|quoted character strings
3|123|456|789|numbers
4| abc|def | ghi |strings with whitespace
5| "abc"|def | "ghi" |quoted strings with whitespace
6| 123|456 | 789 |numbers with whitespace
7|TAB123|456TAB|TAB789TAB|numbers with tabs for whitespace
8| -123| +456| 1E3|more numbers with whitespace
9|123 456|123"456| 123 456 |strange numbers
10|abc"|de"f|g"hi|embedded quotes
11|abc"|de"f|g"hi|quoted embedded quotes
12|| ""|x""|doubled quotes
13|abcdef|abc"def"|abc "def"|strange quotes
14||| |empty fields
15|abc|def
ghi|jkl|embedded newline
16|abc|def|789|multiple types of fields

Write a function that retrieves records from a csv file. Be sure to provide a way to handle user-specified field-separators.

Pages: 1 2

Friday, the Thirteenth

March 13, 2009

Today is Friday, March 13, 2009. How many Fridays will occur on the thirteenth of the month in the next ten years, not counting today?

Pages: 1 2

Word Frequencies

March 10, 2009

Write a program that takes a filename and a parameter n and prints the n most common words in the file, and the count of their occurrences, in descending order.

Pages: 1 2

Roman Numerals

March 6, 2009

Roman numerals are a number-notation system developed in classical Rome, chiefly used today to indicate the year in which a motion picture was made, or the sequence number of a Super Bowl.

Roman numerals use letters of the alphabet to indicate numerical value, according to the following code:

    I 1    V 5    X 10    L 50    C 100    D 500    M 1000

For example, the number 1732 is represented by Roman numerals as MDCCXXXII, and the number 1956 is represented by Roman numerals as MDCCCCLVI. Letter symbols are normally written from the largest symbol to the smallest, left to right, so the numeric values are additive. However, in order to conserve space, it is permissible to replace four of the same symbol written all in a row in a subtractive manner to the left of a higher-value symbol, so that 1956 may also be represented as MCMLVI, where the CM symbol, with C before M, indicates that C is subtracted from M, and thus indicates the numeric value 900. Wikipedia and MathWorld explain the common usage of Roman numerals.

Your task is to write a function that takes two roman numerals (character strings) as input and returns their sum as a roman numeral as output. Be sure that input can be given in either the additive or subtractive forms of Roman numerals; give output using the subtractive form. What is add-roman("CCCLXIX", "CDXLVIII")?

Pages: 1 2

Creation

March 3, 2009

A coded message appears on the next page. The cipher-text was created by XORing the characters of a password, repeated as necessary, with the characters of the plain-text. The numbers are the ascii ordinals of the cipher-text characters; the cipher-text is given in that form in order to bypass the limitations of some browsers. The password consists of less than twenty alphanumeric characters (upper-case letters, lower-case letters, and digits). The plain-text consists of ascii text (printable characters, plus spaces and newlines) in English.

Determine the password and read the coded message.

Pages: 1 2 3