## Data Encryption Standard: Part 1

### August 31, 2010

The Data Encryption Standard has been one of the most successful ciphers in history, and is still in use today, especially in its Triple DES variant. The Data Encryption Standard is officially described by FIPS 46-3, though if you are not fond of reading algorithm descriptions written by government lawyers there are many other descriptions available on the internet.

DES is a block cipher, operating on 64 bits at a time. Here is an example:

`PT P r o g P r a x`

HEX 50 72 6F 67 50 72 61 78

KEY 01 23 45 67 89 AB CD EF

CT CC 99 EA 46 B1 6E 28 90

There is more than one way to encrypt a message longer than 64 bits; we will examine them in a later exercise.

Your task is to write the code to encipher and decipher a single 64-bit block using the Data Encryption Standard. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

## Chinese Remainders

### August 27, 2010

In ancient China, two thousand years ago, a general wanted to count his troops. He first had them line up in ranks of eleven, and there were ten troops left over in the last rank. Then he had his troops line up in ranks of twelve, and there were four left over in the last rank. Finally he had them line up in ranks of thirteen, and there were twelve troops remaining in the last rank.

Your task is to determine how many troops the general had under his command. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

## Daniel Shanks’ Square Form Factorization Algorithm

### August 24, 2010

In the mid-1970s, Daniel Shanks exploited the binary quadratic forms introduced by Legendre in the late eighteenth century and perfected by Gauss in the early nineteenth century to create a method of factoring integers. Binary quadratic forms are expressions of the form *Ax*^{2} + *Bxy* + *Cy*^{2} with *A*, *B* and *C* integers, normally represented as the triple (*A*, *B*, *C*). Binary quadratic forms are often called square forms.

We won’t discuss the mathematics of square forms, but we do need to know how to reduce a square form. A discriminant delta is calculated as Δ = *b*^{2} − 4*ac*. A single reduction step is called a rho reduction, and is given by ρ(*a*, *b*, *c*) = (*c*, *r*, (*r*^{2}−Δ)/4*c*) where *r* is that unique integer *b* mod 2*c* such that −|*c*| < *r* <= |*c*|, if √Δ < |*c*|, or √Δ − 2|*c*| < *r* < √Δ, if |*c*| < √Δ.

A square form is in *reduced* form if |√Δ−2|*a*|| < *b* < √Δ; this reduction is accomplished by continually applying ρ until the reduced form is achieved.

Shanks’ method uses two loops. First, an initial square form is calculated, based on the number to be factored, which must be odd, composite, and not a perfect square; the details of that initialization are tedious, and appear in the solution on the next page. Then the square form is repeatedly reduced (this is the “forward” loop) until it is in *proper* form, which is described below. Then the reduced inverse square root of the square form calculated in the forward loop is reduced in a “backward” loop until a factor is identified when two successive square forms (*A*, *B*, *C*=*c*^{2}) have the same *B*; the factor is then *c*, if *c* is odd, or *c*/2, if *c* is even. The reduced inverse square root of (*A*, *B*, *C*=*c*^{2}) is initially (−*c*, *B*, −*cA*), which is then put into reduced form as described above.

Shanks defined a square form as proper using a queue of values derived in a rather complicated way. We will use a simpler method known as a *sufficient list*. Let *L* be the integer square root of the square root of the discriminant. If a square form (*A*, *B*, *C*) is encountered with |*C*| < *L* with *C* even, or |*C*| < *L*/2 with *C* odd, then *C* is placed on the sufficient list. Then a square form (*A*, *B*, *C*=*c*^{2}) is proper if *c* is not on the sufficient list.

Your task is to write a function to factor integers using Shanks’ square form algorithm. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

## Marriage Sort

### August 20, 2010

Today’s exercise is about a relatively new sorting algorithm. We start with an article *Optimizing Your Wife* by Kevin Brown, which proposes that the best way for a man to find a wife is to decide how many women he is willing to date before he chooses a wife, we’ll call that *N*, determine which of the first √*N* women is “best,” according to whatever matters to him, and then choose the next woman after the first √*N* that is better than any of the first √*N* women. For instance, to find the marriageable woman in a batch of a hundred, date ten of them, then marry the next one that is better than any of those ten. You may not find the optimal woman, but you’ll be close.

Eric Burnett turned Brown’s idea into a sorting algorithm. First, sample the first √*N* values at the beginning of an array, then swap any of the remaining values that are better than the greatest value of the sample to the end of the array, swap the greatest value of the sample just before those at the end, then recur on the smaller array before those greatest values. Finish the sort by performing insertion sort on the entire array; that will be quick, since most values are near their final positions.

Burnett’s algorithm requires three pointers: the current location of the end of the sample, the current location of the end of the array still under consideration, and a pointer that sweeps through the array. The time complexity is *O*(*n*^{1.5}), which is similar to other sorting methods like shell sort and comb sort that have a first stage that nearly sorts the input followed by insertion sort to clean up the rest.

Your task is to write a function that sorts an array using marriage sort. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

## Cut

### August 17, 2010

Unix V7 provided a utility called `cut`

that reads its input a line at a time and selectively copies portions of each input line to standard output. Portions are selected either by character position or by character-delimited field. `Cut`

is invoked as `cut -c`

*list* [*file* …] or `cut -f`

*list* [`-d`

*char*] [*file* …].

Character mode, invoked with the `-c`

option, retains in the output those character positions that are mentioned in the *list*, which may contain column numbers, or ranges of column numbers separated by a dash, all separated by commas; counting starts from one. Field mode, invoked with the `-f`

option, specifies a list of fields in a similar manner to character mode; fields are delimited by tab characters unless the field delimiter is changed by the `-d`

option.

For example, the command `cut -f1,3 -d: /etc/passwd`

prints user names and userid numbers from the password file.

Your task is to write a program to implement `cut`

. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

## E

### August 13, 2010

This puzzle made the rounds of the programmer mathematician blogs a while ago:

Given a random number generator that returns a real number between zero and one, how many random numbers must be selected, on average, to make their accumulated total greater than one?

Your task is to write a program that answers the question above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

## Literate Programming

### August 10, 2010

Literate programming is a style of programming invented by Donald Knuth that merges documentation and code in a single document, with code presented in an order that is conducive to the reader. Chunks of code can be written in any order; a program called `tangle`

restructures the chunks into the order required by the compiler. Here is a short but complete example of a literate program, which you may recognize as the *second* program, after “hello world,” from K&R:

`This program prints a fahrenheit/celsius conversion table.`

<<*>>=

<< include standard headers >>

<< the main program >>The only standard header required is stdio.h, which includes the printf function used by the program.

<< include standard headers >>=

#include <stdio.h>The main program defines some variables, initializes them, then loops through the table printing output lines.

<< the main program >>=

main()

{

<< declare variables >>

<< initialize variables >>

<< loop through the table >>

}Variables fahr and celsius hold the current temperatures. Variables lower, upper and step control the loop.

<< declare variables >>=

int fahr, celsius;

int lower, upper, step;The loop control variables are initialized so that the table prints fahrenheit values from 0 to 300 in steps of 20, along with the corresponding celsius values.

<< initialize variables >>=

lower = 0;

upper = 300;

step = 20;Temperatures are printed by a loop.

<< loop through the table >>=

fahr = lower;

while (fahr <= upper) {

<< calculate celsius and print one line >>

fahr = fahr + step;

}The celsius equivalent of a fahrenheit temperature is computed by the traditional formula C = 9/5 * (F-32). The two temperatures are printed separated by a tab character, each pair on a single line.

`<< calculate celsius and print one line >>=`

celsius = 5 * (fahr-32) / 9;

printf("%d\t%d\n", fahr, celsius);

This is a simple-minded literate programming system, and the form of the input file is correspondingly simple. Code chunks are introduced by a line beginning with double less-than signs and ending with double greater-than signs and an equals sign; there may be no white space at the beginning or end of the line. Code chunks are referenced on any line within another code chunk by surrounding the name of the chunk, which must exactly match the name given on the definition line, with double less-than and greater-than signs; there may be only one reference per line. A code chunk ends at the first blank line following its beginning, or at the end of the file, whichever comes sooner.

The `tangle`

program collects all the code chunks, then performs depth-first search through the call-tree graph beginning with the top-level “*” chunk. `Tangle`

is careful to preserve the formatting of the original, in case the programmer needs to look at its output. `Tangle`

produces the following output from the example program shown above:

`#include`

main()

{

int fahr, celsius;

int lower, upper, step;

lower = 0;

upper = 300;

step = 20;

fahr = lower;

while (fahr <= upper) {

celsius = 5 * (fahr-32) / 9;

printf("%d\t%d\n", fahr, celsius);

fahr = fahr + step;

}

}

This program is simple-minded for exposition, and doesn’t do justice to the literate programming concept. You’ll see a better example in the solution.

Your task is to write a program that tangles an input file and creates a program output. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

## Two Powering Predicates

### August 6, 2010

In our study of prime numbers, we have sometimes been lax in specifying the limitations of particular factoring methods; for instance, elliptic curve factorization only works where the number being factored is co-prime to six. Two conditions that arise from time to time are that the number must not be a perfect square and that the number may not be an integer power of a prime number. In today’s exercise we will write predicates to identify such numbers.

The usual test for whether a number is a perfect square is to find the integer square root by Newton’s method and then test if the square of that number is the original number. A better algorithm exploits a theorem of number theory which states that a number is a square if and only if it is a quadratic residue modulo every prime not dividing it. Henri Cohen, in his book *A Course in Computational Algebraic Number Theory*, describes the algorithm:

The following computations are to be done and stored once and for all.

1. [Fill 11] For

k= 0 to 10 setq11[k] ← 0. Then fork= 0 to 5 setq11[k^{2}mod 11] ← 1.2. [Fill 63] For

k= 0 to 62 setq63[k] ← 0. Then fork= 0 to 31 setq63[k^{2}mod 63] ← 1.3. [Fill 64] For

k= 0 to 63 setq64[k] ← 0. Then fork= 0 to 31 setq63[k^{2}mod 64] ← 1.4. [Fill 65] For

k= 0 to 64 setq65[k] ← 0. Then fork= 0 to 32 setq63[k^{2}mod 65] ← 1.

Then the algorithm is:

Given a positive integer

n, this algorithm determines whethernis a square or not, and if it is, outputs the square root ofn.1. [Test 64] Set

t←nmod 64 (using if possible only an`and`

statement). Ifq64[t] = 0,nis not a square and terminate the algorithm. Otherwise, setr=nmod 45045.2. [Test 63] If

q63[rmod 63] = 0,nis not a square and terminate the algorithm.3. [Test 65] If

q65[rmod 65] = 0,nis not a square and terminate the algorithm.4. [Test 11] If

q11[rmod 11] = 0,nis not a square and terminate the algorithm.5. [Compute square root] Compute

q← ⌊ √n⌋ using Newton’s method. Ifn≠q^{2},nis not a square and terminate the algorithm. Otherwise,nis a square, outputqand terminate the algorithm.

Our second predicate is the prime-power test, which determines, for a given *n*, if there exist two numbers *p* and *k* such that *p ^{k}* =

*n*, with

*p*prime. Stephen Wolfram’s

*Mathematica*program implements the prime-power test as

`PrimePowerQ`

, which returns either `True`

or `False`

. According to the manual,The algorithm for

`PrimePowerQ`

involves first computing the least prime factorpofnand then attempting division bynuntil either 1 is obtained, in which casenis a prime power, or until division is no longer possible, in which casenis not a prime power.

(Note: they probably meant “attempting division by *p*.”) Wolfram gives the example `PrimePowerQ[12167]`

, which is `True`

, since 23^{3} = 12167. That algorithm will take a while, as factoring is a non-trivial problem.

Cohen determines if *n* is a prime power by first assuming that *n* = *p ^{k}*, where

*p*is prime. Then Fermat’s Little Theorem gives

*p*| gcd(

*a*−

^{n}*a*,

*n*). If that fails,

*n*is not a prime power. Here is Cohen’s algorithm:

Given a positive integer

n> 1, this algorithm tests whether or notnis of the formpwith^{k}pprime, and if it is, outputs the primep.1. [Case

neven] Ifnis even, setp← 2 and go to Step 4. Otherwise, setq←n.2. [Apply Rabin-Miller] By using Algorithm 8.2.2 show that either

qis a probable prime or exhibit a witnessato the compositeness ofq. Ifqis a probable prime, setp←qand go to Step 4.3. [Compute GCD] Set

d← (a−^{q}a,q). Ifd= 1 ord=q, thennis not a prime power and terminate the algorithm. Otherwise setq←dand go to Step 2.4. [Final test] (Here

pis a divisor ofnwhich is almost certainly prime.) Using a primality test prove thatpis prime. If it is not (an exceedingly rare occurrence), setq←pand go to Step 2. Otherwise, by dividingnbyprepeatedly, check whethernis a power ofpor not. If it is not,nis not a prime power, otherwise outputp. Terminate the algorithm.We have been a little sloppy in this algorithm. For example in Step 4, instead of repeatedly dividing by

pwe could use a binary search analogous to the binary powering algorithm. We leave this as an exercise for the reader.

Cohen’s Algorithm 8.2.2 refers to the search for a witness to the compositeness of a number which we used in the exercise on the Miller-Rabin primality checker.

These two beautiful algorithms show the power and elegance of number theory. Cohen’s book is a fine example of the blend of mathematics and programming, and does an excellent job of explaining algorithms in a way that makes them easy to implement; most math textbooks aren’t so good.

Your task is to implement Cohen’s two powering predicates. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

## Carl Hewitt’s Same-Fringe Problem

### August 3, 2010

Long ago, Carl Hewitt created the *same-fringe* problem as a demonstration of the simplest problem that requires concurrency to implement efficiently: Given two binary trees, determine if they have the same leaves in the same order, regardless of their internal structure. A solution that simply flattens both trees into lists and compares them element-by-element is unacceptable, as it requires space to store the intermediate lists and time to compute them even if a difference arises early in the computation.

Your task is to write a function that tests if two trees have the same fringe. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.