Literate Programming
August 10, 2010
Literate programming is a style of programming invented by Donald Knuth that merges documentation and code in a single document, with code presented in an order that is conducive to the reader. Chunks of code can be written in any order; a program called tangle restructures the chunks into the order required by the compiler. Here is a short but complete example of a literate program, which you may recognize as the second program, after “hello world,” from K&R:
This program prints a fahrenheit/celsius conversion table.<<*>>=
<< include standard headers >>
<< the main program >>The only standard header required is stdio.h, which includes the printf function used by the program.
<< include standard headers >>=
#include <stdio.h>The main program defines some variables, initializes them, then loops through the table printing output lines.
<< the main program >>=
main()
{
<< declare variables >>
<< initialize variables >>
<< loop through the table >>
}Variables fahr and celsius hold the current temperatures. Variables lower, upper and step control the loop.
<< declare variables >>=
int fahr, celsius;
int lower, upper, step;The loop control variables are initialized so that the table prints fahrenheit values from 0 to 300 in steps of 20, along with the corresponding celsius values.
<< initialize variables >>=
lower = 0;
upper = 300;
step = 20;Temperatures are printed by a loop.
<< loop through the table >>=
fahr = lower;
while (fahr <= upper) {
<< calculate celsius and print one line >>
fahr = fahr + step;
}The celsius equivalent of a fahrenheit temperature is computed by the traditional formula C = 9/5 * (F-32). The two temperatures are printed separated by a tab character, each pair on a single line.
<< calculate celsius and print one line >>=
celsius = 5 * (fahr-32) / 9;
printf("%d\t%d\n", fahr, celsius);
This is a simple-minded literate programming system, and the form of the input file is correspondingly simple. Code chunks are introduced by a line beginning with double less-than signs and ending with double greater-than signs and an equals sign; there may be no white space at the beginning or end of the line. Code chunks are referenced on any line within another code chunk by surrounding the name of the chunk, which must exactly match the name given on the definition line, with double less-than and greater-than signs; there may be only one reference per line. A code chunk ends at the first blank line following its beginning, or at the end of the file, whichever comes sooner.
The tangle program collects all the code chunks, then performs depth-first search through the call-tree graph beginning with the top-level “*” chunk. Tangle is careful to preserve the formatting of the original, in case the programmer needs to look at its output. Tangle produces the following output from the example program shown above:
#include
main()
{
int fahr, celsius;
int lower, upper, step;
lower = 0;
upper = 300;
step = 20;
fahr = lower;
while (fahr <= upper) {
celsius = 5 * (fahr-32) / 9;
printf("%d\t%d\n", fahr, celsius);
fahr = fahr + step;
}
}
This program is simple-minded for exposition, and doesn’t do justice to the literate programming concept. You’ll see a better example in the solution.
Your task is to write a program that tangles an input file and creates a program output. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Two Powering Predicates
August 6, 2010
In our study of prime numbers, we have sometimes been lax in specifying the limitations of particular factoring methods; for instance, elliptic curve factorization only works where the number being factored is co-prime to six. Two conditions that arise from time to time are that the number must not be a perfect square and that the number may not be an integer power of a prime number. In today’s exercise we will write predicates to identify such numbers.
The usual test for whether a number is a perfect square is to find the integer square root by Newton’s method and then test if the square of that number is the original number. A better algorithm exploits a theorem of number theory which states that a number is a square if and only if it is a quadratic residue modulo every prime not dividing it. Henri Cohen, in his book A Course in Computational Algebraic Number Theory, describes the algorithm:
The following computations are to be done and stored once and for all.
1. [Fill 11] For k = 0 to 10 set q11[k] ← 0. Then for k = 0 to 5 set q11[k2 mod 11] ← 1.
2. [Fill 63] For k = 0 to 62 set q63[k] ← 0. Then for k = 0 to 31 set q63[k2 mod 63] ← 1.
3. [Fill 64] For k = 0 to 63 set q64[k] ← 0. Then for k = 0 to 31 set q63[k2 mod 64] ← 1.
4. [Fill 65] For k = 0 to 64 set q65[k] ← 0. Then for k = 0 to 32 set q63[k2 mod 65] ← 1.
Then the algorithm is:
Given a positive integer n, this algorithm determines whether n is a square or not, and if it is, outputs the square root of n.
1. [Test 64] Set t ← n mod 64 (using if possible only an
andstatement). If q64[t] = 0, n is not a square and terminate the algorithm. Otherwise, set r = n mod 45045.2. [Test 63] If q63[r mod 63] = 0, n is not a square and terminate the algorithm.
3. [Test 65] If q65[r mod 65] = 0, n is not a square and terminate the algorithm.
4. [Test 11] If q11[r mod 11] = 0, n is not a square and terminate the algorithm.
5. [Compute square root] Compute q ← ⌊ √ n ⌋ using Newton’s method. If n ≠ q2, n is not a square and terminate the algorithm. Otherwise, n is a square, output q and terminate the algorithm.
Our second predicate is the prime-power test, which determines, for a given n, if there exist two numbers p and k such that pk = n, with p prime. Stephen Wolfram’s Mathematica program implements the prime-power test as PrimePowerQ, which returns either True or False. According to the manual,
The algorithm for
PrimePowerQinvolves first computing the least prime factor p of n and then attempting division by n until either 1 is obtained, in which case n is a prime power, or until division is no longer possible, in which case n is not a prime power.
(Note: they probably meant “attempting division by p.”) Wolfram gives the example PrimePowerQ[12167], which is True, since 233 = 12167. That algorithm will take a while, as factoring is a non-trivial problem.
Cohen determines if n is a prime power by first assuming that n = pk, where p is prime. Then Fermat’s Little Theorem gives p | gcd(an − a, n). If that fails, n is not a prime power. Here is Cohen’s algorithm:
Given a positive integer n > 1, this algorithm tests whether or not n is of the form pk with p prime, and if it is, outputs the prime p.
1. [Case n even] If n is even, set p ← 2 and go to Step 4. Otherwise, set q ← n.
2. [Apply Rabin-Miller] By using Algorithm 8.2.2 show that either q is a probable prime or exhibit a witness a to the compositeness of q. If q is a probable prime, set p ← q and go to Step 4.
3. [Compute GCD] Set d ← (aq − a, q). If d = 1 or d = q, then n is not a prime power and terminate the algorithm. Otherwise set q ← d and go to Step 2.
4. [Final test] (Here p is a divisor of n which is almost certainly prime.) Using a primality test prove that p is prime. If it is not (an exceedingly rare occurrence), set q ← p and go to Step 2. Otherwise, by dividing n by p repeatedly, check whether n is a power of p or not. If it is not, n is not a prime power, otherwise output p. Terminate the algorithm.
We have been a little sloppy in this algorithm. For example in Step 4, instead of repeatedly dividing by p we could use a binary search analogous to the binary powering algorithm. We leave this as an exercise for the reader.
Cohen’s Algorithm 8.2.2 refers to the search for a witness to the compositeness of a number which we used in the exercise on the Miller-Rabin primality checker.
These two beautiful algorithms show the power and elegance of number theory. Cohen’s book is a fine example of the blend of mathematics and programming, and does an excellent job of explaining algorithms in a way that makes them easy to implement; most math textbooks aren’t so good.
Your task is to implement Cohen’s two powering predicates. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Carl Hewitt’s Same-Fringe Problem
August 3, 2010
Long ago, Carl Hewitt created the same-fringe problem as a demonstration of the simplest problem that requires concurrency to implement efficiently: Given two binary trees, determine if they have the same leaves in the same order, regardless of their internal structure. A solution that simply flattens both trees into lists and compares them element-by-element is unacceptable, as it requires space to store the intermediate lists and time to compute them even if a difference arises early in the computation.
Your task is to write a function that tests if two trees have the same fringe. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Fibonacci Numbers
July 30, 2010
One of the first functions taught to programmers who are just learning about recursion is the function to compute the fibonacci numbers. The naive function takes exponential time, as each recursive call must compute the values of smaller fibonacci numbers, so programmers are next taught how to remove the recursion by explicitly storing state information, giving a linear-time iterative algorithm. The upshot is that programming students are left with the impression that recursion is bad and iteration is good.
It is actually possible to improve the performance with a logarithmic-time algorithm. Consider the matrices
Each time the matrix is multiplied by itself, the number in the lower left-hand corner is the next fibonacci number; for instance, F4=3 (F0=0 is a special case). Of course, powering can be done using a binary square-and-multiply algorithm, as in the ipow and expm functions of the Standard Prelude, giving a logarithmic algorithm for computing the nth fibonacci number.
Your task is to write the three fibonacci functions — exponential, linear, and logarithmic — described above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
HAMURABI.BAS
July 27, 2010
Back in the 1970s, David Ahl wrote a new game program each month for Creative Computing magazine. Those were the days of all-caps teletypes (if you were rich you could get a new-fangled “glass teletype”) and punched paper tapes (it was fun to play with the confetti they made). MS-BASIC permitted twenty-six single-letter variable names; later they also allowed a single letter followed by a single digit. There were no user-defined functions and no recursion. GOTO was common, resulting in a phenomenon called “spaghetti code.” There was good news, however: it was acceptable programming practice to GOTO the middle of a FOR loop and run the code there, as long as you jumped back out of the loop before the corresponding NEXT — try to do that in your favorite functional language!
One of Ahl’s most memorable games was Hamurabi, in which the player took the role of the administrator of the ancient city of Sumeria, managing the grain and land resources of the city and trying to keep the residents from starvation. It is typical of the genre, with simple numeric input and scrolling text output. Here is a description and sample game, and the original BASIC source code is reproduced on the next page. By my count, there are fourteen lines that are unreachable except by an IF…THEN, GOSUB or GOTO, forty-three lines that redirect control flow away from the line below, and four instances (line 555 to 215, bypassing line 210, 453 and 479 to 440, bypassing 430, 441 to 511, bypassing 510, and 880 and 885 to 565, bypassing 560) of jumping into the middle of a block of code; that’s a fine bowl of spaghetti, considering the entire program is only 120 lines. Variable P represents the current population, S is the number of bushels in stores, and A is the number of acres of farmland owned by the city, but other variables are used inconsistently — for instance D sometimes represents the number of deaths in the current year, but other times it represents the current input value, and other times Q is used to represent the current input value.
Your task is to reimplement HAMURABI.BAS in a more modern computer language. Don’t peek at the solution unless you want to deprive yourself of the sheer joy of working out the spaghetti code and figuring out what the variables really stand for. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Happy Numbers
July 23, 2010
Over at SteamCode, Scott LaBounty suggests that writing a program to compute the happy numbers less than n makes a good interview question. According to Wikipedia:
A happy number is defined by the following process. Starting with any positive integer, replace the number by the sum of the squares of its digits, and repeat the process until the number equals 1 (where it will stay), or it loops endlessly in a cycle which does not include 1. Those numbers for which this process ends in 1 are happy numbers, while those that do not end in 1 are unhappy numbers (or sad numbers).
For example, 7 is a happy number, as 72=49, 42+92=16+81=97, 92+72=81+49=130, 12+32+02=1+9+0=10, and 12+02=1+0=1. But 17 is not a happy number, as 12+72=1+49=50, 52+02=25+0=25, 22+52=4+25=29, 22+92=4+81=85, 82+52=64+25=89, 82+92=64+81=145, 12+42+52=1+16+25=42, 42+22=16+4=20, 22+02=4+0=4, 42=16, 12+62=1+36=37, 32+72=9+49=58, and 52+82=25+64=89, which forms a loop.
Your task is to write a function to identify the happy numbers less than a given limit; you should work at the level of a programming interview, taking no more than about fifteen minutes, and giving a short explanation of your work to the interviewer. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Solving Systems Of Linear Equations
July 20, 2010
In today’s exercise we continue the examination of matrix operations that we began previously. Our goal is to be able to solve a system of equations; along the way we will see two methods for decomposing matrices.
We begin with some terminology. All the matrices we will be looking at are square, meaning that they have the same number of rows and columns. A lower-triangular matrix L has all entries Lij = 0 for i < j; thus, all entries above the main northwest-to-southeast diagonal are zero. An upper-triangular matrix U has all entries Uij = 0 for i > j; thus, all entries below the main northwest-to-southeast diagonal are zero. A lower- or upper-triangular matrix is unit lower- or upper-triangular if all the entries along the main diagonal are 1. A permutation matrix P has exactly one 1 in each row and column and 0 elsewhere; it is called a permutation matrix because multiplying a vector X by a permutation matrix has the effect of permuting the elements of X. The identity matrix I has 1 in each entry along the main diagonal and 0 elsewhere. A matrix M is singular if it has no inverse, that is, there is no matrix M-1 such that M M-1 = I.
An LU decomposition of a matrix A finds two matrices L, which is unit lower-triangular, and U, which is upper-triangular, such that A = L U. The algorithm is called Gaussian elimination, and works from top to bottom. First, multiples of the first equation are subtracted from the other equations so that the first variable is removed from those equations. Then multiples of the second equation are subtracted from the remaining equations so that the second variable is removed from those equations. Then the third equation, and the fourth, and so on, until all the equations have been processed and the matrix is in upper-triangular form. Here is an example of the LU decomposition of matrix A into its factors L × U:
There are two problems with LU decomposition: First, the algorithm leads to a divide-by-zero error on singular matrices. Second, it is prone to numerical instability for small divisors. The solution is to rearrange, or permute, the equations so that the pivot element is always the largest remaining element, greatly reducing the likelihood of numerical instability.
An improved decomposition is the LUP decomposition, which finds for an input matrix A three matrices L, U, and a permutation matrix P such that P A = L U. Rather than actually moving equations, the permutation matrix records the rearrangements. For example, here is the LUP decomposition of the matrix A given by P × A = L × U:
Given the LUP decomposition, it is simple to solve a system of linear equations. Forward substitution solves the lower-triangular system by calculating the first variable, which is part of an equation with one unknown, then substitutes that into the second equation, reducing it from two unknowns to one unknown, and so on. Then back substitution runs backward, calculating the final values of the variables in the original matrix. Here’s an example, where we wish to solve for the vector X given A X = B:
The LUP decomposition P A = L U is
,
the result of forward substitution L Y = P B is
, giving
,
and the result of the back substitution U X = Y is
, giving
,
which is the solution.
Your task is to write functions that perform LU-decomposition and LUP-decomposition and solve systems of linear equations. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Contents: Themes
July 16, 2010
In two previous exercises, we looked at the programs behind the tables of contents pages in chronological order and permuted by keywords. In today’s exercise, we complete this trio of exercises by writing the program that creates the themed table of contents page. Like the others, it is based on the praxis.info file, which was described previously. We won’t give the output format here, except to say that it is similar to the others, but with an additional header section for the list of themes.
Your task is to write a program that extracts needed data from the praxis.info file and writes the themed table of contents. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Word Cube
July 13, 2010
Word cube is game in which players form words from the nine letters in a cube. Words must have four or more letters and must use the central letter from the cube; at least one word will use all nine letters in the cube. The player who forms the most words wins. Many newspapers publish a word cube on their puzzle page, and Stealthcopter publishes a word cube on line daily. Wikipedia describes word cubes under the caption “word polygon.” There are twelve words formed from the word cube at right: bonnie, bunion, coin, concubine, conic, cubic, ennui, icon, nice, nine, nuncio, and union.
Your task is to write a program that finds all matching words for a given word cube. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Contents: Permuted Table Of Contents
July 9, 2010
We examined in a previous exercise a program that extracts a chronological listing of the exercises on the Programming Praxis website from the praxis.info file. We also discussed in a previous exercise a program that creates a permuted index. In today’s exercise we will combine those two programs into the program that is used to create the Permuted Table of Contents page at Programming Praxis.
The format of the praxis.info file was given in a previous exercise. The output from today’s program should look like this:
<table cellpadding="10">
<tr><td>129</td><td>20 Apr 2010</td><td align="right"> </td><td>145 Puzzle: Build and evaluate expressions using the digits one through nine and the simple arithmetic operators</td><td><a href="/2010/04/20/145-puzzle/">exercise</a> <a href="/2010/04/20/145-puzzle/2/">solution</a> <a href="http://programmingpraxis.codepad.org/SzbrJbjx">codepad</a></td></tr>
<tr><td>51</td><td>17 Jul 2009</td><td align="right">International Mathematical Olympiad: Three exercises from</td><td>1960s math competitions</td><td><a href="/2009/07/17/international-mathematical-olympiad/">exercise</a> <a href="/2009/07/17/international-mathematical-olympiad/2/">solution</a> <a href="http://programmingpraxis.codepad.org/JRGmt2wZ">codepad</a></td></tr>
...
</table>
Your task is to write a program that reads praxis.info and produces the permuted table of contents. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.