The Daily WTF, a web site that chronicles “curious perversions in information technology,” recently introduced a new feature called Programming Praxis in which simple programming exercises are assigned to readers who post their solutions and discuss the exercise in the comments. Alex Papadimoulis runs The Daily WTF.

On June 23rd, Programming Praxis published an exercise based on one of the stories at The Daily WTF. That was done only after consulting with The Daily WTF to ensure there was no copyright violation, and credited The Daily WTF as the source of the exercise, even providing a link back to the original The Daily WTF article.

Papadimoulis liked what he saw at Programming Praxis, and began discussing with me some kind of collaboration between the two web sites. After some discussion, on July 22nd The Daily WTF published a programming exercise of its own, based on the Russian peasant multiplication algorithm. That article used the phrase “Programming Praxis” in its title, and credited me with the idea, but did not refer to the Programming Praxis web site. That article was a success, generating over seven hundred comments with a high signal-to-noise ratio, and Papadimoulis and I began seriously discussing a collaboration.

While we were discussing how a collaboration would work, on July 29th The Daily WTF published a second programming exercise second article that also used the phrase “Programming Praxis” in its title, and borrowed the exercise from one previously discussed at Programming Praxis, but did not credit me or refer to the Programming Praxis web site.

At that point discussions about collaboration broke down. The problem was that the two sites had different goals: The Daily WTF is primarily entertainment, and Programming Praxis is primarily educational. The difference was highlighted by the decision to use the Josephus problem; Papadimoulis selected that problem because he thought of a neat way to use an animated gif to show how the soldiers die. I notified Papadimoulis that no collaboration was possible, and asked him not to use the name “Programming Praxis” in any future exercises he might publish.

Papadimoulis never responded to my email, but did respond on The Daily WTF by publishing on August 5th another exercise based on a common mathematical problem. The problem used the phrase “Programming Praxis” in its title, and Papadimoulis wrote, in the first comment, that “Programming Praxis” now had its own category on The Daily WTF; he also asked readers to submit tips for future “Programming Praxis” articles.

The name “Programming Praxis” belongs to me, not Papadimoulis. I have been publishing under that name twice a week for six months, and own the programmingpraxis.com domain. Papadimoulis is using the name without my permission, and against my expressed wishes.

Papadimoulis’ improper use of the name has already caused confusion in the marketplace of ideas. At proggit (http://www.reddit.com/r/programming/comments/95o33/programming_praxis_josephus_circle/), a reader named “bhrgunatha” says “I think the real WTF here is they are taking these exercises from the actual programming praxis site apparently against the license.”

After publication of the third exercise, I sent an email demanding that Papadimoulis cease and desist from using the phrase “Programming Praxis” to describe his weekly programming exercises. After four days, Papadimoulis responded that it is now too late to change the name, and that we would have to be happy to share it. I am not happy to share my name, and on Monday I sent a registered letter for next-day delivery demanding that Papadimoulis cease and desist from using the phrase “Programming Praxis.” However, The Daily WTF continues to use the phrase “Programming Praxis,” publishing yet another exercise under that name today.

A log of emails between Papadimoulis and me, complete except for a few emails arranging the time of our telephone conversation, appears on the next page. The emails show the history of the situation as it transpired.

I must defend my name. Thus, I am publishing this account of what happened. I will also explore trademark protection for my name, and such other legal action as may be required.

Thank you to all my regular readers for listening to my story. If you wish to help, you may act to provide wide attention to this situation in the blogosphere; feel free to post a link to this blog entry to your favorite forum, and add your comments. Your code, your comments and your private emails inspire me to continue publishing Programming Praxis. I apologize that I must engage you in this ugliness.

/s/ Philip L. Bewig

Pages: 1 2

Someone mentioned Uncle Bob’s Bowling Game Kata to me a few days ago; it is apparently famous as a tutorial on object-oriented programming and test-driven development, though I had never heard of either Uncle Bob or his Bowling Game Kata.

A game of tenpins bowling lasts ten frames, in each of which the bowler makes one or two attempts to knock down ten pins arranged in a triangle. If the bowler knocks down all ten pins on the first attempt (that’s called a “strike”), he scores ten pins plus the number of pins knocked down on his next two rolls. If the bowler knocks down all ten pins after two attempts (that’s called a “spare”), he scores ten pins plus the number of pins knocked down on his next roll. If the bowler fails to knock down all ten pins (that’s called an “open frame”), he scores the number of pins he knocked down. The scores accumulate through all ten frames. At the last frame, if necessary, the pins are reset for one or two additional rolls to count the final bonus. The sample scoresheet below shows how the calculations work:

Drawing by my daughter Kate

For instance, the score in the second frame is 9, the sum of the two balls in the open frame. The score in the third frame is 15, which is 10 for the spare plus 5 on the next ball. The score in the ninth frame is 20, which is 10 for the strike plus 10 for the spare on the first two balls of the tenth frame. The score in the tenth frame is 16, which is 10 for the spare plus 6 for the extra ball, which is a bonus ball not really part of any frame (the two balls of the tenth frame have already been rolled).

Your task is to write a function that calculates the score of a tenpins bowling game. When you are finished, you are welcome to read or run a suggested solution, or to post your solution or discuss the exercise in the comments below.

Pages: 1 2

ADFGX

August 7, 2009

The ADFGX cipher, and its later sibling the ADFGVX cipher, were field ciphers used by the German Army during the First World War. It is called ADFGX because those are the only letters that appear in the ciphertext, chosen to reduce operator error during Morse code transmission because they sound so different.

ADFGX is a fractionating transposition cipher invented by German Army Colonel Fritz Nebel in March 1918, and famously cryptanalyzed by French Army Lieutenant Georges Painvin in April 1918. The French version of the story is that Painvin broke the cipher in a single, sustained forty-eight hour session, reading a message that suggested where Ludendorff intended to attack, allowing the French to foil the attack and win the war; the German version differs, but the winners write the history.

ADFGX works in two phases. The first uses a 5×5 polybius square to represent the alphabet, with I and J combined:

  A D F G X
A b t a l p
D d h o z k
F q f v s n
G g j c u x
X m r e w y

The plaintext message is converted to ciphertext by noting the row and column headers where each character is located; for instance:

P  R  O  G  R  A  M  M  I  N  G  P  R  A  X  I  S
AX XD DF GA XD AF XA XA GD FX GA AX XD AF GX GD FG

Then, the partially-enciphered message is subject to columnar transposition; the original German version of the cipher added nulls at the end of short columns, but we will eschew this method in favor of the normal columnar transposition. Cryptographically, this is a rather strong method, especially in the days when cryptanalysis was done without aid of computers, because the two ciphertext characters that represent a single plaintext character are split, or fractionated, from one another. Using the transposition keyword TULIP, the transposition worked like this:

T U L I P
3 4 1 0 2
A X X D D
F G A X D
A F X A X
A G D F X
G A A X X
D A F G X
G D F G

Then the columns are read off in order: DXAFXGGXAXDAFFDDXXXXAFAAGDGXGFGAAD.

Decipherment is the inverse operation.

There are two keys, the polybius square and the transposition. German doctrine called for the square to be changed weekly and the transposition key to be changed daily, and use of the ADFGX cipher was restricted to high-level communications only. Painvin attacked the cipher by comparing multiple messages with identical beginnings, from which he was able to work out the transposition matrix (he kept swapping columns until a frequency count of digrams had the same shape as German-army plain-text), then he solved the remaining monoalphabetic substitution cipher on digrams by frequency analysis. Though the cipher is no longer secure, the combination of fractionated substitution and transposition is still used as the basis of some modern ciphers, including DES.

Your task is to implement encryption and decryption of the ADFGX cipher. When you are finished, you are welcome to read or run a suggested solution, or to post your solution or discuss the exercise in the comments below.

With this exercise, we return to the original arrangement in which the solution appears on the same day as the exercise. Delaying the solution makes the blog harder to maintain, and confuses readers who don't know where to write their comments. Solutions that appeared in the last few weeks have been merged with their corresponding exercises; in particular, the solution to Lenstra's Algorithm that was promised for today has been posted along with the original exercise.

Pages: 1 2

Lenstra’s Algorithm

August 4, 2009

Hendrik Lenstra devised the elliptic curve factorization algorithm in 1987, an algorithm that is simultaneously elegant and of immense practical importance. This exercise describes his original algorithm. With some algorithmic tweaks that we won’t discuss here, Lenstra’s algorithm is generally quick to find factors in the 20- to 25-digit range, slower to find factors in the 40-digit range, and ineffective if the factors are much larger than about 50 digits; at this writing, the largest factor found by the elliptic curve algorithm has 67 digits.

Lenstra’s elliptic curve factorization algorithm works by searching for a point on the elliptic pseudo-curve Ea,b (mod n), where n =pq, that exists modulo p but not modulo q, or vice versa. In the previous exercise, we searched for that point by picking an initial point and repeatedly adding it to multiples of itself until the elliptic algebra failed.

Instead of repeated addition, Lenstra speeds the search by borrowing from Pollard’s p-1 method the idea of searching over a large product of small integers. Pollard’s method uses the least common multiple of numerous small integers, but Lenstra instead uses a large factorial. For instance, if you multiply 10000! by a point on an elliptic pseudo-curve, you are likely to find a factor of a 50-digit number somewhere in the middle of the computation. But Lenstra’s method admits a continuation that Pollard’s lacks; if one elliptic curve fails, you can choose another and try again. Or you can also increase the multiplier; if you don’t find a factor with a limit of 10000! and 1000 curves, you can try again with a limit of 1000000! and 1000000 curves.

Of course, there are some details to fill in, and instead of computing 10000! and then performing the elliptic multiplication, the work is done in small steps, checking the elliptic algebra after each. Here is Lenstra’s original algorithm, searching a single curve to a given limit:

To get started, choose a parameter for the limit, choose a random pseudo-elliptic curve Ea,b (mod n) and choose a random point (x,y) on the curve. Calculate the discriminant d = 4a3 + 27b2 of the curve; if d = 0, the curve has a cusp or is self-intersecting, so choose a different curve. Then you can check if you were lucky; if gcd(d,n) > 1, d is a factor, so report it and quit. But most often d and n are co-prime, and you must continue.

The main body of Lenstra’s algorithm is a double-nested loop starting with random point (x,y):

for each prime p less than the limit
    for as many times as the base-p integer logarithm of the limit
        calculate a new (x,y) on Ea,b (mod n) as p times the old (x,y)
        if the calculation fails, report the finding of a factor

If you find a factor, the loop exits early; if the loop finishes, you can either choose a different curve, or increase the limit, or quit and report failure.

The elliptic addition, doubling and multiplication functions from the prior exercise need to be modified to recognize an error of the elliptic algebra before it occurs. For instance, when adding two points, first compute the denominator of the slope x1x2, and check that it has an inverse by calculating gcd(x1x2,n) before computing the slope and determining the sum of the two points; if the gcd is 1, you can proceed to calculate the inverse, if it is n you have been horribly unlucky (the point you are calculating isn’t on either of the two true sub-curves of the pseudo-curve) and you must exit the loop with failure, but if the gcd is strictly between 1 and n, it is a factor of n, so you can report it and quit. You may wish to calculate the integer logarithm in the inner loop by multiplying an accumulating product times the base p during the loop, stopping the loop when the accumulator exceeds the limit. The calculation of the large factorial is implicit in the two loops.

It is hard to choose the limit and the number of curves. In his original paper, Lenstra gives a formula for calculating exactly the limit and the number of curves, but unfortunately, the formula relies on the factorization, which of course is not yet known; his formula does give him a good way to prove the time-complexity bound, however. There are several papers that examine the question, but specific advice must be based on the specific implementation. The only general advice we can give is that the limit and number of curves must increase as the number to be factored increases, using too big or too small a limit and number of curves will cost you time rather than save it, and you will have to experiment with your implementation to know what works for you. Be aware that there can be considerable variance in the time it takes to perform a factorization, depending on how the random number generator chooses the parameters of the search.

Lenstra’s algorithm, as given above, searches a single curve to a given limit. To make a complete factorization, you must arrange to call it repeatedly in the event of failure, and then check the result, which may itself be composite, calling Lenstra’s algorithm recursively until the factorization is complete. And any real implementation of Lenstra’s algorithm first uses some light-weight method, such as trial division by small primes or Pollard’s rho algorithm, to quickly reduce the scope of a factorization.

Your task is to write a function that factors integers using Lenstra’s algorithm. Use your function to find the factors of (1081-1)/9; you may wish to do a timing comparison with Pollard’s rho factorization function. When you are finished, you are welcome to read or run a suggested solution, or to post your solution or discuss the exercise in the comments below.

Pages: 1 2

In the previous exercise we wrote a small library of functions on elliptic curves. Today we will use elliptic curves to perform integer factorization. The material in this exercise is based on a discussion in the second edition of the book Introduction to Cryptography with Coding Theory by Wade Trappe and Lawrence C. Washington.

It is easiest to explain the algorithm by an example. Consider the factorization of the number n = pq = 2773. To factor n, we choose an elliptic curve at random, let’s say y2x3 + 4x + 4 (mod 2773), where the modulus is the number to be factored. Of course, this is not really an elliptic curve, because the modulus of a real elliptic curve must be prime in order for us to compute the modular inverse required to determine the slope of the line connecting two points on the curve. It is exactly that fact that the elliptic curve algorithm exploits.

Next we choose a point on the curve; in this case, the point P = (1,3) works fine. In practice, it is easier to choose the a parameter randomly on the range 0 ≤ a < n, let the point be (1,1), then the b parameter is just –a, but you can select the parameters any way you like as long as a, b, x and y are all on the range 0 ≤ a < n and the point (x,y) is on the curve.

Now we begin a series of elliptic additions. The point 2P is (1771,705). Next we try to calculate 3P, but fail. The line through the points 2P = (1771,705) and P = (1,3) is 705 – 3 divided by 1771 – 1, the rise over the run, but gcd(1770,2773) = 59, so we cannot invert 1770 to calculate the slope of the line. We have found a point that is not on the elliptic curve. Of course, that is exactly what we were trying to do; 59 is a factor of 2773 = 59 × 47.

By the Chinese remainder theorem, the curve y2x3 + 4x + 4 (mod 2773) can be regarded as a pair of elliptic curves, one with modulus 59 and the other with modulus 47. Point 3P exists with modulus 47 but not with modulus 59. That’s how elliptic curve factorization works, allowing us to isolate the factor 59.

So the basic elliptic curve factorization algorithm is to choose a random elliptic curve (actually, a pseudo-elliptic curve) modulo the number to be factored, and a random point on the curve, then repeatedly build up multiples of the random point until the elliptic arithmetic fails, at which point the factor can be identified. The algorithm described here is a vastly simplified form of elliptic curve factorization, and is intended to teach the basic concepts rather than create a useful program; we’ll soon see a better version of the algorithm.

Your task is to write a function that performs integer factorization using the elliptic curve algorithm described above. Use your function to calculate the factors of 455839. When you are finished, you are welcome to read or run a suggested solution, or to post your solution or discuss the exercise in the comments below.

Pages: 1 2

Elliptic Curves

July 28, 2009

Image by Jean Brette.  Licensed under Creative Commons Attribution 3.0 Unported license.Elliptic curves are a mathematical oddity. Originally derived from elliptic integrals that measure the area under an ellipse, elliptic curves have relationships to algebra, geometry, and number theory. Elliptic curves featured in the proof of Fermat’s last theorem and are used today in integer factorization and cryptography. We’ll see applications of elliptic curves in future exercises; the task today is to work out the basic arithmetic of elliptic curves. The simple definition of an elliptic curve is that it provides all the solutions to a generalized cubic equation.

The standard form of an elliptic curve is y2 = x3 + ax + b; we can also write Ea,b. Any formula that is cubic in two variables can be rewritten into this form via suitable transformations. In addition to the x,y points defined by the formula, an elliptic curve includes a “point at infinity,” denoted by ∞, analogous to an artist’s “vanishing point” that allows him to depict a three-dimensional image on a two-dimensional canvas. We will consider only elliptic curves where the discriminant 4a3 + 27b2 is non-zero, meaning that the elliptic curve has no cusps or self-intersections.

The standard operation on an elliptic curve is the addition of two points, as shown in the diagram; this is, of course, different from vector addition of two points on a plane. Geometrically, points P and Q can be added by projecting a line between them and finding the “third point” R where the line intersects the curve, which is the negative of the point P + Q, as shown in the drawing by Jean Brette shown at right. Further, the point at infinity is the identity element for addition, so that P + ∞ = ∞ + P = P for any point P on an elliptic curve. Addition is both associative and commutative.

The addition law on elliptic curves is:

Given points P1 = (x1,y1) and P2 = (x2,y2), the point P3 = (x3,y3) = P1 + P2, where points P1, P2 and P3 lie on an elliptic curve y2 = x3 + ax + b with non-zero discriminant 4a3 + 27b2, can be calculated as

    x3 = s2x1x2

    y3 = y1 + s(x3x1)

where the slope s is

    s = (3x12 + a) / (2y1)

if P1 = P2 and

    s = (y1y2) / (x1x2)

otherwise.

The slope of the line connecting two points on an elliptic curve is calculated in the normal way, the difference in the rise divided by the difference in the run, except that if the two points are identical, the line “connecting” the two points is the tangent to the curve. Subtraction is addition of the negative, where the negative of (x,y) is (x,-y). Multiplication on an elliptic curve is just repeated addition, as it is with regular arithmetic.

We’ve been discussing elliptic curves over the real numbers, but the formulas work perfectly well on a finite field containing integers relative to a modulus. The division in the calculation of the slope is multiplication by the modular inverse, so the modulus must be prime. Our earlier exercise on modular arithmetic will be helpful.

A simple example may make things clear. Consider the elliptic curve y2x3 + 4x + 4 (mod 5). There are eight points on the curve: (0,2), (0,3), (1,2), (1,3), (2,0), (4,2), (4,3) and ∞; these points can be computed by substituting 0, 1, 2, 3 and 4 for x and computing the modular square root. Note that when x = 3 there are no solutions because 27 + 12 + 4 ≡ 43 ≡ 3 (mod 5) has no square roots, and when x = 2 there is only one solution, 8 + 8 + 4 ≡ 20 ≡ 0 (mod 5), and the square root of 0 is 0 regardless of the modulus. The sum of the points (1,2) and (4,3) is (4,2), the negative of the point (1,2) is (1,3), double the point (1,2) is (2,0), 3 times the point (1,2) is (1,3), 4 times the point (1,2) is ∞, and 5 times the point (1,2) is (1,2), whence the cycle repeats.

Your task is to write a library of functions for elliptic curves on a finite field. Your library should provide functions for addition, negation, subtraction, doubling, and multiplication; you may wish to represent a normal point as a triple (x y 1) and the point at infinity as the triple (0 1 0). When you are finished, you are welcome to read or run a suggested solution, or to post your solution or discuss the exercise in the comments below.

Pages: 1 2

Let’s Make A Deal!

July 24, 2009

Let’s Make A Deal! was a television game show that originated in the United States in the 1960s and has since been produced throughout the world. In one of the games, prizes were placed behind three doors, and the contestant was asked to pick a door; one door hid an automobile, the other two doors hid goats. Then the host, who knows what is behind the doors, opened one of the doors that the contestant did not pick, revealing in every case a goat, and asks if you want to switch doors. Is it to your advantage to switch your choice of doors?

Your task is to write a program that simulates a large number of games and determines whether or not it is to your advantage to switch doors. What are the odds that you will win the automobile by switching? When you are finished, you are welcome to read or run a suggested solution, or to post your solution or discuss the exercise in the comments below.

Pages: 1 2

Fermat’s little theorem states that for any prime number p, and any other number a, a^{p-1} \equiv 1 \pmod{p}. Rearranging terms, we have a^{p-1} - 1 \equiv 0 \pmod{p}, which means that p divides the left-hand side of the equation, thus p is a factor of the left-hand side.

John Pollard used this idea to develop a factoring method in 1974. His idea is to choose a very large number and see if it has any factors in common with a^{p-1} - 1, thus giving a factor of p. A systematic way to test many very large numbers is by taking large factorials, which have many small factors within them. Thus, Pollard’s p-1 factorization algorithm is this:

1) Choose b, which is known as the smoothness bound, and calculate the product of the primes to b by k = \mathrm{lcm}(1,2,3,\ldots,b).

2) Choose a random integer a such that 1 < a < n.

3) Calculate g=\gcd(a,n). If g is strictly greater than 1, then it is a non-trivial factor of n. Otherwise, continue to the next step.

4) Calculate d = \gcd(a^k - 1, n). If 1 < d < n, then d is a non-trivial factor of n. If d=1, go to Step 1 and choose a larger b. If d=n, go back to Step 2 and choose another a.

In Step 4, you can quit with failure instead of returning to Step 1 ifb becomes too large, where “too large” is probably somewhere around 10^6; in that case, you will need to continue with some other factoring algorithm.

Your task is to write a function that factors integers by Pollard’s p-1 method. What are the factors of 2^{98}-1? When you are finished, you are welcome to read or run a suggested solution, or to post your solution or discuss the exercise in the comments below.

Pages: 1 2

These three problems from the International Mathematical Olympiad recently popped up on the web.

IMO 1960 Problem 01

Determine all three-digit numbers N having the property that N is divisible by 11, and N/11 is equal to the sum of the squares of the digits of N.

IMO 1962 Problem 01

Find the smallest natural number n which has the following properties:

(a) Its decimal representation has 6 as the last digit.
(b) If the last digit 6 is erased and placed in front of the remaining digits, the resulting number is four times as large as the original number n.

IMO 1963 Problem 06

Five students, A,B,C,D,E, took part in a contest. One prediction was that the contestants would finish in the order ABCDE. This prediction was very poor. In fact no contestant finished in the position predicted, and no two contestants predicted to finish consecutively actually did so. A second prediction had the contestants finishing in the order DAECB. This prediction was better. Exactly two of the contestants finished in the places predicted, and two disjoint pairs of students predicted to finish consecutively actually did so. Determine the order in which the contestants finished.

Your task is to solve the three problems. When you are finished, you are welcome to read or run a suggested solution, or to post your solution or discuss the exercise in the comments below.

Pages: 1 2

The Daily Cryptogram

July 14, 2009

Many newspapers publish a cryptogram each day; for instance:

P OYUUAOEXYW YM AEFGAD, FZGPEAG JAATUL, MYC ENA AGFOPEXYW PWG AWSYLVAWE YM ENA DPHHL ZCYICPVVAC

The deciphered cryptogram is usually a quote from a famous author, often advice for daily life or a pithy saying. Cryptographically, these puzzles are monoalphabetic substitution ciphers, with each of the twenty-six plain-text letters systematically replaced by a cipher-text letter.

Your task is to write a program that decrypts the daily cryptogram, and use that program to read the message given above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2