Pollard Rho, Revisited

January 21, 2011

We examined John Pollard’s rho algorithm for factoring integers in a previous exercise. In today’s exercise we will consider some improvements to the basic algorithm.

Let’s begin with a quick refresher. Consider the sequence xxn+1 = (xn)2 + c (mod n), which becomes cyclic in approximately √n steps. If n = p q with p and q relatively prime, then there will be a pair of numbers xi and xj such that p = gcd(|xixj|, n). The rho algorithm runs the sequence, for random x0 and c, looking for the pair of numbers xi and xj for which the gcd is between 1 and n, giving a factor of n.

If the sequence becomes cyclic without finding a factor, it is necessary to try again with different values of x0 and c. Pollard used Robert Floyd’s tortoise-and-hare cycle finding algorithm, which compares the values of xi (the “tortoise”) and x2i (the “hare”, because it moves twice as fast through the cycle as the tortoise); if they are ever equal, a cycle has been identified. Later, Richard Brent devised an alternate method of cycle detection. Each time i is a power of two, the value of xi is saved; if a subsequent xj = xi is found before j = 2i, a cycle has been identified. Brent’s cycle-finding algorithm requires only one modular multiplication per step, instead of the three modular multiplications required by Floyd’s cycle-finding algorithm, so even though Brent’s method typically requires more steps that Floyd’s method, in practice the number of modular multiplications is generally about a quarter less than Floyd’s method, giving a welcome speed-up to Pollard’s factoring algorithm.

There are two other improvements to Pollard’s algorithm that apply to both the Floyd and Brent variants. First, it may be useful to perform trial division to find small divisors before the main algorithm starts; it is certainly necessary to remove factors of 2, and removing larger factors can also save time. Second, because the gcd required at each step is computationally expensive, the numbers |xixj| may be accumulated (multiplied all together) modulo n for several steps, then the gcd of the modular product is taken with n. If the number of steps between gcd operations is, say, 100, the cost of a gcd at each step is replaced by the cost of a modular multiplication, which is smaller.

Your task is to write two versions of the Pollard rho function, one using Floyd’s cycle-finding algorithm, the other using Brent’s cycle-finding algorithm, both versions using trial division to identify small factors and both using the short-circuit gcd procedure, and to compare timings to determine which algorithm, using which parameters for the limit of trial division and the limit of the gcd short-circuit, makes the fastest implementation of Pollard’s algorithm. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Solitaire Cipher

January 18, 2011

In his book Cryptonomicon, Neal Stephenson has his characters communicate using a cipher called Pontifex. Pontifex is based on the Solitaire cipher developed by Bruce Schneier, and uses an ordinary deck of cards, thirteen cards in each of four suits, plus two distinguishable jokers, to generate a keystream that is added to plain-text to form cipher-text, or subtracted from cipher-text to form plain-text.

After the deck is keyed, a single step consists of four operations on the deck. First, the “A” joker is moved one card down the deck, wrapping around the end of the deck if necessary. Second, the “B” joker is moved two cards down the deck, again wrapping around the deck if necessary. Third, a triple-cut swaps all the cards above the highest joker in the deck with all the cards below the lowest joker in the deck, leaving the two jokers and the cards between them in place. Fourth, a counted cut, based on the number of the bottom card in the deck, moves the top “count” cards to just above the bottom card; the cards are numbered 1 to 52 in “bridge order” with ace low to king high in each suit, clubs, diamonds, hearts, spades, and either joker counting as 53. Then look at the top card in the deck and count down the given number to determine the current key card.

For example, given an initial deck in bridge order 1, 2, …, 52, A, B, where the two jokers are A and B, the first operation moves the A joker one card down the deck leaving 1, 2, …, 52, B A, the second operation moves the B joker two cards down the deck leaving 1, B, 2, …, 52, A, the third operation performs a triple cut (the second half of the cut is empty) leaving B, 2, …, 52, A, 1, and the fourth step performs a count cut taking one card (because the bottom card on the deck is 1) leaving 2, …, 52, A, B, 1. Then the output card is 4, the four of clubs, because the top card of the deck is 2 and the second card below it is 4.

Before encrypting or decrypting a message, the deck must be “keyed.” Begin with a deck in bridge order and perform a single step. Then, for each character in the key, do a counted cut on the number of the current character, with A=1 … Z=26, followed by another single step. Once the deck is keyed and you have a keystream, each character is added (for encryption) or subtracted (for decryption) from the current text character, wrapping around the alphabet as necessary, so that A+A=B and T+Q=K; note that Z is the identity character, so F+Z=F. The plain-text has nulls (the letter X) added to the end to make the message length a multiple of five, and the cipher-text is split into five-character blocks for convenience.

Schneier gives three examples. Given the plaintext AAAAAAAAAA and null key, the keystream is 4 49 10 (53) 24 8 51 44 6 4 33 (the joker is skipped) and the ciphertext is EXKYI ZSGEH. Given the plaintext AAAAAAAAAAAAAAA and key FOO, the keystream is 8 19 7 25 20 (53) 9 8 22 32 43 5 26 17 (53) 38 48 and the ciphertext is ITHZU JIWGR FARMW. Given the plaintext SOLITAIRE and key CRYPTONOMICON, the keystream is 44 46 32 18 17 18 23 44 22 42 and the ciphertext is KIRAK SFJAN.

If you actually run the cipher with a deck of cards, you will find that, with just a little practice, your hands work the keystream generator themselves with little conscious thought, and you will soon memorize the wrap-around character addition rules like T+Q=K; the biggest problem with the cipher, like any output-feedback cipher, is that a single mistake renders all trailing text unreadable. This cipher is best used for low-volume transmission of short messages. If you use it for real security, your key should have at least eighty characters, and you should never use the same key to transmit two different messages. An easy way for two communicants to manage keys is for both to use some printed source, say the lead editorial in the daily newspaper or the pages of a favorite novel (be sure both are using the same edition), selecting the key as the first 80 characters starting at the 37th, say, and giving the date or page as a header to the encrypted message.

Your task is to write functions that encrypt and decrypt using the solitaire cipher. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Slots

January 14, 2011

We played Hamurabi in a previous exercise. Today we will play the slot machines, using a version of the game from the same source as Hamurabi. Playing instructions and source code appear on the original pages or our copies.

Your task is to recreate the BASIC game. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2 3 4

Two Integrals

January 11, 2011

The exponential integral appears frequently in the study of physics, and the related logarithmic integral appears both in physics and in number theory. With the Euler-Mascheroni constant γ = 0.5772156649015328606065, formulas for computing the exponential and logarithmic integral are:

\mathrm{Ei}\ (x) = -\int_{-x}^{\infty}\frac{e^{-t}\ d\ t}{t} = \gamma + \mathrm{ln}\ x + \sum_{k=1}^{\infty}\frac{x^k}{k \cdot k!}

\mathrm{Li}\ (x) = \mathrm{Ei}\ (\mathrm{ln}\ x) = \int_{0}^{x}\frac{d\ t}{\mathrm{ln}\ t} = \gamma + \mathrm{ln}\ \mathrm{ln}\ x + \sum_{k=1}^{\infty}\frac{(\mathrm{ln}\ x)^k}{k \cdot k!}

Since there is a singularity at Li(1) = −∞, the logarithmic integral is often given in an offset form with the integral starting at 2 instead of 0; the two forms of the logarithmic integral are related by Lioffset(x) = Li(x) – Li(2) = Li(x) – 1.04516378011749278. It is this form that we are most interested in, because the offset logarithmic integral is a good approximation of the prime counting function π(x), which computes the number of primes less than or equal to x:

x 106 1021
Lioffset(x) 78627 21127269486616126182
π(x) 78498 21127269486018731928

If you read the mathematical literature, you should be aware that there is some notational confusion about the two forms of the logarithmic integral: some authors use Li for the logarithmic integral and li for its offset variant, other authors turn that convention around, and still other authors use either notation in either (or both!) contexts. The good news is that in most cases it doesn’t matter which variant you choose.

Your task is to write functions that compute the exponential integral and the two forms of the logarithmic integral. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Counting Primes

January 7, 2011

We celebrate our 200th exercise by implementing the prime-counting function, which has long been of interest to mathematicians. Generally denoted by the letter π, the prime-counting function π(n) returns the number of primes less than or equal to n; for instance, π(100) = 25. A related function, generally denoted as Pn, returns the nth prime number; thus, π(Pn) = n.

There are various analytic methods to compute the prime-counting function; the method invented by Ernst Meissel, improved by Derrick Henry Lehmer, and automated by Andrew Odlyzko is the best-known of them. But the obvious brute-force method of generating and counting the primes is hard to beat if you pre-compute a list of starting points, then use a segmented sieve to interpolate.

Your task is to write the prime-counting function and the nth-prime function. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2 3

Dijkstra’s Algorithm

January 4, 2011

[ Today’s exercise was written by guest author Graham Enos, a PhD student in the Applied Mathematics program at UNC Charlotte, with solution in Python rather than Scheme. Suggestions for exercises are always welcome, or you may wish to contribute your own exercise; feel free to contact me if you are interested. ]

We’ve worked with directed graphs (“digraphs”) in a recent exercise. Today’s exercise is another graph theoretical procedure with many applications: Dijkstra’s Algorithm.

Published by Edgar Dijkstra in 1959, this algorithm searches a digraph for the shortest paths beginning at a specified vertex; to quote the Wikipedia article,

For example, if the vertices of the graph represent cities and edge path costs represent driving distances between pairs of cities connected by a direct road, Dijkstra’s algorithm can be used to find the shortest route between one city and all other cities. As a result, the shortest path first is widely used in network routing protocols, most notably IS-IS and OSPF (Open Shortest Path First).

For instance, in the digraph pictured at right, the shortest path from vertex a to e is [a, c, e]. This path has a length of 7, which is shorter than the path [a, d, e] (which has a length of 16).

The algorithm works its way through the vertices, indexed by their estimated distance from the starting point. At each vertex, the procedure “relaxes” all its outward edges, updating distance and predecessor estimates accordingly. Once complete, the algorithm has in effect constructed a subtree of the graph that gives all the shortest distance paths from the starting vertex to any other vertex in the graph via the predecessor attribute. The shortest path to the desired finishing vertex can then be reconstructed by traversing the predecessor subtree backwards. See Dijkstra’s 1959 paper “A Note on Two Problems in Connexion with Graphs” in Numerische Mathematik (Volume 1, pages 269-271) for more information.

Your task: given a weighted digraph D = (V, E) where the edge weights represent distances between two vertices, a starting vertex s, and a destination vertex f, use Dijkstra’s Algorithm to find the shortest path from s to f in G. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Arithmetic Drill

December 31, 2010

Children just learning the basic facts of arithmetic need repetitious drill to certify their knowledge, something that a computer does well. Toy stores purvey many brightly-colored plastic boxes that do the job, at a price. Today’s exercise asks you to do the same, minus the plastic box. Consider the following dialog, where the computer’s output is in roman and the child’s response is in italic:

4 + 4 = 8
Right!
8 + 3 = 12
Wrong, try again!
8 + 3 = 11
Right!
9 + 4 = 13
Right!
7 + 8 = ?
15
9 + 5 = 14
Right!
8 + 0 = ^Z
Goodbye!

After the drill program presents a problem, the child either enters his answer, asks for help by entering a question mark, or quits by entering an end-of-file. Correct answers are rewarded, incorrect answers cause the problem to be asked again.

Your task is to write a program that drills children on their addition facts. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Carmichael Numbers

December 28, 2010

Pierre de Fermat’s little theorem states that if p is prime, then for any integer a, apa is evenly divisible by p, which is usually stated as apa (mod p); Fermat’s little theorem is often stated as the corollary ap−1 ≡ 1 (mod p).

By turning it around, Fermat’s little theorem may be used as a primality test: if you can find an a for which ap−1 ≢ 1 (mod p), then p is certainly composite; for instance, 262 ≡ 4 (mod 63), so 63 is composite, and 2 is a witness to its compositeness. Thus Fermat’s primality test consists of choosing many numbers and checking if they are witnesses to the compositeness of the number being tested.

Unfortunately, there are some composite numbers which pass Fermat’s primality test for all possible witnesses; they are called Carmichael numbers, after Robert Carmichael, who studied them. Carmichael discovered in 1910 that 561 is a Fermat pseudo-prime to all possible bases.

One way to test that a number is a Carmichael number is to test all primes less than the number that are coprime to it. A faster test, due to Alwin Korselt, notes that Carmichael numbers are odd, composite, square-free (no factors appear more than once) and f−1 ∣ n−1 for all factors f of n.

Because there exist numbers that fool Fermat’s primality test for all bases, a strong pseudo-primality test is often used. A composite number n = d · 2s + 1 with d odd is a strong pseudoprime to a relatively prime base a if and only if either ad ≡ 1 (mod n) or ad·2r ≡ −1 (mod n) for some 0 ≤ rs−1. For instance, 231 ≡ 2 (mod 63), so 63 = 31 · 21 + 1 is composite, and 2 is a witness to that compositeness. The beauty of the strong pseudo-primality test is that it harbors no “Carmichael numbers;” every composite has at least one strong witness.

Your tasks are to write two functions that test if a number is a Fermat pseudo-prime or a strong pseudo-prime to a given base, two functions that test primality using the Fermat and strong pseudo-prime tests, and two functions that test if a number is a Carmichael number, and to identify all the Carmichael numbers less than 10,000. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Tracking Santa

December 24, 2010

Each year since 1955, Norad has tracked Santa Claus on his annual journey to deliver toys to good little girls and boys around the world. Since Santa must file his flight plan in advance, we already know where his journey will take him: the route at http://www.noradsanta.org/js/data.js is reproduced on the next page.

Your task is to calculate the number of miles that Santa will travel during his journey; you might find Wikipedia’s Great-circle distance page helpful. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2 3

Interval Arithmetic

December 21, 2010

Over at his blog, where he has been solving the exercises in SICP, Bill Cruise reached the section of the book that discusses interval arithmetic; if you want an introduction to the topic, you might want to look at http://www.cs.utep.edu/interval-comp/hayes.pdf.

Given an interval [x,y] with x the lower bound and y the upper bound, the basic operations of interval arithmetic are defined as follows:

  • [a,b] + [c,d] = [a+c,b+d]
  • [a,b] − [c,d] = [ad,bc]
  • [a,b] × [c,d] = [min(ac,ad,bc,bd), max(ac,ad,bc,bd)]
  • [a,b] ÷ [c,d] = [min(a/c,a/d,b/c,b/d), max(a/c,a/d,b/c,b/d)], where division by an interval containing zero is undefined

Your task is to write a basic library for interval arithmetic; you may do as many of the SICP exercises as you wish. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2