Population Count
January 28, 2011
The population count of a bitstring is the number of set bits (1-bits) in the string. For instance, the population count of the number 23, which is represented in binary as 101112, is 4. The population count is used in cryptography and error-correcting codes, among other topics in computer science; some people use it as an interview question. The population count is also known as Hamming weight.
Your task is to write a function that determines the population count of a number representing a bitstring; you should concern yourself with the efficiency of your solution. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Rational Numbers
January 25, 2011
A rational number, or fraction, is the ratio of two integers; for instance, 4/7 is the ratio of 4 (the numerator) divided by 7 (the denominator). No rational number may have a denominator of zero. Rational numbers respond to all the same operations as integers, including addition, subtraction, multiplication and division, and they may also be compared by the less-than operator.
Your task is to write functions that perform arithmetic on rational numbers. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Pollard Rho, Revisited
January 21, 2011
We examined John Pollard’s rho algorithm for factoring integers in a previous exercise. In today’s exercise we will consider some improvements to the basic algorithm.
Let’s begin with a quick refresher. Consider the sequence xxn+1 = (xn)2 + c (mod n), which becomes cyclic in approximately √n steps. If n = p q with p and q relatively prime, then there will be a pair of numbers xi and xj such that p = gcd(|xi – xj|, n). The rho algorithm runs the sequence, for random x0 and c, looking for the pair of numbers xi and xj for which the gcd is between 1 and n, giving a factor of n.
If the sequence becomes cyclic without finding a factor, it is necessary to try again with different values of x0 and c. Pollard used Robert Floyd’s tortoise-and-hare cycle finding algorithm, which compares the values of xi (the “tortoise”) and x2i (the “hare”, because it moves twice as fast through the cycle as the tortoise); if they are ever equal, a cycle has been identified. Later, Richard Brent devised an alternate method of cycle detection. Each time i is a power of two, the value of xi is saved; if a subsequent xj = xi is found before j = 2i, a cycle has been identified. Brent’s cycle-finding algorithm requires only one modular multiplication per step, instead of the three modular multiplications required by Floyd’s cycle-finding algorithm, so even though Brent’s method typically requires more steps that Floyd’s method, in practice the number of modular multiplications is generally about a quarter less than Floyd’s method, giving a welcome speed-up to Pollard’s factoring algorithm.
There are two other improvements to Pollard’s algorithm that apply to both the Floyd and Brent variants. First, it may be useful to perform trial division to find small divisors before the main algorithm starts; it is certainly necessary to remove factors of 2, and removing larger factors can also save time. Second, because the gcd required at each step is computationally expensive, the numbers |xi – xj| may be accumulated (multiplied all together) modulo n for several steps, then the gcd of the modular product is taken with n. If the number of steps between gcd operations is, say, 100, the cost of a gcd at each step is replaced by the cost of a modular multiplication, which is smaller.
Your task is to write two versions of the Pollard rho function, one using Floyd’s cycle-finding algorithm, the other using Brent’s cycle-finding algorithm, both versions using trial division to identify small factors and both using the short-circuit gcd procedure, and to compare timings to determine which algorithm, using which parameters for the limit of trial division and the limit of the gcd short-circuit, makes the fastest implementation of Pollard’s algorithm. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Solitaire Cipher
January 18, 2011
In his book Cryptonomicon, Neal Stephenson has his characters communicate using a cipher called Pontifex. Pontifex is based on the Solitaire cipher developed by Bruce Schneier, and uses an ordinary deck of cards, thirteen cards in each of four suits, plus two distinguishable jokers, to generate a keystream that is added to plain-text to form cipher-text, or subtracted from cipher-text to form plain-text.
After the deck is keyed, a single step consists of four operations on the deck. First, the “A” joker is moved one card down the deck, wrapping around the end of the deck if necessary. Second, the “B” joker is moved two cards down the deck, again wrapping around the deck if necessary. Third, a triple-cut swaps all the cards above the highest joker in the deck with all the cards below the lowest joker in the deck, leaving the two jokers and the cards between them in place. Fourth, a counted cut, based on the number of the bottom card in the deck, moves the top “count” cards to just above the bottom card; the cards are numbered 1 to 52 in “bridge order” with ace low to king high in each suit, clubs, diamonds, hearts, spades, and either joker counting as 53. Then look at the top card in the deck and count down the given number to determine the current key card.
For example, given an initial deck in bridge order 1, 2, …, 52, A, B, where the two jokers are A and B, the first operation moves the A joker one card down the deck leaving 1, 2, …, 52, B A, the second operation moves the B joker two cards down the deck leaving 1, B, 2, …, 52, A, the third operation performs a triple cut (the second half of the cut is empty) leaving B, 2, …, 52, A, 1, and the fourth step performs a count cut taking one card (because the bottom card on the deck is 1) leaving 2, …, 52, A, B, 1. Then the output card is 4, the four of clubs, because the top card of the deck is 2 and the second card below it is 4.
Before encrypting or decrypting a message, the deck must be “keyed.” Begin with a deck in bridge order and perform a single step. Then, for each character in the key, do a counted cut on the number of the current character, with A=1 … Z=26, followed by another single step. Once the deck is keyed and you have a keystream, each character is added (for encryption) or subtracted (for decryption) from the current text character, wrapping around the alphabet as necessary, so that A+A=B and T+Q=K; note that Z is the identity character, so F+Z=F. The plain-text has nulls (the letter X) added to the end to make the message length a multiple of five, and the cipher-text is split into five-character blocks for convenience.
Schneier gives three examples. Given the plaintext AAAAAAAAAA and null key, the keystream is 4 49 10 (53) 24 8 51 44 6 4 33 (the joker is skipped) and the ciphertext is EXKYI ZSGEH. Given the plaintext AAAAAAAAAAAAAAA and key FOO, the keystream is 8 19 7 25 20 (53) 9 8 22 32 43 5 26 17 (53) 38 48 and the ciphertext is ITHZU JIWGR FARMW. Given the plaintext SOLITAIRE and key CRYPTONOMICON, the keystream is 44 46 32 18 17 18 23 44 22 42 and the ciphertext is KIRAK SFJAN.
If you actually run the cipher with a deck of cards, you will find that, with just a little practice, your hands work the keystream generator themselves with little conscious thought, and you will soon memorize the wrap-around character addition rules like T+Q=K; the biggest problem with the cipher, like any output-feedback cipher, is that a single mistake renders all trailing text unreadable. This cipher is best used for low-volume transmission of short messages. If you use it for real security, your key should have at least eighty characters, and you should never use the same key to transmit two different messages. An easy way for two communicants to manage keys is for both to use some printed source, say the lead editorial in the daily newspaper or the pages of a favorite novel (be sure both are using the same edition), selecting the key as the first 80 characters starting at the 37th, say, and giving the date or page as a header to the encrypted message.
Your task is to write functions that encrypt and decrypt using the solitaire cipher. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Slots
January 14, 2011
We played Hamurabi in a previous exercise. Today we will play the slot machines, using a version of the game from the same source as Hamurabi. Playing instructions and source code appear on the original pages or our copies.
Your task is to recreate the BASIC game. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Two Integrals
January 11, 2011
The exponential integral appears frequently in the study of physics, and the related logarithmic integral appears both in physics and in number theory. With the Euler-Mascheroni constant γ = 0.5772156649015328606065, formulas for computing the exponential and logarithmic integral are:
Since there is a singularity at Li(1) = −∞, the logarithmic integral is often given in an offset form with the integral starting at 2 instead of 0; the two forms of the logarithmic integral are related by Lioffset(x) = Li(x) – Li(2) = Li(x) – 1.04516378011749278. It is this form that we are most interested in, because the offset logarithmic integral is a good approximation of the prime counting function π(x), which computes the number of primes less than or equal to x:
x | 106 | 1021 |
Lioffset(x) | 78627 | 21127269486616126182 |
π(x) | 78498 | 21127269486018731928 |
If you read the mathematical literature, you should be aware that there is some notational confusion about the two forms of the logarithmic integral: some authors use Li for the logarithmic integral and li for its offset variant, other authors turn that convention around, and still other authors use either notation in either (or both!) contexts. The good news is that in most cases it doesn’t matter which variant you choose.
Your task is to write functions that compute the exponential integral and the two forms of the logarithmic integral. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Counting Primes
January 7, 2011
We celebrate our 200th exercise by implementing the prime-counting function, which has long been of interest to mathematicians. Generally denoted by the letter π, the prime-counting function π(n) returns the number of primes less than or equal to n; for instance, π(100) = 25. A related function, generally denoted as Pn, returns the nth prime number; thus, π(Pn) = n.
There are various analytic methods to compute the prime-counting function; the method invented by Ernst Meissel, improved by Derrick Henry Lehmer, and automated by Andrew Odlyzko is the best-known of them. But the obvious brute-force method of generating and counting the primes is hard to beat if you pre-compute a list of starting points, then use a segmented sieve to interpolate.
Your task is to write the prime-counting function and the nth-prime function. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Dijkstra’s Algorithm
January 4, 2011
[ Today’s exercise was written by guest author Graham Enos, a PhD student in the Applied Mathematics program at UNC Charlotte, with solution in Python rather than Scheme. Suggestions for exercises are always welcome, or you may wish to contribute your own exercise; feel free to contact me if you are interested. ]
We’ve worked with directed graphs (“digraphs”) in a recent exercise. Today’s exercise is another graph theoretical procedure with many applications: Dijkstra’s Algorithm.
Published by Edgar Dijkstra in 1959, this algorithm searches a digraph for the shortest paths beginning at a specified vertex; to quote the Wikipedia article,
For example, if the vertices of the graph represent cities and edge path costs represent driving distances between pairs of cities connected by a direct road, Dijkstra’s algorithm can be used to find the shortest route between one city and all other cities. As a result, the shortest path first is widely used in network routing protocols, most notably IS-IS and OSPF (Open Shortest Path First).
For instance, in the digraph pictured at right, the shortest path from vertex a to e is [a, c, e]. This path has a length of 7, which is shorter than the path [a, d, e] (which has a length of 16).
The algorithm works its way through the vertices, indexed by their estimated distance from the starting point. At each vertex, the procedure “relaxes” all its outward edges, updating distance and predecessor estimates accordingly. Once complete, the algorithm has in effect constructed a subtree of the graph that gives all the shortest distance paths from the starting vertex to any other vertex in the graph via the predecessor attribute. The shortest path to the desired finishing vertex can then be reconstructed by traversing the predecessor subtree backwards. See Dijkstra’s 1959 paper “A Note on Two Problems in Connexion with Graphs” in Numerische Mathematik (Volume 1, pages 269-271) for more information.
Your task: given a weighted digraph D = (V, E) where the edge weights represent distances between two vertices, a starting vertex s, and a destination vertex f, use Dijkstra’s Algorithm to find the shortest path from s to f in G. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.