Integer Roots
April 19, 2013
We filled in a missing piece of the Standard Prelude in the previous exercise. In today’s exercise we fill in another missing piece, the integer root function: iroot(k, n) returns the largest integer x such that xk does not exceed n, assuming k and n are both positive integers. For example, iroot(3, 125) equals 5, because 53 equals 125; likewise, iroot(3, 126) equals 5, but iroot(3, 124) is 4, because 53 is greater than 124.
Your task is to write the integer root function. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Date Formatting
April 16, 2013
In many programming languages, dates have a kind of “second class” relationship to data types such as integers, characters, strings and arrays; functions related to dates, such as finding the number of days between two dates, may be relegated to libraries or omitted entirely. Our Standard Prelude provides a few simple date functions, but missing is a function to neatly format a date.
Your task is to write a function that takes a date and returns a neatly-formatted string representing the input date; your function may be as simple or as elaborate as you wish. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Booth’s Algorithm
April 12, 2013
One of the pleasures of writing a blog like mine is that I get to learn from my readers just as they learn from me. In a comment on the previous exercise on cyclic equality, reader Maurits pointed to Booth’s algorithm, which I had never previously heard of. Note that Booth’s algorithm is somewhat different than the one we gave in the previous exercise, since Booth’s algorithm relies on an ordering predicate between elements of the cycle while our algorithm relies only on an equality predicate.
Booth’s algorithm creates a canonical rotation of a cycle by putting it in the least possible lexicographic order out of all possible rotations. This is done by a variant of the Knuth-Morris-Pratt string search algorithm. We won’t give the algorithm here, as Wikipedia does a fine job; Booth’s original paper, referenced at Wikipedia, is behind a paywall. Here is Wikipedia’s explanation:
The algorithm uses a modified preprocessing function from the Knuth-Morris-Pratt string search algorithm. The failure function for the string is computed as normal, but the string is rotated during the computation so some indices must be computed more than once as they wrap around. Once all indices of the failure function have been successfully computed without the string rotating again, the minimal lexicographical rotation is known to be found and its starting index is returned. The correctness of the algorithm is somewhat difficult to understand, but it is easy to implement.
Your task is to implement the cyclic equality test of the prior exercise using Booth’s linear-time algorithm. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Cyclic Equality
April 9, 2013
Two cyclic lists are equal if they have the same number of elements in the same order. For instance, the cyclic lists (1 2 3 4 5) and (3 4 5 1 2) are equal. So are the cyclic lists (1 1 2 2) and (2 1 1 2), and also the cyclic lists (1 1 1 1) and (1 1 1 1). The two cyclic lists (1 2 3 4) and (1 2 3 5) are not equal because they have different elements, and the cyclic lists (1 1 1) and (1 1 1 1) are not equal because they have different lengths.
Your task is to write a function that takes two cyclic lists and determines if they are equal. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Last Non-Zero Digit Of A Factorial
April 5, 2013
Today’s exercise appears from time to time on beginning-programmer message boards:
Write a program that, given n, returns the last non-zero digit of n! (factorial). For instance, 7! = 1 * 2 * 3 * 4 * 5 * 6 * 7 = 5040, and its last non-zero digit is 4.
Your task is to write a program to find the last non-zero digit of a factorial. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Quadratic Sieve With Large Primes
April 2, 2013
We implemented a very basic quadratic sieve in a previous exercise. In today’s exercise we will implement three improvements to the sieving phase of the algorithm.
The first improvement comes from the world of code tuning: we will segment the sieve instead of employing a single monolithic sieve. This allows us to tune the memory footprint of the sieve and keep everything in cache memory, which makes the entire process much faster. The second improvement also comes from the world of code tuning: we will use logarithms base 2 instead of base e, because they keep the entire calculation within the realm of integer arithmetic instead of floating-point arithmetic, and we will define an error factor on the sum of the logarithms based on the size of n.
The third improvement is algorithmic. Sometimes when checking a possible smooth number, a number is found that is almost smooth, with a single large prime factor between f and f2; such relations are called partial relations. If the large primes of two partial relations are equal, they can be multiplied together to form a match, which can then be treated like any other relation in the exponent vector. By the birthday paradox, matches are more common than you might think. Thus, sieving produces two lists, a list of relations and a list of partials, then the partials are sorted on their leading prime and a simple scan through the list of partials identifies the matches. The effect is to find more usable relations with the same amount of sieving.
Your task is to modify the quadratic sieve of the previous exercise to incorporate these changes. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
One Million Hits
March 29, 2013
Programming Praxis passed one-million all-time hits yesterday at 9:22pm GMT (that’s 4:22pm CDT where I live). My thanks to all of you for making this happen.
Your task is to write a program that outputs the number 1000000; be as fun and creative as you can. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Google Code Jam Qualification Round Africa 2010, Revisited
March 26, 2013
We looked at this problem, from the Google Code Jam Qualification Round Africa 2010, in a previous exercise:
Store Credit: You receive a credit C at a local store and would like to buy two items. You first walk through the store and create a list L of all available items. From this list you would like to buy two items that add up to the entire value of the credit. The solution you provide will consist of the two integers indicating the positions of the items in your list (smaller number first). For instance, with C=100 and L={5,75,25} the solution is 2,3; with C=200 and L={150,24,79,50,88,345,3} the solution is 1,4; and with C=8 and L={2,1,9,4,4,56,90,3} the solution is 4,5.
The solution we gave there used two nested loops to look at all combinations and return the first that solved the problems. That takes O(n2) time and O(1) space. But there are other solutions. You could insert each item into some flavor of balanced binary search tree in a first pass, then check the “conjugate” of each item in a second pass; that takes O(n log n) space for the tree plus O(n log n) for each of the two passes. Using a hash table instead of a balanced binary search tree reduces the time and space requirement to O(n), assuming that each hash table lookup is O(1), which is not necessarily true. A fourth option is to sort value/position pairs by increasing value, then use binary search to find matches; that takes O(n log n) for the sort plus O(n log n) for the binary searches, but requires only O(1) space beyond the space used for the input.
Your task is to write the four programs described above; bonus points for finding a solution with some other space/time complexity (is there an exponential algorithm that solves the problem?). When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Jumping Jack
March 22, 2013
We have a fun little programming puzzle today; I found it here, but don’t know the original source of the puzzle:
Jack starts at 0 on the number line and jumps one unit in either direction. Each successive jump he makes is longer than the previous by 1 unit, and can be made in either direction. Write a program that takes a number and returns the minimum number of jumps Jack makes to reach that number.
For example, it takes seven jumps to reach 12, first a jump left to -1, then a jump right to 1, a jump right to 4, a jump right to 8, a jump right to 13, a jump right to 19, and a jump left to 12. To go to -12, simply reverse the direction of each jump.
Your task is to write programs that compute the minimum number of steps Jack must take to reach his target and that compute the actual steps. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Quadratic Sieve
March 19, 2013
One of the most powerful factoring methods is the quadratic sieve, invented as a “thought experiment” by Carl Pomerance in the early 1980s; it is routinely used to factor semi-primes up to a hundred digits, sometimes more. The quadratic sieve was famously used to factor RSA-129, a challenge number given by Martin Gardner in the August 1977 edition of Scientific American; using the methods available at the time, it was estimated that factorization would take forty quadrillion years, but in fact the number was factored in April 1994, revealing the encoded message “The magic words are squeamish ossifrage.”
Today’s exercise presents a basic version of the quadratic sieve; we will present more powerful (and more complicated) variants in future exercises. We will be following the thesis Factoring Integers with the Self-Initializing Quadratic Sieve by Scott Contini, which is something of a bible for implementers of the quadratic sieve; Contini’s thesis is available at http://www.crypto-world.com/documents/contini_siqs.pdf. Today’s exercise is drawn from Figure 2.1 of the thesis.
The quadratic sieve shares its basic structure with other factoring methods such as Dixon’s method and the continued fraction method that we have previously studied (today’s exercise will probably make more sense if you read those exercises first). The idea is that if x2 ≡ y2 (mod n) with x ≢ ± y (mod n), then gcd(x±y, n) are two factors of n; about half the time, the two factors will be 1 and n, but the other half of the time the two factors will be non-trivial. All three methods compute a series of relations of the form x2 ≡ y (mod n) where y is smooth over a relatively small factor base; the factored y are then combined using linear algebra to find x2 ≡ y2 (mod n) and thus perform the factorization. The three methods differ in the way they compute the relations: Dixon’s method computes them at random, the continued fraction method finds them in the convergents of the square root of n, and the quadratic sieve finds them by sieving a polynomial of the form g(x) = (x + b)2 − n, where b is the smallest integer greater than the square root of n. The exponent 2 in function g is the “quadratic” in the name of the method.
Contini gives a description of the basic quadratic sieve algorithm that we adapt here:
Compute startup data: Choose f, the upper bound for factor base primes, and m, which is half the size of the sieve. Let b = ⌈√n⌉ and g(x) = (x + b)2 – n. Determine the factor base primes p < f such that the jacobi symbol (n/p) is 1 indicating that there is a solution to t2 ≡ n (mod p); also include 2 in the factor base. For each factor base prime p, compute t, a modular square root of n (mod p). Then compute and store soln1p = t − b (mod p), soln2p = −t − b (mod p) and lp = ⌊log p⌉ (rounding off) for each factor base prime p.
Perform sieving: Initialize a sieve array to 0’s. For each odd prime p in the factor base, add lp to the locations soln1p + ip and soln2p + ip of the sieve array for i = 0, 1, 2, …. For the prime p = 2, sieve only with soln1p.
Trial division: Scan sieve array for locations x that have accumulated a value of a least log 2 x √n minus a small error term. Trial divide g(x) by the primes in the factor base. If g(x) factors into primes less than f, then save smooth relation. After scanning entire sieve array, if there are more smooth relations than primes in the factor base, then go to linear algebra step. Otherwise, increase f or m or both and return to sieving step.
Linear algebra: Solve the matrix as in Dixon’s method and the continued fraction method. For each null vector, construct relation of form x2 ≡ y2 (mod n) and attempt to factor n by computing gcd(x − y, n). If all null vectors fail to give a non-trivial factorization, increase f or m or both and return to sieving step.
Let’s step back and look at the algorithm from a higher perspective. We are looking for relations of the form y2 ≡ g(x) (mod n) where g(x) is smooth so we can combine the g(x) using linear algebra to find a set of relations for which the g(x) form a perfect square, which will give us an x and y that we can use to compute a factor of n. Sieving the polynomial g(x) is done in the same way as sieving for primes in the Sieve of Eratosthenes; in fact, in an optimized Sieve of Eratosthenes over the odd numbers starting from 3, the sieving is performed over the polynomial 2x+1.
The use of logarithms in the sieving is an optimization; we could equally sieve by the primes themselves if we chose. But if we did that, we would have to sieve by all the prime powers as well as the primes, and we would have to keep track of which primes and prime powers appeared at each item of the sieve, which takes lots of memory and limits the size of the sieve. Using logarithms keeps memory consumption down, and the small error factor allows us to ignore prime powers. Note that the logarithms are rounded to integers, so we don’t need the space to store floats or doubles.
The sieving range is b ± m. Any relations found in the left half of the sieve must have a factor of −1 added to their factorizations, which is then treated as any other prime in the factor base during the linear algebra step.
The only other thing to note is that, if you are clever, when increasing f or m after a failure, you only need to compute the new pieces, assuming you saved the old pieces from sieving step.
A complete, worked example is shown on the next page.
Your task is to write a program to factor integers using the quadratic sieve as described above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.