March 19, 2013
One of the most powerful factoring methods is the quadratic sieve, invented as a “thought experiment” by Carl Pomerance in the early 1980s; it is routinely used to factor semi-primes up to a hundred digits, sometimes more. The quadratic sieve was famously used to factor RSA-129, a challenge number given by Martin Gardner in the August 1977 edition of Scientific American; using the methods available at the time, it was estimated that factorization would take forty quadrillion years, but in fact the number was factored in April 1994, revealing the encoded message “The magic words are squeamish ossifrage.”
Today’s exercise presents a basic version of the quadratic sieve; we will present more powerful (and more complicated) variants in future exercises. We will be following the thesis Factoring Integers with the Self-Initializing Quadratic Sieve by Scott Contini, which is something of a bible for implementers of the quadratic sieve; Contini’s thesis is available at http://www.crypto-world.com/documents/contini_siqs.pdf. Today’s exercise is drawn from Figure 2.1 of the thesis.
The quadratic sieve shares its basic structure with other factoring methods such as Dixon’s method and the continued fraction method that we have previously studied (today’s exercise will probably make more sense if you read those exercises first). The idea is that if x2 ≡ y2 (mod n) with x ≢ ± y (mod n), then gcd(x±y, n) are two factors of n; about half the time, the two factors will be 1 and n, but the other half of the time the two factors will be non-trivial. All three methods compute a series of relations of the form x2 ≡ y (mod n) where y is smooth over a relatively small factor base; the factored y are then combined using linear algebra to find x2 ≡ y2 (mod n) and thus perform the factorization. The three methods differ in the way they compute the relations: Dixon’s method computes them at random, the continued fraction method finds them in the convergents of the square root of n, and the quadratic sieve finds them by sieving a polynomial of the form g(x) = (x + b)2 − n, where b is the smallest integer greater than the square root of n. The exponent 2 in function g is the “quadratic” in the name of the method.
Contini gives a description of the basic quadratic sieve algorithm that we adapt here:
Compute startup data: Choose f, the upper bound for factor base primes, and m, which is half the size of the sieve. Let b = ⌈√n⌉ and g(x) = (x + b)2 – n. Determine the factor base primes p < f such that the jacobi symbol (n/p) is 1 indicating that there is a solution to t2 ≡ n (mod p); also include 2 in the factor base. For each factor base prime p, compute t, a modular square root of n (mod p). Then compute and store soln1p = t − b (mod p), soln2p = −t − b (mod p) and lp = ⌊log p⌉ (rounding off) for each factor base prime p.
Perform sieving: Initialize a sieve array to 0′s. For each odd prime p in the factor base, add lp to the locations soln1p + ip and soln2p + ip of the sieve array for i = 0, 1, 2, …. For the prime p = 2, sieve only with soln1p.
Trial division: Scan sieve array for locations x that have accumulated a value of a least log 2 x √n minus a small error term. Trial divide g(x) by the primes in the factor base. If g(x) factors into primes less than f, then save smooth relation. After scanning entire sieve array, if there are more smooth relations than primes in the factor base, then go to linear algebra step. Otherwise, increase f or m or both and return to sieving step.
Linear algebra: Solve the matrix as in Dixon’s method and the continued fraction method. For each null vector, construct relation of form x2 ≡ y2 (mod n) and attempt to factor n by computing gcd(x − y, n). If all null vectors fail to give a non-trivial factorization, increase f or m or both and return to sieving step.
Let’s step back and look at the algorithm from a higher perspective. We are looking for relations of the form y2 ≡ g(x) (mod n) where g(x) is smooth so we can combine the g(x) using linear algebra to find a set of relations for which the g(x) form a perfect square, which will give us an x and y that we can use to compute a factor of n. Sieving the polynomial g(x) is done in the same way as sieving for primes in the Sieve of Eratosthenes; in fact, in an optimized Sieve of Eratosthenes over the odd numbers starting from 3, the sieving is performed over the polynomial 2x+1.
The use of logarithms in the sieving is an optimization; we could equally sieve by the primes themselves if we chose. But if we did that, we would have to sieve by all the prime powers as well as the primes, and we would have to keep track of which primes and prime powers appeared at each item of the sieve, which takes lots of memory and limits the size of the sieve. Using logarithms keeps memory consumption down, and the small error factor allows us to ignore prime powers. Note that the logarithms are rounded to integers, so we don’t need the space to store floats or doubles.
The sieving range is b ± m. Any relations found in the left half of the sieve must have a factor of −1 added to their factorizations, which is then treated as any other prime in the factor base during the linear algebra step.
The only other thing to note is that, if you are clever, when increasing f or m after a failure, you only need to compute the new pieces, assuming you saved the old pieces from sieving step.
A complete, worked example is shown on the next page.
Your task is to write a program to factor integers using the quadratic sieve as described above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.