Pollard’s P-1 Factorization Algorithm, Revisited

September 16, 2011

We have studied John Pollard’s p−1 algorithm for integer factorization on two previous occasions, giving first the basic single-stage algorithm and later adding a second stage. In today’s exercise we look at a somewhat different version of the second stage, known as the improved standard continuation, that greatly improves the speed of the algorithm. The trick is to remove the modular exponentiations in the second stage, replacing them with modular multiplications, which are obviously much faster. Let’s begin with an example, computing the factorization of 16309 with B₁=10 and B₂=50:

For the first stage we will need the primes less than B₁, which are 2, 3, 5 and 7. We will also need to know the integer logarithms of B₁ with respect to each of those primes, which are 3, 2, 1, 1; that is, 2³=8 is the largest power of 2 less than 10, 3²=9 is the largest power of 3 less than 10, 5¹=5 is the largest power of 5 less than 10, and 7¹=7 is the largest power of 7 less than 10. Q is initially 2. For each prime we calculate q = q^{p^a} where p is the prime and a is the integer logarithm. Thus, q = 2⁸ mod 16309 = 256 after prime 2, q = 256⁹ mod 16309 = 7011 after prime 3, q = 7011⁵ mod 16309 = 4239 after prime 5, and q = 4239⁷ mod 16309 = 9884 after prime 7. We calculated the greatest common divisor of q−1 and n at each step, and in each case the gcd was 1 (for instance, at the last step gcd(9883, 16309) = 1) and the first stage failed to split a factor.

For the second stage we will need the next prime larger than B₁, which is 11, and the differences between the primes from there to B₂, which are 2, 4, 2, 4, 6, 2, 6, 4, 2 and 4, representing the primes 13, 17, 19, 23, 29, 31, 37, 41, 43 and 47. We also need the ending q from the first stage and the modular exponentiation q^d mod n for each of the differences 2, 4 and 6, which are 9884² mod 16309 = 2546, 9884⁴ mod 16309 = 7443 and 9884⁶ mod 16309 = 15129, respectively.

With this setup, the second stage is very fast. Q is initialized with the next prime, 11, so q = 9884¹¹ mod 16309 = 13427, and p = 1. Now we iterate over the differences. The first difference is 2, representing the prime 13, and the modular multiplication is q = 13427 × 2546 mod 16309 = 1478 with p = 1 × 147 mod 16309 = 1477. The second difference is 4, representing the prime 17, and q = 1478 × 7443 mod 16309 = 8488 with p = 1477 × 8487 mod 16309 = 9987. The third difference is 2, representing the prime 19, and q = 8488 × 2546 mod 16309 = 1023 with p = 9987 × 1022 mod 16309 = 13589. The fourth difference is 4, representing the prime 23, and q = 1023 × 7443 mod 16309 = 14195 with p = 13589 × 14194 mod 16309 = 12032. We’ve been taking the gcd(p, n) at each step, and they have all been 1, but here gcd(12032, 16309) = 47, which is a factor of 16309. The complete factorization is 16309 = 47 × 347. We can summarize the calculations like this:

B1=10, B2=50 ps = 2, 3, 5, 7, 11 as = 3, 2, 1, 1 ds = 2, 4, 2, 4, 6, 2, 6, 4, 2, 4


First Stage

init: q=2

2^3: 2^8 mod 16309 = 256

3^2: 256^9 mod 16309 = 7011

5^1; 7011^5 mod 16309 = 4239

7^1; 4239^7 mod 16309 = 9884

gcd(9883, 16309) = 1
Second Stage

9884^2 mod 16309 = 2546

9884^4 mod 16309 = 7443

9884^6 mod 16309 = 15129

init: q = 9884^11 mod 16309 = 13427, p = 1 2(13): q = 13427 * 2546 mod 16309 = 1478, p = 1 * 1477 mod 16309 = 1477 4(17): q = 1478 * 7443 mod 16309 = 8488, p = 1477 * 8487 mod 16309 = 9987 2(19): q = 8488 * 2546 mod 16309 = 1023, p = 9987 * 1022 mod 16309 = 13589 4(23): q = 1023 * 7443 mod 16309 = 14195, p = 13589 * 14194 mod 16309 = 12032 gcd(12032, 16309) = 47

You may recall from the earlier exercises that Pollard’s p−1 method is based on Fermat’s little theorem a^p ≡ a (mod p), which can equivalently be stated a^p−1 ≡ 1 (mod p). As a consequence, if p−1 divides M, then p divides gcd(2^M−1, n). Pollard’s p−1 method computes M as the least common multiple of a bound B; thus, if all the factors of p−1 are less than B, then gcd(2^lcm[1..B] − 1, n) is a factor of n.

In the two-stage version of Pollard’s p−1 method, also called the large-prime version, there are two bounds, B₁ and B₂. The first stage calculates the gcd as described above. The second stage then computes gcd(2^{k · lcm[1..B₁]} − 1, n) for each prime k such that B₁ < k < B₂. This is useful because most numbers factor as the product of several small factors and a large factor; numbers that factor as the product of two large primes are relatively rare.

We saw in the example how the least common multiple of the integers less than a bound B can be computed economically using powers of the prime numbers less than the bound. The large-prime variant can be computed economically by a trick. At the end of the first stage we have computed q = 2^lcm[1..B₁] mod n; that was 9884 in the example above. Then we store a table of the differences of the primes from B₁ to B₂, and for each difference d compute q^d; since these differences are small and few in number, the table is small and quick to compute. Then we can compute each 2^{k · lcm[1..B₁]} by successively multiplying each q by the successive differences in the primes, thus converting a costly modular exponentiation to a much simpler modular multiplication. Finally, at each step we subtract 1 and compute the gcd, stopping when we find a factor.

In addition to the modular expontiations, the other large time cost of the algorithm is the computation of the greatest common divisor, and there is another trick that reduces that cost. Instead of taking the greatest common divisor after each step in either stage, take it only periodically, say after every hundred steps. If the gcd at that point is between 1 and n, then you have your factor and you are finished. If the gcd at that point is 1, then all of the intermediate gcds would also have been 1, but none of them needed to be calculated. Finally, if the gcd at that point is n, there are two factors since the last gcd calculation, so backtrack to the prior gcd and recompute the steps, taking the gcd at each step.

Your task is to write a function that finds factors using the p−1 algorithm, using both tricks described above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

Programming Praxis