Monte Carlo Factorization
June 19, 2009
We have previously examined two methods of integer factorization, trial division using wheels and Fermat’s method of the differences of squares. In this exercise we examine a probabilistic method of integer factorization that takes only a small, fixed amount of memory and tends to be very fast. This method was developed by John Pollard in 1975 and is commonly called the Pollard rho algorithm.
The algorithm is based on the observation that, for any two random integers x and y that are congruent modulo p, the greatest common divisor of their difference and n, the integer being factored, will be 1 if p and n are coprime (have no factors in common) but will be between 1 and n if p is a factor of n. By the birthday paradox, p will be found with reasonably large probability after trying about random pairs.
Pollard’s algorithm uses a function modulo n to generate a pseudo-random sequence. Two copies of the sequence are run, one twice as fast as the other, and their values are saved as x and y. At each step, we calculate gcd(x–y,n). If the greatest common divisor is one, we loop, since the two values are coprime. If the greatest common divisor is n, then the values of the two sequences have become equal and Pollard’s algorithm fails, since the sequences have fallen into a cycle, which is detected by Floyd’s tortoise-and-hare cycle-finding algorithm; that’s why we have two copies of the sequence, one (the “hare”) running twice as fast as the other (the “tortoise”). But if the greatest common divisor is between 1 and n, we have found a factor of n.
Failure doesn’t mean failure. It just means that the particular pseudo-random sequence that we chose doesn’t lead to success. Our response to failure is to try another sequence. We use the function x² + c (mod n), where c is initially 1. If Pollard’s algorithm fails, we increase c to 2, then 3, and so on. If we keep increasing c, we will eventually find a factor, though it may take a long time if n is large.
Your task is to implement Pollard’s factorization algorithm. You can test it by calculating the factors of the 98th Mersenne number, 298 – 1. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
[…] Praxis – Monte Carlo factorization By Remco Niemeijer In today’s Programming Praxis problem we have to implement John Pollard’s factorization algorithm. Our […]
My Haskell solution (see http://bonsaicode.wordpress.com/2009/06/19/programming-praxis-monte-carlo-factorization/ for a version with comments):
import Control.Arrow
import Data.Bits
import Data.List
import System.Random
isPrime :: Integer -> StdGen -> Bool
isPrime n g =
let (s, d) = (length *** head) . span even $ iterate (`div` 2) (n-1)
xs = map (expm n d) . take 50 $ randomRs (2, n – 2) g
in all (\x -> elem x [1, n – 1] ||
any (== n-1) (take s $ iterate (expm n 2) x)) xs
expm :: Integer -> Integer -> Integer -> Integer
expm m e b = foldl’ (\r (b’, _) -> mod (r * b’) m) 1 .
filter (flip testBit 0 . snd) .
zip (iterate (flip mod m . (^ 2)) b) $
takeWhile (> 0) $ iterate (`shiftR` 1) e
factor :: Integer -> Integer -> Integer
factor c n = factor’ 2 2 1 where
f x = mod (x * x + c) n
factor’ x y 1 = factor’ x’ y’ (gcd (x’ – y’) n) where
(x’, y’) = (f x, f $ f y)
factor’ _ _ d = if d == n then factor (c + 1) n else d
factors :: Integer -> StdGen -> [Integer]
factors n g = sort $ fs n where
fs x | even x = 2 : fs (div x 2)
| isPrime x g = [x]
| otherwise = f : fs (div x f) where f = factor 1 x
main :: IO ()
main = print . factors (2^98 – 1) =<< getStdGen [/sourcecode]
Here’s my attempt in Python. A couple of issues in the code remain. The factors that it discovers aren’t guaranteed to be prime. I cribbed the Miller-Rabin test from one of the python code repositories. And, I don’t really understand exactly how this works. :-) Back to the reference books.
Okay, I fixed a couple of things, and extended the program a tiny bit. It now is a numeric calculator of sorts. It’s not industrial strength or anything, but you can basically type any python numeric expression, and it will use eval() (at least with a predefined environment) to evaluate the number. I’ve also predefined a couple of built in functions. prime(n) will return an n digit prime. rsa(n) will return an rsa key which is the combination of two n/2 digit primes. factor(n) factors n. I’ve also added code to do some trial division as well, to get rid of small factors, and it collapses multiple occurrences of a factor (instead of printing 128 copies of 2 when factoring 2^128, it outputs “2**128”).
Instead of eval, you might want to build your own calculator. See the very first Programming Praxis exercise for an RPN calculator.