Monte Carlo Factorization

June 19, 2009

We have previously examined two methods of integer factorization, trial division using wheels and Fermat’s method of the differences of squares. In this exercise we examine a probabilistic method of integer factorization that takes only a small, fixed amount of memory and tends to be very fast. This method was developed by John Pollard in 1975 and is commonly called the Pollard rho algorithm.

The algorithm is based on the observation that, for any two random integers x and y that are congruent modulo p, the greatest common divisor of their difference and n, the integer being factored, will be 1 if p and n are coprime (have no factors in common) but will be between 1 and n if p is a factor of n. By the birthday paradox, p will be found with reasonably large probability after trying about $\sqrt{p}$ random pairs.

Pollard’s algorithm uses a function modulo n to generate a pseudo-random sequence. Two copies of the sequence are run, one twice as fast as the other, and their values are saved as x and y. At each step, we calculate gcd(x–y,n). If the greatest common divisor is one, we loop, since the two values are coprime. If the greatest common divisor is n, then the values of the two sequences have become equal and Pollard’s algorithm fails, since the sequences have fallen into a cycle, which is detected by Floyd’s tortoise-and-hare cycle-finding algorithm; that’s why we have two copies of the sequence, one (the “hare”) running twice as fast as the other (the “tortoise”). But if the greatest common divisor is between 1 and n, we have found a factor of n.

Failure doesn’t mean failure. It just means that the particular pseudo-random sequence that we chose doesn’t lead to success. Our response to failure is to try another sequence. We use the function x² + c (mod n), where c is initially 1. If Pollard’s algorithm fails, we increase c to 2, then 3, and so on. If we keep increasing c, we will eventually find a factor, though it may take a long time if n is large.

Your task is to implement Pollard’s factorization algorithm. You can test it by calculating the factors of the 98th Mersenne number, 2⁹⁸ – 1. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

5 Comments »

5 Responses to “Monte Carlo Factorization”

Programming Praxis – Monte Carlo factorization « Bonsai Code said
June 19, 2009 at 1:00 PM
[…] Praxis – Monte Carlo factorization By Remco Niemeijer In today’s Programming Praxis problem we have to implement John Pollard’s factorization algorithm. Our […]
Remco Niemeijer said
June 19, 2009 at 1:01 PM
My Haskell solution (see http://bonsaicode.wordpress.com/2009/06/19/programming-praxis-monte-carlo-factorization/ for a version with comments):

import Control.Arrow
import Data.Bits
import Data.List
import System.Random

isPrime :: Integer -> StdGen -> Bool
isPrime n g =
let (s, d) = (length *** head) . span even $ iterate (`div` 2) (n-1)
xs = map (expm n d) . take 50 $ randomRs (2, n – 2) g
in all (\x -> elem x [1, n – 1] ||
any (== n-1) (take s $ iterate (expm n 2) x)) xs

expm :: Integer -> Integer -> Integer -> Integer
expm m e b = foldl’ (\r (b’, _) -> mod (r * b’) m) 1 .
filter (flip testBit 0 . snd) .
zip (iterate (flip mod m . (^ 2)) b) $
takeWhile (> 0) $ iterate (`shiftR` 1) e

factor :: Integer -> Integer -> Integer
factor c n = factor’ 2 2 1 where
f x = mod (x * x + c) n
factor’ x y 1 = factor’ x’ y’ (gcd (x’ – y’) n) where
(x’, y’) = (f x, f $ f y)
factor’ _ _ d = if d == n then factor (c + 1) n else d

factors :: Integer -> StdGen -> [Integer]
factors n g = sort $ fs n where
fs x | even x = 2 : fs (div x 2)
| isPrime x g = [x]
| otherwise = f : fs (div x f) where f = factor 1 x

main :: IO ()
main = print . factors (2^98 – 1) =<< getStdGen [/sourcecode]

Mark VandeWettering said

July 1, 2009 at 4:38 PM

Here’s my attempt in Python. A couple of issues in the code remain. The factors that it discovers aren’t guaranteed to be prime. I cribbed the Miller-Rabin test from one of the python code repositories. And, I don’t really understand exactly how this works. :-) Back to the reference books.

#!/usr/bin/env python

#
# a basic implementation of the Pollard rho factorization
# Written by Mark VandeWettering <markv@pixar.com>
#

import sys
import locale
import random

class FactorError(Exception):
    def __init__(self, value):
        self.value = value
    def __str__(self):
        return repr(self.value)

def miller_rabin_pass(a, n):
    d = n - 1
    s = 0
    while d % 2 == 0:
        d >>= 1
        s += 1

    a_to_power = pow(a, d, n)
    if a_to_power == 1:
        return True
    for i in xrange(s-1):
        if a_to_power == n - 1:
            return True
        a_to_power = (a_to_power * a_to_power) % n
    return a_to_power == n - 1

def isprime(n):
    for repeat in xrange(20):
        a = 0
        while a == 0:
            a = random.randrange(n)
        if not miller_rabin_pass(a, n):
            return False
    return True

def gcd(a, b):
    while b != 0:
        a, b = b, a%b
    return a

def findfactor(n):
    for c in range(1, 50):
        x = y = random.randint(1, n-1)
        x = (x * x + c) % n
        y = (y * y + c) % n
        y = (y * y + c) % n
        while True:
            t = gcd(n, abs(x-y))
            if t == 1:
                x = (x * x + c) % n
                y = (y * y + c) % n
                y = (y * y + c) % n
            elif t == n:
                break
            else:
                return t
    raise FactorError("couldn't find a factor.")

def factor(n):
    r = []
    while True:
        if isprime(n):
            r.append(n)
            break
        try:
            f = findfactor(n)
            r.append(f)
            n = n / f
        except FactorError, msg:
            r.append(n)
            break
    r.sort()
    return r

def doit(n):
        flist = factor(n)
        print locale.format("%d", n, True), "="
        for f in flist:
                print "\t%s" % locale.format("%d", f, True)

locale.setlocale(locale.LC_ALL, "")

doit(2**98-1)

Mark VandeWettering said

July 1, 2009 at 9:48 PM

Okay, I fixed a couple of things, and extended the program a tiny bit. It now is a numeric calculator of sorts. It’s not industrial strength or anything, but you can basically type any python numeric expression, and it will use eval() (at least with a predefined environment) to evaluate the number. I’ve also predefined a couple of built in functions. prime(n) will return an n digit prime. rsa(n) will return an rsa key which is the combination of two n/2 digit primes. factor(n) factors n. I’ve also added code to do some trial division as well, to get rid of small factors, and it collapses multiple occurrences of a factor (instead of printing 128 copies of 2 when factoring 2^128, it outputs “2**128”).

#!/usr/bin/env python

#
# a basic implementation of the Pollard rho factorization
# Written by Mark VandeWettering <markv@pixar.com>
#

import sys
import locale
import random
try:
    import readline
except ImportError, msg:
    print msg
    print "Line editing disabled."


# an inefficient but straightforward way to find primes...

def primes(n):
    primes = [2]
    for x in range(3, n, 2):
        prime = True
        for p in primes:
            if p * p > n:
                break
            if x % p == 0:
                # it's composite..
                prime = False
                break
        if prime:
            primes.append(x)
    return primes

class FactorError(Exception):
    def __init__(self, value):
        self.value = value
    def __str__(self):
        return repr(self.value)

def miller_rabin_pass(a, n):
    d = n - 1
    s = 0
    while d % 2 == 0:
        d >>= 1
        s += 1

    a_to_power = pow(a, d, n)
    if a_to_power == 1:
        return True
    for i in xrange(s-1):
        if a_to_power == n - 1:
            return True
        a_to_power = (a_to_power * a_to_power) % n
    return a_to_power == n - 1

def isprime(n):
    for repeat in xrange(20):
        a = 0
        while a == 0:
            a = random.randrange(n)
        if not miller_rabin_pass(a, n):
            return False
    return True

def gcd(a, b):
    while b != 0:
        a, b = b, a%b
    return a

def findfactor(n):
    for c in range(1, 10):
        x = y = random.randint(1, n-1)
        x = (x * x + c) % n
        y = (y * y + c) % n
        y = (y * y + c) % n
        while True:
            t = gcd(n, abs(x-y))
            if t == 1:
                x = (x * x + c) % n
                y = (y * y + c) % n
                y = (y * y + c) % n
            elif t == n:
                break
            else:
                return t
    raise FactorError("couldn't find a factor.")

def factor(n):
    r = []
    for p in primes(10000):
        while n % p == 0:
            r.append(p)
            n = n / p
    if n == 1:
        return r
    while True:
        if isprime(n):
            r.append(n)
            break
        try:
            f = findfactor(n)
            r.append(f)
            n = n / f
        except FactorError, msg:
            r.append(n)
            break
    r.sort()
    return r

# this function would be easier to write recursively, but
# python isn't good at tail recursion, so in theory, it could
# fail.  Too bad.

def shorten(flist):
    slist = []
    idx = 0
    while flist[idx:] != []:
        hd = flist[idx]
        idx = idx + 1
        exp = 1
        while flist[idx:] != [] and flist[idx] == hd:
            exp = exp + 1
            idx = idx + 1
        if exp > 1:
                slist.append(locale.format("%d", hd, True) + "**"+str(exp))
        else:
                slist.append(locale.format("%d", hd, True))
    return slist

def factorit(n):
        flist = factor(n)
        print locale.format("%d", n, True), "="
        for f in shorten(flist):
                print "\t%s" % f

locale.setlocale(locale.LC_ALL, "")

import string

def prime(n):
        "generate an n bit prime"
        while True:
                x = int(''.join([random.choice(string.digits) for i in range(n)]))
                if isprime(x):
                        return x

def rsapair(n):
        return prime(n//2)*prime(n//2)

env = { "prime" : prime,
        "rsa" : rsapair,
        "factor" : factorit}

while True:
        try:
                num = raw_input("enter a python numexpr >> ")
                n = eval(num, env)
                if n:
                        print n
        except NameError, msg:
                print >> sys.stderr, msg
        except SyntaxError, msg:
                print >> sys.stderr, msg
        except KeyboardInterrupt, msg:
                print >> sys.stderr, "**Interrupted**"
                continue
        except EOFError, msg:
                print
                break

programmingpraxis said
July 1, 2009 at 11:59 PM
Instead of eval, you might want to build your own calculator. See the very first Programming Praxis exercise for an RPN calculator.

S	M	T	W	T	F	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

Programming Praxis