Monte Carlo Factorization

June 19, 2009

We have previously examined two methods of integer factorization, trial division using wheels and Fermat’s method of the differences of squares. In this exercise we examine a probabilistic method of integer factorization that takes only a small, fixed amount of memory and tends to be very fast. This method was developed by John Pollard in 1975 and is commonly called the Pollard rho algorithm.

The algorithm is based on the observation that, for any two random integers x and y that are congruent modulo p, the greatest common divisor of their difference and n, the integer being factored, will be 1 if p and n are coprime (have no factors in common) but will be between 1 and n if p is a factor of n. By the birthday paradox, p will be found with reasonably large probability after trying about \sqrt{p} random pairs.

Pollard’s algorithm uses a function modulo n to generate a pseudo-random sequence. Two copies of the sequence are run, one twice as fast as the other, and their values are saved as x and y. At each step, we calculate gcd(xy,n). If the greatest common divisor is one, we loop, since the two values are coprime. If the greatest common divisor is n, then the values of the two sequences have become equal and Pollard’s algorithm fails, since the sequences have fallen into a cycle, which is detected by Floyd’s tortoise-and-hare cycle-finding algorithm; that’s why we have two copies of the sequence, one (the “hare”) running twice as fast as the other (the “tortoise”). But if the greatest common divisor is between 1 and n, we have found a factor of n.

Failure doesn’t mean failure. It just means that the particular pseudo-random sequence that we chose doesn’t lead to success. Our response to failure is to try another sequence. We use the function x² + c (mod n), where c is initially 1. If Pollard’s algorithm fails, we increase c to 2, then 3, and so on. If we keep increasing c, we will eventually find a factor, though it may take a long time if n is large.

Your task is to implement Pollard’s factorization algorithm. You can test it by calculating the factors of the 98th Mersenne number, 298 – 1. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Advertisement

Pages: 1 2

5 Responses to “Monte Carlo Factorization”

  1. […] Praxis – Monte Carlo factorization By Remco Niemeijer In today’s Programming Praxis problem we have to implement John Pollard’s factorization algorithm. Our […]

  2. Remco Niemeijer said

    My Haskell solution (see http://bonsaicode.wordpress.com/2009/06/19/programming-praxis-monte-carlo-factorization/ for a version with comments):

    import Control.Arrow
    import Data.Bits
    import Data.List
    import System.Random

    isPrime :: Integer -> StdGen -> Bool
    isPrime n g =
    let (s, d) = (length *** head) . span even $ iterate (`div` 2) (n-1)
    xs = map (expm n d) . take 50 $ randomRs (2, n – 2) g
    in all (\x -> elem x [1, n – 1] ||
    any (== n-1) (take s $ iterate (expm n 2) x)) xs

    expm :: Integer -> Integer -> Integer -> Integer
    expm m e b = foldl’ (\r (b’, _) -> mod (r * b’) m) 1 .
    filter (flip testBit 0 . snd) .
    zip (iterate (flip mod m . (^ 2)) b) $
    takeWhile (> 0) $ iterate (`shiftR` 1) e

    factor :: Integer -> Integer -> Integer
    factor c n = factor’ 2 2 1 where
    f x = mod (x * x + c) n
    factor’ x y 1 = factor’ x’ y’ (gcd (x’ – y’) n) where
    (x’, y’) = (f x, f $ f y)
    factor’ _ _ d = if d == n then factor (c + 1) n else d

    factors :: Integer -> StdGen -> [Integer]
    factors n g = sort $ fs n where
    fs x | even x = 2 : fs (div x 2)
    | isPrime x g = [x]
    | otherwise = f : fs (div x f) where f = factor 1 x

    main :: IO ()
    main = print . factors (2^98 – 1) =<< getStdGen [/sourcecode]

  3. Here’s my attempt in Python. A couple of issues in the code remain. The factors that it discovers aren’t guaranteed to be prime. I cribbed the Miller-Rabin test from one of the python code repositories. And, I don’t really understand exactly how this works. :-) Back to the reference books.

    #!/usr/bin/env python
    
    #
    # a basic implementation of the Pollard rho factorization
    # Written by Mark VandeWettering <markv@pixar.com>
    #
    
    import sys
    import locale
    import random
    
    class FactorError(Exception):
        def __init__(self, value):
            self.value = value
        def __str__(self):
            return repr(self.value)
    
    def miller_rabin_pass(a, n):
        d = n - 1
        s = 0
        while d % 2 == 0:
            d >>= 1
            s += 1
    
        a_to_power = pow(a, d, n)
        if a_to_power == 1:
            return True
        for i in xrange(s-1):
            if a_to_power == n - 1:
                return True
            a_to_power = (a_to_power * a_to_power) % n
        return a_to_power == n - 1
    
    def isprime(n):
        for repeat in xrange(20):
            a = 0
            while a == 0:
                a = random.randrange(n)
            if not miller_rabin_pass(a, n):
                return False
        return True
    
    def gcd(a, b):
        while b != 0:
            a, b = b, a%b
        return a
    
    def findfactor(n):
        for c in range(1, 50):
            x = y = random.randint(1, n-1)
            x = (x * x + c) % n
            y = (y * y + c) % n
            y = (y * y + c) % n
            while True:
                t = gcd(n, abs(x-y))
                if t == 1:
                    x = (x * x + c) % n
                    y = (y * y + c) % n
                    y = (y * y + c) % n
                elif t == n:
                    break
                else:
                    return t
        raise FactorError("couldn't find a factor.")
    
    def factor(n):
        r = []
        while True:
            if isprime(n):
                r.append(n)
                break
            try:
                f = findfactor(n)
                r.append(f)
                n = n / f
            except FactorError, msg:
                r.append(n)
                break
        r.sort()
        return r
    
    def doit(n):
            flist = factor(n)
            print locale.format("%d", n, True), "="
            for f in flist:
                    print "\t%s" % locale.format("%d", f, True)
    
    locale.setlocale(locale.LC_ALL, "")
    
    doit(2**98-1)
    
    
  4. Okay, I fixed a couple of things, and extended the program a tiny bit. It now is a numeric calculator of sorts. It’s not industrial strength or anything, but you can basically type any python numeric expression, and it will use eval() (at least with a predefined environment) to evaluate the number. I’ve also predefined a couple of built in functions. prime(n) will return an n digit prime. rsa(n) will return an rsa key which is the combination of two n/2 digit primes. factor(n) factors n. I’ve also added code to do some trial division as well, to get rid of small factors, and it collapses multiple occurrences of a factor (instead of printing 128 copies of 2 when factoring 2^128, it outputs “2**128”).

    #!/usr/bin/env python
    
    #
    # a basic implementation of the Pollard rho factorization
    # Written by Mark VandeWettering <markv@pixar.com>
    #
    
    import sys
    import locale
    import random
    try:
        import readline
    except ImportError, msg:
        print msg
        print "Line editing disabled."
    
    
    # an inefficient but straightforward way to find primes...
    
    def primes(n):
        primes = [2]
        for x in range(3, n, 2):
            prime = True
            for p in primes:
                if p * p > n:
                    break
                if x % p == 0:
                    # it's composite..
                    prime = False
                    break
            if prime:
                primes.append(x)
        return primes
    
    class FactorError(Exception):
        def __init__(self, value):
            self.value = value
        def __str__(self):
            return repr(self.value)
    
    def miller_rabin_pass(a, n):
        d = n - 1
        s = 0
        while d % 2 == 0:
            d >>= 1
            s += 1
    
        a_to_power = pow(a, d, n)
        if a_to_power == 1:
            return True
        for i in xrange(s-1):
            if a_to_power == n - 1:
                return True
            a_to_power = (a_to_power * a_to_power) % n
        return a_to_power == n - 1
    
    def isprime(n):
        for repeat in xrange(20):
            a = 0
            while a == 0:
                a = random.randrange(n)
            if not miller_rabin_pass(a, n):
                return False
        return True
    
    def gcd(a, b):
        while b != 0:
            a, b = b, a%b
        return a
    
    def findfactor(n):
        for c in range(1, 10):
            x = y = random.randint(1, n-1)
            x = (x * x + c) % n
            y = (y * y + c) % n
            y = (y * y + c) % n
            while True:
                t = gcd(n, abs(x-y))
                if t == 1:
                    x = (x * x + c) % n
                    y = (y * y + c) % n
                    y = (y * y + c) % n
                elif t == n:
                    break
                else:
                    return t
        raise FactorError("couldn't find a factor.")
    
    def factor(n):
        r = []
        for p in primes(10000):
            while n % p == 0:
                r.append(p)
                n = n / p
        if n == 1:
            return r
        while True:
            if isprime(n):
                r.append(n)
                break
            try:
                f = findfactor(n)
                r.append(f)
                n = n / f
            except FactorError, msg:
                r.append(n)
                break
        r.sort()
        return r
    
    # this function would be easier to write recursively, but
    # python isn't good at tail recursion, so in theory, it could
    # fail.  Too bad.
    
    def shorten(flist):
        slist = []
        idx = 0
        while flist[idx:] != []:
            hd = flist[idx]
            idx = idx + 1
            exp = 1
            while flist[idx:] != [] and flist[idx] == hd:
                exp = exp + 1
                idx = idx + 1
            if exp > 1:
                    slist.append(locale.format("%d", hd, True) + "**"+str(exp))
            else:
                    slist.append(locale.format("%d", hd, True))
        return slist
    
    def factorit(n):
            flist = factor(n)
            print locale.format("%d", n, True), "="
            for f in shorten(flist):
                    print "\t%s" % f
    
    locale.setlocale(locale.LC_ALL, "")
    
    import string
    
    def prime(n):
            "generate an n bit prime"
            while True:
                    x = int(''.join([random.choice(string.digits) for i in range(n)]))
                    if isprime(x):
                            return x
    
    def rsapair(n):
            return prime(n//2)*prime(n//2)
    
    env = { "prime" : prime,
            "rsa" : rsapair,
            "factor" : factorit}
    
    while True:
            try:
                    num = raw_input("enter a python numexpr >> ")
                    n = eval(num, env)
                    if n:
                            print n
            except NameError, msg:
                    print >> sys.stderr, msg
            except SyntaxError, msg:
                    print >> sys.stderr, msg
            except KeyboardInterrupt, msg:
                    print >> sys.stderr, "**Interrupted**"
                    continue
            except EOFError, msg:
                    print
                    break
    
  5. programmingpraxis said

    Instead of eval, you might want to build your own calculator. See the very first Programming Praxis exercise for an RPN calculator.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: