Pollard Rho, Revisited
January 21, 2011
We begin with the code that is common to both the Floyd and Brent variants. Since either may return a non-prime factor, we have to test each return value for primality and call the algorithm on the reduced factor when it is composite. The exercise suggested using trial division to find small factors, but we will use a 2,3,5-wheel instead, because that is nearly as simple but twice as fast as trial division by two and the odd numbers:
(define (wheel-factors n limit)
(define (wheel . xs) (set-cdr! (last-pair xs) xs) xs)
(let loop ((n n) (f 2) (fs '())
(ws (cons* 1 2 2 (wheel 4 2 4 2 4 6 2 6))))
(cond ((< limit f) (values n (reverse fs)))
((or (= n 1) (< n (* f f))) (values 1 (reverse (cons n fs))))
((zero? (modulo n f)) (loop (/ n f) f (cons f fs) ws))
(else (loop n (+ f (car ws)) fs (cdr ws))))))
Our framework takes an argument that specifies the variant and calls it as necessary until all the composites have been reduced to their prime factors. It makes sense to keep the limit of the wheel factorization small, since Pollard’s method is more powerful than the wheel method:
(define (pollard factor n)
(let-values (((n fs) (wheel-factors n 1000)))
(sort < (let fact ((n n) (fs fs))
(cond ((= n 1) fs)
((prime? n) (cons n fs))
(else (let ((f (factor n 1 100000000)))
(append fs (fact f '()) (fact (/ n f) '())))))))))
Here is the rho algorithm using Floyd’s cycle-finding algorithm. T
and h
are the tortoise and hare, and x
and y
are the saved values of t
and h
at the last short-circuit. Loop1
is the normal rho algorithm, with j counting the steps to limit
; loop2
replays the individual steps of the last short-circuit when the product equals n, with k counting the steps. Function g
accumulates the product.
(define (floyd n c limit)
(define (f x) (modulo (+ (* x x) c) n))
(define (g p t h) (modulo (* p (abs (- t h))) n))
(let loop1 ((j 1) (t 2) (h (f 2)) (x 2) (y (f 2)) (p 1))
(if (= j limit) (error 'floyd "timeout")
(if (= t h) (floyd n (+ c 1) (- limit j))
(if (positive? (modulo j 100)) (loop1 (+ j 1) (f t) (f (f h)) x y (g p t h))
(let ((d (gcd p n)))
(if (= d 1) (loop1 (+ j 1) (f t) (f (f h)) t h 1)
(if (< 1 d n) d
(let loop2 ((k 1) (x x) (y y) (d (gcd (- x y) n)))
(if (= k 100) (floyd n (+ c 1) (- limit j))
(if (= d 1) (loop2 (+ k 1) (f x) (f (f y)) (gcd (- x y) n))
(if (= d n) (floyd n (+ c 1) (- limit j))
d))))))))))))
Brent's variant uses the same sequence as Floyd's and keeps three values of the sequence: x is the running value, y is the value at the last gcd calculation, and z is the value at the last power of two. The main loop also defines j as the step counter, q as the next power of two, and p as the current product for the short-circuit gcd calculation:
(define (brent n c limit)
(define (f y) (modulo (+ (* y y) c) n))
(define (g p x y) (modulo (* p (abs (- x y))) n))
(let loop1 ((x 2) (y (+ 4 c)) (z (+ 4 c)) (j 1) (q 2) (p 1))
(if (= j limit) (error 'brent "timeout")
(if (= x y) (brent n (+ c 1) (- limit j)) ; cycle
(if (= j q) (let ((t (f y))) (loop1 y (f y) z (+ j 1) (* q 2) (g p y t)))
(if (positive? (modulo j 100)) (loop1 x (f y) z (+ j 1) q (g p x y))
(let ((d (gcd p n)))
(if (= d 1) (loop1 x (f y) y (+ j 1) q (g p x y))
(if (< 1 d n) d ; factor
(let loop2 ((k 1) (z (f z)))
(if (= k 100) (brent n (+ c 1) (- limit j))
(let ((d (gcd (- z x) n)))
(if (= d 1) (loop2 (+ k 1) (f z))
(if (= d n) (brent n (+ c 1) (- limit j))
d))))))))))))))
We factor the fibonacci numbers to F100 to compare timings of the two variants. Here is the timing harness:
(define (fib n)
(define (square x) (* x x))
(cond ((= n 0) 0) ((= n 1) 1) ((= n 2) 1)
((odd? n) (let ((n2 (+ (quotient n 2) 1)))
(+ (square (fib n2)) (square (fib (- n2 1))))))
(else (let ((n2 (quotient n 2)))
(* (fib n2) (+ (* (fib (- n2 1)) 2) (fib n2)))))))
(time (do ((i 3 (+ i 1))) ((< 100 i))
(display i) (display " ")
(display (pollard floyd (fib i)))
(newline)))
(time (do ((i 3 (+ i 1))) ((< 100 i))
(display i) (display " ")
(display (pollard brent (fib i)))
(newline)))
The results depend on the Scheme interpreter and the underlying machine. In various tests on three different machines and three different Scheme interpreters, there was no clear winner between Floyd’s algorithm and Brent’s algorithm, so our experience differs from the theory that Brent’s algorithm should be about 25% faster. The test run at http://programmingpraxis.codepad.org/WnqrjcxX shows that Floyd’s algorithm takes 280ms while Brent’s algorithm is only 180ms.
We used prime?
from a previous exercise, cons*
and sort
from the Standard Prelude, and had to write last-pair
, which is standard everywhere except, apparently, at codepad.org
.