Two Sieving Problems
August 27, 2013
Whenever you need to enumerate a large number of primes, it is best to find a way to use a sieve to find them; generating candidates and testing them with a primality checker will always be slower. We use two different sieves to solve the two problems.
For the first problem, it is not possible to sieve by all the primes less than the square root of 1050 + 106; there are so many sieving primes that no one knows how many. What we do instead is sieve to some convenient limit
, then apply a primality checker to those numbers that survive the sieve:
(define (big-primes lo hi delta limit)
(let* ((output (list))
(sieve (make-vector delta #t))
(ps (cdr (primes limit)))
(qs (map (lambda (p) (modulo (* -1/2 (+ lo p 1)) p)) ps)))
(let loop ((lo lo) (qs qs))
(if (not (< lo hi)) (reverse output)
(begin
(do ((i 0 (+ i 1))) ((= i delta)) (vector-set! sieve i #t))
(do ((ps ps (cdr ps)) (qs qs (cdr qs))) ((null? ps))
(do ((j (car qs) (+ j (car ps)))) ((<= delta j))
(vector-set! sieve j #f)))
(do ((i 0 (+ i 1)) (t (+ lo 1) (+ t 2)))
((or (<= delta i) (<= hi t)))
(if (and (vector-ref sieve i) (prime? t))
(set! output (cons t output))))
(loop (+ lo (* 2 delta))
(map (lambda (p q) (modulo (- q delta) p)) ps qs)))))))
This is just a segmented sieve, as in a prior exercise, but with the sieving primes in ps
limited. It is called like this:
> (length (big-primes #e1e50 (+ #e1e50 #e1e6) 25000 1000000))
8737
That took a little bit less than a minute on my old slow machine at home. Only about one second of that minute was spent sieving, the rest of the time was devoted to checking the 40808 candidates that survived the sieve. Increasing limit
to 2000000 reduced the candidate count to 38888, and increasing limit
to 5000000 reduced the candidate count to 36551. Changing from a Miller-Rabin test with k = 10 to a single pseudoprimality test to base 2 left the result unchanged and saved only a little bit of time, proving Henri Cohen’s comment about “industrial-grade primes.”
The second problem uses a segmented sieve that contains only the numbers congruent to 1 (mod 4); thus, instead of sieving 2, 3, 4, 5, … or 3, 5, 7, 9, …, we sieve 5, 9, 13, 17, …. Here delta
is one-quarter the segment length, instead of one-half, and the definition of the qs
changes to reflect the numbers that are in the sieve:
(define (primes1mod4 lo hi delta)
(let* ((output (list))
(sieve (make-vector delta #t))
(ps (cdr (primes (isqrt hi))))
(qs (map (lambda (p) (modulo (* -1 (inverse 4 p) (+ lo p 1)) p)) ps)))
(let loop ((lo lo) (qs qs))
(if (not (< lo hi)) (reverse output)
(begin
(do ((i 0 (+ i 1))) ((= i delta)) (vector-set! sieve i #t))
(do ((ps ps (cdr ps)) (qs qs (cdr qs))) ((null? ps))
(do ((j (car qs) (+ j (car ps)))) ((<= delta j))
(vector-set! sieve j #f)))
(do ((i 0 (+ i 1)) (t (+ lo 1) (+ t 4)))
((or (<= delta i) (<= hi t)))
(if (vector-ref sieve i) (set! output (cons t output))))
(loop (+ lo (* 4 delta))
(map (lambda (p q) (modulo (- q delta) p)) ps qs)))))))
It takes only 70ms to solve the problem:
> (length (primes1mod4 1000000 2000000 25000))
35241
We have a bonus exercise today; it’s not about enumerating primes, but the solution does involve a sieve, so we include it here. The task is to make a list of all numbers less than or equal to n that have k distinct prime factors. For instance, with n = 25 and k = 1, the solution includes both prime numbers (their only factor is themselves) and prime powers (their only factor is the base prime): 2, 3, 4, 5, 7, 8, 9, 11, 13, 16, 17, 19, 23, 25. The solution is on the next page.
My first answers are in J; it’s definitely cheating when your language has a
“prime factors of” function.
Problem 1:
This takes a very long time to get anywhere; the second is much faster:
I originally had simpler answers, but decided that only working over odd
numbers (first question) or numbers equivalent to 1 mod 4 (second question) was
worth the uglification of my code.
My second answers are in Ruby; I can’t imagine trying to do these in a language
like C or C++, even with the help of multiprecision libraries like GMP or NTL.
I went with the Miller-Rabin primality test.
A Python version with segments.