Two Sieving Problems

August 27, 2013

Whenever you need to enumerate a large number of primes, it is best to find a way to use a sieve to find them; generating candidates and testing them with a primality checker will always be slower. We use two different sieves to solve the two problems.

For the first problem, it is not possible to sieve by all the primes less than the square root of 10⁵⁰ + 10⁶; there are so many sieving primes that no one knows how many. What we do instead is sieve to some convenient limit, then apply a primality checker to those numbers that survive the sieve:

(define (big-primes lo hi delta limit) (let* ((output (list)) (sieve (make-vector delta #t)) (ps (cdr (primes limit))) (qs (map (lambda (p) (modulo (* -1/2 (+ lo p 1)) p)) ps))) (let loop ((lo lo) (qs qs)) (if (not (< lo hi)) (reverse output) (begin (do ((i 0 (+ i 1))) ((= i delta)) (vector-set! sieve i #t)) (do ((ps ps (cdr ps)) (qs qs (cdr qs))) ((null? ps)) (do ((j (car qs) (+ j (car ps)))) ((<= delta j)) (vector-set! sieve j #f))) (do ((i 0 (+ i 1)) (t (+ lo 1) (+ t 2))) ((or (<= delta i) (<= hi t))) (if (and (vector-ref sieve i) (prime? t)) (set! output (cons t output)))) (loop (+ lo (* 2 delta)) (map (lambda (p q) (modulo (- q delta) p)) ps qs)))))))

This is just a segmented sieve, as in a prior exercise, but with the sieving primes in ps limited. It is called like this:

> (length (big-primes #e1e50 (+ #e1e50 #e1e6) 25000 1000000)) 8737

That took a little bit less than a minute on my old slow machine at home. Only about one second of that minute was spent sieving, the rest of the time was devoted to checking the 40808 candidates that survived the sieve. Increasing limit to 2000000 reduced the candidate count to 38888, and increasing limit to 5000000 reduced the candidate count to 36551. Changing from a Miller-Rabin test with k = 10 to a single pseudoprimality test to base 2 left the result unchanged and saved only a little bit of time, proving Henri Cohen’s comment about “industrial-grade primes.”

The second problem uses a segmented sieve that contains only the numbers congruent to 1 (mod 4); thus, instead of sieving 2, 3, 4, 5, … or 3, 5, 7, 9, …, we sieve 5, 9, 13, 17, …. Here delta is one-quarter the segment length, instead of one-half, and the definition of the qs changes to reflect the numbers that are in the sieve:

(define (primes1mod4 lo hi delta) (let* ((output (list)) (sieve (make-vector delta #t)) (ps (cdr (primes (isqrt hi)))) (qs (map (lambda (p) (modulo (* -1 (inverse 4 p) (+ lo p 1)) p)) ps))) (let loop ((lo lo) (qs qs)) (if (not (< lo hi)) (reverse output) (begin (do ((i 0 (+ i 1))) ((= i delta)) (vector-set! sieve i #t)) (do ((ps ps (cdr ps)) (qs qs (cdr qs))) ((null? ps)) (do ((j (car qs) (+ j (car ps)))) ((<= delta j)) (vector-set! sieve j #f))) (do ((i 0 (+ i 1)) (t (+ lo 1) (+ t 4))) ((or (<= delta i) (<= hi t))) (if (vector-ref sieve i) (set! output (cons t output)))) (loop (+ lo (* 4 delta)) (map (lambda (p q) (modulo (- q delta) p)) ps qs)))))))

It takes only 70ms to solve the problem:

> (length (primes1mod4 1000000 2000000 25000)) 35241

We have a bonus exercise today; it’s not about enumerating primes, but the solution does involve a sieve, so we include it here. The task is to make a list of all numbers less than or equal to n that have k distinct prime factors. For instance, with n = 25 and k = 1, the solution includes both prime numbers (their only factor is themselves) and prime powers (their only factor is the base prime): 2, 3, 4, 5, 7, 8, 9, 11, 13, 16, 17, 19, 23, 25. The solution is on the next page.

Posted by programmingpraxis

Filed in Exercises

2 Comments »

2 Responses to “Two Sieving Problems”

Graham said

August 27, 2013 at 11:07 PM

My first answers are in J; it’s definitely cheating when your language has a
“prime factors of” function.
Problem 1:

+/@(1=#@q:)(10x^50)+1+i.1e6%2

This takes a very long time to get anywhere; the second is much faster:

+/@(1=#@q:)1e6+1+4*1e6%4

I originally had simpler answers, but decided that only working over odd
numbers (first question) or numbers equivalent to 1 mod 4 (second question) was
worth the uglification of my code.

My second answers are in Ruby; I can’t imagine trying to do these in a language
like C or C++, even with the help of multiprecision libraries like GMP or NTL.
I went with the Miller-Rabin primality test.

require 'mathn'

def decomp n
  s, d = 0, n
  while d.even?
    s, d = s + 1, d >> 1
  end
  return s, d
end


def modpow n, e, m
  r = 1
  while e > 0
    if e.odd?
      r = (r * n) % m
    end
    e >>= 1
    n = (n * n) % m
  end
  return r
end

def miller_rabin n, k=42
  s, d= decomp(n - 1)
  k.times do
    a = 2 + rand(n - 4)
    x = modpow a, d, n
    next if [1, n - 1].include? x
    flag = (s - 1).times do
      x = (x * x) % n
      return false if x == 1
      break n - 1 if x == n - 1
    end
    next if flag == n - 1
    return false
  end
  return true
end

class Integer
  def prime?
    return false if self < 2
    return true if self == 2
    return false if self.even?
    return miller_rabin self
  end
end

def problem1
  (10**50 + 1 .. 10**50 + 10**6).step(2).select(&:prime?).length
end

def problem2
  (10**6 + 1 .. 2 * 10**6).step(4).select(&:prime?).length
end

Paul said

August 30, 2013 at 3:42 PM

A Python version with segments.

import time
from math import ceil, sqrt

from gmpy2 import is_prime

from ma.primegmp import sieve

def erase(flags, primes, q, n):
    for pk, qk in zip(primes, q):
        for i in xrange(qk, n, pk):
            flags[i] = 0

def gen_sieve_seg(l, r, number_miller_rabin=25):
    """segmented sieve
       uses prime numbes <= 1000000
       checks candidates with miller rabin
    """
    primes = list(sieve(1000000))
    primes.remove(2)
    q = [(-(l + 1 + pk) // 2) % pk for pk in primes]
    n = ((r - l + 1) // 2)
    flags = [1] * n
    erase(flags, primes,q, n)
    for i, f in enumerate(flags):
        if f:
            cand = l + i * 2 + 1
            if is_prime(cand, number_miller_rabin):
                yield cand
                
def gen_sieve_seg_4np1(l, r):
    """segmented sieve
       ony numbers 4*n+1 are sieved
    """
    primes = list(sieve(int(ceil(sqrt(r)))))
    primes.remove(2)
    assert not l % 4
    q = [(-(l + 1 - (pk % 4) * pk) // 4) % pk for pk in primes]
    n = (r - l) // 4
    flags = [1] * n
    erase(flags, primes,q, n)
    for i, f in enumerate(flags):
        if f:
            yield l + i * 4 + 1
            
low = 10 ** 50
high = low + 10 ** 6

t0 = time.clock()
print sum(1 for p in gen_sieve_seg(low, high, 5)),
print "{:6.1f} sec".format(time.clock() - t0)

t0 = time.clock()
print sum(1 for p in gen_sieve_seg_4np1(1000000, 2000000)),
print "{:6.3f} sec".format(time.clock() - t0)

"""
8737    4.3 sec
35241  0.044 sec
"""

Programming Praxis