Incremental Sieve Of Eratosthenes

July 31, 2015

Here is our implementation of the PRIMEGEN algorithm:

(define-generator (primegen) (yield 2) (yield 3) (let* ((ps (primegen)) (p (and (ps) (ps))) (q (* p p)) (d (make-hashtable identity =))) (define (add m s) (while (hashtable-contains? d m) (set! m (+ m s))) (hashtable-set! d m s)) (do ((c (+ p 2) (+ c 2))) (#f) (cond ((hashtable-contains? d c) (let ((s (hashtable-ref d c #f))) (hashtable-delete! d c) (add (+ c s) s))) ((< c q) (yield c)) (else (add (+ c p p) (+ p p)) (set! p (ps)) (set! q (* p p)))))))

Here, the infinite loop is created by the (#f) in the do-loop, and the while loops in steps 4 and 6 have been replaced by an internal function add. We calculate the ten thousandth prime like this:

> (let ((ps (primegen))) (do ((i 0 (+ i 1)) (p 0 (ps))) ((= i 10000) p))) 104729

In general, if you know the bound on n in advance, or can determine a reasonable approximation for it, you should use either the simple Sieve of Eratosthenes or the segmented Sieve of Eratosthenes, as they are both faster than the incremental Sieve of Eratosthenes (indexing an array is quicker than fetching and updating a dictionary) and consume less space (because they use bit arrays instead of dictionaries).

We used the identity function and the while and define-generator macros from the Standard Prelude. You can see the code assembled at http://ideone.com/wzcg5j, but it doesn’t run because the necessary libraries aren’t available on ideone. A python version that runs is available at http://ideone.com/S9BewN.

Posted by programmingpraxis

Filed in Exercises

10 Comments »

10 Responses to “Incremental Sieve Of Eratosthenes”

Paul said

July 31, 2015 at 10:10 AM

In Python. Nice method, but indeed a little slow.

def primegen():
    yield 2
    yield 3
    D = {}
    def set_next_m(c, s):
        m = c + s
        while m in D:
            m += s
        D[m] = s
    ps = primegen()
    c = p = next(ps) and next(ps)
    q = p * p
    while True:
        c += 2
        if c in D:
            set_next_m(c, D.pop(c))
        else:
            if c != q:
                yield c
            else:
                set_next_m(c, p + p)
                p = next(ps)
                q = p * p

GordonBGood said
August 1, 2015 at 2:28 AM
It should be noted that due to the deficiencies in execution speed as noted in the given answer, a better solution for an unbounded Sieve of Eratosthenes for larger ranges of over a million primes or so is to use a page segmented version where the upper bound is not specified and the sieve range grows by whatever the page size (which may even be a variable page size to better suit larger and larger ranges). In this way, the “increment” is a full page of primes rather than single primes, space efficiency is preserved by using bit packed arrays instead of hash table entries, and speed efficiency is also preserved in culling composites by page ranges instead of one at a time.
programmingpraxis said
August 1, 2015 at 11:44 AM
@GordonBGood: You are correct, it is better to use a segmented Sieve of Eratosthenes rather than the algorithm given here. We did exactly that in a previous exercise.

Lucas A. Brown said

August 2, 2015 at 12:26 AM

In Python, I present a direct translation of the pseudocode in the problem statement…

def primegen():
    yield 2; yield 3
    ps = serat()
    D = {}
    p = ps.next()
    p = ps.next()
    q = p * p
    c = p
    while True:
        c += 2
        if c in D:
            s = D[c]
            m = c + s
            del D[c]
            while m in D: m += s
            D[m] = s
            continue
        if c != q:
            yield c
            continue
        else:
            s = p + p
            m = c + s
            while m in D: m += s
            D[m] = s
            p = ps.next()
            q = p*p

… and terser variant that somebody else wrote and that I’ve been using all this time instead:

def primegen():
    yield 2; yield 3; yield 5; yield 7; yield 11; yield 13
    ps = primegen() # yay recursion
    p = ps.next() and ps.next()
    q, sieve, n = p**2, {}, 13
    while True:
        if n not in sieve:
            if n < q: yield n
            else:
                next, step = q + 2*p, 2*p
                while next in sieve: next += step
                sieve[next] = step
                p = ps.next()
                q = p**2
        else:
            step = sieve.pop(n)
            next = n + step
            while next in sieve: next += step
            sieve[next] = step
        n += 2

Paul said
August 3, 2015 at 1:13 PM
A small update of my earlier version, where a function is used to update the dictionary. This function is called a lot. Removing the function gives a speedup of about 20%. I played with the idea given in the post of @Lucas A. Brown to preload more primes from the start (not 2,3 but 2,3,5,7,11,13). This did not help on my PC to speed up execution. The new version is on ideone.

Mike said

August 4, 2015 at 4:48 AM

Here’s a function I use on occasion. It combines steps 4 and 6 of the algorithm in the problem. When a prime p is found, the composite p*p is loaded into the dictionary with a step of 2*p. Then at step 4, if c is in D it is composite, if c is not in D it is prime. No need to check if c == q. Also no recursion.

from itertools import count

def primegen():
    yield 2
    D = {}
    for candidate in count(3, 2):
        step = D.pop(candidate, None)

        if step is None:
            yield candidate
            D[candidate*candidate] = 2*candidate

        else:
            composite = candidate + step
            while composite in D:
                composite += step
            D[composite] = step

programmingpraxis said
August 4, 2015 at 12:45 PM
@Mike: I think yours takes space O(n), but mine takes space O(sqrt n) because addition of the new prime to the dictionary is postponed until its square is seen; here n is the number of primes seen, not the magnitude of the current largest prime.
Paul said
August 4, 2015 at 12:47 PM
@Mike. The method you show works fine, of course. The drawback is that the dictionary is filling up much quicker than the method that @programmingpraxis proposes. For example: after calculating the primes up to a million, the dictionary has a 78497 entries with your method (the largest entry is 999966000289), where the dictionaries of @programmingpraxis will have only some 300 entries (the largest entry is 1002043). The @programmingpraxis will keep running longer without a memory error. The extra entries with your method are the squares of the primes processed so far. The @programmingpraxis version is avoiding all these.
Generator Push-Back | Programming Praxis said
September 1, 2017 at 9:01 AM
[…] Generators are a useful programming tool; we provide an implementation in the Standard Prelude. Here is a Python prime-number generator that we discussed in a previous exercise: […]
I Think I’m Crazy! | Programming Praxis said
February 23, 2018 at 10:00 AM
[…] decided to write a prime number generator in Awk, based on this previous exercise. Never mind that Awk doesn’t provide iterators, I ought to be able to figure this out. And I […]

Programming Praxis