Incremental Sieve Of Eratosthenes

July 31, 2015

Here is our implementation of the PRIMEGEN algorithm:

(define-generator (primegen)
  (yield 2) (yield 3)
    (let* ((ps (primegen))
         (p (and (ps) (ps)))
         (q (* p p))
         (d (make-hashtable identity =)))
    (define (add m s)
      (while (hashtable-contains? d m)
        (set! m (+ m s)))
      (hashtable-set! d m s))
    (do ((c (+ p 2) (+ c 2))) (#f)
      (cond ((hashtable-contains? d c)
              (let ((s (hashtable-ref d c #f)))
                (hashtable-delete! d c)
                (add (+ c s) s)))
            ((< c q) (yield c))
            (else (add (+ c p p) (+ p p))
                  (set! p (ps))
                  (set! q (* p p)))))))

Here, the infinite loop is created by the (#f) in the do-loop, and the while loops in steps 4 and 6 have been replaced by an internal function add. We calculate the ten thousandth prime like this:

> (let ((ps (primegen)))
    (do ((i 0 (+ i 1)) (p 0 (ps)))
        ((= i 10000) p)))
104729

In general, if you know the bound on n in advance, or can determine a reasonable approximation for it, you should use either the simple Sieve of Eratosthenes or the segmented Sieve of Eratosthenes, as they are both faster than the incremental Sieve of Eratosthenes (indexing an array is quicker than fetching and updating a dictionary) and consume less space (because they use bit arrays instead of dictionaries).

We used the identity function and the while and define-generator macros from the Standard Prelude. You can see the code assembled at http://ideone.com/wzcg5j, but it doesn’t run because the necessary libraries aren’t available on ideone. A python version that runs is available at http://ideone.com/S9BewN.

Pages: 1 2

10 Responses to “Incremental Sieve Of Eratosthenes”

  1. Paul said

    In Python. Nice method, but indeed a little slow.

    def primegen():
        yield 2
        yield 3
        D = {}
        def set_next_m(c, s):
            m = c + s
            while m in D:
                m += s
            D[m] = s
        ps = primegen()
        c = p = next(ps) and next(ps)
        q = p * p
        while True:
            c += 2
            if c in D:
                set_next_m(c, D.pop(c))
            else:
                if c != q:
                    yield c
                else:
                    set_next_m(c, p + p)
                    p = next(ps)
                    q = p * p
    
  2. It should be noted that due to the deficiencies in execution speed as noted in the given answer, a better solution for an unbounded Sieve of Eratosthenes for larger ranges of over a million primes or so is to use a page segmented version where the upper bound is not specified and the sieve range grows by whatever the page size (which may even be a variable page size to better suit larger and larger ranges). In this way, the “increment” is a full page of primes rather than single primes, space efficiency is preserved by using bit packed arrays instead of hash table entries, and speed efficiency is also preserved in culling composites by page ranges instead of one at a time.

  3. programmingpraxis said

    @GordonBGood: You are correct, it is better to use a segmented Sieve of Eratosthenes rather than the algorithm given here. We did exactly that in a previous exercise.

  4. Lucas A. Brown said

    In Python, I present a direct translation of the pseudocode in the problem statement…

    def primegen():
        yield 2; yield 3
        ps = serat()
        D = {}
        p = ps.next()
        p = ps.next()
        q = p * p
        c = p
        while True:
            c += 2
            if c in D:
                s = D[c]
                m = c + s
                del D[c]
                while m in D: m += s
                D[m] = s
                continue
            if c != q:
                yield c
                continue
            else:
                s = p + p
                m = c + s
                while m in D: m += s
                D[m] = s
                p = ps.next()
                q = p*p
    

    … and terser variant that somebody else wrote and that I’ve been using all this time instead:

    def primegen():
        yield 2; yield 3; yield 5; yield 7; yield 11; yield 13
        ps = primegen() # yay recursion
        p = ps.next() and ps.next()
        q, sieve, n = p**2, {}, 13
        while True:
            if n not in sieve:
                if n < q: yield n
                else:
                    next, step = q + 2*p, 2*p
                    while next in sieve: next += step
                    sieve[next] = step
                    p = ps.next()
                    q = p**2
            else:
                step = sieve.pop(n)
                next = n + step
                while next in sieve: next += step
                sieve[next] = step
            n += 2
    
  5. Paul said

    A small update of my earlier version, where a function is used to update the dictionary. This function is called a lot. Removing the function gives a speedup of about 20%. I played with the idea given in the post of @Lucas A. Brown to preload more primes from the start (not 2,3 but 2,3,5,7,11,13). This did not help on my PC to speed up execution. The new version is on ideone.

  6. Mike said

    Here’s a function I use on occasion. It combines steps 4 and 6 of the algorithm in the problem. When a prime p is found, the composite p*p is loaded into the dictionary with a step of 2*p. Then at step 4, if c is in D it is composite, if c is not in D it is prime. No need to check if c == q. Also no recursion.

    from itertools import count
    
    def primegen():
        yield 2
        D = {}
        for candidate in count(3, 2):
            step = D.pop(candidate, None)
    
            if step is None:
                yield candidate
                D[candidate*candidate] = 2*candidate
    
            else:
                composite = candidate + step
                while composite in D:
                    composite += step
                D[composite] = step
    
  7. programmingpraxis said

    @Mike: I think yours takes space O(n), but mine takes space O(sqrt n) because addition of the new prime to the dictionary is postponed until its square is seen; here n is the number of primes seen, not the magnitude of the current largest prime.

  8. Paul said

    @Mike. The method you show works fine, of course. The drawback is that the dictionary is filling up much quicker than the method that @programmingpraxis proposes. For example: after calculating the primes up to a million, the dictionary has a 78497 entries with your method (the largest entry is 999966000289), where the dictionaries of @programmingpraxis will have only some 300 entries (the largest entry is 1002043). The @programmingpraxis will keep running longer without a memory error. The extra entries with your method are the squares of the primes processed so far. The @programmingpraxis version is avoiding all these.

  9. […] Generators are a useful programming tool; we provide an implementation in the Standard Prelude. Here is a Python prime-number generator that we discussed in a previous exercise: […]

  10. […] decided to write a prime number generator in Awk, based on this previous exercise. Never mind that Awk doesn’t provide iterators, I ought to be able to figure this out. And I […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: