Incremental Sieve Of Eratosthenes

July 31, 2015

The standard Sieve of Eratosthenes requires that you specify an upper bound before you start. But often, programs that use prime numbers don’t know the upper bound in advance, or don’t want to pre-compute and store all of the primes up to their bound. In those cases, an incremental Sieve of Eratosthenes can be used:

Algorithm PRIMEGEN: Create a generating function that returns the next prime number each time it is called, starting from zero.
1. [Prime the pump] Yield 2, then yield 3.
2. [Initialization] Set ps ← PRIMEGEN, D ← {}, p ← next(ps), p ← next(ps) again, q ← p × p, and c ← p.
3. [Next candidate] Set c ← c + 2.
4. [Composite candidate] If c ∉ D, skip to Step 5. Otherwise, set s ← D{c}, m ← c + s. Remove c from D. While m ∈ D, set m ← m + s. Set D{m} ← s. Go to Step 3.
5. [Prime candidate] If c = q, skip to Step 6. Otherwise, yield c and go to Step 3.
6. [Next sieving prime] Set s ← p + p, m ← c + s. While m ∈ D, set m ← m + s. Set D{m} ← s. Set p ← next(ps), q ← p × p. Go to Step 3.

The PRIMEGEN algorithm creates the sequence of prime numbers, returning the next prime in the sequence each time it is called; the exact mechanism to do that depends on the implementation language. The first step primes the pump (sorry!) by issuing the first two primes: 2 is a special case because it is even, and 3 is a special case because the pump only considers odd numbers as prime candidates.

The second step initializes two important data structures: ps is the list of sieving primes, which is determined by calling PRIMEGEN recursively, and D is a dictionary of composite/stride pairs, one for each sieving prime less than the square root of the current prime; the dictionary will most likely be implemented as a hash table, but other data structures such as a balanced binary tree could also be used. The other initializations are the current sieving prime p, its square q, and the initial prime candidate c.

The third step begins the main loop of the function by calculating the next prime candidate; eventually, all odd numbers starting from 5 will be prime candidates. Then there are three possibilities, each handled by a separate step.

The fourth step handles the case that the candidate c is composite, resetting the dictionary entry for the appropriate sieving prime. The fifth step handles the case that the candidate c is both composite and less than the square q of the current sieving prime, which indicates that the candidate is prime. The sixth step occurs when the candidate is composite but not in the dictionary, having reached the square of the current sieving prime, when a new sieving prime is added to the dictionary and the current sieving prime is updated in variables p and q. After the appropriate option has been processed, the algorithm returns to the top of the main loop to obtain the next prime.

In the fourth and sixth steps, the while loop calculates the new dictionary entry for the current sieving prime: the stride s = 2p is the distance between odd multiples of the sieving prime, and m is the smallest multiple of the sieving prime p greater than the current candidate c that is not already in the dictionary. The dictionary is keyed by m, which is a multiple of a sieving prime, with the corresponding stride s as its value.

There are several points to note about this algorithm. First, it is recursive, and there will eventually be a stack of sieving prime sequences, which sounds bizarre but actually makes sense. Second, by postponing the addition of new sieving primes to the dictionary until reaching their squares, only primes less than the square root of the current candidate need to be stored. And third, eliminating duplicate dictionary entries with the while loop of steps 4 and 6 keeps the size of the dictionary at exactly the number of sieving primes already processed. The whole algorithm is very efficient, making it useful whenever processing primes incrementally.

Time complexity of the algorithm is O(n log log n), the same as any other implementation of the Sieve of Eratosthenes, and space complexity is O(sqrt n) to store the dictionary of sieving primes.

Your task is to implement the incremental Sieve of Eratosthenes. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

10 Comments »

10 Responses to “Incremental Sieve Of Eratosthenes”

Paul said

July 31, 2015 at 10:10 AM

In Python. Nice method, but indeed a little slow.

def primegen():
    yield 2
    yield 3
    D = {}
    def set_next_m(c, s):
        m = c + s
        while m in D:
            m += s
        D[m] = s
    ps = primegen()
    c = p = next(ps) and next(ps)
    q = p * p
    while True:
        c += 2
        if c in D:
            set_next_m(c, D.pop(c))
        else:
            if c != q:
                yield c
            else:
                set_next_m(c, p + p)
                p = next(ps)
                q = p * p

GordonBGood said
August 1, 2015 at 2:28 AM
It should be noted that due to the deficiencies in execution speed as noted in the given answer, a better solution for an unbounded Sieve of Eratosthenes for larger ranges of over a million primes or so is to use a page segmented version where the upper bound is not specified and the sieve range grows by whatever the page size (which may even be a variable page size to better suit larger and larger ranges). In this way, the “increment” is a full page of primes rather than single primes, space efficiency is preserved by using bit packed arrays instead of hash table entries, and speed efficiency is also preserved in culling composites by page ranges instead of one at a time.
programmingpraxis said
August 1, 2015 at 11:44 AM
@GordonBGood: You are correct, it is better to use a segmented Sieve of Eratosthenes rather than the algorithm given here. We did exactly that in a previous exercise.

Lucas A. Brown said

August 2, 2015 at 12:26 AM

In Python, I present a direct translation of the pseudocode in the problem statement…

def primegen():
    yield 2; yield 3
    ps = serat()
    D = {}
    p = ps.next()
    p = ps.next()
    q = p * p
    c = p
    while True:
        c += 2
        if c in D:
            s = D[c]
            m = c + s
            del D[c]
            while m in D: m += s
            D[m] = s
            continue
        if c != q:
            yield c
            continue
        else:
            s = p + p
            m = c + s
            while m in D: m += s
            D[m] = s
            p = ps.next()
            q = p*p

… and terser variant that somebody else wrote and that I’ve been using all this time instead:

def primegen():
    yield 2; yield 3; yield 5; yield 7; yield 11; yield 13
    ps = primegen() # yay recursion
    p = ps.next() and ps.next()
    q, sieve, n = p**2, {}, 13
    while True:
        if n not in sieve:
            if n < q: yield n
            else:
                next, step = q + 2*p, 2*p
                while next in sieve: next += step
                sieve[next] = step
                p = ps.next()
                q = p**2
        else:
            step = sieve.pop(n)
            next = n + step
            while next in sieve: next += step
            sieve[next] = step
        n += 2

Paul said
August 3, 2015 at 1:13 PM
A small update of my earlier version, where a function is used to update the dictionary. This function is called a lot. Removing the function gives a speedup of about 20%. I played with the idea given in the post of @Lucas A. Brown to preload more primes from the start (not 2,3 but 2,3,5,7,11,13). This did not help on my PC to speed up execution. The new version is on ideone.

Mike said

August 4, 2015 at 4:48 AM

Here’s a function I use on occasion. It combines steps 4 and 6 of the algorithm in the problem. When a prime p is found, the composite p*p is loaded into the dictionary with a step of 2*p. Then at step 4, if c is in D it is composite, if c is not in D it is prime. No need to check if c == q. Also no recursion.

from itertools import count

def primegen():
    yield 2
    D = {}
    for candidate in count(3, 2):
        step = D.pop(candidate, None)

        if step is None:
            yield candidate
            D[candidate*candidate] = 2*candidate

        else:
            composite = candidate + step
            while composite in D:
                composite += step
            D[composite] = step

programmingpraxis said
August 4, 2015 at 12:45 PM
@Mike: I think yours takes space O(n), but mine takes space O(sqrt n) because addition of the new prime to the dictionary is postponed until its square is seen; here n is the number of primes seen, not the magnitude of the current largest prime.
Paul said
August 4, 2015 at 12:47 PM
@Mike. The method you show works fine, of course. The drawback is that the dictionary is filling up much quicker than the method that @programmingpraxis proposes. For example: after calculating the primes up to a million, the dictionary has a 78497 entries with your method (the largest entry is 999966000289), where the dictionaries of @programmingpraxis will have only some 300 entries (the largest entry is 1002043). The @programmingpraxis will keep running longer without a memory error. The extra entries with your method are the squares of the primes processed so far. The @programmingpraxis version is avoiding all these.
Generator Push-Back | Programming Praxis said
September 1, 2017 at 9:01 AM
[…] Generators are a useful programming tool; we provide an implementation in the Standard Prelude. Here is a Python prime-number generator that we discussed in a previous exercise: […]
I Think I’m Crazy! | Programming Praxis said
February 23, 2018 at 10:00 AM
[…] decided to write a prime number generator in Awk, based on this previous exercise. Never mind that Awk doesn’t provide iterators, I ought to be able to figure this out. And I […]

Programming Praxis