Incremental Sieve Of Eratosthenes
July 31, 2015
Here is our implementation of the PRIMEGEN algorithm:
(define-generator (primegen)
(yield 2) (yield 3)
(let* ((ps (primegen))
(p (and (ps) (ps)))
(q (* p p))
(d (make-hashtable identity =)))
(define (add m s)
(while (hashtable-contains? d m)
(set! m (+ m s)))
(hashtable-set! d m s))
(do ((c (+ p 2) (+ c 2))) (#f)
(cond ((hashtable-contains? d c)
(let ((s (hashtable-ref d c #f)))
(hashtable-delete! d c)
(add (+ c s) s)))
((< c q) (yield c))
(else (add (+ c p p) (+ p p))
(set! p (ps))
(set! q (* p p)))))))
Here, the infinite loop is created by the (#f)
in the do
-loop, and the while
loops in steps 4 and 6 have been replaced by an internal function add
. We calculate the ten thousandth prime like this:
> (let ((ps (primegen)))
(do ((i 0 (+ i 1)) (p 0 (ps)))
((= i 10000) p)))
104729
In general, if you know the bound on n in advance, or can determine a reasonable approximation for it, you should use either the simple Sieve of Eratosthenes or the segmented Sieve of Eratosthenes, as they are both faster than the incremental Sieve of Eratosthenes (indexing an array is quicker than fetching and updating a dictionary) and consume less space (because they use bit arrays instead of dictionaries).
We used the identity
function and the while
and define-generator
macros from the Standard Prelude. You can see the code assembled at http://ideone.com/wzcg5j, but it doesn’t run because the necessary libraries aren’t available on ideone. A python version that runs is available at http://ideone.com/S9BewN.
In Python. Nice method, but indeed a little slow.
It should be noted that due to the deficiencies in execution speed as noted in the given answer, a better solution for an unbounded Sieve of Eratosthenes for larger ranges of over a million primes or so is to use a page segmented version where the upper bound is not specified and the sieve range grows by whatever the page size (which may even be a variable page size to better suit larger and larger ranges). In this way, the “increment” is a full page of primes rather than single primes, space efficiency is preserved by using bit packed arrays instead of hash table entries, and speed efficiency is also preserved in culling composites by page ranges instead of one at a time.
@GordonBGood: You are correct, it is better to use a segmented Sieve of Eratosthenes rather than the algorithm given here. We did exactly that in a previous exercise.
In Python, I present a direct translation of the pseudocode in the problem statement…
… and terser variant that somebody else wrote and that I’ve been using all this time instead:
A small update of my earlier version, where a function is used to update the dictionary. This function is called a lot. Removing the function gives a speedup of about 20%. I played with the idea given in the post of @Lucas A. Brown to preload more primes from the start (not 2,3 but 2,3,5,7,11,13). This did not help on my PC to speed up execution. The new version is on ideone.
Here’s a function I use on occasion. It combines steps 4 and 6 of the algorithm in the problem. When a prime p is found, the composite p*p is loaded into the dictionary with a step of 2*p. Then at step 4, if c is in D it is composite, if c is not in D it is prime. No need to check if c == q. Also no recursion.
@Mike: I think yours takes space O(n), but mine takes space O(sqrt n) because addition of the new prime to the dictionary is postponed until its square is seen; here n is the number of primes seen, not the magnitude of the current largest prime.
@Mike. The method you show works fine, of course. The drawback is that the dictionary is filling up much quicker than the method that @programmingpraxis proposes. For example: after calculating the primes up to a million, the dictionary has a 78497 entries with your method (the largest entry is 999966000289), where the dictionaries of @programmingpraxis will have only some 300 entries (the largest entry is 1002043). The @programmingpraxis will keep running longer without a memory error. The extra entries with your method are the squares of the primes processed so far. The @programmingpraxis version is avoiding all these.
[…] Generators are a useful programming tool; we provide an implementation in the Standard Prelude. Here is a Python prime-number generator that we discussed in a previous exercise: […]
[…] decided to write a prime number generator in Awk, based on this previous exercise. Never mind that Awk doesn’t provide iterators, I ought to be able to figure this out. And I […]