## Pritchard’s Wheel Sieve

### January 6, 2012

We begin at the end with the main function that drives the others; setup is done in the `let*`, then the loop processing updates the set S until k is greater than m:

```(define (pritchard n)   (let* ((ps (eratosthenes (isqrt n)))          (m (length ps))          (k (compute-k n ps)))     (let loop ((k (+ k 1)) (s (initial-s n k ps)))       (if (< m k) (append ps (cdr s))         (loop (+ k 1) (next n s))))))```

The `next` function derives a new version of S from its predecessor:

```(define (next n s)   (let* ((p (cadr s)) (pg (make-pg p)))     (let loop ((ss s) (ps (list p)))       (let ((p (+ (car ps)                   (vector-ref pg (/ (- (cadr ss) (car ss) 2) 2)))))         (if (< n p) (list-minus s (reverse ps))           (loop (cdr ss) (cons p ps)))))))```

`Next` calls `make-pg` to build a lookup-table of extended gaps, converting a multiplication to a lookup, and `list-minus` subtracts one list from another:

```(define (make-pg p)   (let ((pg (make-vector (- p 1) 0)))     (do ((i 0 (+ i 1))) ((= i (- p 1)) pg)       (vector-set! pg i (* 2 p (+ i 1))))))```

```(define (list-minus xs ys)   (let loop ((xs xs) (ys ys) (zs (list)))     (cond ((null? xs) (reverse zs))           ((null? ys) (append (reverse zs) xs))           ((= (car xs) (car ys))             (loop (cdr xs) (cdr ys) zs))           (else (loop (cdr xs) ys (cons (car xs) zs))))))```

`Compute-k` and `initial-s` are used in the setup phase:

```(define (compute-k n ps)   (let loop ((k 0) (m 2) (ps (cdr ps)))     (if (< (/ n (log n)) m) k       (loop (+ k 1) (* m (car ps)) (cdr ps)))))```

```(define (initial-s n k ps)   (let ((s (make-vector (+ n 1) #t)))     (vector-set! s 0 #f)     (let loop ((k k) (ps ps))       (if (zero? k) (enlist s)         (do ((i (car ps) (+ i (car ps))))             ((< n i) (loop (- k 1) (cdr ps)))           (vector-set! s i #f))))))```

The set S is initially represented as a bit vector locally in `initial-s` then translated to a list by the `enlist` function:

```(define (enlist s)   (let loop ((i (- (vector-length s) 1)) (ss (list)))     (cond ((negative? i) ss)           ((vector-ref s i)             (loop (- i 1) (cons i ss)))           (else (loop (- i 1) ss)))))```

We also use the sieve of Eratosthenes from a previous exercise to initialize the small list of primes, and `isqrt` from the Standard Prelude. Here’s the program in action:

```> (pritchard 100) (2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97) > (length (pritchard 1000)) 168```

You can run the program at http://programmingpraxis.codepad.org/22VkAlSX. You will find that Pritchard’s sieve is quite slow compared to Eratosthenes’ sieve. Though that is true in general, our implementation is also rather inefficient due to its use of the set data structure. A better implementation uses arrays, but if you decide to do that, be careful; a naive translation from sets to array indexed front-to-back will fail because some of the values of the S set are changed (stricken from the set) before the sweep through the array catches them, causing errors. There are various ways to solve that problem (Pritchard swept the array back-to-front, other programmers came up with other methods), but all add to the problem that Pritchard’s sieve is slower in practice than Eratosthenes’, even with a better asymptotic complexity. Other versions of Pritchard’s sieve improve on this version, but are still slower in practice than Eratosthenes’. In fact, all of the advanced sieves that we have studied are slower than the original algorithm from the ancient Greek mathematician.

Pages: 1 2

### 5 Responses to “Pritchard’s Wheel Sieve”

1. Giorgio Valoti said

If I understand correctly, the wheel for 2, 3, 5, 7 is [10 2 4 2 4 6 2 6 4 2 4 6 6 2 6 4 2 6 4 6 8 4 2 4 2 4 8 6 4 6 2 4 6 2 6 6 4 2 4 6 2 6 4 2 4 2 10 2]. However, in this paper http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf, the wheel starts from 2: [2 4 2 4 6 2 6 4 2 4 6 6 2 6 4 2 6 4 6 8 4 2 4 2 4 8 6 4 6 2 4 6 2 6 6 4 2 4 6 2 6 4 2 4 2 10 2 10].

Why the difference?

2. Sanjay Giri said

There is an error here The second time through the loop, p = 7, the gaps are 7−1=6, 11−7=4, 13−11=6, 17−13=4, and 19−17=2, and the numbers that are stricken are 7, 7+6×7=49, 49+4×7=77, and 77+2×7=91, leaving S4 = {1 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97}.

i.e 13-11 is 2

3. nondescript said

Thanks for the algorithm!

4. Willy Good said

@Giorgio Valoti, the difference is that one of the wheel sets has been rolled one position from the other as usually the wheels are used starting from their next wheel position, as is W0 starts at 2, W1 starts at 3, W2 starts at 5, etc.- the wheel excluding 7 (W4 by my definition) starts at 11.

The algorithm is interesting, but really it is just a wheel factorized Sieve of Eratosthenes with less wheels used for smaller sieve ranges and larger ones used for bigger sieve ranges in order to keep the O(n) performance: one may as well use a constant maximum practical sieve size such as 2/3/5/7/11/13/17/19 (W8 by my definition) and get even better performance for small ranges, the same performance when this algorithm would call for this size of wheel anyway (at about 200 million to quite a few billion) and accept that performance above that point will be a little slower given that one doesn’t really want to deal with a huge gap table.

As to “worse performance than the Sieve of Eratosthenes”, that isn’t necessarily true, or at least not by much: I suspect that your implementation is slow because you persist in using lists to store the gaps/wheel primes and the set of candidates to be culled of composites: lists are very handy but the bane of performance (and consume memory since as well as the actual data per element, they also have to pointer link to the next element of 4 bytes/8 bytes for 32/64 bit systems, respectively). Of the three operations you mention, 1) you don’t have to calculate gaps per culling operations if you store the wheel as gaps (preferably in an array for faster access rather than a list), 2) integer multiplies aren’t expensive for a modern CPU and can be streamed to not take much more time than the other operations performed in parallel, and 3) integer additions are done in any case and take almost zero average time for a modern CPU.

The most practical sieves such a (C) primesieve fill the array with a precull using a pattern of the large wheel as above and then do further culling using only the W4 wheel of 48 “hit” positions out of the wheel range of 210; this seems to maximize efficiency and trade-offs between use of memory including optimizing the use of CPU caches vs. execution speed. In order to keep the inner loop simple and avoid having to jump by gaps, the ‘hit” modulos are often separated out so that the innermost loops can do simple addition offsets for their culling (jumping by even numbers of wheels), with separate innermost loops running for each modulo for each base prime. This can be done in any language, including Scheme.

for instance, see Python 2, in this case using only a 2/3/5 wheel and no preculling, but that would be fairly easy to add as would increasing the size of the wheel, here the wheel is given as constant literals but computing the wheel isn’t hard in any language:

```def primes235(limit):
yield 2; yield 3; yield 5
if limit < 7: return
modPrms = [7,11,13,17,19,23,29,31]
gaps = [4,2,4,2,4,6,2,6,4,2,4,2,4,6,2,6] # 2 loops for overflow
ndxs = [0,0,0,0,1,1,2,2,2,2,3,3,4,4,4,4,5,5,5,5,5,5,6,6,7,7,7,7,7,7]
lmtbf = (limit + 23) // 30 * 8 - 1 # integral number of wheels rounded up
lmtsqrt = (int(limit ** 0.5) - 7)
lmtsqrt = lmtsqrt // 30 * 8 + ndxs[lmtsqrt % 30] # round down on the wheel
buf = [True] * (lmtbf + 1)
for i in xrange(lmtsqrt + 1):
if buf[i]:
ci = i & 7; p = 30 * (i >> 3) + modPrms[ci]
s = p * p - 7; p8 = p << 3
for j in range(8):
c = s // 30 * 8 + ndxs[s % 30]
buf[c::p8] = [False] * ((lmtbf - c) // p8 + 1)
s += p * gaps[ci]; ci += 1
for i in xrange(lmtbf - 6 + (ndxs[(limit - 7) % 30])): # adjust for extras
if buf[i]: yield (30 * (i >> 3) + modPrms[i & 7])
```
5. Larry Cornell said

The whole process runs better if you use the sequence of products of the first n-primes as your bases. There really is no point in venturing outside this since the totient is minimal at those values and all primes less than any of them will lie in the reduced residue system which can be constructed easily by enumerating over the r.r.s. of its predecessor from zero to p-1.