## Pritchard’s Wheel Sieve

### January 6, 2012

We have seen several different sieves that enumerate the prime numbers not greater than *n* due to Eratosthenes, Atkin, Euler and Sundaram. In the 1980s, Paul Pritchard, an Australian mathematician, developed a family of sieve algorithms based on wheels, eventually finding an algorithm with *O*(*n* / log log *n*) time complexity and *O*(√*n*) space complexity. We examine a simple version of Pritchard’s wheel sieve in today’s exercise.

We begin with some definitions. *M _{k}* is the product of the first

*k*primes; for instance,

*M*

_{7}= 2×3×5×7×11×13×17=510510. The totatives of

*M*are those numbers from 1 to

_{k}*M*that are coprime to

_{k}*M*(that is, they have no factors in common). It is easy to determine the totatives of

_{k}*M*by sieving: make a list of the integers from 1 to

_{k}*M*, then for each of the primes that form the product

_{k}*M*, strike out from the list the prime and all of its multiples; for instance, with

_{k}*M*

_{3}=2×3×5=30, sieving with 2 strikes out 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28 and 30, sieving with 3 strikes out 3, 6, 9, 12, 15, 18, 21, 24, 27, and 30, and sieving with 5 strikes out 5, 10, 15, 20, 25 and 30, leaving the totatives 1, 7, 11, 13, 17, 19, 23 and 29. A factoring wheel

*W*contains the gaps between successive totatives, wrapping around at the end; for instance,

_{k}*W*

_{3}consists of the gaps 6, 4, 2, 4, 2, 4, 6 and 2, corresponding to the gaps 7−1, 11−7, and so on, ending with 29−23 and 31−29 when the wheel wraps around at the end.

Pritchard’s wheel sieve uses wheels repeatedly to strike out composite numbers from the sieve. It operates in two phases: a setup phase and a processing loop.

The setup phase first computes a parameter *k* such that *M _{k}* < n / log

_{e}n <

*M*

_{k+1}, then computes the wheel

*W*and forms the set

_{k}*S*from 1 to

*n*by rolling the

*W*wheel. The setup phase also computes the list of

_{k}*m*primes not greater than the square root of

*n*.

We compute the primes not greater than 100 as an example. We compute k=2, since 100/log(100) is 21.714724 which is between *M*_{2}=6 and *M*_{3}=30. The *W*_{2} wheel is {4 2} and the set *S* is {1 5 7 11 13 17 19 23 25 29 31 35 37 41 43 47 49 53 55 59 61 65 67 71 73 77 79 83 85 89 91 95 97}; although there is only one set *S*, we will refer to this set as *S*_{2}, since it is the result of rolling the *W*_{2} wheel. Finally, the square root of 100 is 10, and the m=4 primes less than 10 are {2 3 5 7}.

The processing loop iterates from *k*+1 to *m*. At each loop we will strike some of the elements of *S*, reducing *S* from *S*_{k} to *S*_{k+1}. Each time through the loop we first identify *p*, the smallest member of *S* that is greater than 1, and strike it from the set *S*. We also strike from *S* the successive multiples *p*(*s*'−*s*) less than *n*, where *s*'−*s* are the successive gaps in *S*. Finally, we increment *k* by 1 and repeat the loop as long as *k*≤*m*.

In our example computing the primes not greater than 100, *i* will take the values 3 and 4. The first time through the loop, *p* = 5, the gaps are 5−1=4, 7−5=2, 11−7=4, 13−11=2, 17−13=4 and 19−17=2, and the numbers that are stricken are 5, 5+4×5=25, 25+2×5=35, 35+4×5=55, 55+2×5=65, 65+4×5=85, and 85+2×5=95, leaving *S*_{3} = {1 7 11 13 17 19 23 29 31 37 41 43 47 49 53 59 61 67 71 73 77 79 83 89 91 97}. The second time through the loop, *p* = 7, the gaps are 7−1=6, 11−7=4, 13−11=6, 17−13=4, and 19−17=2, and the numbers that are stricken are 7, 7+6×7=49, 49+4×7=77, and 77+2×7=91, leaving *S*_{4} = {1 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97}.

Once *k*>*m* and the final *S* has been computed, the list of primes is returned, consisting of the primes less than the square root of *n* followed by the elements of *S* excluding 1. Thus the primes not greater than 100 are 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89 and 97.

This version of Pritchard’s sieve has time complexity and space complexity both equal to *O*(*n*), where the standard sieve of Eratosthenes has time complexity *O*(*n* log log *n*). The improvement comes from the fact that Pritchard’s sieve strikes each composite only once, whereas Eratosthenes’ sieve strikes each composite once for each of its distinct prime factors; for instance, Eratosthenes’ sieve strike 15 twice, one for the factor of 3 and once for the factor of 5. But despite the improvement in the asymptotic complexity, Eratosthenes’ sieve is fast because its inner loop consists only of addition, while Pritchard’s sieve is slower because its inner loop consists of a subtraction to compute the gap in the wheel, a multiplication to extend that gap by the current sieving prime, and an addition to add the gap to the previously-stricken element. Thus, in practice, Eratosthenes’ sieve is faster than Pritchard’s.

Your task is to write a program to compute the primes not greater than *n* using Pritchard’s wheel sieve. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

If I understand correctly, the wheel for 2, 3, 5, 7 is [10 2 4 2 4 6 2 6 4 2 4 6 6 2 6 4 2 6 4 6 8 4 2 4 2 4 8 6 4 6 2 4 6 2 6 6 4 2 4 6 2 6 4 2 4 2 10 2]. However, in this paper http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf, the wheel starts from 2: [2 4 2 4 6 2 6 4 2 4 6 6 2 6 4 2 6 4 6 8 4 2 4 2 4 8 6 4 6 2 4 6 2 6 6 4 2 4 6 2 6 4 2 4 2 10 2 10].

Why the difference?

There is an error here The second time through the loop, p = 7, the gaps are 7−1=6, 11−7=4, 13−11=6, 17−13=4, and 19−17=2, and the numbers that are stricken are 7, 7+6×7=49, 49+4×7=77, and 77+2×7=91, leaving S4 = {1 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97}.

i.e 13-11 is 2

Thanks for the algorithm!

@Giorgio Valoti, the difference is that one of the wheel sets has been rolled one position from the other as usually the wheels are used starting from their next wheel position, as is W0 starts at 2, W1 starts at 3, W2 starts at 5, etc.- the wheel excluding 7 (W4 by my definition) starts at 11.

The algorithm is interesting, but really it is just a wheel factorized Sieve of Eratosthenes with less wheels used for smaller sieve ranges and larger ones used for bigger sieve ranges in order to keep the O(n) performance: one may as well use a constant maximum practical sieve size such as 2/3/5/7/11/13/17/19 (W8 by my definition) and get even better performance for small ranges, the same performance when this algorithm would call for this size of wheel anyway (at about 200 million to quite a few billion) and accept that performance above that point will be a little slower given that one doesn’t really want to deal with a huge gap table.

As to “worse performance than the Sieve of Eratosthenes”, that isn’t necessarily true, or at least not by much: I suspect that your implementation is slow because you persist in using lists to store the gaps/wheel primes and the set of candidates to be culled of composites: lists are very handy but the bane of performance (and consume memory since as well as the actual data per element, they also have to pointer link to the next element of 4 bytes/8 bytes for 32/64 bit systems, respectively). Of the three operations you mention, 1) you don’t have to calculate gaps per culling operations if you store the wheel as gaps (preferably in an array for faster access rather than a list), 2) integer multiplies aren’t expensive for a modern CPU and can be streamed to not take much more time than the other operations performed in parallel, and 3) integer additions are done in any case and take almost zero average time for a modern CPU.

The most practical sieves such a (C) primesieve fill the array with a precull using a pattern of the large wheel as above and then do further culling using only the W4 wheel of 48 “hit” positions out of the wheel range of 210; this seems to maximize efficiency and trade-offs between use of memory including optimizing the use of CPU caches vs. execution speed. In order to keep the inner loop simple and avoid having to jump by gaps, the ‘hit” modulos are often separated out so that the innermost loops can do simple addition offsets for their culling (jumping by even numbers of wheels), with separate innermost loops running for each modulo for each base prime. This can be done in any language, including Scheme.

for instance, see Python 2, in this case using only a 2/3/5 wheel and no preculling, but that would be fairly easy to add as would increasing the size of the wheel, here the wheel is given as constant literals but computing the wheel isn’t hard in any language:

The whole process runs better if you use the sequence of products of the first n-primes as your bases. There really is no point in venturing outside this since the totient is minimal at those values and all primes less than any of them will lie in the reduced residue system which can be constructed easily by enumerating over the r.r.s. of its predecessor from zero to p-1.

[…] Pritchard’s Wheel Sieve […]

I have taken your PY code and have tried to make it work for values other the 8 primes and Wheel Size 30, by changing the value 8 to NP and the value 30 to NX. Also some additional changes such as changing the I <>3 shifts to division by 8 or NP in t=my modified code) . The I & 7, I change to I % NP. I aslo made a few simple changes to print some intermediate information. I have included my PY code in this email. It works OK when I set the values of NP = 8 and NX = 30. However, when I try to extend the code to consider additional primes beyond 31, to 37 and 41 – changing NP = 10 and NX= 30 and also changing “gaps” and “ndxs” to account for the additional primes, it fails miserably—see the code below. I get many additional values that are not prime and miss a few that are. I believe the issue is in the code fragment:

buf[c::p8] = [0] * ((lmtbf – c) // p8 + 1)

In the case where NP = 10 the p8 value would be p*NP. I can’t figure it out. If you find a bug or two I wouldn’t be surprised. You seem to be an expert on the sieving processes, maybe you can help?

My PY code—

#!/usr/bin/python

def primes235(limit) :

print “\nlimit %20d\n” % limit

#yield 2; yield 3; yield 5

if limit limit+1 : break

#for i in xrange(lmtbf – 6 + (ndxs[(limit – 7) % 30])): # adjust for extras

# if buf[i]: yield (30 * (i >> 3) + modPrms[i & 7])

#print “buf[i] %20d\n” % buf[i]

return [2,3,5] + [(NX * (i//NP) + modPrms[i % NP]) | 1 for i in xrange(lmtbf – 6 + (ndxs[(limit – 7) % NX])) if buf[i]]

“””***************************************************************************************

A test driver.

***************************************************************************************”””

if __name__ == ‘__main__’ :

#for p in primes235(1000000000) :

# if p > 999999900 and p < 1000000000: print(p)

p = primes235(2310)

print "\n"

print p

print "\n\n Prime count: %d\n" % len(p)

k = 1;

for i in range (len(p)) :

print k, p[i]

k = k + 1

@Peter: You don’t seem to understand how a prime wheel works. A 2,3,5-wheel uses the totatives of 2*3*5=30 to determine the stops of the wheel. The next larger wheel is not 37 or 41, it’s a 2,3,5,7-wheel with a circumference of 2*3*5*7=210. To learn more about prime wheels, look at my Wheel Factorization exercise.

@Peter, as programmingpraxis says, you don’t seem to understand factorization wheels. For the 2,3,5,7 wheel, “modPrms” will be the totients from 11 up to 220 excluding those numbers evenly divisible by 2,3,5, or 7 for a total of 48 entries, “gaps” will be two times 48 elements long, and “ndxs” will be 210 elements long; these can be auto generated by little programs. The resulting program needs adjustment for almost every line as to the factors of 48 and 210 rather than 8 and 30, and the resulting program won’t gain as much performance as expected because there will need to be divisions by 48 rather than just right shifts by 3 to effect the division by 8. For large ranges, this wouldn’t be the way to write this anyway, as using the “one large buffer array” won’t be very efficient and a page segmented approach would be much better, also would not require a fixed upper limit bound, although the resulting program would be a little more complex.

For my own interest I generalized Willy’s code to take the number of initial primes as an argument:

[python]

def primesk(k, limit):

”’Takes the number of inital primes k, and generates primes up to limit”’

inital = [2, 3, 5, 7, 11, 13]

if k <= 0 or k >= len(inital):

print(f”k must be in range 0 to {len(inital)}”)

sys.exit(1)

# return the initial primes

s = 0

prms = []

while s < k:

p = inital[s]

prms.append(p)

if limit < p:

return

yield p

s = s+1

# setup

m = reduce(lambda x, y: x

y, prms) # the product of the initial primesgaps[i] for i in range(lenModPrms)]p = inital[s]

modPrms = [x for x in range(p, p+m) if all([x % i != 0 for i in prms])]

m0 = modPrms[0]

lenModPrms = len(modPrms)

gaps = [modPrms[i+1]-modPrms[i] for i in range(lenModPrms-1)]

gaps.append(m-modPrms[-1]+modPrms[0])

gaps = gaps * 2 # 2 loops for overflow

ndxs = [item for sublist in [[i]

for item in sublist]

if debug:

print(f”prms={prms}, m={m}, modPrms={modPrms}, gaps={gaps}, ndxs={ndxs}”)

# integral number of wheels rounded up

lmtbf = (limit + m – m0) // m * lenModPrms – 1

lmtsqrt = (int(limit ** 0.5) – m0)

lmtsqrt = lmtsqrt // m * lenModPrms + \

ndxs[lmtsqrt % m] # round down on the wheel

if debug:

print(f”lmtbf={lmtbf}, lmtsqrt={lmtsqrt}”)

buf = [True] * (lmtbf + 1)

for i in range(lmtsqrt + 1):

if buf[i]:

ci = i % lenModPrms

p = m * (i // lenModPrms) + modPrms[ci]

s = p * p – m0

p2 = p * lenModPrms

if debug:

print(f”ci={ci}, p={p}, p2={p2}”)

for j in range(lenModPrms):

c = s // m * lenModPrms + ndxs[s % m]

buf[c::p2] = [False] * ((lmtbf – c) // p2 + 1)

if debug:

print(f”s={s}, c={c}, buf[{c}::{p2}]={buf[c::p2]}”)

s += p * gaps[ci]

ci += 1

# adjust for extras

end = lmtbf – (lenModPrms – 1) + (ndxs[(limit – m0) % m])

if debug:

print(f”end={end}”)

for i in range(end+1):

if buf[i]:

p = m * (i // lenModPrms) + modPrms[i % lenModPrms]

if debug:

print(f”i={i}, p={p}”)

yield p

[/python]

Reposting with (hopefully) proper formatting:

[…] I have been looking for fast and scalable implementations. The fastest Python version I found is here in a comment by Willy […]

I owe you my gratitude for covering my algorithm. Thanks! This was the only significant mention of it I could find on the web. However, the version you have presented doesn’t show it in its best light. I’d refer those interested to the Wikipedia page https://en.wikipedia.org/wiki/Sieve_of_Pritchard. The standard implementation outlined there was designed to show the sublinear running time as clearly as possible; a detailed code animation for N=150 is given at https://www.youtube.com/watch?v=GxgGMwLfTjE. You give the strong impression that the classic version of the sieve of Eratosthenes (so without wheels or segmentation) is necessarily faster than the wheel sieve. This is not true. A more than competitive fast implementation of the wheel sieve in C++ is given at https://github.com/paulpritchard/dynamic_wheel_sieve. Finally, some minor corrections. Although I confess to being Australian, I was a computer scientist rather than a mathematician, never worked for IBM, and discovered the wheel sieve in 1979.

[…] 我讨论了一个简单版本的Pritchard的筛子在 我的博客. […]

[…] following strip was modified to support python 3 and is taken from a comment left by Willy good on the forum of programming praxis. The general conception behind this code is the wheel sieve of […]

[…] following is an excerpt taken from Willy Good in a comment left by him regarding culling operations and wheel sieve implementations. As to […]

[…] found a pure Python 2 prime generator here , in a comment by Willy Good, that is faster than […]