## Sieve Of Atkin

### February 12, 2010

Our solution follows the one in *Wikipedia*:

`(define (atkin limit)`

(let ((v (make-vector (+ limit 1) #f)))

(define (flip k)

(vector-set! v k

(not (vector-ref v k))))

(do ((x 1 (+ x 1))) ((> (* x x) limit))

(do ((y 1 (+ y 1))) ((> (* y y) limit))

(let* ((n (+ (* 4 x x) (* y y))) (nmod12 (modulo n 12)))

(if (and (<= n limit) (member nmod12 '(1 5))) (flip n)))

(let* ((n (+ (* 3 x x) (* y y))) (nmod12 (modulo n 12)))

(if (and (<= n limit) (= nmod12 7)) (flip n)))

(let* ((n (- (* 3 x x) (* y y))) (nmod12 (modulo n 12)))

(if (and (> x y) (<= n limit) (= nmod12 11)) (flip n)))))

(do ((n 5 (+ n 1))) ((> (* n n) limit))

(if (vector-ref v n)

(do ((k (* n n) (+ k (* n n)))) ((> k limit))

(vector-set! v k #f))))

(let ((ps '(3 2)))

(do ((n 5 (+ n 1))) ((> n limit) (reverse ps))

(if (vector-ref v n) (set! ps (cons n ps)))))))

This implementation isn’t particularly fast. Dan Bernstein, co-creator of the sieve, has a screamingly fast implementation, much faster than any Sieve of Eratosthenes, at http://cr.yp.to/primegen.html. You can run our implementation at http://programmingpraxis.codepad.org/WAgoUQGS.

The equations that are like: x² + y² = n

should be: x² + y² = k

Thanks. Fixed.

According to the Wikipedia page the equation x^2 + y^2 = k should read 4*x^2 + y^2 = k. Your code is also using the factor 4.

Fixed again. Thanks.

Tightened up the loop limits a little bit. It’s about 50% slower than a plain sieve on getting the primes less than a million (0.96 sec vs. 0.63 sec).

escucho maravillas de la criba de atkin pero todavia no he visto un codigo que me funcione yo hice otro tipo de criba qe trabaja diferebte y quiero comparar yo saco hasta 600 millones la saco en 14 seg en visual c pero no se si esta criba es mejor o peor, trabajo con una lenovo n200

This is such a naive implementation of the old pseudo code from the Wikipedia article (since updated with a version of pseudo code that is much more true to the Atkin algorithm). Using the simplification of modulo 12 rather than modulo 60 it adds about a constant factor of 23.5% to the work, but worse is that, due to using a range of x and y variables all the way to the square root of the range, the performance isn’t even O(n) as it spends increasing amounts of time with increasing range just calculating quadratic equations that it can’t use and are rejected by the various conditions including the time expensive modulo operations. This old pseudo code algorithm will never beat even an odds only Sieve of Eratosthenes due to this huge waste of computing power, let alone a maximally wheel factorialized true SoE. Even then, neither will be all that fast as with the huge sieve buffer algorithm the memory access time gets to be a bottleneck once the buffer sizes exceed the CPU cache sizes (although one won’t notice this with interpreted languages such as Python which have loop times much greater than the memory access times).

Even the new pseudo code at Wikipedia still wastes almost half of it’s operations due to redundant calculations that are rejected by the modulo conditions, but at least this is a constant factor so we get back our O(n) performance; with the large array algorithm, it is quite easy to correct this by using an intermediate loop that stops almost all the redundant modulo rejected computation – when this is done, the maximally factorized SoE is still slightly faster than the SoA to sieving ranges up to just over a billion and then the SoA gradually pulls ahead by up to 20% at about 60 million, at which point most operating systems and languages refuse to give us more memory (the 2 Gigabyte limit even for 64 bit systems).

I’ve written versions for the large memory array (bit packed) SoA versus the maximally wheel factorized (also bit packed) SoE and have found that both run at about the same time at a range of one billion at just a few seconds each (about 30 CPU clack cycles per operation).. A page segmented version of the SoE runs at not much more than a half second (less than 5 CPU clock cycles per operation), whereas Bernstein’s page segmented, highly tuned ‘C’ SoA runs at about 0.8 seconds, both on the same machine (SoA has more average CPU cycles per operation due to the difficulty in implementing a page segmented algorithm, and this difficulty increases more quickly with range than for the SoE). So no, the Sieve of Atkin isn’t “screamingly fast” and in fact gets rather slow at the large sieve ranges where it is supposed to excel.

In fact, Bernstein “held back” his reference implementation of the Sieve of Eratosthenes against which he compared, as there are signs in his code that he at least thought of applying a pre-culled-of-small-primes wheel pattern to the page segment buffers, which would have made it faster than the SoA for all competing ranges. He limited the SoE to the same wheel factorization as that of the SoA; the factorization of the SoE can be increased much above that whereas the SoA can’t as the wheel factorization is set by the algorithm.

I won’t post the code here as there is a more up to date exercise at: https://programmingpraxis.com/2010/02/19/sieve-of-atkin-improved/.