Sieve of Eratosthenes
February 19, 2009
Over two millenia ago, Eratosthenes, who calculated the circumference of the earth, the distance to the Sun and the tilt of the Earth’s axis, developed a system of latitude and longitude, and invented the leap day, created a systematic method to enumerate the prime numbers that is still in use today. Eratosthenes was born in Cyrene (presentday Libya), lived from 276 B.C. to 194 B.C., and spent most of his life in Alexandria, Egypt, where he was the second Chief Librarian of the Great Library, succeeding Apollonius of Rhodes; he was a good friend of Archimedes.
The Sieve of Eratosthenes starts by making a list of all the numbers up to a desired maximum; we’ll illustrate the method by calculating the prime numbers through thirty:
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Now take the first number on the list, 2, and cross off every second number:
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
(Although it may not be obvious, the number 4 is crossed off the list; in some fonts, the crossbar of the 4 coincides with the strikethrough bar.) Next, take the next number on the list that isn’t crossed off, 3, and cross off every third number; some of them have already been crossed off:
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Repeat that last step for the next uncrossed number on the list, 5:
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
And so on, each time crossing off all multiples of the next uncrossed number on the list. The list of prime numbers are all those that haven’t been crossed off:
2 3 5 7 11 13 17 19 23 29
This method is called a sieve because it sweeps through a range of numbers, with each prime number, as it is discovered, blocking all its multiples from falling through as prime numbers. The sieve admits several optimizations. First, only odd numbers are considered, since the initial sifting crosses off all the even numbers except 2, which is handled separately. Second, crossing off starts at the square of the number being sifted, since all smaller primes have already been crossed off by previous steps of the sieve; for instance, sifting by 3 starts at 9, since 6 was already crossed off when sifting by 2. Third, sifting stops at the square root of the maximum number in the sieve, since any nonprimes larger than the square root must have already been crossed off at previous levels of the sieve; thus, in the above example there is no need to sieve on the prime number 7, or any larger prime number, since the square of 7 is greater than 30, which is the largest number in the list.
Write a function that takes a single argument n and returns a list of prime numbers less than or equal to n using the optimized sieving algorithm described above. Apply the function to the argument 15485863 and count the number of primes returned.
My haskell solution : http://codepad.org/OMywU3lJ
In C. http://programmingpraxis.pastebin.com/f6f889fa4
[…] this is the second Programming Praxis exercise I’ve done, and this one was slightly more difficult to get right, or at least test. […]
I’m confused by one aspect of the above solution. I don’t understand why startj is defined as (+ (* 2 i i) (* 6 i) 3).
I understand why p is (+ i i 3) – that’s the sequence of odd numbers, starting at 3. But what relationship does that have with startj?
Can anyone enlighten me?
That’s the implementation of the second optimization.
Suppose you have already sieved 2, 3, and 5 and are now beginning to sieve 7. Start by adding 7 to the list of primes. Then 14, but that has already been sifted out by 2; likewise, 21 has been sifted out by 3, 28 has been sifted out by 2, 35 has been sifted out by 5, and 42 has been sifted out by 2 (and also 3, but that doesn’t matter). The first multiple of 7 that gets sifted out is 49, which is 7 times 7. In general, when sifting by n, sifting starts at nsquared because all the previous multiples of n have already been sieved.
The rest of the expression has to do with the crossreference between numbers and sieve indexes. There’s a 2 in the expression because we eliminated all the even numbers before we ever started. There’s a 3 in the expression because Scheme vectors are zerobased, and the numbers 0, 1 and 2 aren’t part of the sieve. I think the 6 is actually a combination of the 2 and the 3, but it’s been a while since I looked at the code, so I’ll leave it to you to figure out.
Good question!
The number 2 is handled specially, so the vector v looks this:
index 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
number 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
When you want to begin sieving 7, the first number you sift out is 7 × 7 = 49. The number 7 is at index i = 2 in the vector, and the number 49 is at index startj = (+ (* 2 i i) (* 6 i) 3) = 23 in the vector.
So, it seems we can alternatively define startj as:
(quotient ( (expt p 2) 3) 2)
That is,
– psquared, to give us the number to start sifting at
– shifted by 3 because we ignore the first 3 numbers (0, 1 and 2)
– divided by 2 to get the index, because we ignore every second number.
Is that correct?
The straightforward implementation in C, storing all the odd numbers in an array and then zeroing all nonprimes. Not being much of a programmer, I’ll admit I’m amazed my laptop can compute 1,000,000 primes in under half a second… And I suppose the code could even be made to run twice as fast on my c2duo by having two threads sifting through the array.
…turns out the multithreaded version does not work, at least the way I’ve implemented it. I’ve effectively split the seek_primes in two threads, one sifting for (i = 0; i < n; i+=2), and the other one for (i = 1; i < n; i+=2).
It appears my first thread finds nonzero elements in the numbers[] array in seek_primes, but there is no guarantee these same elements are not going to be zeroed by the second thread later.
In fact up to 100 I find 3 primes too many, and up to 1000 about 38 (and of course the actual number found might even be different from one run to the next).
My python solution is at http://pastebin.com/Qup0TuFu.
I generated all primes from the integer list of 0100000. Since the last prime in that list is 99991, using the square of the primes method, it is relatively easy and fast deriving the prime sequence of most numbers. I think I began having problems around the number of primes for the integer 9990000000.
Here’s my Clojure version:
(require ‘[clojure.contrib.math :as math])
;; faster for dropping a few leading nils than (into [] (dropwhile nil? sieve))
(defn dropwhilenil? [sieve] (if (first sieve) sieve (recur (subvec sieve 1))))
;; returns sieve with all multiples of (first sieve) set to nil,
;; leading nils removed
(defn removeeverynth [sieve]
(let [countsieve (count sieve), n (first sieve)]
(loop [s sieve, i 1, ni n]
(if (> ni countsieve)
(dropwhilenil? (subvec s 1))
(recur (assoc s ni nil), (inc i), (* n (inc i)))))))
(defn eratosthenes [r]
(loop [s (into [] (range 2 r)), primes [1] ]
(if (> (peek primes) (math/sqrt r))
(concat primes (remove nil? s))
(recur (removeeverynth s), (conj primes (first s))))))
(map #(count (time (eratosthenes %))) [10 100 1000 10000 100000 ])
Simple and simple tailrecursive solutions, but missing the “start eliminating at (* p p)” optimization:
Did you calculate the number of primes less than 15485863? I think it will take a while.
Yes, it takes about fifty times longer than the vectorbased solution above, mostly in GC.
Using Petite Chez Scheme for profiling:
(time (display (length (…))))
6 collections
3021 ms elapsed cpu time, including 248 ms collecting
3022 ms elapsed real time, including 247 ms collecting
110325408 bytes allocated, including 40188320 bytes reclaimed
(time (length (tailerat (…) …)))
2767 collections
167705 ms elapsed cpu time, including 103201 ms collecting
167817 ms elapsed real time, including 103268 ms collecting
23602922896 bytes allocated, including 24373515936 bytes reclaimed
Moving to an inplace filtering of the list (see below) brought this time down by a factor of ten, though it’s still slower than the vector version:
(time (length (tailerat (…) …)))
105 collections
11498 ms elapsed cpu time, including 5577 ms collecting
11503 ms elapsed real time, including 5590 ms collecting
1135547536 bytes allocated, including 3928896 bytes reclaimed
This was achieved by replacing filter with the following inplace filter! routine (and staying in Chez Scheme, where setcdr! is available).
A more general filter!, implemented as syntax! so it could consistently alter the list in place but not depend on returning a list would be interesting, but isn’t needed for this — this implementation returns the result of modifying the list in place, but the original value passed in is only modified from the first value not matching pred? on…
You’re missing the point. The Sieve is fast because it is based on addition. You are using division (the modulo function). Your algorithm is not the Sieve of Eratosthenes.
Oh, I see… let me revisit…
Naive attempt at moving to vector and using addition over vector indices gives a slight speedup, looking further:
(time (length (primes 15485863)))
4 collections
10296 ms elapsed cpu time, including 366 ms collecting
10300 ms elapsed real time, including 366 ms collecting
155888768 bytes allocated, including 124098560 bytes reclaimed
I got curious as to how big a difference the additonvsmodule distinction is on modern hardware (historically, it has been very large, as you point out).
I used the following code to run 10^8 integer additions, modulo operations, and modulo operations with a poweroftwo modulus (another traditional distinction in performance).
I may well be missing something big, but the output shows about a 25% difference in speed between additions and modulo operations. This obviously adds up, but looking at the above code, it seems to be small compared to other differences (particularly, the various operations which control how many of whichever operation are carried out, and which reduce the infrastructural cost of the algorithm).
(In the code below, I use three vectors in an attempt to minimize the effect of cache readahead on subsequent vectormaps during earlier testing with a smaller n.)
my OCaml version. Because 15000000 is bigger than the bound for basic array, i had to use Bigarray.
[…] ancient Sieve of Eratosthenes that computes the list of prime numbers is inefficient in the sense that some composite numbers are […]
This was my first Factor program I wrote a few months ago, to learn the language. Updated it with the optimization to start sieving at the square of the prime being sieved. (Didn’t know about that trick before :)
Session:
[…] As mentioned in my last article, I started doing some challenges from the Programming Praxis website. And here comes my PHP solution to the second challenge. […]
I’m working my way through old exercises; here’s my submission.
Python was a bit slow for me, and I’m trying to learn Common Lisp, so here’s
another version. I was only able to figure out how to get 2/3 of the optimizations
working, but it’s still absurdly fast compared to my last submission.
I just started learning ruby. Here is my first try.
def prime_numbers(n)
return [] if n < 2 # no prime numbers before 2
max = (n**0.5).to_i; # define until when we need to loop
primes = []
# optimization 1: checking only odd numbers
primes = 3.step( n, 2 ).to_a
primes.each do p
next unless p
# optimization 3: checking until the root of the maximum number
break unless p <= max
# optimization 2: checking starts at the square
((p**2)/21).step( primes.length1, p) do i
primes[i] = nil
end
end
primes.insert(0, 2).delete(nil) # add the prime number 2 and remove false numbers
primes
end
puts prime_numbers(15485863).length
[\sourcecode]
Another clojure version,
because the first one die on me with OutOfMemoryError to compute (eratosthenes 15485863)
An imperative OCaml Batteries
BitSet
based version, without multiplications, only additions, subtractions and shifts:My C++ solution where a “crossing out” means “made 0”
http://ideone.com/fdmux
Here’s a Matlab solution:
It’s a tiny bit faster than the nearlyidentical Matlab implementation, called primes.
A rather simple version of a Python solution.
def primes_sieve(limit):
limit += 1
not_prime = [False] * limit
primes = []
for i in xrange(2, limit):
if not_prime[i]:
continue
for j in xrange(i*2, limit, i):
not_prime[j] = True
primes.append(i)
return primes
if __name__ == "__main__":
print len(primes_sieve(15485863))
I tried another solution in C++ using 2 threads, one running upwards on the numbers, the other running downwards. They stop if the reach another.
The solution is correct, but the performance decreases severly.
Here is my code:
#include
#include
#include
#include
#include
using namespace std;
const unsigned long MAX_NUMBER = 15485863;
vector numbers(MAX_NUMBER+1);
pthread_mutex_t mutex1 = PTHREAD_MUTEX_INITIALIZER;
unsigned long l1,l2;
// thread specific arguments
struct thread_data
{
unsigned int thread_id;
unsigned long n; // data to sieve
bool bw; // backward or forward search
};
struct thread_data thread_data_array[2];
void init_numbers(vector * numbers);
void *prime_sieve(void *arg);
void init_numbers(vector * numbers)
{
unsigned long i;
(*numbers)[2]=true;
for (i=3;ibw))
{
while (l1 <= l2)
{
if (numbers[l1] != false)
{
c = l1*l1;
pos_move = (l1 << 1);
while (c n)
{
pthread_mutex_lock(&mutex1);
numbers[c]=false;
pthread_mutex_unlock(&mutex1);
c += pos_move;
}
}
pthread_mutex_lock(&mutex1);
l1++;
pthread_mutex_unlock(&mutex1);
}
}
else
{
while(l2 >= l1)
{
if (numbers[l2] != false)
{
c = l2*l2;
pos_move = (l2 << 1);
while (c n)
{
pthread_mutex_lock(&mutex1);
numbers[c]=false;
pthread_mutex_unlock(&mutex1);
c += pos_move;
}
}
pthread_mutex_lock(&mutex1);
l2–;
pthread_mutex_unlock(&mutex1);
}
}
pthread_exit(NULL);
}
unsigned long get_primes(vector numbers, unsigned long n)
{
unsigned long primes=0;
for(unsigned int i=0;i<=n;i++)
{
primes+=numbers[i];
}
return primes;
}
int main(int argc, char *argv[])
{
pthread_t threads[2];
int rc;
unsigned long n = MAX_NUMBER;
init_numbers(&numbers);
l1 = 2;
l2 = (int)(sqrt(n))+1;
thread_data_array[0].thread_id = 0;
thread_data_array[0].n = MAX_NUMBER;
thread_data_array[0].bw = false;
rc = pthread_create(&threads[0], NULL, prime_sieve, (void *) &thread_data_array[0]);
thread_data_array[1].thread_id = 1;
thread_data_array[1].n = MAX_NUMBER;
thread_data_array[1].bw = true;
rc = pthread_create(&threads[1], NULL, prime_sieve, (void *) &thread_data_array[1]);
void* retval;
pthread_join(threads[0], &retval);
pthread_join(threads[1], &retval);
cout << get_primes(numbers, n) << endl;
}
Using python:
from math import *
def sieve2(n):
index = 0
numberList = range(3,n+1,2)
while numberList[index] = sqrt(n):
break
return [2] + numberList
^


Didn’t post correctly
https://github.com/gcapell/ProgrammingPraxis/blob/master/02_sieve_of_eratosthenes/sieve.go
compact ruby implementation:
A Scala solution:
Previous post did not copy properly:
#include
#include
#include
#include
#include
unsigned LIMIT 1000000000;
int main()
{
int all[LIMIT],i,j=2,k=0,lim;
float start,stop;
printf(“Enter the limit(excluding 0)”);
scanf(“%d”,&lim);
for(i=0;i<lim;i++)
{
all[i]=i;
}
all[0]=0;
all[1]=0;
stop=(unsigned)sqrt(lim);
while(j<stop)
{
start=0;
if(all[j]!=0)
{
start=all[j]*all[j];
if(all[start]%all[j]==0)
{
all[start]=0;
start++;
}
}
j++;
}
for(i=2;i<lim2;i++)
{
if(all[i]!=0)
{
printf(" %d",all[i]);
}
}
getch();
return 0;
}
Corrected solution :) very simple works like a charm
#include “stdafx.h”
#include
#include
#include
#include
#include
int main()
{
int all[100],i,j=2,k=0,lim,start;
float stop;
printf(“Enter the limit(excluding 0)”);
scanf(“%d”,&lim);
lim=lim+1;
for(i=0;i<=lim;i++)
{
all[i]=i;
}
all[0]=0;
all[1]=0;
stop=sqrt((float)lim);
while(j<stop)
{
start=0;
if(all[j]!=0)
{
start=all[j]*all[j];
while(start<=lim)
{
all[start]=0;
start=start+j;
}
}
j++;
}
for(i=2;i<=lim2;i++)
{
if(all[i]!=0)
{
printf(" %d",all[i]);
}
}
getch();
return 0;
}
now if any of you guys take a look here how can you make this work for a billionth using the constant did not give me result,are linked list only solution???
A memoryhungry Python solution: https://gist.github.com/gmcclure/5475130
C# solution:
public class Primer
{
public int[] GetPrimes(int max)
{
if (max < 0) max = max * 1;
if (max 0)
{
for (int i = start * start; i < primeIndexes.Length; i += start)
{
primeIndexes[i] = true;
}
start++;
}
List primes = new List(max / 10);
for (int i = 1; i primeIndexes.Length)
return 0;
for (int i = minIndex; i < primeIndexes.Length; i++)
{
if (primeIndexes[i] == false)
return i;
}
return 0;
}
}
Just over a second on Core2 Duo.
Hm, it didn’t format properly, here’s a gist:
Hi, just learn python and still beginner level, so I don’t know
def sieve(number):
array = [False,False] + [True] * (number – 1)
result = []
prime_found = 0
index = 2
while index <= number:
if array[index]:
prime_found = index
result.append(index)
index *= index
while index <= number:
array[index] = False
index += prime_found
index = prime_found + 1
else:
index += 1
return result
But I remember in java, an array of boolean is really cheap, only 1 bit per element. and if I switch so False is Boolean while True is not or not examined yet, this will probably safe some space and array creation time.
Sorry, Just disappointed with the way my code shown here, so I just try this to see how to format my code appropriately:
def just_a_test(‘my apology’):
print ‘Again, really sorry’
Java solution:
jUnit Tests:
Output on 15485863 took a while, although I didn’t time it. Does anybody have some performance suggestions, my java’s a bit rusty.
My solution in C using all optimizations. It prints primes as it finds them, and then once it hits the square root mark prints any remaining untouched variables in the array. Chews through N=15485863 in about 3.3 seconds.
In lua, not efficient, but short:
function sieve(n,s)
for i=2,math.ceil(math.sqrt(n)) do
for j=2*i,n,i do s[j]=false end
end
return s
end
s=sieve(15485863,{false,false})
for i=1,15485863 do if s[i]==nil then print(i) end end
Well, it “only” takes only 5.5s for 15485863 (and 262MB RAM)…
function sieve(n,s)
for i=2,math.ceil(math.sqrt(n)) do
if (i==2) or (i%2==1) then for j=i*i,n,(i%2+1)*i do s[j]=false end end
end
return s
end
s=sieve(15485863,{false,false})
for i=1,15485863 do if s[i]==nil then print(i) end end
(a few improvements, now it only takes 2.3s [and 1.5s for printing…])
Just stumbled across this page. I have written a multithreaded nthprime algorithm in C# (it finds the millionth prime in 7 milliseconds using 16 threads on a 4 core – 8 hyperthreaded core – i7 3770s running at 3.1 GHz while using only 405 KB for the sieve) and was looking for information on how to convert it to racket. I’m new to racket and still trying to wrap my head around the functional approach to the sieve.
The various examples in functional languages should give me what I was looking for. Thanks to everyone for sharing.
If anyone is interested in the multithreaded algorithm, let me know (I will be notified of follow up comments). Alternatively, as it appears that this page is all about coding challenges, how about a multithreaded sieve challenge? I also want to convert my algo to c and compile it with clang to see what kind of boost clang’s optimers might provide vs. compiling with gcc (and vs. my visual studio compiled c#).
Note that my algorithm is an nthprime algorithm – the inverse of what was requested on this page. In other words, given the input 1000000, my algo outputs 15485863.
There are three reasons I want to convert it to racket:
1. To use racket’s native big integers (the C# version begins to return clipped answers somewhere in the neighborhood of the 240 millionth prime).
2. To see what kind of performance penalty racket’s native big integers incurr (a simple tail recursive factorial in wicked fast in racket – faster than any big integer factorial I can find in c#).
3. It’s fun to learn new languages.
This is my PHP version that calculates half a million before the terminal crashes, don’t have a local copy of PHP to test with at the minute so using a remote low powered box, had to chunk the number sequences as a result of the low performing box.
In Rust: http://xojoc.pw/programmingpraxis/sieveeratosthenes.html
Python / Numpy solution
[…] start at p^2 instead of p and the outer loop can stop at the square root of n. I’ll leave the optimized version for you to work […]