## Counting Primes Using Legendre’s Formula

### July 22, 2011

We will need functions to compute the nth prime, for the calculation of φ, and to compute π(x) for small values of x, to terminate the recursion in the formula for π. We do both by using the sieve of Eratosthenes to create a vector of the primes up to some convenient limit (the vector has an additional nonsense value in the 0th position so we don’t have to keep subtracting 1 whenever we index the vector), then just index the vector to compute the nth prime and use binary search to compute π:

(define ps (list->vector (cons -1 (primes #e1e5))))

(define (p n) (vector-ref ps n))

(define (pi n)   (let loop ((lo 1) (hi (- (vector-length ps) 1)))     (let ((mid (quotient (+ lo hi) 2)))       (cond ((<= hi lo) mid)             ((< n (p mid)) (loop lo (- mid 1)))             ((< (p mid) n) (loop (+ mid 1) hi))             (else mid)))))

The φ function is computed according to the recurrence equation. We memoize the function because small results appear again and again in the recursive calculations:

(define phi   (let ((memo (make-hash (lambda (x) (+ (* 1000 (car x)) (cadr x))) equal? #f 9973)))     (lambda args       (let ((x (car args)) (a (cadr args)) (t (memo 'lookup args)))         (cond (t t) ; return memoized value               ((= a 1) (let ((t (quotient (+ x 1) 2))) (memo 'insert args t) t))               (else (let ((t (- (phi x (- a 1)) (phi (quotient x (p a)) (- a 1)))))                       (memo 'insert args t) t)))))))

Now the calculation of π is simple:

(define (prime-pi n)   (let ((a (pi (isqrt n))))     (+ (phi n a) a -1)))

Here is Legendre’s calculation:

> (prime-pi 1000000) 78498

We used hash tables and the isqrt function from the Standard Prelude, and primes from a previous exercise. You can run the program at http://programmingpraxis.codepad.org/sNCV5Y0d.

Pages: 1 2

### 16 Responses to “Counting Primes Using Legendre’s Formula”

1. Graham said

Even with memoization, my Python solution didn’t finish quickly enough for me. Interestingly, (or not, as the case may be)
the following naive Haskell implementation finishes much more quickly (especially when compiled via GHC).

import Data.List (foldl')
import Data.Numbers.Primes (primes)

sieve :: Int -> [Int] -> [Int]
sieve p = filter (\x -> x mod p /= 0)

phi :: Int -> Int -> Int
phi x a = length $foldl' (flip sieve) [1..x] (take a primes) isqrt :: Int -> Int isqrt = floor . sqrt . fromIntegral primePi :: Int -> Int primePi 1 = 0 primePi x = phi x a + a - 1 where a = primePi$ isqrt x

main :: IO ()
main = print $primePi 1000000  I guess the next step is to get good enough at Haskell to figure out a way to cut the number of phi calls. 2. programmingpraxis said The reference solution finds pi(10^6) in 72ms on my machine, and pi(10^7) in about ten times that, which is still less than a second. What kind of times are you seeing? Sieving by trial division to get a list of the primes is ugly. The fastest, prettiest prime generator that I know in Haskell is by Melissa O’Neill: http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf. 3. Graham said The reference solution finds pi(10^6) in about 0.3 seconds and pi(10^7) in about 1.6 (under Petite Chez Scheme). Running the naive Haskell above with GHC’s runhaskell command (so, intepreted) takes almost 9.6 seconds for pi(10^6), while compiled with gch --make -O2 take about 0.5 seconds. My Haskell interpreted chokes when going for pi(10^7), but manages to finish in about 11 seconds when compiled. Thanks for the O’Neill paper; I think I’d come across it before, but hadn’t checked it out in depth. 4. Mike said Python 3.2 from bisect import bisect from functools import lru_cache from math import sqrt from utils import primesTo prime = list(primesTo(10000)) @lru_cache(maxsize=None) def phi(x, a): if a == 1: return (x+1)//2 else: return phi(x, a-1) - phi(x//prime[a - 1], a - 1) def pi(n): ''' return number of primes <= n >>> pi(10) 4 >>> pi(1000) 168 >>> pi(1000000) 78498 ''' if n < prime[-1]: return bisect(prime, n) else: a = pi(int(sqrt(n))) return phi(n, a) + a - 1 ##if __name__ == "__main__": ## import doctest ## doctest.testmod()  5. Graham said @Mike, your solution seems to run into the same problem my Python work did, even with Python 3.2: trying pi(10^7) causes the interpreter to crash after exceeding the recursion limit. Any thoughts? 6. Mike said @Graham, I had trouble with the stack limit as well. Evaluating the code for pi() with n = 1e7, a = 966. So, pi(1e7) = phi(1e7, 966) + 966 – 1. phi(1e7, 966) depends on phi(1e7,965), which depends on phi(1e7,964), … down to phi(1e7, 1). That’s 966 stack frames, which is close to the stack overflow limit. I suspect the python shell uses a few stack frames, but probably not enough to push pi(1e7) over the limit. I believe the culprit is the lru cache decorator for the phi() function. Looking at the source code in functools.py, when there is a cache miss, the wrapped function calls the original code for phi(). So I think that on a cache miss, two frames get pushed on the stack (one for the call to the wrapped function and one for the call to the original function). The original function may the recurse and call the wrapped function, etc. I have not tested my hypothesis. However, if I pre-populate the cache by calling pi() in a loop with gradually increasing arguments, I can calculate pi() for larger arguments without hitting the stack limit. 7. programmingpraxis said Scheme has no trouble with stack overflow because stack frames are stored on the heap, not on the stack. I don’t know enough about Python to help. Here are some random thoughts on the matter. I don’t know if any of them are useful. Setting a higher stack overflow limit only helps for a little while. I don’t know if that is possible in Python, but in any event it is not a serious solution. Does Python have some kind of lazy evaluation method? If it does, and if it uses the heap instead of the stack, maybe you could arrange to use that instead of the normal recursion stack. You could build your own stack out of cons pairs stored on the heap, and make it however big you want, instead of using Python’s built-in recursion stack. You don’t necessarily need to use the recurrence equation to calculate Legendre’s sum. See the MathWorld article for Legendre’s Formula. Specifically, equation (1) may be useful. 8. Mike said Here is a Python 3 version of phi(x,a) using a separate stack. I optimized the code a bit. –If x < prime[a], the second term in the recurrence equation is zero, and the first term in the recurrence equation is basically doing a linear search through the primes to find the largest prime that divides x. The bisect() call replaces the linear search with a binary search to find the largest prime less than x. –As stated in the problem, the recurrence terminates when a == 1. The recurrence can also be terminated when x == 1, where phi(1,a) is +1 or -1 depending on the sign the phi term would have (the variable s in the code). from bisect import bisect from math import sqrt from utils import primesTo # pad prime[] so that prime[n] is the nth prime number prime =  + primesTo(100000) def phi(x, a): if a == 0: return x stack = [(1,x,a)] value = 0 while stack: s, x, a = stack.pop() if x == 1: value += s else: if x < prime[a]: a = bisect(prime, x, 0, a) – 1 if a == 1: value += s * ((x + 1)//2) else: stack.extend([(s, x, a-1), (-s, x//prime[a], a-1)]) return value def pi(x): if x < 2: return 0 else: a = pi(int(sqrt(x))) return phi(x, a) + a – 1 There is no caching of intermediate results, so it is not very fast: _n_______time (s)_____pi____ 10^0 0.0000083810 0 10^1 0.0000455365 4 10^2 0.0001030857 25 10^3 0.0007869715 168 10^4 0.0064016516 1229 10^5 0.0681326817 9592 10^6 0.2781906639 78498 10^7 2.4777696607 664579 10^8 25.5805003150 5761455 10^9 262.7440423294 50847534 9. Mike said With formatting: from bisect import bisect from math import sqrt from utils import primesTo # pad prime[] so that prime[n] is the nth prime number prime =  + primesTo(100000) def phi(x, a): if a == 0: return x stack = [(1,x,a)] value = 0 while stack: s, x, a = stack.pop() if x == 1: value += s else: if x < prime[a]: a = bisect(prime, x, 0, a) – 1 if a == 1: value += s * ((x + 1)//2) else: stack.extend([(s, x, a-1), (-s, x//prime[a], a-1)]) return value def pi(x): if x < 2: return 0 else: a = pi(int(sqrt(x))) return phi(x, a) + a – 1 _n_______time (s)_____pi____ 10^0 0.0000083810 0 10^1 0.0000455365 4 10^2 0.0001030857 25 10^3 0.0007869715 168 10^4 0.0064016516 1229 10^5 0.0681326817 9592 10^6 0.2781906639 78498 10^7 2.4777696607 664579 10^8 25.5805003150 5761455 10^9 262.7440423294 50847534  10. David said Clojure also runs out of stack space, and just using a stack and a loop is not obvious how to memoize the intermediate results. So I took advantage of Mike’s optimization above (I admit I never would have figured out what the recurrence actually does on my own.) Uses O’Neil sieve algorithm to create enough primes to calculate pi to 1e9. (load "lazy-sieve") (def max-prime (int 1e9)) (def prime (into (vector-of :int 0) (take-while #(< %1 (int (Math/sqrt max-prime))) lazy-primes))) (defn small-pi "pi(n) for small n {n <= (last prime)}" [n] (let [pos (java.util.Collections/binarySearch prime (int n))] (if (>= pos 0) pos (- (+ pos 2))))) (defn phi [x a] (loop [stack (list [1, x, a]), n 0] (if (not (empty? stack)) (let [[sgn, x, a] (first stack), l (rest stack)] (if (= x 1) (recur l (+ n sgn)) (let [a' (if (< x (prime a)) (small-pi x) a)] (if (= a' 1) (recur l (+ n (* sgn (int (/ (inc x) 2))))) (recur (conj (conj l [sgn, x, (dec a')]) [(- sgn), (int (/ x (prime a'))), (dec a')]) n))))) n))) (defn prime-pi "Compute pi(n), upto 1 billion" [n] (if (or (< n 0) (> n max-prime)) (throw (IllegalArgumentException. "argument to prime-pi out of range."))) (if (<= n (last prime)) (small-pi n) (let [a (prime-pi (int (Math/sqrt n)))] (+ (phi n a) a -1)))) (defmacro timex "Evaluates expr and returns expr and the time it took in milliseconds." [expr] (let [start# (. System (nanoTime)), ret# ~expr] [(/ (double (- (. System (nanoTime)) start#)) 1e6), ret#])) (defn pi-table "Count primes by powers of 10" [limit] (println "n time (s) pi(n) % prime") (loop [n 1, c 0] (let [[t, pi] (timex (prime-pi n))] (printf "1e%d %11.6f %11d %5.2f%%\n" c (/ t 1000) pi (/ (* 100.0 pi) n)) (if (< c limit) (recur (* 10 n) (inc c)))))) (comment user=> (pi-table 9) n time (s) pi(n) % prime 1e0 0.000949 0 0.00% 1e1 0.000764 4 40.00% 1e2 0.000762 25 25.00% 1e3 0.000822 168 16.80% 1e4 0.000759 1229 12.29% 1e5 0.046668 9592 9.59% 1e6 0.218717 78498 7.85% 1e7 2.206287 664579 6.65% 1e8 22.933560 5761455 5.76% 1e9 234.952010 50847534 5.08% )  11. programmingpraxis said My experience is different that everyone else. I have no trouble with recursion limits, and my timings are quite fast. I just computed pi(10^9) from an initial load with no previously-cached results: Petite Chez Scheme Version 8.4 Copyright (c) 1985-2011 Cadence Research Systems   > (load "e:/prime.ss") > (time (prime-pi #e1e9)) (time (prime-pi 1000000000)) 35 collections 1014 ms elapsed cpu time, including 62 ms collecting 1010 ms elapsed real time, including 72 ms collecting 298285040 bytes allocated, including 277790944 bytes reclaimed 50847534 One second is a whole lot better than the four minutes David reported. What’s going on? 12. David said There is probably only a marginal effect if you run the scheme code with the cache empty or with previously cached results. The reason why it is so fast is that there are so many redundant calculations that don’t need to be performed when the results are cached. Without a cache, calculating phi for 1e9 will take (I don’t know, days?) Calculating phi for pi(1e6) naively without caching takes hours (and I’m not sure how long because I gave up after it was chugging for a few hours.) Mike’s optimization removes some computational dead ends, and he also figured out what the recurrence actually does, so replacing a big chunk of the recurrence with the call to binary search the prime array saves a lot of time, but probably still leaves a lot of redundant computation in place. It’s time is still much improved over trying to compute phi naively without a cache. Clojure is based on the JVM, and Java was not designed to be a functional programming language, so there’s not a whole lot of stack. I’m not sure why Python has so much problems. I worked on the Python code in the early 90’s and at that time stack frames were allocated on the heap. Because Python now has generators, there is even less reason to use the system stack for procedure calls, so it doesn’t make sense that there would be a stack limit… 13. Dana said This was fun, especially as my copy of Riesel arrived tonight. Two versions in Perl, not tested extensively. An obvious one:  #!/usr/bin/env perl use strict; use warnings; no warnings 'recursion'; use Memoize; use Math::Prime::Util qw/primes/;  my @primea = @{primes(1_000_000)}; memoize('phi'); sub phi { my ($x, $a) = @_; return ($a ? int(($x+1)/2) :$x) if $a <= 1; return 1 if$x < $primea[$a]; return phi($x,$a-1) - phi( int($x /$primea[$a-1]),$a-1); } sub pi { my $n = shift; return 0 if$n < 2; 

 my $a = pi(int(sqrt($n))); return phi($n,$a) + $a - 1; }  0.104s for 10**6 (78498), 41.1s for 10**9 (50847534). I then made a non-recursive phi:  #!/usr/bin/env perl use strict; use warnings; use Math::Prime::Util qw/primes/;  my @primea = @{primes(1_000_000)}; sub phi { my ($x, $a) = @_; return$x if $a == 0; my @add = ($x); my @sub; my $sum = 0; while ($a-- > 1) { my $prime =$primea[$a]; my @newadd = map { int($_ / $prime) } @sub; my @newsub = map { int($_ / $prime) } @add; push @add, @newadd; push @sub, @newsub;$prime = $primea[$a-1]; my $count_ones = (scalar @add) - (scalar @sub); @add = grep {$_ > $prime } @add; @sub = grep {$_ > $prime } @sub;$count_ones = $count_ones - (scalar @add) + (scalar @sub);$sum += $count_ones; }$sum += int(($_+1)/2) for @add;$sum -= int(($_+1)/2) for @sub;$sum; } sub pi { my $n = shift; return 0 if$n < 2; 

 my $a = pi(int(sqrt($n))); return phi($n,$a) + $a - 1; }  0.016s for 10**6 (78498), 11.5s for 10**9 (50847534). Math::Prime::Util::prime_count(10**9) takes under 0.5s but that’s cheating (it’s also written in C — its pure perl prime_count code is much slower). 14. danaj said Hopefully formatting will work this time (I wish there was a preview). A new version, with better phi(x,a), and also Lehmer essentially straight from from Riesel (page 22): #!/usr/bin/env perl use strict; use warnings; use Math::Prime::Util qw/-nobigint primes prime_count prime_precalc/; use Time::HiRes qw(gettimeofday tv_interval); # Get 10M primes -- enough to test to at least 10^13 Math::Prime::Util::prime_precalc(10_000_000); my @primea = @{primes(10_000_000)}; sub phi { my ($x, $a) = @_; # Some shortcuts for small x, a = 0, a = 1 return$x if $a == 0; return int(($x+1)/2) if $a == 1; return ($x > 0 ? 1 : 0) if $x <$primea[$a]; my %vals = ($x => 1 );
my $sum = 0; while ($a-- > 2) {
my $p1 =$primea[$a]; my$p2 = $primea[$a-1];
my %newvals;
while (my($v,$c) = each %vals) {
next if $c == 0; # Prune out zero counts if ($v > $p2) {$newvals{$v} +=$c;
my $sval = int($v / $p1); if ($sval > $p2) {$newvals{$sval} -=$c;
} else {
$sum -=$c;
}
}
}
%vals = %newvals;
}
die unless $a == 1; while (my($v,$c) = each %vals) {$sum = $sum +$c * int(($v+1)/2) -$c * int((int($v/3)+1)/2); }$sum;
}

sub legendre_pi {
my $n = shift; return 0 if$n < 2;

my $a = legendre_pi(int(sqrt($n)));
return phi($n,$a) + $a - 1; } sub lehmer { my$x = shift;
return prime_count($x) if$x < 1_000;
my $z = int(sqrt($x+0.5));
my $a = lehmer(int(sqrt($z)+0.5));
my $b = lehmer($z);
my $c = lehmer(int($x**(1/3)+0.5));
my $sum = phi($x, $a) + int(($b + $a - 2) * ($b - $a + 1) / 2); # Make sure these calls are fast. # 1M for 10**8, 32M for 10**10, 5600M for 10**12 prime_precalc( int($x / $primea[$a]) );

foreach my $i ($a+1 .. $b) { my$w = int($x /$primea[$i-1]);$sum = $sum - prime_count($w);
if ($i <=$c) {
my $bi = prime_count(int(sqrt($w)+0.5));
foreach my $j ($i .. $bi) {$sum = $sum - prime_count(int($w / $primea[$j-1])) + $j - 1; } } }$sum;
}

for my $e (0 .. 13) { my$start = [gettimeofday];
my $pi = lehmer( int(10 **$e) );
my $seconds = tv_interval($start);
printf "10^$e %10.6fs$pi\n", $seconds; }  Timing on my machine for 10**9 is 6.2s for Legendre, 0.12s for Lehmer. I tried Lehmer counts 10**12 in 24.9s, 10**13 in 186.2s. Pretty fantastic for Perl. I am using a Perl/XS library to get the prime list and do the small prime_counts in the Lehmer loop. In theory those could all be precalculated (MPU’s prime count is doing bit counts on the pre-sieved area to get the result). More timing:  primesieve MPU Legendre Lehmer Python-Legendre 10** 8 0.01s 0.08s 0.82s 0.02s 12.4s 10** 9 0.04s 0.5s 6.2s 0.09s 123.3s 10**10 0.4s 5.3s 47.7s 0.49s 1272.6s 10**11 4.3s 67.5s 408.0s 3.1s 10**12 52.6s 1029.9s 22.8s 10**13 642.0s 177.7s 10**14 1472.5s  primesieve is helped a lot by using 12 threads but it’s seriously fast even with only one thread (4-10x slower than multi-threaded). Both it and Math::Prime::Util are sieving the entire range (in segments) and doing fast bit counts, and both use very low memory since they run segmented. “Python-Legendre” is Mike’s code on my machine. The Lehmer results are great (IMO) for a first pass. Lots of optimization could be done on the “small” prime_count calls rather than its current method of making a big sieve and then counting all the bits up to the target, for each call. 15. David said Memoizing the phi function is the way to go. Ruby (which is slower language in execution than Clojure, generally) can compute pi(1e9) in 70 seconds, vs. 235 in Clojure. This is because there are no stack space issues here… require 'benchmark' require_relative 'lazy-sieve'$primegen = LazySieve.new
$prime =  def required_primes(min) if$prime[-1] < min then
($primegen.upto {|n| n <= min}).each {|p|$prime << p}
end
end

def small_pi(x)
lo = 0
hi = $prime.length while lo < hi do m = (lo + hi) / 2 case x <=>$prime[m]
when  0 then return m
when -1 then hi = m
when  1 then lo = m + 1
end
end
return lo - 1
end

$phi_cache = {} def phi(x, a) y =$phi_cache[[x,a]]
return y unless y.nil?

$phi_cache[[x,a]] = if a == 1 then (x + 1) / 2 else phi(x, a-1) - phi(x /$prime[a], a - 1)
end
end

def legendre_pi(n)
rt_n = Math.sqrt(n).to_i
required_primes [rt_n, 100].max

if n <= \$prime[-1] then
small_pi(n)
else
a = small_pi(rt_n)
phi(n, a) + a - 1
end
end

def pi_table(limit)
n = 1
pi = 0

puts "n      time (s)       pi(n)    % prime"
(0..limit).each do |c|
seconds = Benchmark.realtime {pi = legendre_pi(n)}
printf "1e%d %11.6f %11d    %5.2f%%\n", c, seconds, pi, pi.to_f/n * 100
n *= 10
end
end

pi_table 9

PS C:\Users\dave\Documents\dev\ruby> ruby .\prime-pi.rb
n      time (s)       pi(n)    % prime
1e0    0.003001           0     0.00%
1e1    0.000000           4    40.00%
1e2    0.000000          25    25.00%
1e3    0.001000         168    16.80%
1e4    0.002000        1229    12.29%
1e5    0.020001        9592     9.59%
1e6    0.120007       78498     7.85%
1e7    0.769044      664579     6.65%
1e8    6.259358     5761455     5.76%
1e9   70.721045    50847534     5.08%
`
16. danaj said

David, I don’t have your LazySieve, but just threw in a standard sieve to the nth_upper_limit of rt_n (and verified the sieving is a trivial amount of time and the prime counts are identical). I get “stack level too deep” errors in phi when trying to run 1e8 using Ruby 1.8.7. I had to set ‘ulimit -s 32768’ to get it to run 1e9 successfully (hmm, looks like ruby for Windows presets the stack to a giant size so that’s why it runs fine for you). Sadly I’m getting times that are *much* slower than yours — perhaps a 1.8.7 vs. 1.9.x issue. It’s 3.1s for 1e7, 58s for 1e8, and over 10 minutes for 1e9.

I believe ‘Memoizing the phi function is the way to go’ only applies to very small n values. My Legendre version in Perl above is over 10x faster than your Ruby numbers, and Perl isn’t _that_ much faster (I have a fast computer, but my memoized versions in Perl are slower than your Ruby numbers). When I did memoization it was not only slow for large n, but used gawdawful amounts of memory. The ruby code is using over 700MB of memory computing pi(10**9).

One of the main issues I ran into when doing Lehmer with large values (see https://programmingpraxis.com/2011/07/26/more-prime-counting-functions/) was space more so than just speed, making simple memoization a very bad solution.