Essay: Programming With Prime Numbers
September 25, 2012
We introduce today a new feature of Programming Praxis: essays. An essay isn’t an exercise; it doesn’t present a task for you to perform, but instead provides extended discussion and complete source code. Though essays may be based on exercises, the idea is that an essay can delve more deeply into a topic than is possible in the limited space and time of an exercise.
Our first essay is about a topic dear to my heart: programming with prime numbers. The essay presents two versions of the Sieve of Eratosthenes, uses trial division to identify primes and factor composites, and provides the Miller-Rabin pseudoprime checker and Pollard rho factoring algorithm. Source code is given in five languages: C, Haskell, Java, Python, and Scheme.
Please read the essay, and feel free to comment on it below; comments on the essay itself are closed. Let me know if you see any errors. And feel free to link to the essay on the internet if you know of places where it is appropriate.
Beautiful! This essay just made it to the required reading list in my Algorithms class. (And my Programming Language class!)
Thank you.
Outstanding paper. Thank you very much for the work.
If possible, colored syntax highlighting would be awesome for the source code!
Loving this essay format. A few more of these and you have the makings of a must-have book. So far I have spotted one error: On page 5, in the upper-right paragraph, there is an am+1 that should be a^m+1.
On page 6, “And Zhenxiang Zhang plausibly conjectures that there are no errors less than 10^3*6 when using…” I think you meant 10^36.
Question: the prime number theorem offers a very tight range for the value of the nth prime. It’s possible to check each of the numbers in that range to see which are prime in a reasonably efficient manner. If, however, the range contains multiple primes, is there any way to figure out which one is the nth?
@treeowl: Yes. See https://programmingpraxis.com/2011/08/02/the-nth-prime/.
Thank you very much for enlightening advices, I was stuck with third problem from Project Euler and then I found your clear explaination in stackoverflow, it worked very well and fast!!! Thanks again!
Paul Hofstra writes with notice of a bug in the Python version of the
primes
function: the first while loop should readwhile p*p <= n
. Otherwise, if n is the square of a prime, it will be incorrectly included in the list of primes returned by the function; for instance,primes(9)
returns the list[2, 3, 5, 7, 9]
, which is incorrect. Thanks, Paul.I think there’s either a bug in the python code for rho_factors or in my understanding of same!
In the rho_factor subroutine, the check “limit == 0” is made. Limit starts at a large number by default and is never decremented, so that test will always return False. Should it be “limit == c” instead?
No, not “limit == c”, because c gets incremented outside the “while d == 1” loop. Perhaps some other counter that’s incremented each time the hare/tortoise step.
You’ve found a bug:
limit
should be decremented each time through the loop.Gah, I must be thicker than normal today. The correction (I think) should be after the “if d == n” test, when the return statement should be “return rho_factor(n, c + 1, limit – 1)”
Decrement
limit
in the main processing loopwhile d == 1
, after testingif limit == 0
; thus, the line that recomputest
,h
andd
should also setlimit = limit - 1
.In that case, there should probably be some way to reset the limit after a 1 < d < n has been found. Suppose for the sake of simplicity that we've set limit = 10. We try c = 1, and it takes 3 iterations to find a loop (no factor). So we then try c = 2 with a limit of 7, and it finds a factor d after 5 iterations. But d is composite, so we start the algorithm again to factor d. Surely with a new number to factor we don't want to set limit = 2? There's probably also a case for resetting the limit after we change c, maybe to some sub-limit, to avoid 'wasting' all our iterations on a function f that has an unusually long tail.
Sorry for harping on about this, I just want to understand what's going on.
P.S. I realise now that "return rho_factor(n, c + 1, limit – 1)" is virtually no different than checking "limit == c". Whoops!
The use of
limit
is completely arbitrary, as determined by the needs of the user, which is the reason it was made a parameter to the function. If you initializelimit
to -1, the rho algorithm will continue until all factors have been found, which may be what you want to do when the n being factored is fairly small (say, less than 20- or 25-digits). But, if you are using the rho algorithm to quickly extract small factors from a very large number, you probably want to initializelimit
to a fairly small number so that once you have the small factors you can switch to a more powerful factoring method such as the elliptic curve algorithm. For example, if you want to factor a 120-digit n, you might start by trial division to 1000 or 10,000 to find all 3- or 4-digit factors, followed by a million steps of Pollard rho to find all factors up to 10- or 12-digits, followed by the elliptic curve algorithm until you are convinced the remaining co-factor is a semi-prime (because you have, with high probability, found all factors less than the cube root of the co-factor), followed by the quadratic sieve. But all that is really beyond the scope of what this essay does.The
c
parameter is unrelated tolimit
;c
specifies the random-number generator, andlimit
specifies your patience.Ok, thanks for that. I guess it’s not so bad that, having found a composite factor d, we run the algorithm again with a smaller limit – with a smaller number to factor it should return much quicker. Further, if we remove all the small factors using trial division as you suggest, then the likelihood of finding a composite factor is greatly diminished.
I understand the difference between limit and c. I merely observed that my two proposed ‘solutions’ were the essentially the same: If you increment c at the same time you decrement limit, then checking whether limit == c or limit == 0 amount to the same thing (almost, because c starts at 1 instead of 0).