Essay: Programming With Prime Numbers

September 25, 2012

We introduce today a new feature of Programming Praxis: essays. An essay isn’t an exercise; it doesn’t present a task for you to perform, but instead provides extended discussion and complete source code. Though essays may be based on exercises, the idea is that an essay can delve more deeply into a topic than is possible in the limited space and time of an exercise.

Our first essay is about a topic dear to my heart: programming with prime numbers. The essay presents two versions of the Sieve of Eratosthenes, uses trial division to identify primes and factor composites, and provides the Miller-Rabin pseudoprime checker and Pollard rho factoring algorithm. Source code is given in five languages: C, Haskell, Java, Python, and Scheme.

Please read the essay, and feel free to comment on it below; comments on the essay itself are closed. Let me know if you see any errors. And feel free to link to the essay on the internet if you know of places where it is appropriate.


17 Responses to “Essay: Programming With Prime Numbers”

  1. Serge said

    Beautiful! This essay just made it to the required reading list in my Algorithms class. (And my Programming Language class!)
    Thank you.

  2. ardnew said

    Outstanding paper. Thank you very much for the work.

  3. ardnew said

    If possible, colored syntax highlighting would be awesome for the source code!

  4. Mark said

    Loving this essay format. A few more of these and you have the makings of a must-have book. So far I have spotted one error: On page 5, in the upper-right paragraph, there is an am+1 that should be a^m+1.

  5. Mark said

    On page 6, “And Zhenxiang Zhang plausibly conjectures that there are no errors less than 10^3*6 when using…” I think you meant 10^36.

  6. treeowl said

    Question: the prime number theorem offers a very tight range for the value of the nth prime. It’s possible to check each of the numbers in that range to see which are prime in a reasonably efficient manner. If, however, the range contains multiple primes, is there any way to figure out which one is the nth?

  7. Luca said

    Thank you very much for enlightening advices, I was stuck with third problem from Project Euler and then I found your clear explaination in stackoverflow, it worked very well and fast!!! Thanks again!

  8. programmingpraxis said

    Paul Hofstra writes with notice of a bug in the Python version of the primes function: the first while loop should read while p*p <= n. Otherwise, if n is the square of a prime, it will be incorrectly included in the list of primes returned by the function; for instance, primes(9) returns the list [2, 3, 5, 7, 9], which is incorrect. Thanks, Paul.

  9. Owen said

    I think there’s either a bug in the python code for rho_factors or in my understanding of same!
    In the rho_factor subroutine, the check “limit == 0” is made. Limit starts at a large number by default and is never decremented, so that test will always return False. Should it be “limit == c” instead?

  10. Owen said

    No, not “limit == c”, because c gets incremented outside the “while d == 1” loop. Perhaps some other counter that’s incremented each time the hare/tortoise step.

  11. programmingpraxis said

    You’ve found a bug: limit should be decremented each time through the loop.

  12. Owen said

    Gah, I must be thicker than normal today. The correction (I think) should be after the “if d == n” test, when the return statement should be “return rho_factor(n, c + 1, limit – 1)”

  13. programmingpraxis said

    Decrement limit in the main processing loop while d == 1, after testing if limit == 0; thus, the line that recomputes t, h and d should also set limit = limit - 1.

  14. Owen said

    In that case, there should probably be some way to reset the limit after a 1 < d < n has been found. Suppose for the sake of simplicity that we've set limit = 10. We try c = 1, and it takes 3 iterations to find a loop (no factor). So we then try c = 2 with a limit of 7, and it finds a factor d after 5 iterations. But d is composite, so we start the algorithm again to factor d. Surely with a new number to factor we don't want to set limit = 2? There's probably also a case for resetting the limit after we change c, maybe to some sub-limit, to avoid 'wasting' all our iterations on a function f that has an unusually long tail.

    Sorry for harping on about this, I just want to understand what's going on.
    P.S. I realise now that "return rho_factor(n, c + 1, limit – 1)" is virtually no different than checking "limit == c". Whoops!

  15. programmingpraxis said

    The use of limit is completely arbitrary, as determined by the needs of the user, which is the reason it was made a parameter to the function. If you initialize limit to -1, the rho algorithm will continue until all factors have been found, which may be what you want to do when the n being factored is fairly small (say, less than 20- or 25-digits). But, if you are using the rho algorithm to quickly extract small factors from a very large number, you probably want to initialize limit to a fairly small number so that once you have the small factors you can switch to a more powerful factoring method such as the elliptic curve algorithm. For example, if you want to factor a 120-digit n, you might start by trial division to 1000 or 10,000 to find all 3- or 4-digit factors, followed by a million steps of Pollard rho to find all factors up to 10- or 12-digits, followed by the elliptic curve algorithm until you are convinced the remaining co-factor is a semi-prime (because you have, with high probability, found all factors less than the cube root of the co-factor), followed by the quadratic sieve. But all that is really beyond the scope of what this essay does.

    The c parameter is unrelated to limit; c specifies the random-number generator, and limit specifies your patience.

  16. Owen said

    Ok, thanks for that. I guess it’s not so bad that, having found a composite factor d, we run the algorithm again with a smaller limit – with a smaller number to factor it should return much quicker. Further, if we remove all the small factors using trial division as you suggest, then the likelihood of finding a composite factor is greatly diminished.

    I understand the difference between limit and c. I merely observed that my two proposed ‘solutions’ were the essentially the same: If you increment c at the same time you decrement limit, then checking whether limit == c or limit == 0 amount to the same thing (almost, because c starts at 1 instead of 0).

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: