Fenderbender’s Square Reckoner
November 29, 2016
In many of our programs involving prime numbers and number theory, we need to be able to determine if a number n is a perfect square. One way to do that is to determine the integer square root of the number, using Newton’s method, then multiply to determine if the original number is a square. But that’s slow. In a previous exercise, we used a method devised by Henri Cohen to calculate the quadratic residues of n to various moduli, which can quickly determine that some n cannot be perfect squares.
Over at Mersenne Forum, fenderbender
extends Cohen’s idea to make a ridiculously fast square predicate: he precalculates multiple moduli to reduce the operation from big integers to 32-bit integers, chooses the moduli after extensive testing, and tests the quadratic residues using a 64-bit bloom filter. The result is impressive. Where Cohen eliminates the expensive square root calculation in 99% of cases, fenderbender
eliminates the expensive square root calculation in 99.92% of cases, and does it faster than Cohen. Go read fenderbender
‘s explanation to see a beautiful combination of number theory, wonky programming, and sheer artistry.
Your task is to implement fenderbender
‘s square predicate. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
@programmingpraxis: I converted the code to Python. It works about twice as fast compared to a method where you just use the 2 last digits for a first screening. Maybe I found an example where the method does not work. 152415787532388367504942236884722755800955129 is a square, but fenderbender’s method tells me not; it fails on the test using mod19. Could you check with your code?
@Paul: Works fine for me: http://ideone.com/sqY3zf.
@programmingparaxis: thanks. Apparently some checking to do.
Does anyone know why fenderbender chose mod x with x = 128? I have tried other x. By trial and error, 5! (= 120) gives a rejection rate of 85% and 6! a rejection rate of 93.3%.
Would either of these be more appropriate?
@Ernie: I suspect it is due to the computational efficiency of bit-shifting rather than performing a big-integer modulo.
An interesting method. In Python it seems to be more effective to use sets i.s.o. Bloom filters. The version with sets is about a factor of 1.7 times faster. For the sets version I used mod64 i.s.o. mod128, because I could not find the quadratic residues for mod128.
@Paul: It’s easy to compute the quadratic residues of a modulus, let’s say n = 11. For each number from 0 to n−1, square the number and take it mod n; thus, (0*0)%11=0, (1*1)%11=1, . . . , (5*5)%11=3, . . . , (10*10)%11=1. Those numbers are the quadratic residues of 11: 0, 1, 3, 4, 5, and 9. For the quadratic residues of 128 I get
@programmingpraxis: thanks, I found out in the meantime (and get the same numbers as you). Using mod128 did not give significant different results from using mod64. I am still a little working on the problem, what would be the best mod’s to use.
I’ve implemented 4 square finding routines and timed them.
> Timings (Standard Python):
3.6116340733525862
6.418930000824364
9.941192323010673
11.628690812502668
> Timings (Pypy):
0.3549592833256139
2.8981952367599213
0.24083208547439217
0.14977518674845003
As you can see it seems to matter with which interpreter you run your script. Feel free to correct me if any of that source code is wrong.
To make the function more general purpose, it is useful to handle negative n properly (it should return false) and n = 0 (it should return 0).