March 20, 2012
On February 15th, the New York Times published an article that described an attack on the RSA key-generation algorithm by Arjen Lenstra and others; Nadia Heninger and her friends did the same study, but have not yet published their work. The Times missed the essence of the Lenstra paper and reported that two out of every thousand internet passwords are insecure, which is incorrect; Dan Kaminsky gives a better description of the vulnerability, and explains why there is little to worry about.
The problem was not in the RSA algorithm itself, but in generating the keys that are used by the RSA algorithm. We studied the key-generation algorithm in a previous exercise. Briefly, the key-generation algorithm selects two large primes, usually called p and q, and uses them to create a public key and a private key; the public key is the product of p and q. The security of the RSA algorithm relies on the fact that it is difficult to factor the public key into its two components p and q. However, if you do, it is easy to determine the corresponding private key, and anyone who does so can decrypt any message thus encoded.
What Lenstra/Heninger found was that, due to a flaw in some implementations of the key-generation algorithm, multiple private keys used the same p. Here’s what Lenstra/Heninger and their colleagues did: First, they collected a large corpus of RSA public keys by scanning every IPv4 address on the internet using
nmap; both teams collected somewhere around twelve million keys. Then, they computed the gcd of each key paired with every other key; where the gcd was not 1, it was one of the factors of the key, and the other factor could easily be found by division, and from the two factors the private key could be determined.
Today’s exercise challenges you to recreate the computation that Lenstra/Heninger used to find the factors. We’ll use this set of 21 public keys; these keys are small enough to be easily factored, but real keys are much bigger:
708894553 704488409 705674273
707478271 710167019 708093251
702013379 704030867 708691187
708374743 712748719 713581951
704387447 707015741 704308279
710872423 707947453 704996429
706323757 705031051 714623803
We’ll use two different methods to find the factorable keys. The first method, mentioned above, is to pair each key with every other key and compute the gcd. The time complexity of that algorithm is O(n 2), which is fine if n=21 but not so fine if n=12,000,000; 144trillion of anything will take a while. A second algorithm first multiplies all the keys, then for each key k computes gcd(k, kprod (mod k 2) / k). The second algorithm takes time O(n log log n); the product takes linear time, the gcd with each key takes linear time, and there is an additional factor of log log n because of the big-integer arithmetic on the product of the keys. The algorithm that Lenstra/Heninger actually used was neither of these; a paper by Daniel Bernstein gives the algorithm, but it’s rather more complicated than we care to deal with today, and the linear algorithm described above isn’t too bad, as long as n isn’t too large and as long as you have a good big-integer library.
Your task is to write functions that implement the two algorithms given above and use them to factor as many of the keys as you can. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.