## Factoring Multiple RSA Keys

### March 20, 2012

On February 15th, the *New York Times* published an article that described an attack on the RSA key-generation algorithm by Arjen Lenstra and others; Nadia Heninger and her friends did the same study, but have not yet published their work. The *Times* missed the essence of the Lenstra paper and reported that two out of every thousand internet passwords are insecure, which is incorrect; Dan Kaminsky gives a better description of the vulnerability, and explains why there is little to worry about.

The problem was not in the RSA algorithm itself, but in generating the keys that are used by the RSA algorithm. We studied the key-generation algorithm in a previous exercise. Briefly, the key-generation algorithm selects two large primes, usually called *p* and *q*, and uses them to create a public key and a private key; the public key is the product of *p* and *q*. The security of the RSA algorithm relies on the fact that it is difficult to factor the public key into its two components *p* and *q*. However, if you do, it is easy to determine the corresponding private key, and anyone who does so can decrypt any message thus encoded.

What Lenstra/Heninger found was that, due to a flaw in some implementations of the key-generation algorithm, multiple private keys used the same p. Here’s what Lenstra/Heninger and their colleagues did: First, they collected a large corpus of RSA public keys by scanning every IPv4 address on the internet using `nmap`

; both teams collected somewhere around twelve million keys. Then, they computed the gcd of each key paired with every other key; where the gcd was not 1, it was one of the factors of the key, and the other factor could easily be found by division, and from the two factors the private key could be determined.

Today’s exercise challenges you to recreate the computation that Lenstra/Heninger used to find the factors. We’ll use this set of 21 public keys; these keys are small enough to be easily factored, but real keys are much bigger:

`708894553 704488409 705674273`

707478271 710167019 708093251

702013379 704030867 708691187

708374743 712748719 713581951

704387447 707015741 704308279

710872423 707947453 704996429

706323757 705031051 714623803

We’ll use two different methods to find the factorable keys. The first method, mentioned above, is to pair each key with every other key and compute the gcd. The time complexity of that algorithm is *O*(*n* ^{2}), which is fine if *n*=21 but not so fine if *n*=12,000,000; 144trillion of anything will take a while. A second algorithm first multiplies all the keys, then for each key *k* computes gcd(*k*, *k _{prod}* (mod

*k*

^{2}) /

*k*). The second algorithm takes time

*O*(

*n*log log

*n*); the product takes linear time, the gcd with each key takes linear time, and there is an additional factor of log log

*n*because of the big-integer arithmetic on the product of the keys. The algorithm that Lenstra/Heninger actually used was neither of these; a paper by Daniel Bernstein gives the algorithm, but it’s rather more complicated than we care to deal with today, and the linear algorithm described above isn’t too bad, as long as

*n*isn’t too large and as long as you have a good big-integer library.

Your task is to write functions that implement the two algorithms given above and use them to factor as many of the keys as you can. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

I’m glad to see you reference Nadia’s work. I know she was worried when the first NY Times article came out that her work would be lost in the shuffle.

Continuing my exercises in J, here’s what i have:

I’m sorry but I couldn’t get the idea of the second algorithm. All gathered public key are: Pi(p,qi)=p*qi where 1 <= i <= 21.

Then we follow next steps:

1. Compute kij=Pi*Pj where i,j is between [1,21].

2. gcd with the next arguments:

a. Pi where 1 is between [1,21]

b. kprod = kij. What means (mod k 2)

Second question, why it solves the problem? Why its outcome gives us p and q?

Thanks for your time.

Best regards

Diego

My Python solution:

import itertools

keys = ”’

708894553 704488409 705674273

707478271 710167019 708093251

702013379 704030867 708691187

708374743 712748719 713581951

704387447 707015741 704308279

710872423 707947453 704996429

706323757 705031051 714623803

”’

keys = [int(i) for i in keys.strip().split()]

# uses algorithm from paxis problem: three binary algorithms

def gcd(a, b):

if a == 0: return b

if b == 0: return a

if a % 2 == 0 and b % 2 == 0:

return gcd(a >> 1, b >> 1) <> 1, b)

if b % 2 == 0: return gcd(a, b >> 1)

if a > b: return gcd(a-b, b)

else: return gcd(a, b-a)

def algo1(keys):

result = set()

for x,y in itertools.combinations(keys, 2):

g = gcd(x,y)

if g > 1:

result.add(g)

return list(result)

def algo2(keys):

prod = reduce(lambda x,y: x*y, keys)

result = set()

for key in keys:

g = gcd(key, (prod % key**2)/key)

if g > 1:

result.add(g)

return [int(x) for x in result]

print algo1(keys)

print algo2(keys)

Johann: If you know Nadia, please point out this blog entry to her.

Diego: You’re making it too complicated. Let’s say the keys are K_1 through K_21. Multiply them together to produce K_prod. Then for each K_i, reduce K_prod modulo K_i squared, divide by K_i, and take the gcd with K_i; if the gcd is greater than 1, you’ve found a factor.

PHP CLI :

output:

right?

[…] public keys are insecure. A couple days ago I noticed that the excellent Programming Praxis has a challenge based on that research and subsequent reporting in the press. This is excellent because it exposes […]

I managed to do it eventually, this has to be one of the most interesting exercises I’ve completed on here, along with the spigot algorithm for pi and the Mark V. Shaney exercise.