Factoring Multiple RSA Keys

March 20, 2012

On February 15th, the New York Times published an article that described an attack on the RSA key-generation algorithm by Arjen Lenstra and others; Nadia Heninger and her friends did the same study, but have not yet published their work. The Times missed the essence of the Lenstra paper and reported that two out of every thousand internet passwords are insecure, which is incorrect; Dan Kaminsky gives a better description of the vulnerability, and explains why there is little to worry about.

The problem was not in the RSA algorithm itself, but in generating the keys that are used by the RSA algorithm. We studied the key-generation algorithm in a previous exercise. Briefly, the key-generation algorithm selects two large primes, usually called p and q, and uses them to create a public key and a private key; the public key is the product of p and q. The security of the RSA algorithm relies on the fact that it is difficult to factor the public key into its two components p and q. However, if you do, it is easy to determine the corresponding private key, and anyone who does so can decrypt any message thus encoded.

What Lenstra/Heninger found was that, due to a flaw in some implementations of the key-generation algorithm, multiple private keys used the same p. Here’s what Lenstra/Heninger and their colleagues did: First, they collected a large corpus of RSA public keys by scanning every IPv4 address on the internet using nmap; both teams collected somewhere around twelve million keys. Then, they computed the gcd of each key paired with every other key; where the gcd was not 1, it was one of the factors of the key, and the other factor could easily be found by division, and from the two factors the private key could be determined.

Today’s exercise challenges you to recreate the computation that Lenstra/Heninger used to find the factors. We’ll use this set of 21 public keys; these keys are small enough to be easily factored, but real keys are much bigger:

708894553 704488409 705674273 707478271 710167019 708093251 702013379 704030867 708691187 708374743 712748719 713581951 704387447 707015741 704308279 710872423 707947453 704996429 706323757 705031051 714623803

We’ll use two different methods to find the factorable keys. The first method, mentioned above, is to pair each key with every other key and compute the gcd. The time complexity of that algorithm is O(n ²), which is fine if n=21 but not so fine if n=12,000,000; 144trillion of anything will take a while. A second algorithm first multiplies all the keys, then for each key k computes gcd(k, k_prod (mod k ²) / k). The second algorithm takes time O(n log log n); the product takes linear time, the gcd with each key takes linear time, and there is an additional factor of log log n because of the big-integer arithmetic on the product of the keys. The algorithm that Lenstra/Heninger actually used was neither of these; a paper by Daniel Bernstein gives the algorithm, but it’s rather more complicated than we care to deal with today, and the linear algorithm described above isn’t too bad, as long as n isn’t too large and as long as you have a good big-integer library.

Your task is to write functions that implement the two algorithms given above and use them to factor as many of the keys as you can. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

8 Comments »

8 Responses to “Factoring Multiple RSA Keys”

Johann Hibschman said

March 20, 2012 at 2:37 PM

I’m glad to see you reference Nadia’s work. I know she was worried when the first NY Times article came out that her work would be lost in the shuffle.

Continuing my exercises in J, here’s what i have:

k=: 708894553 704488409 705674273 707478271 710167019 708093251 702013379 704030867 708691187 708374743 712748719 713581951 704387447 707015741 704308279 710872423 707947453 704996429 706323757 705031051 714623803x

NB. Method 1: gcd with themselves
diag=: (<0 1)&|:           NB. select the diagonal
IR=: @(i.@$@])             NB. indices of ravel
gtOne=: #~ >&1             NB. select items greater than one
ktab=: 1 diag IR} +./~k    NB. gcd table, replace diagonal with 1
kfacts=: gtOne&.:>"1 ktab  NB. factors found

NB. Method 2: product of keys
kgcd=: k+.((*:k)|*/k)%k    NB. gcds
kfacts2=: gtOne&.> kgcd    NB. factors found

Diego Sanchez said
March 20, 2012 at 2:58 PM
I’m sorry but I couldn’t get the idea of the second algorithm. All gathered public key are: Pi(p,qi)=p*qi where 1 <= i <= 21.

Then we follow next steps:
1. Compute kij=Pi*Pj where i,j is between [1,21].
2. gcd with the next arguments:
a. Pi where 1 is between [1,21]
b. kprod = kij. What means (mod k 2)

Second question, why it solves the problem? Why its outcome gives us p and q?

Thanks for your time.

Best regards
Diego
jasonostrander (@jasonostrander) said
March 20, 2012 at 4:13 PM
My Python solution:

import itertools

keys = ”’
708894553 704488409 705674273
707478271 710167019 708093251
702013379 704030867 708691187
708374743 712748719 713581951
704387447 707015741 704308279
710872423 707947453 704996429
706323757 705031051 714623803
”’

keys = [int(i) for i in keys.strip().split()]

# uses algorithm from paxis problem: three binary algorithms
def gcd(a, b):
if a == 0: return b
if b == 0: return a
if a % 2 == 0 and b % 2 == 0:
return gcd(a >> 1, b >> 1) <> 1, b)
if b % 2 == 0: return gcd(a, b >> 1)
if a > b: return gcd(a-b, b)
else: return gcd(a, b-a)

def algo1(keys):
result = set()
for x,y in itertools.combinations(keys, 2):
g = gcd(x,y)
if g > 1:
result.add(g)
return list(result)

def algo2(keys):
prod = reduce(lambda x,y: x*y, keys)

result = set()
for key in keys:
g = gcd(key, (prod % key**2)/key)
if g > 1:
result.add(g)
return [int(x) for x in result]

print algo1(keys)
print algo2(keys)
programmingpraxis said
March 20, 2012 at 4:47 PM
Johann: If you know Nadia, please point out this blog entry to her.

Diego: You’re making it too complicated. Let’s say the keys are K_1 through K_21. Multiply them together to produce K_prod. Then for each K_i, reduce K_prod modulo K_i squared, divide by K_i, and take the gcd with K_i; if the gcd is greater than 1, you’ve found a factor.

Manoj Senguttuvan said

March 21, 2012 at 8:30 PM

PHP CLI :

<?
$keys=array(gmp_init(708894553),gmp_init(704488409),gmp_init(705674273),gmp_init(707478271),gmp_init(710167019),gmp_init(708093251),gmp_init(702013379),gmp_init(704030867),gmp_init(708691187),gmp_init(708374743),gmp_init(712748719),gmp_init(713581951),gmp_init(704387447),gmp_init(707015741),gmp_init(704308279),gmp_init(710872423),gmp_init(707947453),gmp_init(704996429),gmp_init(706323757),gmp_init(705031051),gmp_init(714623803));
$prod=gmp_init(1);
foreach($keys as $k)
	$prod=gmp_mul($prod,$k);
foreach ($keys as $k)
{
	$val=gmp_strval(gmp_gcd($k,gmp_div(gmp_mod($prod,gmp_pow($k,2)),$k)));
	if($val!=1) print $val."\n";
}
?>

Manoj Senguttuvan said

March 21, 2012 at 8:46 PM

output:

right?

About Those Insecure GSA Keys | Irreal said
March 22, 2012 at 5:32 PM
[…] public keys are insecure. A couple days ago I noticed that the excellent Programming Praxis has a challenge based on that research and subsequent reporting in the press. This is excellent because it exposes […]

Sam Kennedy said

May 22, 2012 at 1:09 AM

I managed to do it eventually, this has to be one of the most interesting exercises I’ve completed on here, along with the spigot algorithm for pi and the Mark V. Shaney exercise.

#include <iostream>
#include <mpir.h>
#include <vector>
#include <boost/random/mersenne_twister.hpp>
#include <boost/random/uniform_int_distribution.hpp>
#include <time.h>
#include <Windows.h>

using namespace std;

vector<long long> list_of_primes;

boost::random::mt19937 gen;

int sieve(){
    vector<int> consecutive;
    int p = 0;
    int i;

    cout << "Initialising...\n";

    consecutive.push_back(0);
    consecutive.push_back(0);
    consecutive.push_back(2);

    for(i = 3; i <= 50000; i+=2) {
        consecutive.push_back(i);
        consecutive.push_back(0);
    }

    while(p <= 50000) {
        if(consecutive[p] != 0) {
            p = consecutive[p];
            for(i = p; i <= 50000; i+=p) {
                if(i > p && consecutive[i] != 0) {
                    consecutive[i] = 0;
                }
            }
        }
        p++;
    }
    for(i = 0; i <= 50000; i++) {
        if(consecutive[i] != 0) {
            list_of_primes.push_back(consecutive[i]);
        }
    }
	consecutive.erase(consecutive.begin(), consecutive.end());
    return 0;
}

int main(){
	sieve();
	gen.seed(static_cast<unsigned int>(time(0)));
	vector<long long> values_for_p;
	vector<long long> values_for_q;
	vector<unsigned long long> values_for_n;
	vector<unsigned long long> values_for_d;
	vector<long long> evalues;
	mpz_t n;
	mpz_init(n);
	mpz_t p;
	mpz_init(p);
	mpz_t q;
	mpz_init(q);
	mpz_t t;
	mpz_init(t);
	mpz_t e;
	mpz_init(e);
	mpz_t d;
	mpz_init(d);
	mpz_t buf;
	mpz_init(buf);
	mpz_t buf2;
	mpz_init(buf2);
	mpz_t pmo;
	mpz_init(pmo);
	mpz_t qmo;
	mpz_init(qmo);
	mpz_t n1;
	mpz_init(n1);
	mpz_t n2;
	mpz_init(n2);
	int x;
	int i;
	int outercounter;
	int single_q;
	
	for(x = 1; x <= 6; x++){
		boost::random::uniform_int_distribution<> dist(1, 4);
		boost::random::uniform_int_distribution<> dist2(0, 5000);
		int loops = dist(gen);
		int single_p = list_of_primes[dist2(gen)];
		for(i = 1; i <= loops; i++){
			values_for_p.push_back(single_p);
		}
	}
	for(outercounter = 0; outercounter < values_for_p.size(); outercounter++){
		boost::random::uniform_int_distribution<> dist(1, 4);
		boost::random::uniform_int_distribution<> dist2(0, 5000);
		Sleep(1000);
		mpz_set_ui(p, values_for_p[outercounter]);
		single_q = dist2(gen);
		single_q = list_of_primes[single_q];
		values_for_q.push_back(single_q);
		mpz_set_ui(q, single_q);
		mpz_mul(n, p, q);
		values_for_n.push_back(mpz_get_ui(n));
		mpz_sub_ui(pmo, p, 1);
		mpz_sub_ui(qmo, q, 1);
		mpz_mul(t, pmo, qmo);
		
		for(i = 0; i < list_of_primes.size(); i++){
			mpz_set_ui(buf, list_of_primes[i]);
			mpz_gcd(buf, t, buf);
			if(mpz_cmp_ui(buf, 1) == 0 && mpz_cmp_ui(t, list_of_primes[i]) > 0){
				evalues.push_back(list_of_primes[i]);
			}
		}
		gen.seed(static_cast<unsigned int>(time(0)));
		boost::random::uniform_int_distribution<> dist3(0, (evalues.size() - 1));
		mpz_set_ui(e, evalues[dist3(gen)]);
		mpz_invert(d, e, t);
		values_for_d.push_back(mpz_get_ui(d));
		mpz_out_str(stdout, 10, n);
		cout << endl;
		evalues.erase(evalues.begin(), evalues.end());
	}
		int frontpointer;
		int backpointer;
		for(frontpointer = 0; frontpointer < values_for_n.size() - 1; frontpointer++){
			for(backpointer = frontpointer + 1; backpointer < values_for_n.size(); backpointer++){
				mpz_set_ui(n1, values_for_n[frontpointer]);
				mpz_set_ui(n2, values_for_n[backpointer]);
				mpz_gcd(buf, n1, n2);
				int tester = mpz_get_ui(buf);
				if(tester > 1){
					cout << "Found Factors For: " << values_for_n[frontpointer] << " and " << values_for_n[backpointer] << "!" << endl;
					cout << values_for_n[frontpointer] << ": p = ";
					mpz_out_str(stdout, 10, buf);
					mpz_div(buf2, n1, buf);
					cout << " q = ";
					mpz_out_str(stdout, 10, buf2);
					cout << endl;
					cout << values_for_n[backpointer] << ": p = ";
					mpz_out_str(stdout, 10, buf);
					mpz_div(buf, n2, buf);
					cout << " q = ";
					mpz_out_str(stdout, 10, buf);
					cout << endl;
				}
			}
		}

	system("pause");

	cout << "Actual Values: \n";
	for(i = 0; i < values_for_p.size(); i++){
		cout << "p: " << values_for_p[i] << " q: " << values_for_q[i] << " n: " << values_for_n[i] << endl;
	}

	system("pause");
	return 0;
}

Programming Praxis