Elliptic Curve Factorization
July 31, 2009
In the previous exercise we wrote a small library of functions on elliptic curves. Today we will use elliptic curves to perform integer factorization. The material in this exercise is based on a discussion in the second edition of the book Introduction to Cryptography with Coding Theory by Wade Trappe and Lawrence C. Washington.
It is easiest to explain the algorithm by an example. Consider the factorization of the number n = pq = 2773. To factor n, we choose an elliptic curve at random, let’s say y2 ≡ x3 + 4x + 4 (mod 2773), where the modulus is the number to be factored. Of course, this is not really an elliptic curve, because the modulus of a real elliptic curve must be prime in order for us to compute the modular inverse required to determine the slope of the line connecting two points on the curve. It is exactly that fact that the elliptic curve algorithm exploits.
Next we choose a point on the curve; in this case, the point P = (1,3) works fine. In practice, it is easier to choose the a parameter randomly on the range 0 ≤ a < n, let the point be (1,1), then the b parameter is just –a, but you can select the parameters any way you like as long as a, b, x and y are all on the range 0 ≤ a < n and the point (x,y) is on the curve.
Now we begin a series of elliptic additions. The point 2P is (1771,705). Next we try to calculate 3P, but fail. The line through the points 2P = (1771,705) and P = (1,3) is 705 – 3 divided by 1771 – 1, the rise over the run, but gcd(1770,2773) = 59, so we cannot invert 1770 to calculate the slope of the line. We have found a point that is not on the elliptic curve. Of course, that is exactly what we were trying to do; 59 is a factor of 2773 = 59 × 47.
By the Chinese remainder theorem, the curve y2 ≡ x3 + 4x + 4 (mod 2773) can be regarded as a pair of elliptic curves, one with modulus 59 and the other with modulus 47. Point 3P exists with modulus 47 but not with modulus 59. That’s how elliptic curve factorization works, allowing us to isolate the factor 59.
So the basic elliptic curve factorization algorithm is to choose a random elliptic curve (actually, a pseudo-elliptic curve) modulo the number to be factored, and a random point on the curve, then repeatedly build up multiples of the random point until the elliptic arithmetic fails, at which point the factor can be identified. The algorithm described here is a vastly simplified form of elliptic curve factorization, and is intended to teach the basic concepts rather than create a useful program; we’ll soon see a better version of the algorithm.
Your task is to write a function that performs integer factorization using the elliptic curve algorithm described above. Use your function to calculate the factors of 455839. When you are finished, you are welcome to read or run a suggested solution, or to post your solution or discuss the exercise in the comments below.
Elliptic Curves
July 28, 2009
Elliptic curves are a mathematical oddity. Originally derived from elliptic integrals that measure the area under an ellipse, elliptic curves have relationships to algebra, geometry, and number theory. Elliptic curves featured in the proof of Fermat’s last theorem and are used today in integer factorization and cryptography. We’ll see applications of elliptic curves in future exercises; the task today is to work out the basic arithmetic of elliptic curves. The simple definition of an elliptic curve is that it provides all the solutions to a generalized cubic equation.
The standard form of an elliptic curve is y2 = x3 + ax + b; we can also write Ea,b. Any formula that is cubic in two variables can be rewritten into this form via suitable transformations. In addition to the x,y points defined by the formula, an elliptic curve includes a “point at infinity,” denoted by ∞, analogous to an artist’s “vanishing point” that allows him to depict a three-dimensional image on a two-dimensional canvas. We will consider only elliptic curves where the discriminant 4a3 + 27b2 is non-zero, meaning that the elliptic curve has no cusps or self-intersections.
The standard operation on an elliptic curve is the addition of two points, as shown in the diagram; this is, of course, different from vector addition of two points on a plane. Geometrically, points P and Q can be added by projecting a line between them and finding the “third point” R where the line intersects the curve, which is the negative of the point P + Q, as shown in the drawing by Jean Brette shown at right. Further, the point at infinity is the identity element for addition, so that P + ∞ = ∞ + P = P for any point P on an elliptic curve. Addition is both associative and commutative.
The addition law on elliptic curves is:
Given points P1 = (x1,y1) and P2 = (x2,y2), the point P3 = (x3,y3) = P1 + P2, where points P1, P2 and P3 lie on an elliptic curve y2 = x3 + ax + b with non-zero discriminant 4a3 + 27b2, can be calculated as
x3 = s2 – x1 – x2
y3 = y1 + s(x3 – x1)
where the slope s is
s = (3x12 + a) / (2y1)
if P1 = P2 and
s = (y1 – y2) / (x1 – x2)
otherwise.
The slope of the line connecting two points on an elliptic curve is calculated in the normal way, the difference in the rise divided by the difference in the run, except that if the two points are identical, the line “connecting” the two points is the tangent to the curve. Subtraction is addition of the negative, where the negative of (x,y) is (x,-y). Multiplication on an elliptic curve is just repeated addition, as it is with regular arithmetic.
We’ve been discussing elliptic curves over the real numbers, but the formulas work perfectly well on a finite field containing integers relative to a modulus. The division in the calculation of the slope is multiplication by the modular inverse, so the modulus must be prime. Our earlier exercise on modular arithmetic will be helpful.
A simple example may make things clear. Consider the elliptic curve y2 ≡ x3 + 4x + 4 (mod 5). There are eight points on the curve: (0,2), (0,3), (1,2), (1,3), (2,0), (4,2), (4,3) and ∞; these points can be computed by substituting 0, 1, 2, 3 and 4 for x and computing the modular square root. Note that when x = 3 there are no solutions because 27 + 12 + 4 ≡ 43 ≡ 3 (mod 5) has no square roots, and when x = 2 there is only one solution, 8 + 8 + 4 ≡ 20 ≡ 0 (mod 5), and the square root of 0 is 0 regardless of the modulus. The sum of the points (1,2) and (4,3) is (4,2), the negative of the point (1,2) is (1,3), double the point (1,2) is (2,0), 3 times the point (1,2) is (1,3), 4 times the point (1,2) is ∞, and 5 times the point (1,2) is (1,2), whence the cycle repeats.
Your task is to write a library of functions for elliptic curves on a finite field. Your library should provide functions for addition, negation, subtraction, doubling, and multiplication; you may wish to represent a normal point as a triple (x y 1) and the point at infinity as the triple (0 1 0). When you are finished, you are welcome to read or run a suggested solution, or to post your solution or discuss the exercise in the comments below.
Let’s Make A Deal!
July 24, 2009
Let’s Make A Deal! was a television game show that originated in the United States in the 1960s and has since been produced throughout the world. In one of the games, prizes were placed behind three doors, and the contestant was asked to pick a door; one door hid an automobile, the other two doors hid goats. Then the host, who knows what is behind the doors, opened one of the doors that the contestant did not pick, revealing in every case a goat, and asks if you want to switch doors. Is it to your advantage to switch your choice of doors?
Your task is to write a program that simulates a large number of games and determines whether or not it is to your advantage to switch doors. What are the odds that you will win the automobile by switching? When you are finished, you are welcome to read or run a suggested solution, or to post your solution or discuss the exercise in the comments below.
Pollard’s P-1 Factorization Algorithm
July 21, 2009
Fermat’s little theorem states that for any prime number , and any other number
,
. Rearranging terms, we have
, which means that
divides the left-hand side of the equation, thus
is a factor of the left-hand side.
John Pollard used this idea to develop a factoring method in 1974. His idea is to choose a very large number and see if it has any factors in common with , thus giving a factor of
. A systematic way to test many very large numbers is by taking large factorials, which have many small factors within them. Thus, Pollard’s
factorization algorithm is this:
1) Choose , which is known as the smoothness bound, and calculate the product of the primes to
by
.
2) Choose a random integer such that
.
3) Calculate . If
is strictly greater than 1, then it is a non-trivial factor of
. Otherwise, continue to the next step.
4) Calculate . If
, then
is a non-trivial factor of
. If
, go to Step 1 and choose a larger
. If
, go back to Step 2 and choose another
.
In Step 4, you can quit with failure instead of returning to Step 1 if becomes too large, where “too large” is probably somewhere around
; in that case, you will need to continue with some other factoring algorithm.
Your task is to write a function that factors integers by Pollard’s method. What are the factors of
? When you are finished, you are welcome to read or run a suggested solution, or to post your solution or discuss the exercise in the comments below.
International Mathematical Olympiad
July 17, 2009
These three problems from the International Mathematical Olympiad recently popped up on the web.
IMO 1960 Problem 01
Determine all three-digit numbers N having the property that N is divisible by 11, and N/11 is equal to the sum of the squares of the digits of N.
IMO 1962 Problem 01
Find the smallest natural number n which has the following properties:
(a) Its decimal representation has 6 as the last digit.
(b) If the last digit 6 is erased and placed in front of the remaining digits, the resulting number is four times as large as the original number n.
IMO 1963 Problem 06
Five students, A,B,C,D,E, took part in a contest. One prediction was that the contestants would finish in the order ABCDE. This prediction was very poor. In fact no contestant finished in the position predicted, and no two contestants predicted to finish consecutively actually did so. A second prediction had the contestants finishing in the order DAECB. This prediction was better. Exactly two of the contestants finished in the places predicted, and two disjoint pairs of students predicted to finish consecutively actually did so. Determine the order in which the contestants finished.
Your task is to solve the three problems. When you are finished, you are welcome to read or run a suggested solution, or to post your solution or discuss the exercise in the comments below.
The Daily Cryptogram
July 14, 2009
Many newspapers publish a cryptogram each day; for instance:
P OYUUAOEXYW YM AEFGAD, FZGPEAG JAATUL, MYC ENA AGFOPEXYW PWG AWSYLVAWE YM ENA DPHHL ZCYICPVVAC
The deciphered cryptogram is usually a quote from a famous author, often advice for daily life or a pithy saying. Cryptographically, these puzzles are monoalphabetic substitution ciphers, with each of the twenty-six plain-text letters systematically replaced by a cipher-text letter.
Your task is to write a program that decrypts the daily cryptogram, and use that program to read the message given above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
A Golden Exercise
July 12, 2009
The exercise that will be published on Tuesday will be the fiftieth exercise published at Programming Praxis, marking our Golden Anniversary. My thanks to all you readers, and especially those who contributed code and other comments, for making this blog special. I am always delighted to hear from my readers, especially if they are contributing an idea for an exercise (hint, hint). It’s a lot of fun writing these exercises, but also a lot of work, and your contributions make it worthwhile. <grin>I can tell that the blog is starting to be successful because spam comments, which I must delete manually, now exceed ham comments.</grin>
Tuesday’s exercise will also introduce a change in the structure of the exercises. The suggested solution will be on a separate page published forty-eight hours after the exercise. That gives regular readers a chance to work on their own solutions without being influenced by the suggested solution. Comments will be attached to the exercise, not the solution.
Our first fifty exercises have developed several themes: related sequences of exercises involving prime numbers and cryptography, exercises in data structures like ternary search tries and exercises in algorithms like modular arithmetic, some exercises simple and others not-quite-so simple, and puzzles like sudoku and Feynman. The next fifty exercises will be more of the same, featuring some new sequences of related exercises, some old friends, and the odd puzzle or two. And I am always open to suggestions!
To celebrate our Golden Anniversary, Tuesday’s exercise will be somewhat more ambitious than normal, so get those compilers revving. Happy coding! And please enjoy a piece of virtual anniversary cake!
And now, may I ask for your help. I would like to move the site from wordpress.com to its own host, still running WordPress software, so that I can finally fix the theme and, more importantly, automate some processes that I am currently doing by hand (and some others I am not doing at all). But I have never blogged before, and have no experience with hosting, or migrating a domain registration, or maintaining a WordPress installation. If some reader would like to offer advice, please contact me at programmingpraxis@gmail.com.
The Golden Ratio
July 10, 2009
I was helping my daughter with her math homework recently, and came across this problem in continued fractions:
This fraction can be considered as a sequence of terms
Or, in general
The first ten elements of the sequence are 1, 2, 3/2, 5/3, 8/5, 13/8, 21/13, 34/21, 55/34, and 89/55.
Your task is to write a program that evaluates the nth element of the sequence. What is the value of ? When you are finished, you may read or run a suggested solution, or post your own solution or discuss the exercise in the comments below.
Modular Arithmetic
July 7, 2009
Modular arithmetic is a branch of number theory that is useful in its own right and as a tool in such disciplines as integer factorization, calendrical and astronomical calculations, and cryptography. Modular arithmetic was systematized by Carl Friedrich Gauss in his book Disquisitiones Arithmeticae, published in 1801.
Modular arithmetic is a system of arithmetic for integers in which all results are expressed as non-negative integers less than the modulus. A familiar use of modular arithmetic is the twelve-hour clock, in which eight hours after nine o’clock is five o’clock; that is, 9 + 8 ≡ 5 (mod 12). The ≡ sign used in modular arithmetic specifies congruence, not equality. Two numbers are congruent in a given modulus if the difference between them is an even multiple of the modulus; for instance 17 ≡ 5 (mod 12). Modular addition is like regular addition, but “wraps around” at the modulus. Modular subtraction is the inverse operation of modular addition, and modular multiplication is just repeated modular addition; for instance 4 − 9 ≡ 7 (mod 12), and 3 × 7 ≡ 9 (mod 12).
Modular division is the inverse operation of modular multiplication, just as regular division is the inverse operation of regular multiplication. In modular arithmetic, the modular inverse of an integer p is that integer q for which p × q ≡ 1 (mod n); the modular inverse is undefined, just as division by zero is undefined, unless the target number is coprime to the modulus. The extended Euclidean algorithm calculates the inverse; given ax + by = g = gcd(x, y), when g = 1 and x > 0, b (mod x) and a (mod y) are the inverses of y (mod x) and x (mod y), respectively. Then modular division is just modular multiplication by the inverse. For instance, 9 ÷ 7 ≡ 3 (mod 12).
Exponentiation in modular arithmetic is repeated modular multiplication, just as exponentiation is repeated multiplication in normal arithmetic. Modular exponentiation could be performed by first computing the exponentiation and then performing the modulo operation, but that could involve a large intermediate result. A better method performs the modulo operations at each multiplication, squaring when possible to reduce the number of intermediate multiplications.
Square root is defined in modular arithmetic as that number which, when multiplied by itself, equals the target number, just as in normal arithmetic. And just as in normal arithmetic, every square root has a conjugate on the “other side” of zero, except that in modular arithmetic there is no “other side” of zero, so the two conjugates are the negatives of each other, adding to the modulus. For instance the square roots of 18 (mod 31) are 7 and 24, since 7 × 7 ≡ 18 (mod 31) and 24 × 24 ≡ 18 (mod 31), and 7 + 24 = 31.
Square root is a more difficult concept in modular arithmetic than regular arithmetic, because there are many cases where the modular square root does not exist. For instance, consider the squares x2 (mod 13), 0 ≤ x < 13: 0, 1, 4, 9, 3, 12, 10, 10, 12, 3, 9, 4, 1. There is no number that can be squared modulo 13 to evaluate to 7, so the square root of 7 modulo 13 does not exist; the only numbers that have square roots modulo 13 are 1, 3, 4, 9, 10, and 12, or, equivalently, ±1, ±3, and ±4. Another restriction is that the modular square root is only defined if the modulus is an odd prime. For instance, consider the squares x2 (mod 15), 0 ≤ x < 15: 0, 1, 4, 9, 1, 10, 6, 4, 4, 6, 10, 1, 9, 4, 1. The number 4 has two sets of conjugate square roots: ±2 and ±7. Since the solution is not unique, the modular square root can be said not to exist when the modulus is composite.
If you need a refresher on the math that supports modular arithmetic, you can look at any number theory textbook, or search the internet for terms such as modular arithmetic, bezout identity, modular inverse, jacobi symbol, and quadratic residue.
Your task is to write a library that provides convenient access to the basic operations of modular arithmetic: addition, subtraction, multiplication, division, exponentiation, and square root. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
The Playfair Cipher
July 3, 2009
One of the most famous of the classical ciphers was the Playfair cipher, invented by Sir Charles Wheatstone in 1854 but popularized by Lord Lyon Playfair. The cipher was used by the British during the Boer War and World War I, and was still in use as a field-grade cipher as late as World War II, probably because it is simple to teach, fast to use, requires no special equipment, and was reasonably secure by the standards of the day; by the time enemy cryptanalysts could break the message, its contents were useless to them.
Playfair uses a 5-square block containing the pass-phrase; the letter J is mapped to I to make the 26-letter alphabet fit the 5-square block. The pass-phrase is first filled in to the 5-square block, then the remaining letters of the alphabet complete the 5-square block, with duplicates removed. For instance, the pass-phrase PLAYFAIR leads to this 5-square block:
P L A Y F
I R B C D
E G H K M
N O Q S T
U V W X Z
Some variants of Playfair omit Q instead of J, or start the key from the bottom-right corner and work backwards; we’ll stick with the standard method.
The message to be encrypted is split into digrams, with J mapped to I. In addition, no digram may use the same letter twice, so an X is inserted if needed (some Playfair variants use Z or Q instead of X). Likewise, if the last digram has only a single letter, an X is added to the end of the message. For instance, the plain-text PROGRAMMING PRAXIS is split as:
PR OG RA MX MI NG PR AX IS
Then each digram is enciphered separately according to the following rules:
- If the two letters of the digram are in the same row, they are replaced pairwise by the letters to their immediate right, wrapping around to the left of the row if needed.
- If the two letters of the digram are in the same column, they are replaced pairwise by the letters immediately below, wrapping around to the top of the column if needed.
- Otherwise, the first letter of the digram is replaced by the letter in the same row as the first letter of the digram and the same column as the second letter of the digram, and the second letter of the digram is replaced by the letter in the same row as the second letter of the digram and the same column as the first letter of the digram.
For instance, the PR digram is enciphered as LI, because P and R are in different rows and columns, and the OG digram is enciphered as VO, since O and G are in the same column. The complete encipherment is shown below:
PR OG RA MX MI NG PR AX IS
LI VO BL KZ ED OE LI YW CN
The coded message is thus LIVOBLKZEDOELIYWCN. Decryption is simply the inverse of encryption.
Playfair is rather more secure than simpler substitution ciphers because it works with digrams rather than single letters, rendering simple frequency analysis moot. Frequency analysis on digrams is possible, but requires considerably more cipher-text than frequency analysis on single letters because Playfair admits 600 digrams whereas simple substitution admits only 26 letters. Manual cryptanalysis looks at commonly-reversed digrams such as REceivER and DEcodED; if some plain-text is known (such as a standard message header), it is easy to recover at least a partial key.
The most famous use of the Playfair cipher came in 1943. David Kahn writes in The Codebreakers:
Perhaps the most famous cipher of 1943 involved the future president of the U.S., J. F. Kennedy, Jr. On 2 August 1943, Australian Coastwatcher Lieutenant Arthur Reginald Evans of the Royal Australian Naval Volunteer Reserve saw a pinpoint of flame on the dark waters of Blackett Strait from his jungle ridge on Kolombangara Island, one of the Solomons. He did not know that the Japanese destroyer Amagiri had rammed and sliced in half an American patrol boat PT-109, under the command of Lieutenant John F. Kennedy, United States Naval Reserve. Evans received the following message at 0930 on the morning of the 2 of August 1943:
KXJEY UREBE ZWEHE WRYTU HEYFS
KREHE GOYFI WTTTU OLKSY CAJPO
BOTEI ZONTX BYBWT GONEY CUZWR
GDSON SXBOU YWRHE BAAHY USEDQThe translation:
PT BOAT ONE OWE NINE LOST IN ACTION IN BLACKETT
STRAIT TWO MILES SW MERESU COVE X CREW OF TWELVE
X REQUEST ANY INFORMATION.The coastwatchers regularly used the Playfair system. Evans deciphered it with the key ROYAL NEW ZEALAND NAVY and learned of Kennedy’s fate. […] About ten hours later, at 10:00 p.m. Kennedy and his crew was rescued.
Your task is to write functions that encipher and decipher messages using the Playfair cipher. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.