The Factorization Of F7: Part 1
September 14, 2010
In the seventeenth century, the French jurist and amateur mathematician Pierre de Fermat investigated numbers of the form 22n+1, now known as Fermat numbers: 3, 5, 17, 257, 65537, 4294967297, 18446744073709551617, 340282366920938463463374607431768211457, … (Sloane’s A000215). Fermat noted that the first five numbers in the sequence, F0 to F4, were prime, and conjectured that all the remaining numbers in the sequence were prime. He was wrong, as Leonhard Euler proved in 1732 with the factorization of F5 = 641 · 6700417; then Thomas Clausen factored F6 = 274177 · 67280421310721 in 1855. Those first five numbers are the only members of the sequence that are known to be prime, and it now conjectured that all the remaining Fermat numbers are composite. There is something of a cottage industry among mathematicians and programmers to find new factors of Fermat numbers; you can find the current status of that friendly competition at http://www.prothsearch.net/fermat.html.
On September 13th, 1970, Michael A. Morrison and John Brillhart factored the seventh Fermat number: F7 = 227+1 = 2128+1 = 340282366920938463463374607431768211457 = 59649589127497217 · 5704689200685129054721. Their feat inaugurated the modern culture of factorization by computer, and their now-deprecated method was the direct predecessor of the quadratic sieve that is today the most powerful factorization method available for personal computers, able to factor numbers up to about a hundred digits (the most powerful factoring algorithm known is the number field sieve, which can factor numbers up to about 150 digits, but requires a large cluster of computers). On the fortieth anniversary of their achievement we recreate their computation in this exercise and the next. We are following their description of their work in “A Method of Factoring and the Factorization of F7” in Mathematics of Computation, Volume 29, Number 129, January 1975, Pages 183-205, available on-line at http://www.ams.org/journals/mcom/1975-29-129/S0025-5718-1975-0371800-5/S0025-5718-1975-0371800-5.pdf.
The math behind the method dates to Fermat, who noted that a factor of N could be found when x2 – N was a perfect square; thus, Fermat’s factorization method started at the square root of N and worked x backwards until it reached a solution, as we did in a previous exercise. Early in the twentieth century, Maurice Kraitchik, a Ukrainian emigré and recreational mathematician living in Paris, observed that Fermat’s formula could be written as x2 ≡ y2 (mod N). Then, in 1931, D. H. Lehmer and R. E. Powers discovered that the convergents of the continued fraction representation of the square root of N provided a suitable x and y, but their method was considered unusable because the calculations were too tedious and time-consuming for manual calculation, and the method frequently failed to find a factorization. In 1965, Brillhart realized that the tedious calculations were fit for computer calculation, and he and Morrison spent the summer of 1970 developing their method.
The continued fraction method works in two stages, a first stage that computes successive convergents of the continued fraction representing the square root of the number being factored, and a second stage that performs linear algebra on the factors of the convergents and computes a factor of the original number. Morrison and Brillhart, using an IBM 360/91 mainframe computer at UCLA, coding in PL/1 with assembly-language libraries for large-integer processing, divided the work into programs RESIDUE and ANSWER to make it fit into the machine. We will likewise divide the work into two exercises, looking at RESIDUE today and ANSWER in the next exercise.
We saw continued fractions previously, in the exercise on the golden ratio. Square roots can be computed by continued fractions, and are always periodic; we’ll refer to the article by Morrison and Brillhart for the math and skip directly to their algorithm to expand √N, or √kN for some suitable multiplier k ≥ 1, into a simple continued fraction:
Initialize A−2 = 0, A−1 = 1, Q−1 = kN, r−1 = g, P0 = 0, Q0 = 1, and g = ⌊√kN⌋.
For each n from 0 to a user-specified limit:
- Compute qn and rn using the formula g + Pn = qnQn + rn, where 0 ≤ rn < Qn.
- Compute An (mod N) using the formula An ≡ qnAn−1 + An−2 (mod N).
- Compute g + Pn+1 using the formula g + Pn+1 = 2g − rn.
- Compute Qn+1 using the formula Qn+1 = Qn−1 + qn(rn −rn−1).
Some of the Qn must be factored. That “some” sounds odd. The idea is to factor those Qn that have all their prime factors less than some pre-specified limit, said primes having a Jacobi symbol of 0 or 1 indicating that they are quadratic residues of kN and thus potential factors; we saw the Jacobi symbol in the exercise on modular arithmetic. The procedure is to choose a limit, compute a factor base of those primes that are potential factors, then use trial division to determine if the Qn should be added to the out-going list; Morrison and Brillhart call such a Qn an A − Q pair.
Morrison and Brillhart, based on Lehmer and Powers before them, give this example: let N = 13290059 and k = 1; then g = 3645. Then selected results from the expansion of √kN are given below:
| n | g+Pn | Qn | qn | rn | An−1 (mod N) | Qn factored |
| −1 | … | 13290059 | … | 3645 | 0 | … |
| 0 | 3645 | 1 | 3645 | 0 | 1 | … |
| 1 | 7290 | 4034 | 1 | 3256 | 3645 | 2 · 2017 |
| 2 | 4034 | 3257 | 1 | 777 | 3646 | 3257 |
| 3 | 6513 | 1555 | 4 | 293 | 7291 | 5 · 311 |
| 4 | 6997 | 1321 | 5 | 392 | 32810 | 1321 |
| 5 | 6898 | 2050 | 3 | 748 | 171341 | 2 · 52 · 41 |
| 10 | 6318 | 1333 | 4 | 986 | 6700527 | 31 · 43 |
| 22 | 4779 | 4633 | 1 | 146 | 5235158 | 41 · 113 |
| 23 | 7144 | 226 | 31 | 138 | 1914221 | 2 · 113 |
| 26 | 5622 | 3286 | 1 | 2336 | 11455708 | 2 · 31 · 53 |
| 31 | 6248 | 5650 | 1 | 598 | 1895246 | 2 · 52 · 53 |
| 40 | 6576 | 4558 | 1 | 2018 | 3213960 | 2 · 43 · 53 |
| 52 | 7273 | 25 | 290 | 23 | 2467124 | 52 |
The output from RESIDUE is a list of the following items for each Qn that could be factored over the factor base: n, An−1, Qn, and a list of the odd-power primes dividing Qn (thus, factors like 52 of Q31 are omitted); forty years ago, the output was given on punched cards! In the example above, the rows for Q5, Q10, Q22, Q23, Q26, Q31, and Q40 should appear in the output, based on a factor base containing 2, 5, 31, 41, 43, 53, and 113. Note that 5 is part of the factor base even though it doesn’t appear as an odd power in any of the Qn.
Your task is to write the RESIDUE function that computes a factor base and returns a list of factored Qn; we will complete the factorization of F7 in the next exercise. Apply your function to F7 using a factor base of the first 2700 suitable primes less than 60000 and a multiplier of 257. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Data Encryption Standard: Part 4
September 10, 2010
We conclude our series of exercises on the Data Encryption Standard with this program that provides a convenient interface to the functions we have written:
NAME
des— Data Encryption StandardUSAGE
des-dmode [-isalt]-kkey [filename]
des-dmode [-isalt]-1key1-2key2 [-3key3] [filename]
des-emode [-isalt]-kkey [filename]
des-emode [-isalt]-1key1-2key2 [-3key3] [filename]
des-hn-kkey [filename]
des-ppasswordDESCRIPTION
Desprovides 64-bit block cryptography using the Data Encryption Standard, with-eproviding encryption and-dproviding decryption. Mode may beECBfor Electronic Code Book,CBCfor Cipher Block Chaining,CFBfor Cipher Feedback, orOFBfor Output Feedback. Salt (the initialization vector) should be specified forCBC,CFBandOFBmodes; if no salt is given, the first 64-bit block is taken as the salt. Regular DES is performed if a single key is given by-kkey, and Triple DES is performed if three keys are given; for Triple DES, the third key is optional, and defaults to key1 if not given. Keys and salt are specified by sixteen hexadecimal digits. Input may be specified by filename or on standard input, and output is written to standard output. An n-bit cryptographic hash, where 16 ≤ n ≤ 64 and n ≡ 0 (mod 8), is computed by-h, and an ascii password can be converted to a 64-bit key by-p.REFERENCES
FIPS 46-3 — Data Encryption Standard (http://csrc.nist.gov/publications/fips/fips46-3/fips46-3.pdf)
FIPS 81 — DES Modes of Operation (http://www.itl.nist.gov/fipspubs/fip81.htm)
FIPS 113 — Computer Data Authentication (http://www.itl.nist.gov/fipspubs/fip113.htm)
BUGS
Padding in
ECBandCBCmodes, and in hashing and password generation which is based onCBCmode, is done by adding a 1-bit followed by enough 0-bits to complete the final 64-bit block. Other implementations ofdesmay pad differently, leading to differences in the final two blocks of an encrypted file.Though
desis simple to use, it requires considerable cryptographic sophistication to use effectively.
Your task is to write the des program described above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Data Encryption Standard: Part 3
September 7, 2010
Our third exercise related to the Data Encryption Standard is simpler than the two previous exercises. We will look at Triple DES, cryptographic hashing, and password management.
Triple DES is defined in FIPS 46-3, along with regular DES. Triple DES uses three keys and the formulas CT = EK3(DK2(EK1 PT)) for encryption and PT = DK1(EK2(DK3 CT) for decryption. Triple DES is strongest when all three keys are unique, but is often used with K1 = K3, which is simpler to manage and only somewhat less secure. If K1 = K2 = K3, Triple DES is just the same as regular DES.
FIPS 113 defines cryptographic hashing using DES in CBC block mode. With the input encrypted using an initialization vector of 64 zero-bits, the hash is just the leading n bits of the final block, where 16 ≤ n ≤ 64 and n ≡ 0 (mod 8).
One application of cryptographic hashing converts an ascii plaintext password to a 64-bit key. The hash initializes using the normal zero-vector and is calculated using a single key specific to the application. Then 56 bits are taken from the hash and parity bits are inserted to form a 64-bit key.
Your task is to write Triple DES enciphering and deciphering functions, a cryptographic hash function, and a password-to-key converter. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Data Encryption Standard: Part 2
September 3, 2010
In the previous exercise we examined the working of the DES block cipher for a single 64-bit block. In today’s exercise we extend encryption to an entire file, following the procedures of FIPS 81.
In the descriptions that follow, Pi is the ith plain-text block, Ci is the ith cipher-text block, Ek is the encryption function with key k, Dk is the decryption function with key k, IV is the initialization vector, and ⊕ is the xor operation. There are four block modes:
- Electronic Codebook (ECB) treats each block separately, padding the final block. The encryption algorithm is Ci = Ek(Pi) and the decryption algorithm is Pi = Dk(Ci).
- Cipher Block Chaining (CBC) links each block to the previous one, starting from an initialization vector, padding the final block. The encryption algorithm is Ci = Ek(Pi ⊕ Ci-1) with C0 = IV and the decryption algorithm is Pi = Dk(Ci) ⊕ Ci-1 with C0 = IV.
- Cipher Feedback (CFB) is similar to CBC. The encryption algorithm is Ci = Ek(Ci-1) ⊕ Pi with C0 = IV and the decryption algorithm is Pi = Ek(Ci-1) ⊕ Ci with C0 = IV; note that both encryption and decryption use Ek. The final block is not padded; instead, the leading bits of Ek(Ci-1) are xor’ed with the partial block.
- Output Feedback (OFB) is symmetric for encryption and decryption. The encryption algorithm is Ci = Pi ⊕ Oi and the decryption algorithm is Pi = Ci ⊕ Oi. For both encryption and decryption, Oi = Ek(Oi-1) with O0 = IV. The final block is not padded; instead, the leading bits of Ek(Oi-1) are xor’ed with the partial block.
The initialization vector is a 64-bit block given by the user as a “salt” to the cryptographic process. Padding gives the final block a length of eight bytes and can be accomplished in many ways, all of which must be reversible for either ascii or binary files. FIPS 81 specifies a method that never increases the number of blocks in the file, but requires an out-of-band indicator (say, in the message header) to specify whether or not padding was applied; that method is generally no longer used. Our padding method adds a byte of x80 followed by sufficient bytes of x00 to fill the final 64-bit block, so the message length always increases; in particular, a final block that is exactly eight bytes long causes another full 8-byte block to be added. It is simple to remove the padding; just remove all trailing x00 bytes and the immediately preceding x80 byte.
Of the four methods, ECB is probably the most-frequently used but also the least secure, since repeated blocks will be encrypted identically. Of the others, OFB is most resistant to bit-errors during transmission. If in doubt, CBC is always a good choice. Since there is no particular need for secrecy, the IV can be prepended as the first 64-bit block in the encrypted message (or the last block, or the nth block for some n agreed between sender and receiver), as long as care is taken that no IV is reused with the same key; thus, a common IV embeds the current date and time.
Your task is to write functions that encrypt and decrypt files using the four block modes described above. In the next exercise we will continue our examination of the Data Encryption Standard by looking at Triple DES, cryptographic hashing, and keying procedures. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Data Encryption Standard: Part 1
August 31, 2010
The Data Encryption Standard has been one of the most successful ciphers in history, and is still in use today, especially in its Triple DES variant. The Data Encryption Standard is officially described by FIPS 46-3, though if you are not fond of reading algorithm descriptions written by government lawyers there are many other descriptions available on the internet.
DES is a block cipher, operating on 64 bits at a time. Here is an example:
PT P r o g P r a x
HEX 50 72 6F 67 50 72 61 78
KEY 01 23 45 67 89 AB CD EF
CT CC 99 EA 46 B1 6E 28 90
There is more than one way to encrypt a message longer than 64 bits; we will examine them in a later exercise.
Your task is to write the code to encipher and decipher a single 64-bit block using the Data Encryption Standard. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Chinese Remainders
August 27, 2010
In ancient China, two thousand years ago, a general wanted to count his troops. He first had them line up in ranks of eleven, and there were ten troops left over in the last rank. Then he had his troops line up in ranks of twelve, and there were four left over in the last rank. Finally he had them line up in ranks of thirteen, and there were twelve troops remaining in the last rank.
Your task is to determine how many troops the general had under his command. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Daniel Shanks’ Square Form Factorization Algorithm
August 24, 2010
In the mid-1970s, Daniel Shanks exploited the binary quadratic forms introduced by Legendre in the late eighteenth century and perfected by Gauss in the early nineteenth century to create a method of factoring integers. Binary quadratic forms are expressions of the form Ax2 + Bxy + Cy2 with A, B and C integers, normally represented as the triple (A, B, C). Binary quadratic forms are often called square forms.
We won’t discuss the mathematics of square forms, but we do need to know how to reduce a square form. A discriminant delta is calculated as Δ = b2 − 4ac. A single reduction step is called a rho reduction, and is given by ρ(a, b, c) = (c, r, (r2−Δ)/4c) where r is that unique integer b mod 2c such that −|c| < r <= |c|, if √Δ < |c|, or √Δ − 2|c| < r < √Δ, if |c| < √Δ.
A square form is in reduced form if |√Δ−2|a|| < b < √Δ; this reduction is accomplished by continually applying ρ until the reduced form is achieved.
Shanks’ method uses two loops. First, an initial square form is calculated, based on the number to be factored, which must be odd, composite, and not a perfect square; the details of that initialization are tedious, and appear in the solution on the next page. Then the square form is repeatedly reduced (this is the “forward” loop) until it is in proper form, which is described below. Then the reduced inverse square root of the square form calculated in the forward loop is reduced in a “backward” loop until a factor is identified when two successive square forms (A, B, C=c2) have the same B; the factor is then c, if c is odd, or c/2, if c is even. The reduced inverse square root of (A, B, C=c2) is initially (−c, B, −cA), which is then put into reduced form as described above.
Shanks defined a square form as proper using a queue of values derived in a rather complicated way. We will use a simpler method known as a sufficient list. Let L be the integer square root of the square root of the discriminant. If a square form (A, B, C) is encountered with |C| < L with C even, or |C| < L/2 with C odd, then C is placed on the sufficient list. Then a square form (A, B, C=c2) is proper if c is not on the sufficient list.
Your task is to write a function to factor integers using Shanks’ square form algorithm. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Marriage Sort
August 20, 2010
Today’s exercise is about a relatively new sorting algorithm. We start with an article Optimizing Your Wife by Kevin Brown, which proposes that the best way for a man to find a wife is to decide how many women he is willing to date before he chooses a wife, we’ll call that N, determine which of the first √N women is “best,” according to whatever matters to him, and then choose the next woman after the first √N that is better than any of the first √N women. For instance, to find the marriageable woman in a batch of a hundred, date ten of them, then marry the next one that is better than any of those ten. You may not find the optimal woman, but you’ll be close.
Eric Burnett turned Brown’s idea into a sorting algorithm. First, sample the first √N values at the beginning of an array, then swap any of the remaining values that are better than the greatest value of the sample to the end of the array, swap the greatest value of the sample just before those at the end, then recur on the smaller array before those greatest values. Finish the sort by performing insertion sort on the entire array; that will be quick, since most values are near their final positions.
Burnett’s algorithm requires three pointers: the current location of the end of the sample, the current location of the end of the array still under consideration, and a pointer that sweeps through the array. The time complexity is O(n1.5), which is similar to other sorting methods like shell sort and comb sort that have a first stage that nearly sorts the input followed by insertion sort to clean up the rest.
Your task is to write a function that sorts an array using marriage sort. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Cut
August 17, 2010
Unix V7 provided a utility called cut that reads its input a line at a time and selectively copies portions of each input line to standard output. Portions are selected either by character position or by character-delimited field. Cut is invoked as cut -clist [file …] or cut -flist [-dchar] [file …].
Character mode, invoked with the -c option, retains in the output those character positions that are mentioned in the list, which may contain column numbers, or ranges of column numbers separated by a dash, all separated by commas; counting starts from one. Field mode, invoked with the -f option, specifies a list of fields in a similar manner to character mode; fields are delimited by tab characters unless the field delimiter is changed by the -d option.
For example, the command cut -f1,3 -d: /etc/passwd prints user names and userid numbers from the password file.
Your task is to write a program to implement cut. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
E
August 13, 2010
This puzzle made the rounds of the programmer mathematician blogs a while ago:
Given a random number generator that returns a real number between zero and one, how many random numbers must be selected, on average, to make their accumulated total greater than one?
Your task is to write a program that answers the question above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.