September | 2010 | Programming Praxis

Maxiphobic Heaps

September 28, 2010

Priority queues are a data structure in which items arrive in random order and exit in priority order; they provide the operations find-min, delete-min, insert and merge. We implemented priority queues using leftist heaps in one exercise and using pairing heaps in another exercise. In today’s exercise, we will implement priority queues using the maxiphobic heaps of Chris Okasaki.

Maxiphobic heaps are a variant of leftist heaps. Like leftist heaps, maxiphobic heaps are represented as binary trees arranged according to the heap property that every element is less than or equal to its two children. Find-min looks at the root of the tree, delete-min discards the root and merges its two children, and insert merges the existing tree with a singleton tree containing the new element.

The key to leftist trees and maxiphobic trees is the merge operation. In leftist trees, the rank of each left child is always less than the rank of its sibling, where the rank of a node is defined as the length of its right spine, and the merge operation enforces this invariant. In maxiphobic trees, the merge operation is implemented by comparing the roots of the two trees. The smaller root survives as the root of the merged tree. That leaves three sub-trees: the tree that lost in the comparison of the two roots, and the child sub-trees of the winner. They are merged by first merging the two smaller trees, where the size of a tree is determined as the number of elements it contains, then attaching that merged tree along with the remaining tree as the children of the new root. The name maxiphobic, meaning “biggest avoiding,” refers to the merge of the two smaller sub-trees.

Your task is to write functions that implement maxiphobic trees. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

2 Comments »

Alien Numbers

September 24, 2010

Today’s exercise comes to us from the Google Code Jam via Christian Harms’ blog:

Problem

The decimal numeral system is composed of ten digits, which we represent as “0123456789” (the digits in a system are written from lowest to highest). Imagine you have discovered an alien numeral system composed of some number of digits, which may or may not be the same as those used in decimal. For example, if the alien numeral system were represented as “oF8”, then the numbers one through ten would be (F, 8, Fo, FF, F8, 8o, 8F, 88, Foo, FoF). We would like to be able to work with numbers in arbitrary alien systems. More generally, we want to be able to convert an arbitrary number that’s written in one alien system into a second alien system.

Input

The first line of input gives the number of cases, N. N test cases follow. Each case is a line formatted as

alien_number source_language target_language

Each language will be represented by a list of its digits, ordered from lowest to highest value. No digit will be repeated in any representation, all digits in the alien number will be present in the source language, and the first digit of the alien number will not be the lowest valued digit of the source language (in other words, the alien numbers have no leading zeroes). Each digit will either be a number 0-9, an uppercase or lowercase letter, or one of the following symbols !”#$%&'()*+,-./:;?@[\]^_`{|}~

Output

For each test case, output one line containing “Case #x: ” followed by the alien number translated from the source language to the target language.

Your task is to write a program to solve the Google Code Jam. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

10 Comments »

Kaprekar Numbers

September 21, 2010

Wolfram’s MathWorld describes Kaprekar numbers like this:

Consider an n-digit number k. Square it and add the right n digits to the left n or n-1 digits. If the resultant sum is k, then k is called a Kaprekar number. For example, 9 is a Kaprekar number since 9² = 81 and 8 + 1 = 9 and 297 is a Kaprekar number since 297² = 88209 and 88 + 209 = 297.

Your task is to write a function that identifies Kaprekar numbers and to determine the Kaprekar numbers less than a thousand. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

24 Comments »

The Factorization Of F7: Part 2

September 17, 2010

In the previous exercise, we wrote RESIDUE, which implements the first half of Morrison and Brillhart’s factorization algorithm based on continued fractions. In today’s exercise, we complete the factorization of F₇ by writing the ANSWER program, which takes the factored Q_n and computes a factorization of N.

The math underlying the continued fraction factorization algorithm, which we skipped in the previous exercise, is that we are searching for x and y such that x² ≡ y² (mod N) with x ≢ y (mod N). The convergents of the continued fraction expansion of √N have the property that A_n−1 and Q_n, as defined in the previous exercise, will indicate a factor of N whenever Q_n is a perfect square, as we saw in the example of Q₅₂ in the expansion of √13290059. But such Q_n are rare, which is why the method of Lehmer and Powers never found favor. Much more common is the case that two or more Q_n multiply to a perfect square, as they share some odd-power prime factors that, when multiplied together, become an even power.

Consider the example of the previous exercise. The product of Q₅, Q₂₂ and Q₂₃ is the perfect square 2050 · 4633 · 226 = 2146468900 = 46330² = (2 · 5 · 41 · 113)². The product of the corresponding A_n−1 is 171341 · 5235158 · 1914221 = 1717050890347212038 ≡ 1469504 (mod 13290059), and we have the congruence (171341 · 5235158 · 1914221)² ≡ (2 · 5 · 41 · 113)² (mod 13290059) or 1469504² ≡ 46330² (mod 13290059). Thus a factor of N = 13290059 is GCD(1469504 − 46330, 13290059) = 4261 and N = 3119 · 4261.

ANSWER uses the Gaussian elimination algorithm of linear algebra to find a set of Q_n (Morrison and Brillhart call it an S-set) with product a perfect square. The primes in the factorizations of the Q_n are 2, 31, 41, 43, 53 and 113, and we also need −1, as there is an implied factor of −1 attached to each Q_n with odd n. Thus, any Q_n can be represented as a vector of the powers of the exponents of its factors, modulo 2; for instance, Q₅, with odd-power factors 2 and 41, is represented by the vector [1 1 0 1 0 0 0]. Associated with each exponent vector is a history vector that records our work, which initially has 0 everywhere except 1 in the ordinal position corresponding to the row with which it is associated. Here are the exponent matrix, on the left, and history matrix, on the right, corresponding to the seven Q_n of our example problem; the columns of the exponent matrix are factors, left to right from low to high, starting with -1, the columns of the history matrix are Q_n, left to right from low to high, and the rows of both matrices are Q_n, top to bottom from low to high:

1 1 0 1 0 0 0    1 0 0 0 0 0 0
0 0 1 0 1 0 0    0 1 0 0 0 0 0
0 0 0 1 0 0 1    0 0 1 0 0 0 0
1 1 0 0 0 0 1    0 0 0 1 0 0 0
0 1 1 0 0 1 0    0 0 0 0 1 0 0
1 1 0 0 0 0 1    0 0 0 0 0 1 0
0 1 0 0 1 1 0    0 0 0 0 0 0 1

Once the exponent matrix and history matrix are established, the reduction procedure, which performs the forward step of Gaussian elimination, is done as follows:

For each column in the exponent matrix, working from right to left:

Find the “pivot” vector of smallest subscript (nearest the top) whose rightmost 1 is in the current column. If none exists, continue working on the next column to the left.

For each other vector with rightmost 1 in the current column:

Replace the exponent vector with the element-wise sum, modulo 2, of the pivot vector and the current vector.

Replace the associated history vector with the element-wise sum, modulo 2, of the pivot vector and the current vector.

The reduction of the initial matrices given above is shown below:

1 1 0 1 0 0 0    1 0 0 0 0 0 0
0 0 1 0 1 0 0    0 1 0 0 0 0 0
0 0 0 1 0 0 1    0 0 1 0 0 0 0
0 0 0 0 0 0 0    1 0 1 1 0 0 0
0 1 1 0 0 1 0    0 0 0 0 1 0 0
0 0 0 0 0 0 0    1 0 1 0 0 1 0
0 0 0 0 0 0 0    0 1 0 0 1 0 1

Each of the three rows with zero exponent vectors represents an S-set. Some of those S-sets lead to factorizations, but others fail. For instance, the S-set in the seventh row of the reduced matrix gives the congruence (6700527 · 11455708 · 3213960)² ≡ (2 · 31 · 43 · 53)² (mod 13290059), or 141298² ≡ 141298² (mod 13290059), which is useless. The S-set in the sixth row gives the congruence (171341 · 5235158 · 1895246)² ≡ (2 · 5² · 41 · 113)² (mod 13290059), or 13058409² ≡ 231650² (mod 13290059); however, GCD(13058409 − 231650, 13290059) = 1 and the factorization fails. However, the S-set in the fourth row gives a completed factorization, as shown above.

Your task is to write the ANSWER program described above, and then to find the factors of F₇ using the factor base and residues from the previous exercise. When you are finished, you are welcome to read, run or print a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

1 Comment »

The Factorization Of F7: Part 1

September 14, 2010

In the seventeenth century, the French jurist and amateur mathematician Pierre de Fermat investigated numbers of the form 2^2ⁿ+1, now known as Fermat numbers: 3, 5, 17, 257, 65537, 4294967297, 18446744073709551617, 340282366920938463463374607431768211457, … (Sloane’s A000215). Fermat noted that the first five numbers in the sequence, F₀ to F₄, were prime, and conjectured that all the remaining numbers in the sequence were prime. He was wrong, as Leonhard Euler proved in 1732 with the factorization of F₅ = 641 · 6700417; then Thomas Clausen factored F₆ = 274177 · 67280421310721 in 1855. Those first five numbers are the only members of the sequence that are known to be prime, and it now conjectured that all the remaining Fermat numbers are composite. There is something of a cottage industry among mathematicians and programmers to find new factors of Fermat numbers; you can find the current status of that friendly competition at http://www.prothsearch.net/fermat.html.

On September 13th, 1970, Michael A. Morrison and John Brillhart factored the seventh Fermat number: F₇ = 2^2⁷+1 = 2¹²⁸+1 = 340282366920938463463374607431768211457 = 59649589127497217 · 5704689200685129054721. Their feat inaugurated the modern culture of factorization by computer, and their now-deprecated method was the direct predecessor of the quadratic sieve that is today the most powerful factorization method available for personal computers, able to factor numbers up to about a hundred digits (the most powerful factoring algorithm known is the number field sieve, which can factor numbers up to about 150 digits, but requires a large cluster of computers). On the fortieth anniversary of their achievement we recreate their computation in this exercise and the next. We are following their description of their work in “A Method of Factoring and the Factorization of F₇” in Mathematics of Computation, Volume 29, Number 129, January 1975, Pages 183-205, available on-line at http://www.ams.org/journals/mcom/1975-29-129/S0025-5718-1975-0371800-5/S0025-5718-1975-0371800-5.pdf.

The math behind the method dates to Fermat, who noted that a factor of N could be found when x² – N was a perfect square; thus, Fermat’s factorization method started at the square root of N and worked x backwards until it reached a solution, as we did in a previous exercise. Early in the twentieth century, Maurice Kraitchik, a Ukrainian emigré and recreational mathematician living in Paris, observed that Fermat’s formula could be written as x² ≡ y² (mod N). Then, in 1931, D. H. Lehmer and R. E. Powers discovered that the convergents of the continued fraction representation of the square root of N provided a suitable x and y, but their method was considered unusable because the calculations were too tedious and time-consuming for manual calculation, and the method frequently failed to find a factorization. In 1965, Brillhart realized that the tedious calculations were fit for computer calculation, and he and Morrison spent the summer of 1970 developing their method.

The continued fraction method works in two stages, a first stage that computes successive convergents of the continued fraction representing the square root of the number being factored, and a second stage that performs linear algebra on the factors of the convergents and computes a factor of the original number. Morrison and Brillhart, using an IBM 360/91 mainframe computer at UCLA, coding in PL/1 with assembly-language libraries for large-integer processing, divided the work into programs RESIDUE and ANSWER to make it fit into the machine. We will likewise divide the work into two exercises, looking at RESIDUE today and ANSWER in the next exercise.

We saw continued fractions previously, in the exercise on the golden ratio. Square roots can be computed by continued fractions, and are always periodic; we’ll refer to the article by Morrison and Brillhart for the math and skip directly to their algorithm to expand √N, or √kN for some suitable multiplier k ≥ 1, into a simple continued fraction:

Initialize A₋₂ = 0, A₋₁ = 1, Q₋₁ = kN, r₋₁ = g, P₀ = 0, Q₀ = 1, and g = ⌊√kN⌋.

For each n from 0 to a user-specified limit:

Compute q_n and r_n using the formula g + P_n = q_nQ_n + r_n, where 0 ≤ r_n < Q_n.

Compute A_n (mod N) using the formula A_n ≡ q_nA_n−1 + A_n−2 (mod N).

Compute g + P_n+1 using the formula g + P_n+1 = 2g − r_n.

Compute Q_n+1 using the formula Q_n+1 = Q_n−1 + q_n(r_n −r_n−1).

Some of the Q_n must be factored. That “some” sounds odd. The idea is to factor those Q_n that have all their prime factors less than some pre-specified limit, said primes having a Jacobi symbol of 0 or 1 indicating that they are quadratic residues of kN and thus potential factors; we saw the Jacobi symbol in the exercise on modular arithmetic. The procedure is to choose a limit, compute a factor base of those primes that are potential factors, then use trial division to determine if the Q_n should be added to the out-going list; Morrison and Brillhart call such a Q_n an A − Q pair.

Morrison and Brillhart, based on Lehmer and Powers before them, give this example: let N = 13290059 and k = 1; then g = 3645. Then selected results from the expansion of √kN are given below:

n	g+P_n	Q_n	q_n	r_n	A_n−1 (mod N)	Q_n factored
−1	…	13290059	…	3645	0	…
0	3645	1	3645	0	1	…
1	7290	4034	1	3256	3645	2 · 2017
2	4034	3257	1	777	3646	3257
3	6513	1555	4	293	7291	5 · 311
4	6997	1321	5	392	32810	1321
5	6898	2050	3	748	171341	2 · 5² · 41
10	6318	1333	4	986	6700527	31 · 43
22	4779	4633	1	146	5235158	41 · 113
23	7144	226	31	138	1914221	2 · 113
26	5622	3286	1	2336	11455708	2 · 31 · 53
31	6248	5650	1	598	1895246	2 · 5² · 53
40	6576	4558	1	2018	3213960	2 · 43 · 53
52	7273	25	290	23	2467124	5²

The output from RESIDUE is a list of the following items for each Q_n that could be factored over the factor base: n, A_n−1, Q_n, and a list of the odd-power primes dividing Q_n (thus, factors like 5² of Q₃₁ are omitted); forty years ago, the output was given on punched cards! In the example above, the rows for Q₅, Q₁₀, Q₂₂, Q₂₃, Q₂₆, Q₃₁, and Q₄₀ should appear in the output, based on a factor base containing 2, 5, 31, 41, 43, 53, and 113. Note that 5 is part of the factor base even though it doesn’t appear as an odd power in any of the Q_n.

Your task is to write the RESIDUE function that computes a factor base and returns a list of factored Q_n; we will complete the factorization of F₇ in the next exercise. Apply your function to F₇ using a factor base of the first 2700 suitable primes less than 60000 and a multiplier of 257. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

1 Comment »

Data Encryption Standard: Part 4

September 10, 2010

We conclude our series of exercises on the Data Encryption Standard with this program that provides a convenient interface to the functions we have written:

NAME

des — Data Encryption Standard

USAGE

des -d mode [-i salt] -k key [filename]
des -d mode [-i salt] -1 key1 -2 key2 [-3 key3] [filename]

des -e mode [-i salt] -k key [filename]
des -e mode [-i salt] -1 key1 -2 key2 [-3 key3] [filename]

des -h n -k key [filename]

des -p password

DESCRIPTION

Des provides 64-bit block cryptography using the Data Encryption Standard, with -e providing encryption and -d providing decryption. Mode may be ECB for Electronic Code Book, CBC for Cipher Block Chaining, CFB for Cipher Feedback, or OFB for Output Feedback. Salt (the initialization vector) should be specified for CBC, CFB and OFB modes; if no salt is given, the first 64-bit block is taken as the salt. Regular DES is performed if a single key is given by -k key, and Triple DES is performed if three keys are given; for Triple DES, the third key is optional, and defaults to key1 if not given. Keys and salt are specified by sixteen hexadecimal digits. Input may be specified by filename or on standard input, and output is written to standard output. An n-bit cryptographic hash, where 16 ≤ n ≤ 64 and n ≡ 0 (mod 8), is computed by -h, and an ascii password can be converted to a 64-bit key by -p.

REFERENCES

FIPS 46-3 — Data Encryption Standard (http://csrc.nist.gov/publications/fips/fips46-3/fips46-3.pdf)

FIPS 81 — DES Modes of Operation (http://www.itl.nist.gov/fipspubs/fip81.htm)

FIPS 113 — Computer Data Authentication (http://www.itl.nist.gov/fipspubs/fip113.htm)

BUGS

Padding in ECB and CBC modes, and in hashing and password generation which is based on CBC mode, is done by adding a 1-bit followed by enough 0-bits to complete the final 64-bit block. Other implementations of des may pad differently, leading to differences in the final two blocks of an encrypted file.

Though des is simple to use, it requires considerable cryptographic sophistication to use effectively.

Your task is to write the des program described above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

Data Encryption Standard: Part 3

September 7, 2010

Our third exercise related to the Data Encryption Standard is simpler than the two previous exercises. We will look at Triple DES, cryptographic hashing, and password management.

Triple DES is defined in FIPS 46-3, along with regular DES. Triple DES uses three keys and the formulas CT = E_K3(D_K2(E_K1 PT)) for encryption and PT = D_K1(E_K2(D_K3 CT) for decryption. Triple DES is strongest when all three keys are unique, but is often used with K1 = K3, which is simpler to manage and only somewhat less secure. If K1 = K2 = K3, Triple DES is just the same as regular DES.

FIPS 113 defines cryptographic hashing using DES in CBC block mode. With the input encrypted using an initialization vector of 64 zero-bits, the hash is just the leading n bits of the final block, where 16 ≤ n ≤ 64 and n ≡ 0 (mod 8).

One application of cryptographic hashing converts an ascii plaintext password to a 64-bit key. The hash initializes using the normal zero-vector and is calculated using a single key specific to the application. Then 56 bits are taken from the hash and parity bits are inserted to form a 64-bit key.

Your task is to write Triple DES enciphering and deciphering functions, a cryptographic hash function, and a password-to-key converter. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

Data Encryption Standard: Part 2

September 3, 2010

In the previous exercise we examined the working of the DES block cipher for a single 64-bit block. In today’s exercise we extend encryption to an entire file, following the procedures of FIPS 81.

In the descriptions that follow, P_i is the ith plain-text block, C_i is the ith cipher-text block, E_k is the encryption function with key k, D_k is the decryption function with key k, IV is the initialization vector, and ⊕ is the xor operation. There are four block modes:

Electronic Codebook (ECB) treats each block separately, padding the final block. The encryption algorithm is C_i = E_k(P_i) and the decryption algorithm is P_i = D_k(C_i).
Cipher Block Chaining (CBC) links each block to the previous one, starting from an initialization vector, padding the final block. The encryption algorithm is C_i = E_k(P_i ⊕ C_i-1) with C₀ = IV and the decryption algorithm is P_i = D_k(C_i) ⊕ C_i-1 with C₀ = IV.
Cipher Feedback (CFB) is similar to CBC. The encryption algorithm is C_i = E_k(C_i-1) ⊕ P_i with C₀ = IV and the decryption algorithm is P_i = E_k(Ci-1) ⊕ C_i with C₀ = IV; note that both encryption and decryption use E_k. The final block is not padded; instead, the leading bits of E_k(C_i-1) are xor’ed with the partial block.
Output Feedback (OFB) is symmetric for encryption and decryption. The encryption algorithm is C_i = P_i ⊕ O_i and the decryption algorithm is P_i = C_i ⊕ O_i. For both encryption and decryption, O_i = E_k(O_i-1) with O₀ = IV. The final block is not padded; instead, the leading bits of E_k(O_i-1) are xor’ed with the partial block.

The initialization vector is a 64-bit block given by the user as a “salt” to the cryptographic process. Padding gives the final block a length of eight bytes and can be accomplished in many ways, all of which must be reversible for either ascii or binary files. FIPS 81 specifies a method that never increases the number of blocks in the file, but requires an out-of-band indicator (say, in the message header) to specify whether or not padding was applied; that method is generally no longer used. Our padding method adds a byte of x80 followed by sufficient bytes of x00 to fill the final 64-bit block, so the message length always increases; in particular, a final block that is exactly eight bytes long causes another full 8-byte block to be added. It is simple to remove the padding; just remove all trailing x00 bytes and the immediately preceding x80 byte.

Of the four methods, ECB is probably the most-frequently used but also the least secure, since repeated blocks will be encrypted identically. Of the others, OFB is most resistant to bit-errors during transmission. If in doubt, CBC is always a good choice. Since there is no particular need for secrecy, the IV can be prepended as the first 64-bit block in the encrypted message (or the last block, or the nth block for some n agreed between sender and receiver), as long as care is taken that no IV is reused with the same key; thus, a common IV embeds the current date and time.

Your task is to write functions that encrypt and decrypt files using the four block modes described above. In the next exercise we will continue our examination of the Data Encryption Standard by looking at Triple DES, cryptographic hashing, and keying procedures. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

Programming Praxis

Maxiphobic Heaps

September 28, 2010

Alien Numbers

September 24, 2010

Kaprekar Numbers

September 21, 2010

The Factorization Of F7: Part 2

September 17, 2010

The Factorization Of F7: Part 1

September 14, 2010

Data Encryption Standard: Part 4

September 10, 2010

Data Encryption Standard: Part 3

September 7, 2010

Data Encryption Standard: Part 2

September 3, 2010

Categories

Archives

Archives