Maxiphobic Heaps

September 28, 2010

Priority queues are a data structure in which items arrive in random order and exit in priority order; they provide the operations find-min, delete-min, insert and merge. We implemented priority queues using leftist heaps in one exercise and using pairing heaps in another exercise. In today’s exercise, we will implement priority queues using the maxiphobic heaps of Chris Okasaki.

Maxiphobic heaps are a variant of leftist heaps. Like leftist heaps, maxiphobic heaps are represented as binary trees arranged according to the heap property that every element is less than or equal to its two children. Find-min looks at the root of the tree, delete-min discards the root and merges its two children, and insert merges the existing tree with a singleton tree containing the new element.

The key to leftist trees and maxiphobic trees is the merge operation. In leftist trees, the rank of each left child is always less than the rank of its sibling, where the rank of a node is defined as the length of its right spine, and the merge operation enforces this invariant. In maxiphobic trees, the merge operation is implemented by comparing the roots of the two trees. The smaller root survives as the root of the merged tree. That leaves three sub-trees: the tree that lost in the comparison of the two roots, and the child sub-trees of the winner. They are merged by first merging the two smaller trees, where the size of a tree is determined as the number of elements it contains, then attaching that merged tree along with the remaining tree as the children of the new root. The name maxiphobic, meaning “biggest avoiding,” refers to the merge of the two smaller sub-trees.

Your task is to write functions that implement maxiphobic trees. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Advertisement

Pages: 1 2

Alien Numbers

September 24, 2010

Today’s exercise comes to us from the Google Code Jam via Christian Harms’ blog:

Problem

The decimal numeral system is composed of ten digits, which we represent as “0123456789” (the digits in a system are written from lowest to highest). Imagine you have discovered an alien numeral system composed of some number of digits, which may or may not be the same as those used in decimal. For example, if the alien numeral system were represented as “oF8”, then the numbers one through ten would be (F, 8, Fo, FF, F8, 8o, 8F, 88, Foo, FoF). We would like to be able to work with numbers in arbitrary alien systems. More generally, we want to be able to convert an arbitrary number that’s written in one alien system into a second alien system.

Input

The first line of input gives the number of cases, N. N test cases follow. Each case is a line formatted as

alien_number source_language target_language

Each language will be represented by a list of its digits, ordered from lowest to highest value. No digit will be repeated in any representation, all digits in the alien number will be present in the source language, and the first digit of the alien number will not be the lowest valued digit of the source language (in other words, the alien numbers have no leading zeroes). Each digit will either be a number 0-9, an uppercase or lowercase letter, or one of the following symbols !”#$%&'()*+,-./:;?@[\]^_`{|}~

Output

For each test case, output one line containing “Case #x: ” followed by the alien number translated from the source language to the target language.

Your task is to write a program to solve the Google Code Jam. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Kaprekar Numbers

September 21, 2010

Wolfram’s MathWorld describes Kaprekar numbers like this:

Consider an n-digit number k. Square it and add the right n digits to the left n or n-1 digits. If the resultant sum is k, then k is called a Kaprekar number. For example, 9 is a Kaprekar number since 92 = 81 and 8 + 1 = 9 and 297 is a Kaprekar number since 2972 = 88209 and 88 + 209 = 297.

Your task is to write a function that identifies Kaprekar numbers and to determine the Kaprekar numbers less than a thousand. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

In the previous exercise, we wrote RESIDUE, which implements the first half of Morrison and Brillhart’s factorization algorithm based on continued fractions. In today’s exercise, we complete the factorization of F7 by writing the ANSWER program, which takes the factored Qn and computes a factorization of N.

The math underlying the continued fraction factorization algorithm, which we skipped in the previous exercise, is that we are searching for x and y such that x2y2 (mod N) with xy (mod N). The convergents of the continued fraction expansion of √N have the property that An−1 and Qn, as defined in the previous exercise, will indicate a factor of N whenever Qn is a perfect square, as we saw in the example of Q52 in the expansion of √13290059. But such Qn are rare, which is why the method of Lehmer and Powers never found favor. Much more common is the case that two or more Qn multiply to a perfect square, as they share some odd-power prime factors that, when multiplied together, become an even power.

Consider the example of the previous exercise. The product of Q5, Q22 and Q23 is the perfect square 2050 · 4633 · 226 = 2146468900 = 463302 = (2 · 5 · 41 · 113)2. The product of the corresponding An−1 is 171341 · 5235158 · 1914221 = 1717050890347212038 ≡ 1469504 (mod 13290059), and we have the congruence (171341 · 5235158 · 1914221)2 ≡ (2 · 5 · 41 · 113)2 (mod 13290059) or 14695042 ≡ 463302 (mod 13290059). Thus a factor of N = 13290059 is GCD(1469504 − 46330, 13290059) = 4261 and N = 3119 · 4261.

ANSWER uses the Gaussian elimination algorithm of linear algebra to find a set of Qn (Morrison and Brillhart call it an S-set) with product a perfect square. The primes in the factorizations of the Qn are 2, 31, 41, 43, 53 and 113, and we also need −1, as there is an implied factor of −1 attached to each Qn with odd n. Thus, any Qn can be represented as a vector of the powers of the exponents of its factors, modulo 2; for instance, Q5, with odd-power factors 2 and 41, is represented by the vector [1 1 0 1 0 0 0]. Associated with each exponent vector is a history vector that records our work, which initially has 0 everywhere except 1 in the ordinal position corresponding to the row with which it is associated. Here are the exponent matrix, on the left, and history matrix, on the right, corresponding to the seven Qn of our example problem; the columns of the exponent matrix are factors, left to right from low to high, starting with -1, the columns of the history matrix are Qn, left to right from low to high, and the rows of both matrices are Qn, top to bottom from low to high:

1 1 0 1 0 0 0    1 0 0 0 0 0 0
0 0 1 0 1 0 0    0 1 0 0 0 0 0
0 0 0 1 0 0 1    0 0 1 0 0 0 0
1 1 0 0 0 0 1    0 0 0 1 0 0 0
0 1 1 0 0 1 0    0 0 0 0 1 0 0
1 1 0 0 0 0 1    0 0 0 0 0 1 0
0 1 0 0 1 1 0    0 0 0 0 0 0 1

Once the exponent matrix and history matrix are established, the reduction procedure, which performs the forward step of Gaussian elimination, is done as follows:

For each column in the exponent matrix, working from right to left:

Find the “pivot” vector of smallest subscript (nearest the top) whose rightmost 1 is in the current column. If none exists, continue working on the next column to the left.

For each other vector with rightmost 1 in the current column:

Replace the exponent vector with the element-wise sum, modulo 2, of the pivot vector and the current vector.

Replace the associated history vector with the element-wise sum, modulo 2, of the pivot vector and the current vector.

The reduction of the initial matrices given above is shown below:

1 1 0 1 0 0 0    1 0 0 0 0 0 0
0 0 1 0 1 0 0    0 1 0 0 0 0 0
0 0 0 1 0 0 1    0 0 1 0 0 0 0
0 0 0 0 0 0 0    1 0 1 1 0 0 0
0 1 1 0 0 1 0    0 0 0 0 1 0 0
0 0 0 0 0 0 0    1 0 1 0 0 1 0
0 0 0 0 0 0 0    0 1 0 0 1 0 1

Each of the three rows with zero exponent vectors represents an S-set. Some of those S-sets lead to factorizations, but others fail. For instance, the S-set in the seventh row of the reduced matrix gives the congruence (6700527 · 11455708 · 3213960)2 ≡ (2 · 31 · 43 · 53)2 (mod 13290059), or 1412982 ≡ 1412982 (mod 13290059), which is useless. The S-set in the sixth row gives the congruence (171341 · 5235158 · 1895246)2 ≡ (2 · 52 · 41 · 113)2 (mod 13290059), or 130584092 ≡ 2316502 (mod 13290059); however, GCD(13058409 − 231650, 13290059) = 1 and the factorization fails. However, the S-set in the fourth row gives a completed factorization, as shown above.

Your task is to write the ANSWER program described above, and then to find the factors of F7 using the factor base and residues from the previous exercise. When you are finished, you are welcome to read, run or print a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2 3

In the seventeenth century, the French jurist and amateur mathematician Pierre de Fermat investigated numbers of the form 22n+1, now known as Fermat numbers: 3, 5, 17, 257, 65537, 4294967297, 18446744073709551617, 340282366920938463463374607431768211457, … (Sloane’s A000215). Fermat noted that the first five numbers in the sequence, F0 to F4, were prime, and conjectured that all the remaining numbers in the sequence were prime. He was wrong, as Leonhard Euler proved in 1732 with the factorization of F5 = 641 · 6700417; then Thomas Clausen factored F6 = 274177 · 67280421310721 in 1855. Those first five numbers are the only members of the sequence that are known to be prime, and it now conjectured that all the remaining Fermat numbers are composite. There is something of a cottage industry among mathematicians and programmers to find new factors of Fermat numbers; you can find the current status of that friendly competition at http://www.prothsearch.net/fermat.html.

On September 13th, 1970, Michael A. Morrison and John Brillhart factored the seventh Fermat number: F7 = 227+1 = 2128+1 = 340282366920938463463374607431768211457 = 59649589127497217 · 5704689200685129054721. Their feat inaugurated the modern culture of factorization by computer, and their now-deprecated method was the direct predecessor of the quadratic sieve that is today the most powerful factorization method available for personal computers, able to factor numbers up to about a hundred digits (the most powerful factoring algorithm known is the number field sieve, which can factor numbers up to about 150 digits, but requires a large cluster of computers). On the fortieth anniversary of their achievement we recreate their computation in this exercise and the next. We are following their description of their work in “A Method of Factoring and the Factorization of F7” in Mathematics of Computation, Volume 29, Number 129, January 1975, Pages 183-205, available on-line at http://www.ams.org/journals/mcom/1975-29-129/S0025-5718-1975-0371800-5/S0025-5718-1975-0371800-5.pdf.

The math behind the method dates to Fermat, who noted that a factor of N could be found when x2N was a perfect square; thus, Fermat’s factorization method started at the square root of N and worked x backwards until it reached a solution, as we did in a previous exercise. Early in the twentieth century, Maurice Kraitchik, a Ukrainian emigré and recreational mathematician living in Paris, observed that Fermat’s formula could be written as x2y2 (mod N). Then, in 1931, D. H. Lehmer and R. E. Powers discovered that the convergents of the continued fraction representation of the square root of N provided a suitable x and y, but their method was considered unusable because the calculations were too tedious and time-consuming for manual calculation, and the method frequently failed to find a factorization. In 1965, Brillhart realized that the tedious calculations were fit for computer calculation, and he and Morrison spent the summer of 1970 developing their method.

The continued fraction method works in two stages, a first stage that computes successive convergents of the continued fraction representing the square root of the number being factored, and a second stage that performs linear algebra on the factors of the convergents and computes a factor of the original number. Morrison and Brillhart, using an IBM 360/91 mainframe computer at UCLA, coding in PL/1 with assembly-language libraries for large-integer processing, divided the work into programs RESIDUE and ANSWER to make it fit into the machine. We will likewise divide the work into two exercises, looking at RESIDUE today and ANSWER in the next exercise.

We saw continued fractions previously, in the exercise on the golden ratio. Square roots can be computed by continued fractions, and are always periodic; we’ll refer to the article by Morrison and Brillhart for the math and skip directly to their algorithm to expand √N, or √kN for some suitable multiplier k ≥ 1, into a simple continued fraction:

Initialize A−2 = 0, A−1 = 1, Q−1 = kN, r−1 = g, P0 = 0, Q0 = 1, and g = ⌊√kN⌋.

For each n from 0 to a user-specified limit:

  • Compute qn and rn using the formula g + Pn = qnQn + rn, where 0 ≤ rn < Qn.
  • Compute An (mod N) using the formula AnqnAn−1 + An−2 (mod N).
  • Compute g + Pn+1 using the formula g + Pn+1 = 2grn.
  • Compute Qn+1 using the formula Qn+1 = Qn−1 + qn(rnrn−1).

Some of the Qn must be factored. That “some” sounds odd. The idea is to factor those Qn that have all their prime factors less than some pre-specified limit, said primes having a Jacobi symbol of 0 or 1 indicating that they are quadratic residues of kN and thus potential factors; we saw the Jacobi symbol in the exercise on modular arithmetic. The procedure is to choose a limit, compute a factor base of those primes that are potential factors, then use trial division to determine if the Qn should be added to the out-going list; Morrison and Brillhart call such a Qn an AQ pair.

Morrison and Brillhart, based on Lehmer and Powers before them, give this example: let N = 13290059 and k = 1; then g = 3645. Then selected results from the expansion of √kN are given below:

n g+Pn Qn qn rn An−1 (mod N) Qn factored
−1 13290059 3645 0
0 3645 1 3645 0 1
1 7290 4034 1 3256 3645 2 · 2017
2 4034 3257 1 777 3646 3257
3 6513 1555 4 293 7291 5 · 311
4 6997 1321 5 392 32810 1321
5 6898 2050 3 748 171341 2 · 52 · 41
10 6318 1333 4 986 6700527 31 · 43
22 4779 4633 1 146 5235158 41 · 113
23 7144 226 31 138 1914221 2 · 113
26 5622 3286 1 2336 11455708 2 · 31 · 53
31 6248 5650 1 598 1895246 2 · 52 · 53
40 6576 4558 1 2018 3213960 2 · 43 · 53
52 7273 25 290 23 2467124 52

The output from RESIDUE is a list of the following items for each Qn that could be factored over the factor base: n, An−1, Qn, and a list of the odd-power primes dividing Qn (thus, factors like 52 of Q31 are omitted); forty years ago, the output was given on punched cards! In the example above, the rows for Q5, Q10, Q22, Q23, Q26, Q31, and Q40 should appear in the output, based on a factor base containing 2, 5, 31, 41, 43, 53, and 113. Note that 5 is part of the factor base even though it doesn’t appear as an odd power in any of the Qn.

Your task is to write the RESIDUE function that computes a factor base and returns a list of factored Qn; we will complete the factorization of F7 in the next exercise. Apply your function to F7 using a factor base of the first 2700 suitable primes less than 60000 and a multiplier of 257. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

We conclude our series of exercises on the Data Encryption Standard with this program that provides a convenient interface to the functions we have written:

NAME

des — Data Encryption Standard

USAGE

des -d mode [-i salt] -k key [filename]
des -d mode [-i salt] -1 key1 -2 key2 [-3 key3] [filename]

des -e mode [-i salt] -k key [filename]
des -e mode [-i salt] -1 key1 -2 key2 [-3 key3] [filename]

des -h n -k key [filename]

des -p password

DESCRIPTION

Des provides 64-bit block cryptography using the Data Encryption Standard, with -e providing encryption and -d providing decryption. Mode may be ECB for Electronic Code Book, CBC for Cipher Block Chaining, CFB for Cipher Feedback, or OFB for Output Feedback. Salt (the initialization vector) should be specified for CBC, CFB and OFB modes; if no salt is given, the first 64-bit block is taken as the salt. Regular DES is performed if a single key is given by -k key, and Triple DES is performed if three keys are given; for Triple DES, the third key is optional, and defaults to key1 if not given. Keys and salt are specified by sixteen hexadecimal digits. Input may be specified by filename or on standard input, and output is written to standard output. An n-bit cryptographic hash, where 16 ≤ n ≤ 64 and n ≡ 0 (mod 8), is computed by -h, and an ascii password can be converted to a 64-bit key by -p.

REFERENCES

FIPS 46-3 — Data Encryption Standard (http://csrc.nist.gov/publications/fips/fips46-3/fips46-3.pdf)

FIPS 81 — DES Modes of Operation (http://www.itl.nist.gov/fipspubs/fip81.htm)

FIPS 113 — Computer Data Authentication (http://www.itl.nist.gov/fipspubs/fip113.htm)

BUGS

Padding in ECB and CBC modes, and in hashing and password generation which is based on CBC mode, is done by adding a 1-bit followed by enough 0-bits to complete the final 64-bit block. Other implementations of des may pad differently, leading to differences in the final two blocks of an encrypted file.

Though des is simple to use, it requires considerable cryptographic sophistication to use effectively.

Your task is to write the des program described above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Our third exercise related to the Data Encryption Standard is simpler than the two previous exercises. We will look at Triple DES, cryptographic hashing, and password management.

Triple DES is defined in FIPS 46-3, along with regular DES. Triple DES uses three keys and the formulas CT = EK3(DK2(EK1 PT)) for encryption and PT = DK1(EK2(DK3 CT) for decryption. Triple DES is strongest when all three keys are unique, but is often used with K1 = K3, which is simpler to manage and only somewhat less secure. If K1 = K2 = K3, Triple DES is just the same as regular DES.

FIPS 113 defines cryptographic hashing using DES in CBC block mode. With the input encrypted using an initialization vector of 64 zero-bits, the hash is just the leading n bits of the final block, where 16 ≤ n ≤ 64 and n ≡ 0 (mod 8).

One application of cryptographic hashing converts an ascii plaintext password to a 64-bit key. The hash initializes using the normal zero-vector and is calculated using a single key specific to the application. Then 56 bits are taken from the hash and parity bits are inserted to form a 64-bit key.

Your task is to write Triple DES enciphering and deciphering functions, a cryptographic hash function, and a password-to-key converter. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

In the previous exercise we examined the working of the DES block cipher for a single 64-bit block. In today’s exercise we extend encryption to an entire file, following the procedures of FIPS 81.

In the descriptions that follow, Pi is the ith plain-text block, Ci is the ith cipher-text block, Ek is the encryption function with key k, Dk is the decryption function with key k, IV is the initialization vector, and ⊕ is the xor operation. There are four block modes:

  • Electronic Codebook (ECB) treats each block separately, padding the final block. The encryption algorithm is Ci = Ek(Pi) and the decryption algorithm is Pi = Dk(Ci).
  • Cipher Block Chaining (CBC) links each block to the previous one, starting from an initialization vector, padding the final block. The encryption algorithm is Ci = Ek(PiCi-1) with C0 = IV and the decryption algorithm is Pi = Dk(Ci) ⊕ Ci-1 with C0 = IV.
  • Cipher Feedback (CFB) is similar to CBC. The encryption algorithm is Ci = Ek(Ci-1) ⊕ Pi with C0 = IV and the decryption algorithm is Pi = Ek(Ci-1) ⊕ Ci with C0 = IV; note that both encryption and decryption use Ek. The final block is not padded; instead, the leading bits of Ek(Ci-1) are xor’ed with the partial block.
  • Output Feedback (OFB) is symmetric for encryption and decryption. The encryption algorithm is Ci = PiOi and the decryption algorithm is Pi = CiOi. For both encryption and decryption, Oi = Ek(Oi-1) with O0 = IV. The final block is not padded; instead, the leading bits of Ek(Oi-1) are xor’ed with the partial block.

The initialization vector is a 64-bit block given by the user as a “salt” to the cryptographic process. Padding gives the final block a length of eight bytes and can be accomplished in many ways, all of which must be reversible for either ascii or binary files. FIPS 81 specifies a method that never increases the number of blocks in the file, but requires an out-of-band indicator (say, in the message header) to specify whether or not padding was applied; that method is generally no longer used. Our padding method adds a byte of x80 followed by sufficient bytes of x00 to fill the final 64-bit block, so the message length always increases; in particular, a final block that is exactly eight bytes long causes another full 8-byte block to be added. It is simple to remove the padding; just remove all trailing x00 bytes and the immediately preceding x80 byte.

Of the four methods, ECB is probably the most-frequently used but also the least secure, since repeated blocks will be encrypted identically. Of the others, OFB is most resistant to bit-errors during transmission. If in doubt, CBC is always a good choice. Since there is no particular need for secrecy, the IV can be prepended as the first 64-bit block in the encrypted message (or the last block, or the nth block for some n agreed between sender and receiver), as long as care is taken that no IV is reused with the same key; thus, a common IV embeds the current date and time.

Your task is to write functions that encrypt and decrypt files using the four block modes described above. In the next exercise we will continue our examination of the Data Encryption Standard by looking at Triple DES, cryptographic hashing, and keying procedures. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2