June | 2011 | Programming Praxis

Big Numbers: Functions

June 28, 2011

In today’s exercise we complete the big number library that we have been building over the past several exercises with a handful of useful functions. Big-gcd finds the greatest common denominator of two big numbers. Big-expt raises a big number to a power. Big-expm raises a big number to a power, modulo another big number. Big-sqrt calculates the square root of a big number. Big-rand returns a random big number, and optionally resets the seed of the random number generator. Integer->big and big->integer are optional; if the underlying language supports big numbers natively, these functions convert between our big numbers and native big numbers.

Your task is to complete the big number library. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

Thank God It’s Friday!

June 24, 2011

We have previously given two algorithms to calculate the day of the week, one in the Standard Prelude and one in the exercise on Zeller’s Congruence. In today’s exercise we give three more algorithms to calculate the day of the week.

Our first method is due to Carl Gauss, and is based on moving January and February to the end of the preceding year, then fitting a straight line through the number of days in each month. Gauss gives the formula $w = \left( d + \lfloor 2.6 m - 0.2 \rfloor + y + \lfloor \frac{y}{4} \rfloor + \lfloor \frac{c}{4} \rfloor - 2c \right) \pmod{7}$ where Y is the input year, except that it is reduced by 1 in January and February, d is the day of the month, m is the number of the month, with 1 for March through 12 for February, y is the last two digits of Y, c is the first two digits of Y, and w is the day of the week, with 0 for Sunday through 6 for Saturday. For instance, June 24, 2011 is calculated as 24 + floor(2.6×4−0.2) + 11 + floor(11÷4) + floor(20÷4) – 2×20, modulo 7, which is 5 for Friday.

Our second method is due to Tomohiko Sakamoto, who gives a table of offsets for the days of each month from the day at the start of the year: t = {0, 3, 2, 5, 0, 3, 5, 1, 4, 6, 2, 4}. Then Sakamoto subtracts 1 from the input year in January and February and calculates the day of the week with the formula $\left( y + \lfloor \frac{y}{4} \rfloor - \lfloor \frac{y}{100} \rfloor + \lfloor \frac{y}{400} \rfloor + t[m-1] + d \right) \pmod{7}$ . For instance, June 24, 2011 is calculated as 2011 + floor(2011÷4) − floor(2011÷100) + floor(2011÷400) + t[6−1] + 24 = 2011 + 502 – 20 + 5 + 3 + 24 = 2525, modulo 7, which is 5 for Friday.

Our third method is due to John Horton Conway, and is intended for mental calculation. Conway’s method is based on calculating the anchor day for the requested century, the doomsday for the requested year, and interpolating from various repetitions of the doomsday through the year.

The anchor day is calculated as $\left( 5c + \lfloor \frac{c-1}{4} \rfloor \right) \pmod{7} + Thursday$ , where c is the century; note that century years, such as 2000, are part of the succeeding century, so c=21 for the year 2000. For example, the anchor day for the 21^st century is Tuesday, calculated as 5×21 + floor(20÷4), modulo 7, plus Thursday. Anchor days repeat every four centuries, in the cycle Friday, Wednesday, Tuesday, Sunday starting from the year 1800.

The doomsday is calculated by dividing the last two digits of the year by 12 to calculate the quotient and remainder. Then the doomsday is the quotient, plus the remainder, plus the quotient of the remainder divided by 4, plus the anchor day for the century. For example, the doomsday for 2011 is Monday, calculated as 0 + 11 + floor(11÷4) = 13, which is 6 modulo 7, plus the anchor day Tuesday.

Once the doomsday is known, the day of the week is calculated by locating the nearest doomsday in each month, which can be memorized in the following manner: For April, June, August, October, and December, the doomsday is the month number: 4/4, 6/6, 8/8, 10/10, and 12/12. For May, July, September, and November, the doomsday can be calculated by the ditty “I worked 9 to 5 at 7-11” which gives 5/9, 7/11, 9/5, and 11/7. The last day of February is a doomsday, whether a common year or a leap year, and this gives an easy way to calculate the day of the week for March, where doomsday is 3/0 (the “zeroth” day of March). All that’s left is January, for which the doomsday is 1/10 in common years and 1/11 in leap years.

Thus, the day of the week for June 24, 2011 is calculated as 24−6=18 ≡ 4 (mod 7) days past the doomsday, which gives an answer of Friday. Conway claims to be able to calculate any day of the week in two seconds, though I confess I have not been able to make the required calculations reliably except by using pencil and paper to assist.

Your task is to write programs to calculate the day of the week using the three functions described above; for Conway’s algorithm, you should calculated the doomsday for the year rather than the day of the week for the date. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

5 Comments »

Big Numbers: Testing

June 21, 2011

In the last several exercises we have been building a library of functions for dealing with big integers. Our library supports basic predicates and comparisons, the four basic arithmetic operations, and input and output. Now is a good time to pause, consolidate what we have done, and test it thoroughly.

Your task is to write a test suite for the big number library. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

Adi Shamir’s Threshold Scheme

June 17, 2011

[ Today’s exercise was written by guest author Graham Enos, a PhD student in the Applied Mathematics program at UNC Charlotte, with solution in Python rather than Scheme. Suggestions for exercises are always welcome, or you may wish to contribute your own exercise; feel free to contact me if you are interested. ]

In his 1979 paper “How to Share A Secret,” Adi Shamir (the S in RSA) proposed a cryptographic scheme that allows n people to share a secret number S in such a way that it takes at least k of them pooling their resources to reconstruct S. This (k, n) threshold scheme uses modular arithmetic and polynomials to give each of the n participants ¹/_k of the needed information. For our discussion, we’ll use a mix of Shamir’s notation and that found in chapter 12 of the book Handbook of Applied Cryptography by Menezes, van Oorschot, and Vanstone.

In his paper, Shamir describes how this scheme can be used to allow groups of k people to retrieve the secret number S even if the other n−k pieces of information have been lost or destroyed. For another use case, suppose S is a 2048-bit private RSA key that’s been used to encrypt a message. Once k participants get together and pool their information, they can find S and decode the message. However, at least k of them must cooperate to retrieve S; no smaller number of participants will do. Note that S and n can be arbitrarily large integers with k ≤ n. For instance, S could be the ASCII value of some secret letter, or a word encoded by taking letters as digits in base 36. Now for the details.

Given a secret value S, the number of participants n, the threshold number k, and some prime number p > max(S, n), we first construct in secret a polynomial y = q(x) of degree k−1 (modulo our prime p) with constant term S by picking independent random integers between 1 and p−1, inclusive, for the coefficients. Next we choose n unique random integers x between 1 and p−1, inclusive, and evaluate the polynomial at those n points. Each of the n participants is given an (x, y) pair.

To reconstruct S from k pairs (x, y), we use Lagrange Interpolation. In general this technique can rebuild the entire polynomial y = q(x), but since S = q(0), we only need to find q(0):

$S = \sum_{i = 1}^k \left[ y_i \prod_{1 \le j \le k, j \ne i} x_j (x_j - x_i)^{-1} \right] \mod p$

Note: the exponent −1 signifies taking the multiplicative inverse mod p, that is, the integer z such that z · (x_j – x_i) ≡ 1 (mod p).

As an example, suppose p = 23, S = 17, and our polynomial y = q(x) is 17 + 4x + 13x². Since this polynomial has degree two, we need at least three points to reconstruct this polynomial. Suppose furthermore that to three of our n recipients we gave the points (14, 22), (2, 8), and (21, 5). Lagrange Interpolation could be used to recreate the whole polynomial, but we’re only interested in the constant term $S = \sum_{i = 1}^3 \left[ y_i \prod_{1 \le j \le k, j \ne i} x_j (x_j - x_i)^{-1} \right] \pmod{23}$ :

S = [22 · 2(2−14)⁻¹ · 21(21−14)⁻¹] + [8 · 14(14−2)⁻¹ · 21(21−2)⁻¹] + [5 · 14(14−21)⁻¹ · 2(2−21)⁻¹] (mod 23)

= [22 · 2 · 11⁻¹ · 21 · 7⁻¹] + [8 · 14 · 12⁻¹ · 21 · 19⁻¹] + [5 · 14 · 16⁻¹ · 21 · 4⁻¹] (mod 23)

= [22 · 2 · 21 · 21 · 10] + [8 · 14 · 2 · 23 · 17] + [5 · 14 · 13 · 21 · 6] (mod 23)

= 194040 + 87584 + 114660 (mod 23)

= 396284 (mod 23)

= 17

The beauty of this scheme is twofold. First, it is rather simple and elegant; the majority of the actual code used to implement the scheme takes less than 15 lines in Python. Second, it has information theoretic security. That is, the security of the scheme relies entirely upon the fact that at least k points are needed to reconstruct a degree k−1 polynomial; nothing less than k points will do. This means its security is based on something being impossible, as opposed to something being believed to be difficult, but not yet proven to be so (e.g. factoring the product of two large primes). This scheme also enjoys other useful properties; see the above references for more.

Your task is to write functions that perform both portions of Shamir’s (k, n) threshold scheme. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

6 Comments »

Big Numbers: Input And Output

June 14, 2011

We continue our series on implementing a big number library by writing the functions that translate between strings and big numbers. You may recall that we cheated in the first big number exercise by using the native big numbers of Scheme to provide input and output. In today’s exercise we overcome that cheating by writing our own functions.

The two functions that convert between strings and big numbers both take an optional argument that represents the radix in which the strings are represented, which can range from 2 to 36 inclusive; if no radix is given, it defaults to 10. The functions are similar to the digits and undigits functions of the Standard Prelude.

Your task is to write the two functions that convert between big numbers and strings. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

1 Comment »

Steganography

June 10, 2011

In his book Dead or Alive, Tom Clancy describes a cryptographic system used by terrorists. His description is incomplete, but it seems to be a two-stage system, with a hand-operable cipher hidden by steganography inside images on a web site. Clancy talks about a one-time pad that doesn’t really seem to be a one-time pad and creates a stream of two-digit numbers using the middle-square method; it may sound good to his readers, but even my limited knowledge of cryptography suggests it’s bad crypto. Or, on one crypto forum where I asked about it, “really really awful” crypto.

Let’s see if we can do better than Clancy. We have four objectives: The system must be hand-operable by terrorists in similar situations to Clancy’s. The system must use both cryptography and steganography, as Clancy’s did. The system must be easily explainable in the context of a novel such as Clancy’s. And the system must be reasonably secure, certainly better than Clancy’s “really, really, really awful” system.

We’ll use Playfair for the cipher and hide our message inside the text of a typical spam email — everybody ignores spam, anyway, so what better place to hide a message? For Playfair we’ll use an 8×8 grid with 26 upper-case letters, 26 lower-case letters, 10 digits, a space, and a period as the only punctuation character. The daily passphrase is the first sentence of the lead editorial in the Wall Street Journal; as I write this on June 6th, the passphrase is “President Obama’s visit to a Chrysler plant in Toledo, Ohio, on Friday was the culmination of a campaign to portray the auto bailouts as a brilliant success with no unpleasant side effects.” We’ll refer to the previous exercise for details. If you don’t like Playfair, bifid makes a reasonable alternative.

The primary point of today’s exercise is to discuss steganography, a word which derives from the Greek for “hidden writing;” cryptography, by contrast is “secret writing.” In ancient times, steganography was performed by shaving a slave’s head, tattooing the message on his scalp, waiting for his hair to grow back, then having him travel to the intended recipient; nowadays, there are numerous programs that hide a message inside a JPEG image. We’ll hide our message in a plain text spam message like this one:

Get V I A G R A today!!!! Call (638)555-1212!!!!!

subduct mythos qua backrest chanter Kioto cronyism Lettish Badajoz Saida moody megavolt gondola coward Tibetan stoss andiron magenta Biisk Henry tumbler coquet SHAPE affable flattery blear Bahaism lance meteor limbate hit anyway yoni Hengist phaeton Papua snick whiffle ankh Firdausi Chaplin triolein ampliate hum putsch desire buttocks Golconda groat fickle mensural utopia oecology scapula bruit Stuart foamy Jane futures Vedic Halifax misquote agitate whereon resonate melodic aground smoky muezzin riddance Aarau dB elm robin bugloss duckbill pe last pow chanter winglet temporal yeanling Sidon Auckland regimen Cheviot skatole gobo splenic neolith amid braiding lowlife riant Sunnite styrene ywis teacart flyspeck deplore chyack Titan Percy hidalgo sniffle unbridle zig kinsfolk immense opaline bebeeru heeled topsail yurt lobby trucking stridor Selden mullet

We’ve all seen spam like that; the extra words are intended to get the message past the spam trap. We can hide a message in the spam in this manner: Each word after the empty line represents a binary 1-bit if it has an odd number of characters and a binary 0-bit if it has an even number of characters. A word is a maximal sequence of non-white characters.

Your task is to write functions that perform encryption and decryption using the system described above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

3 Comments »

Big Numbers: Division

June 7, 2011

We continue our examination of big-integer arithmetic today with a look at division. Long division was hard in grade school, and it’s hard for computers, too, with tricky algorithms and lots of special cases. Fortunately, Donald Knuth has made things easier for us, and we will be following his Algorithm 4.3.1 D.

Division takes as input two numbers, n (the dividend, also called the numerator) and d (the divisor, also called the denominator) and returns two numbers q (the quotient) and r (the remainder) such that q · d + r = n, with 0 ≤ r < d. The basic idea of Algorithm D is to take successive partial divisions, where in each case the partial dividend has one more digit than the divisor, each successive partial division revealing one more digit of the quotient. As with the other arithmetic operators, there is a notion of carry from one partial division to the next.

We’re not going to give a detailed explanation here, because Knuth says it far better; you can run to your nearest copy of Knuth, or peek at the solution. Beware that the code is lengthy, and there are lots of tricky bits, and plan from the start to do lots of debugging.

Your task is to write a function that performs division on big integers. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

2 Comments »

Mersenne Primes

June 3, 2011

Numbers of the form M_n = 2ⁿ−1 are known as Mersenne numbers, named for the French monk Marin Mersenne who studied them early in the seventeenth century. Of the infinite set of Mersenne numbers, 47 are currently known to be prime; see Sloane’s A000043 for a list of their indices. Mersenne primes can be identified by the Lucas-Lehmer test, devised by Édouard Lucas in the 1870s and cast into its modern form by Derrick Lehmer in 1930:

For p an odd prime, the Mersenne number 2^p−1 is prime if and only if 2^p−1 divides S_p−1 where S_n+1 = S_n²−2 and S₁ = 4.

The special form of Mersenne primes makes them easy to identify, and for many years the largest known prime has been a Mersenne prime. A cooperative project on the internet, the Great Internet Mersenne Prime Search (GIMPS), has found all of the recent Mersenne primes, because the numbers have grown so large that a single computer can’t handle the workload.

Your task is to use the Lucas-Lehmer test to find the Mersenne primes through M₂₅₆. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

12 Comments »

Programming Praxis

Big Numbers: Functions

June 28, 2011

Thank God It’s Friday!

June 24, 2011

Big Numbers: Testing

June 21, 2011

Adi Shamir’s Threshold Scheme

June 17, 2011

Big Numbers: Input And Output

June 14, 2011

Steganography

June 10, 2011

Big Numbers: Division

June 7, 2011

Mersenne Primes

June 3, 2011

Categories

Archives

Archives