Big Numbers: Testing

June 21, 2011

In the last several exercises we have been building a library of functions for dealing with big integers. Our library supports basic predicates and comparisons, the four basic arithmetic operations, and input and output. Now is a good time to pause, consolidate what we have done, and test it thoroughly.

Your task is to write a test suite for the big number library. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2 3

[ Today’s exercise was written by guest author Graham Enos, a PhD student in the Applied Mathematics program at UNC Charlotte, with solution in Python rather than Scheme. Suggestions for exercises are always welcome, or you may wish to contribute your own exercise; feel free to contact me if you are interested. ]

In his 1979 paper “How to Share A Secret,” Adi Shamir (the S in RSA) proposed a cryptographic scheme that allows n people to share a secret number S in such a way that it takes at least k of them pooling their resources to reconstruct S. This (k, n) threshold scheme uses modular arithmetic and polynomials to give each of the n participants 1/k of the needed information. For our discussion, we’ll use a mix of Shamir’s notation and that found in chapter 12 of the book Handbook of Applied Cryptography by Menezes, van Oorschot, and Vanstone.

In his paper, Shamir describes how this scheme can be used to allow groups of k people to retrieve the secret number S even if the other nk pieces of information have been lost or destroyed. For another use case, suppose S is a 2048-bit private RSA key that’s been used to encrypt a message. Once k participants get together and pool their information, they can find S and decode the message. However, at least k of them must cooperate to retrieve S; no smaller number of participants will do. Note that S and n can be arbitrarily large integers with kn. For instance, S could be the ASCII value of some secret letter, or a word encoded by taking letters as digits in base 36. Now for the details.

Given a secret value S, the number of participants n, the threshold number k, and some prime number p > max(S, n), we first construct in secret a polynomial y = q(x) of degree k−1 (modulo our prime p) with constant term S by picking independent random integers between 1 and p−1, inclusive, for the coefficients. Next we choose n unique random integers x between 1 and p−1, inclusive, and evaluate the polynomial at those n points. Each of the n participants is given an (x, y) pair.

To reconstruct S from k pairs (x, y), we use Lagrange Interpolation. In general this technique can rebuild the entire polynomial y = q(x), but since S = q(0), we only need to find q(0):

S =  \sum_{i = 1}^k \left[ y_i \prod_{1 \le j \le k, j \ne i} x_j (x_j - x_i)^{-1}      \right] \mod p

Note: the exponent −1 signifies taking the multiplicative inverse mod p, that is, the integer z such that z · (xjxi) ≡ 1 (mod p).

As an example, suppose p = 23, S = 17, and our polynomial y = q(x) is 17 + 4x + 13x2. Since this polynomial has degree two, we need at least three points to reconstruct this polynomial. Suppose furthermore that to three of our n recipients we gave the points (14, 22), (2, 8), and (21, 5). Lagrange Interpolation could be used to recreate the whole polynomial, but we’re only interested in the constant term S = \sum_{i = 1}^3 \left[ y_i \prod_{1 \le j \le k, j \ne i} x_j (x_j - x_i)^{-1} \right] \pmod{23}:

S = [22 · 2(2−14)−1 · 21(21−14)−1] + [8 · 14(14−2)−1 · 21(21−2)−1] + [5 · 14(14−21)−1 · 2(2−21)−1] (mod 23)

  = [22 · 2 · 11−1 · 21 · 7−1] + [8 · 14 · 12−1 · 21 · 19−1] + [5 · 14 · 16−1 · 21 · 4−1] (mod 23)

  = [22 · 2 · 21 · 21 · 10] + [8 · 14 · 2 · 23 · 17] + [5 · 14 · 13 · 21 · 6] (mod 23)

  = 194040 + 87584 + 114660 (mod 23)

  = 396284 (mod 23)

  = 17

The beauty of this scheme is twofold. First, it is rather simple and elegant; the majority of the actual code used to implement the scheme takes less than 15 lines in Python. Second, it has information theoretic security. That is, the security of the scheme relies entirely upon the fact that at least k points are needed to reconstruct a degree k−1 polynomial; nothing less than k points will do. This means its security is based on something being impossible, as opposed to something being believed to be difficult, but not yet proven to be so (e.g. factoring the product of two large primes). This scheme also enjoys other useful properties; see the above references for more.

Your task is to write functions that perform both portions of Shamir’s (k, n) threshold scheme. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

We continue our series on implementing a big number library by writing the functions that translate between strings and big numbers. You may recall that we cheated in the first big number exercise by using the native big numbers of Scheme to provide input and output. In today’s exercise we overcome that cheating by writing our own functions.

The two functions that convert between strings and big numbers both take an optional argument that represents the radix in which the strings are represented, which can range from 2 to 36 inclusive; if no radix is given, it defaults to 10. The functions are similar to the digits and undigits functions of the Standard Prelude.

Your task is to write the two functions that convert between big numbers and strings. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2 3

Steganography

June 10, 2011

In his book Dead or Alive, Tom Clancy describes a cryptographic system used by terrorists. His description is incomplete, but it seems to be a two-stage system, with a hand-operable cipher hidden by steganography inside images on a web site. Clancy talks about a one-time pad that doesn’t really seem to be a one-time pad and creates a stream of two-digit numbers using the middle-square method; it may sound good to his readers, but even my limited knowledge of cryptography suggests it’s bad crypto. Or, on one crypto forum where I asked about it, “really really awful” crypto.

Let’s see if we can do better than Clancy. We have four objectives: The system must be hand-operable by terrorists in similar situations to Clancy’s. The system must use both cryptography and steganography, as Clancy’s did. The system must be easily explainable in the context of a novel such as Clancy’s. And the system must be reasonably secure, certainly better than Clancy’s “really, really, really awful” system.

We’ll use Playfair for the cipher and hide our message inside the text of a typical spam email — everybody ignores spam, anyway, so what better place to hide a message? For Playfair we’ll use an 8×8 grid with 26 upper-case letters, 26 lower-case letters, 10 digits, a space, and a period as the only punctuation character. The daily passphrase is the first sentence of the lead editorial in the Wall Street Journal; as I write this on June 6th, the passphrase is “President Obama’s visit to a Chrysler plant in Toledo, Ohio, on Friday was the culmination of a campaign to portray the auto bailouts as a brilliant success with no unpleasant side effects.” We’ll refer to the previous exercise for details. If you don’t like Playfair, bifid makes a reasonable alternative.

The primary point of today’s exercise is to discuss steganography, a word which derives from the Greek for “hidden writing;” cryptography, by contrast is “secret writing.” In ancient times, steganography was performed by shaving a slave’s head, tattooing the message on his scalp, waiting for his hair to grow back, then having him travel to the intended recipient; nowadays, there are numerous programs that hide a message inside a JPEG image. We’ll hide our message in a plain text spam message like this one:

Get V I A G R A today!!!! Call (638)555-1212!!!!!

subduct mythos qua backrest chanter Kioto cronyism Lettish Badajoz Saida moody megavolt gondola coward Tibetan stoss andiron magenta Biisk Henry tumbler coquet SHAPE affable flattery blear Bahaism lance meteor limbate hit anyway yoni Hengist phaeton Papua snick whiffle ankh Firdausi Chaplin triolein ampliate hum putsch desire buttocks Golconda groat fickle mensural utopia oecology scapula bruit Stuart foamy Jane futures Vedic Halifax misquote agitate whereon resonate melodic aground smoky muezzin riddance Aarau dB elm robin bugloss duckbill pe last pow chanter winglet temporal yeanling Sidon Auckland regimen Cheviot skatole gobo splenic neolith amid braiding lowlife riant Sunnite styrene ywis teacart flyspeck deplore chyack Titan Percy hidalgo sniffle unbridle zig kinsfolk immense opaline bebeeru heeled topsail yurt lobby trucking stridor Selden mullet

We’ve all seen spam like that; the extra words are intended to get the message past the spam trap. We can hide a message in the spam in this manner: Each word after the empty line represents a binary 1-bit if it has an odd number of characters and a binary 0-bit if it has an even number of characters. A word is a maximal sequence of non-white characters.

Your task is to write functions that perform encryption and decryption using the system described above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Big Numbers: Division

June 7, 2011

We continue our examination of big-integer arithmetic today with a look at division. Long division was hard in grade school, and it’s hard for computers, too, with tricky algorithms and lots of special cases. Fortunately, Donald Knuth has made things easier for us, and we will be following his Algorithm 4.3.1 D.

Division takes as input two numbers, n (the dividend, also called the numerator) and d (the divisor, also called the denominator) and returns two numbers q (the quotient) and r (the remainder) such that q · d + r = n, with 0 ≤ r < d. The basic idea of Algorithm D is to take successive partial divisions, where in each case the partial dividend has one more digit than the divisor, each successive partial division revealing one more digit of the quotient. As with the other arithmetic operators, there is a notion of carry from one partial division to the next.

We’re not going to give a detailed explanation here, because Knuth says it far better; you can run to your nearest copy of Knuth, or peek at the solution. Beware that the code is lengthy, and there are lots of tricky bits, and plan from the start to do lots of debugging.

Your task is to write a function that performs division on big integers. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2 3

Mersenne Primes

June 3, 2011

Numbers of the form Mn = 2n−1 are known as Mersenne numbers, named for the French monk Marin Mersenne who studied them early in the seventeenth century. Of the infinite set of Mersenne numbers, 47 are currently known to be prime; see Sloane’s A000043 for a list of their indices. Mersenne primes can be identified by the Lucas-Lehmer test, devised by Édouard Lucas in the 1870s and cast into its modern form by Derrick Lehmer in 1930:

For p an odd prime, the Mersenne number 2p−1 is prime if and only if 2p−1 divides Sp−1 where Sn+1 = Sn2−2 and S1 = 4.

The special form of Mersenne primes makes them easy to identify, and for many years the largest known prime has been a Mersenne prime. A cooperative project on the internet, the Great Internet Mersenne Prime Search (GIMPS), has found all of the recent Mersenne primes, because the numbers have grown so large that a single computer can’t handle the workload.

Your task is to use the Lucas-Lehmer test to find the Mersenne primes through M256. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

In the previous exercise, we decided on a representation for big numbers and wrote some simple functions concerning them. In today’s exercise we look at addition, subtraction, and multiplication. We will use the standard grade-school algorithms; there are better ways to do multiplication, but it’s best to get something working first, and maybe later we can improve it.

The trick with all of these functions is to handle the sign separately from the arithmetic. Consider addition. If you want to calculate a + b = c, you must first figure out which of |a| + |b|, |a| − |b|, or |b| − |a| is equal to |c|, perform the appropriate arithmetic, then attach the appropriate sign. Likewise subtraction, except that the result may have fewer big-digits than either input, so you have to be careful to remove redundant leading zeros from the result. Multiplication is simpler; the result is positive if the signs are the same, otherwise negative.

For addition, the arithmetic is done by iterating over the big-digits from least to most significant, adding the two big-digits in each position and carrying to the next position any excess over the digit base. Since the carry is always either 0 or 1, it is not possible for a single carry to propagate beyond the next big-digit (though several big-digit sums may incur carries in succession). In those cases where the addition is done by subtraction, the carry (in grade-school they called it borrow) is either 0 or -1, and it may indeed propagate to multiple big-digits (think of decimal 1000 − 1, where the borrow propagates three positions).

Subtraction is done by changing the sign and calling the function for addition, in order to eliminate the need for duplicating code.

Multiplication is only a little bit harder. The grade-school method forms partial products for each big-digit, and adds all the partial products at the end, but it is easier to add the partial products at each big-digit. The carry may be greater than 1, but it is still true that a single carry cannot propagate beyond the next big-digit.

Your task is to write functions that perform big number addition, subtraction and multiplication. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Upside Up

May 27, 2011

An “upside up” number is a number that reads the same when it is rotated 180°. For instance, 689 and 1961 are upside up numbers.

Your task is to find the next upside up number greater than 1961, and to count the number of upside up numbers less than ten thousand. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Over the next few exercises we’ll be developing a library of functions that provide arithmetic on big numbers: integers that are too large to fit into a machine word. Most languages, including Scheme, provide big numbers, either natively or through a standard library, so you are unlikely to need a library for big numbers. On the other hand, coding big numbers is fun, and makes a good exercise. We’ll get started today by defining a representation for big numbers and writing a few small functions.

The first decision to make when writing a big number library is to choose the radix in which numbers are represented. With binary computers, it is most economical to make the radix the largest convenient power of two; for a 32-bit CPU, the preferred radix is 216, which means that single-place multiplication plus a carry never overflows a machine word. During development, we choose a radix of 1000 (“millenial digits”), which are convenient for debugging since the numbers are easily read by human eyes, but we’ll be careful not to rely on the radix, so that it is easy to switch to a larger radix when development is finished.

The second decision is how to represent numbers. A number is, of course, represented as a polynomial in the base of the radix, with the digits stored separately. We use a list, rather than an array, since lists are more natural in Scheme, and since lists don’t impose any additional limits on the size of the numbers. The exact representation we choose is called the signed magnitude representation, where the head of the list gives the number of digits in the list, as well as the sign of the number, and the tail of the list gives the digits themselves, least-significant digit first, always stored as a positive number. Thus, the number 12345678 is stored in our representation as the list (3 678 345 12) and the number -87654321 is stored as (-3 321 654 87), 1 is (1 1), -1 is (-1 1), and zero is (0). Among other things, this representation makes it easy to compare numbers: first compare the signed magnitude, and only compare the numbers digit-by-digit if the signed magnitudes are the same.

In today’s exercise we will implement the following procedures: functions to convert between big numbers and the native numbers of the underlying language, functions to take the absolute value of a big number and to negate a big number, predicates to identify positive, negative or zero big numbers and even or odd big numbers, and functions to compare two big numbers.

Your task is to implement those big number functions described using a signed-magnitude representation; we’ll write functions to do arithmetic on big numbers in future exercises. When you are finished, you are welcome to read or run a suggested solution, or to discuss the exercise or post your own solution in the comments below.

Pages: 1 2

ISBN Validation

May 20, 2011

In a previous exercise we studied Luhn’s algorithm for validating credit card numbers. In today’s exercise we will study the algorithms for validating 10-digit and 13-digit ISBN book numbers, and fetch the author and title of the book from an internet server.

The term ISBN refers to the 10-digit book number that libraries and bookstores use to catalog their collections. The ISBN number consists of four variable-length parts, separated by dashes or spaces, that always have 10 digits in total: the first part is a region, the second part is a publisher, the third part is a title, and the fourth part is a modulo-11 check digit, which may be a digit 0 through 9, or the letter X, which represents 10. To validate the number, ignore the dashes and spaces, multiply the ten digits by 10, 9, 8, …, 1 and sum the digital products; if the sum is 0 modulo 11, the ISBN number is valid.

The term EAN refers to the new 13-digit book numbers. The EAN number consists of five variable-length parts, separated by dashes or spaces, that always have 13 digits in total: the first part is the three digits 9 7 8, the second, third and fourth parts are the region, publisher and title of the ISBN, and the fifth part is a check digit. To validate the number, ignore the dashes and spaces, multiply the ten digits by 1, 3, 1, 3, …, 1 (alternating 1 and 3) and sum the digital products; if the sum is 0 modulo 10, the EAN number is valid.

In practice, the dashes or spaces are frequently omitted. It is easy to convert between ISBN and EAN by adding or deleting the 978 prefix and recomputing the check digit. The standard definitions of ISBN and EAN are available at http://www.isbn.org/standards/home/isbn/international/html/usm7.htm. There are several internet servers that provide free ISBN lookup services; we will use the one at http://isbndb.com.

Your task is to write functions that validate ISBN and EAN numbers, convert between them, and fetch the title and author associated with an ISBN from an internet server. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2