In the previous exercise, we decided on a representation for big numbers and wrote some simple functions concerning them. In today’s exercise we look at addition, subtraction, and multiplication. We will use the standard grade-school algorithms; there are better ways to do multiplication, but it’s best to get something working first, and maybe later we can improve it.
The trick with all of these functions is to handle the sign separately from the arithmetic. Consider addition. If you want to calculate a + b = c, you must first figure out which of |a| + |b|, |a| − |b|, or |b| − |a| is equal to |c|, perform the appropriate arithmetic, then attach the appropriate sign. Likewise subtraction, except that the result may have fewer big-digits than either input, so you have to be careful to remove redundant leading zeros from the result. Multiplication is simpler; the result is positive if the signs are the same, otherwise negative.
For addition, the arithmetic is done by iterating over the big-digits from least to most significant, adding the two big-digits in each position and carrying to the next position any excess over the digit base. Since the carry is always either 0 or 1, it is not possible for a single carry to propagate beyond the next big-digit (though several big-digit sums may incur carries in succession). In those cases where the addition is done by subtraction, the carry (in grade-school they called it borrow) is either 0 or -1, and it may indeed propagate to multiple big-digits (think of decimal 1000 − 1, where the borrow propagates three positions).
Subtraction is done by changing the sign and calling the function for addition, in order to eliminate the need for duplicating code.
Multiplication is only a little bit harder. The grade-school method forms partial products for each big-digit, and adds all the partial products at the end, but it is easier to add the partial products at each big-digit. The carry may be greater than 1, but it is still true that a single carry cannot propagate beyond the next big-digit.
Your task is to write functions that perform big number addition, subtraction and multiplication. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Upside Up
May 27, 2011
An “upside up” number is a number that reads the same when it is rotated 180°. For instance, 689 and 1961 are upside up numbers.
Your task is to find the next upside up number greater than 1961, and to count the number of upside up numbers less than ten thousand. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Big Numbers: Getting Started
May 24, 2011
Over the next few exercises we’ll be developing a library of functions that provide arithmetic on big numbers: integers that are too large to fit into a machine word. Most languages, including Scheme, provide big numbers, either natively or through a standard library, so you are unlikely to need a library for big numbers. On the other hand, coding big numbers is fun, and makes a good exercise. We’ll get started today by defining a representation for big numbers and writing a few small functions.
The first decision to make when writing a big number library is to choose the radix in which numbers are represented. With binary computers, it is most economical to make the radix the largest convenient power of two; for a 32-bit CPU, the preferred radix is 216, which means that single-place multiplication plus a carry never overflows a machine word. During development, we choose a radix of 1000 (“millenial digits”), which are convenient for debugging since the numbers are easily read by human eyes, but we’ll be careful not to rely on the radix, so that it is easy to switch to a larger radix when development is finished.
The second decision is how to represent numbers. A number is, of course, represented as a polynomial in the base of the radix, with the digits stored separately. We use a list, rather than an array, since lists are more natural in Scheme, and since lists don’t impose any additional limits on the size of the numbers. The exact representation we choose is called the signed magnitude representation, where the head of the list gives the number of digits in the list, as well as the sign of the number, and the tail of the list gives the digits themselves, least-significant digit first, always stored as a positive number. Thus, the number 12345678 is stored in our representation as the list (3 678 345 12) and the number -87654321 is stored as (-3 321 654 87), 1 is (1 1), -1 is (-1 1), and zero is (0). Among other things, this representation makes it easy to compare numbers: first compare the signed magnitude, and only compare the numbers digit-by-digit if the signed magnitudes are the same.
In today’s exercise we will implement the following procedures: functions to convert between big numbers and the native numbers of the underlying language, functions to take the absolute value of a big number and to negate a big number, predicates to identify positive, negative or zero big numbers and even or odd big numbers, and functions to compare two big numbers.
Your task is to implement those big number functions described using a signed-magnitude representation; we’ll write functions to do arithmetic on big numbers in future exercises. When you are finished, you are welcome to read or run a suggested solution, or to discuss the exercise or post your own solution in the comments below.
ISBN Validation
May 20, 2011
In a previous exercise we studied Luhn’s algorithm for validating credit card numbers. In today’s exercise we will study the algorithms for validating 10-digit and 13-digit ISBN book numbers, and fetch the author and title of the book from an internet server.
The term ISBN refers to the 10-digit book number that libraries and bookstores use to catalog their collections. The ISBN number consists of four variable-length parts, separated by dashes or spaces, that always have 10 digits in total: the first part is a region, the second part is a publisher, the third part is a title, and the fourth part is a modulo-11 check digit, which may be a digit 0 through 9, or the letter X, which represents 10. To validate the number, ignore the dashes and spaces, multiply the ten digits by 10, 9, 8, …, 1 and sum the digital products; if the sum is 0 modulo 11, the ISBN number is valid.
The term EAN refers to the new 13-digit book numbers. The EAN number consists of five variable-length parts, separated by dashes or spaces, that always have 13 digits in total: the first part is the three digits 9 7 8, the second, third and fourth parts are the region, publisher and title of the ISBN, and the fifth part is a check digit. To validate the number, ignore the dashes and spaces, multiply the ten digits by 1, 3, 1, 3, …, 1 (alternating 1 and 3) and sum the digital products; if the sum is 0 modulo 10, the EAN number is valid.
In practice, the dashes or spaces are frequently omitted. It is easy to convert between ISBN and EAN by adding or deleting the 978 prefix and recomputing the check digit. The standard definitions of ISBN and EAN are available at http://www.isbn.org/standards/home/isbn/international/html/usm7.htm. There are several internet servers that provide free ISBN lookup services; we will use the one at http://isbndb.com.
Your task is to write functions that validate ISBN and EAN numbers, convert between them, and fetch the title and author associated with an ISBN from an internet server. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Two Bad Sorts
May 17, 2011
We have some fun today implementing two really bad sorting algorithms.
Stooge sort is a recursively-defined algorithm that sorts n items in O(nlog 3 / log 1.5) comparisons: If the value at the end of the list is less than the value at the start of the list swap them. Then, if there are three are more items in the list, recursively sort the first two-thirds of the items in the list, then the last two-thirds of the items in the list, and finally (again) the first two-thirds of the items in the list. This is so bad that it takes a few seconds just to convince yourself that it actually works.
Bogosort is particularly bad, with factorial time complexity, on average, and no guarantee that it ever terminates: collect the items in the list in random order. If they are sorted, return them, otherwise try again with a new random ordering of the items in the list.
Your task is to implement stooge sort and bogosort. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Dixon’s Factorization Algorithm
May 13, 2011
In 1981, John D. Dixon, a mathematician at Carleton University, developed the integer factorization algorithm that bears his name. Dixon’s algorithm is not used in practice, because it is quite slow, but it is important in the realm of number theory because it is the only sub-exponential factoring algorithm with a deterministic (not conjectured) run time, and it is the precursor to the quadratic sieve factorization algorithm, which is eminently practical.
Dixon’s algorithm is based on the well-known fact of number theory that, given x2 ≡ y2 (mod n), with x ≠ ±y (mod n), it is likely that gcd(x–y, n) will be a factor of n. The algorithm is similar to the CFRAC algorithm of Morrison and Brillhart: the first phase finds smooth relations, and the second phase uses linear algebra to combine them and form a factorization. The difference is the first phase, where CFRAC looks in the convergents of the continued fraction of the square root of the number being factored, but Dixon’s algorithm simply chooses numbers at random.
Thus, the first stage of Dixon’s method is this: choose a bound B and identify the factor base of all primes less than B that are quadratic residues of n (their jacobi symbol is 1). Then choose a positive integer r < n, calculate its square modulo n, and if the square is smooth over the factor base, add r and its square to the linear algebra; repeat this step until you have found a sufficient number of smooth squares. The second stage is identical to CFRAC.
Your task is to write a function that factors integers by Dixon’s algorithm. When you are finished you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Comm
May 10, 2011
We have today another exercise in our continuing series of Unix V7 commands:
NAME
comm — select or reject lines common to two sorted files
SYNOPSIS
comm [ -[123] ] file1 file2
DESCRIPTION
Comm reads file1 and file2, which should be ordered in ASCII collating sequence, and produces a three column output: lines only in file1; lines only in file2; and lines in both files. The filename ‘-‘ means the standard input. Flags 1, 2, or 3 suppress printing of the corresponding column. Thus comm -12 prints only the lines common to the two files; comm -23 prints only lines in the first file but not in the second; comm -123 is a no-op.
Your task is to implement the comm command. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Entab And Detab
May 6, 2011
In ancient times, say the 1970s, religious wars were fought about whether to indent blocks of code using tabs or spaces, and how wide the indents should be. Then editing environments got better and programmers stopped arguing about tabs and started arguing about other things, like where the braces go. Now, with the advent of the internet, the problem of tab/space indentation seems to be returning, as copy/paste operations between web browsers and text editors seem to get the indentation wrong (especially when the source is the evil PDF format).
In ancient times, the normal solution to the tab/space indentation problem was a pair of programs, entab and detab, that could easily convert between the two formats. Now, with the internet, it behooves us to resurrect those old programs.
Your task is to write programs that convert files using tabs for indentation to files using spaces for indentation, and vice versa; be sure to permit an argument specifying the width of the tab. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Squaring The Bishop
May 3, 2011
Among the many other works of Charles Babbage is the game of word squares. A word square consists of a set of words written in a square grid in such a way that the same words can be read both horizontally and vertically. Some sample word squares are shown below; the one on the left, in Latin, was found in the ruins of Pompeii, the one in the middle is due to Doug McIlroy, and the one on the right is due to Babbage:
S A T O R W A S S A I L D E A N
A R E P O A N T E N N A E A S E
T E N E T S T R I N G Y A S K S
O P E R A S E I Z U R E N E S T
R O T A S A N N U L A R
I N G R A T E
L A Y E R E D
Babbage, in his memoirs, proposed to find a word square based on BISHOP, but was unable to do so.
Your task is to write a program that, given a single word, creates a word square using that word in its top row, then use that program to find a word square for the starting word BISHOP. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.