Streaming Median

May 29, 2012

The median of a set of numbers is the number in the middle when the items are arranged in sorted order, when the number of items is odd, or the average of the two numbers in the middle, when the number of items is even; for instance, the median of {3 7 4 1 2 6 5} is 4 and the median of {4 2 1 3} is 2.5. The normal algorithm for computing the median considers the entire set of numbers at once; the streaming median algorithm recalculates the median of each successive prefix of the set of numbers, and can be applied to a prefix of an infinite sequence. For instance, the streaming medians of the original sequence of numbers are 3, 5, 4, 3.5, 3, 3.5, and 4.

The streaming median is computed using two heaps. All the numbers less than or equal to the current median are in the left heap, which is arranged so that the maximum number is at the root of the heap. All the numbers greater than or equal to the current median are in the right heap, which is arranged so that the minimum number is at the root of the heap. Note that numbers equal to the current median can be in either heap. The count of numbers in the two heaps never differs by more than 1.

When the process begins the two heaps are initially empty. The first number in the input sequence is added to one of the heaps, it doesn’t matter which, and returned as the first streaming median. The second number in the input sequence is then added to the other heap, if the root of the right heap is less than the root of the left heap the two heaps are swapped, and the average of the two numbers is returned as the second streaming median.

Then the main algorithm begins. Each subsequent number in the input sequence is compared to the current median, and added to the left heap if it is less than the current median or to the right heap if it is greater than the current median; if the input number is equal to the current median, it is added to whichever heap has the smaller count, or to either heap arbitrarily if they have the same count. If that causes the counts of the two heaps to differ by more than 1, the root of the larger heap is removed and inserted in the smaller heap. Then the current median is computed as the root of the larger heap, if they differ in count, or the average of the roots of the two heaps, if they are the same size.

Your task is to write a function that computes the streaming medians of a sequence. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Advertisement

Pages: 1 2

Ackermann’s Function

May 25, 2012

In the 1920s, Wilhelm Ackermann demonstrated a computable function that was not primitive-recursive, settling an important argument in the run-up to the theory of computation. There are several versions of his function, of which the most common is

A(m,n) = \left\{    \begin{array}{ll}      n+1 & \mbox{if \(m=0\)} \\      A(m-1,1) & \mbox{if \(m > 0\) and \(n = 0\)} \\      A(m-1, A(m,n-1)) &        \mbox{if \(m > 0\) and \(n > 0\)}    \end{array}  \right.

defined over non-negative integers m and n. The function grows very rapidly, even for very small inputs; as an example, A(4,2) is an integer of about 20,000 digits.

Your task is to implement Ackermann’s function. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Hamming Codes

May 22, 2012

Hamming codes, which were invented by Richard Hamming in 1950, are a method of transmitting data over a noisy channel that give the recipient the ability to correct simple errors. The sender adds parity bits to the transmission stream so that, when the data bits and parity bits are combined, any single-bit error, in either the data bits or parity bits, can be identified and corrected. The number of parity bits that are required is given by the Hamming rule d + p + 1 ≤ 2p where d is the number of data bits and p is the number of parity bits. The length of the code word c, which combines the data bits and parity bits, is d + p, and a Hamming code is described by (c,d). We will illustrate using a 4-bit data word, which requires 3 parity bits to satisfy the Hamming rule (2 is insufficient because 4+2+1>4, but 3 is sufficient because 4+3+1≤8) and is described by (7,4).

A particular instance of a Hamming code uses two matrices, G the generator matrix and H the syndrome matrix. Here are sample generator (left) and syndrome (right) matrices for a (7,4) Hamming code:

1 0 0 0 1 1 1    1 0 1 1 1 0 0
0 1 0 0 0 1 1    1 1 0 1 0 1 0
0 0 1 0 1 0 1    1 1 1 0 0 0 1
0 0 0 1 1 1 0

The generator matrix is denoted [ I : A ] and consists of an identity matrix I in the left-most four columns (the number of data bits) and a parity coding matrix A in the right-most three columns (the number of parity bits). There is no formula for the A matrix; it must be constructed so that each data bit is checked by two or more parity bits, in such a way that no combination of parity bits overlaps a data bit. The syndrome matrix is denoted [ AT : I ] and consists of the transpose of the parity coding matrix A in the left-most four columns and the identity matrix in the right-most four columns.

A data word is encoded by multiplying it by the generator matrix, with all arithmetic done modulo 2; we gave an algorithm for matrix multiplication in a previous exercise. For instance, the data word [1 0 0 1] is encoded as [1 0 0 1 0 0 1] like this:

                                  | 1 |
                                  | 0 |
              | 1 0 0 0 1 1 1 |   | 0 |
| 1 0 0 1 | * | 0 1 0 0 0 1 1 | = | 1 |
              | 0 0 1 0 1 0 1 |   | 0 |
              | 0 0 0 1 1 1 0 |   | 0 |
                                  | 1 |

Decoding is the inverse operation, with the syndrome matrix multiplied by the encoded data:

                    | 1 |
                    | 0 |
| 1 0 1 1 1 0 0 |   | 0 |   | 0 |
| 1 1 0 1 0 1 0 | * | 1 | = | 0 |
| 1 1 1 0 0 0 1 |   | 0 |   | 0 |
                    | 0 |
                    | 1 |

If all of the result bits are zero, then the transmission succeeded without errors, and the input code is the first four bits of the encoding [1 0 0 1]. But if there was a transmission error, the syndrome points to it. For instance, if the recipient received the tranmission as [1 0 1 1 0 0 1], then the syndrome computes as [1 0 1], which matches the third column of the H matrix, indicating that the third bit of the transmission was in error, so instead of [1 0 1 1] the message is [1 0 0 1], which is correct.

Your task is to write functions that encode and decode a message using Hamming codes as described above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

It is often necessary for programs to produce numeric output in various formats, and most languages provide libraries for this purpose; for instance, C provides the printf function, which includes the d and f format specifications for decimal numbers (integers) and floating point numbers, respectively.

Your task is to write library functions that format integers and floating point numbers; you may follow the formatting conventions of C, or those of some other language, or invent your own. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Streaming Knapsack

May 15, 2012

A famous problem of computer science is the knapsack problem, in which you are to find a combination of items from a population that sums to a given target, often with some kind of constraint such as maximizing the value of the items. In today’s problem we want to find the first possible combination of k integers from a stream of positive integers that sum to n. For instance, given the input stream 4, 8, 9, 2, 10, 2, 17, 2, 12, 4, 5, …, we want to find the knapsack containing 4, 2, 10, 2, 2 immediately after reading the third 2, without reading the 12, 4, 5 that follow it.

Your task is to write a program that takes parameters k and n and an input stream and returns the first possible knapsack. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Partitions

May 11, 2012

The partitions of an integer is the set of all sets of integers that sum to the given integer. For instance, the partitions of 4 is the set of sets ((1 1 1 1) (1 1 2) (1 3) (2 2) (4)). We computed the number of partitions of an integer in a previous exercise. In today’s exercise, we will make a list of the partitions.

The process is recursive. There is a single partition of 0, the empty set (). There is a single partition of 1, the set (1). There are two partitions of 2, the sets (1 1) and (2). There are three partitions of 3, the sets (1 1 1), (1 2) and (3). There are five partitions of 4, the sets (1 1 1 1), (1 1 2), (1 3), (2 2), and (4). There are seven partitions of 5, the sets (1 1 1 1 1), (1 1 1 2), (1 2 2), (1 1 3), (1 4), (2 3) and (5). And so on. In each case, the next-larger set of partitions is determined by adding each integer x less than or equal to the desired integer n to all the sets formed by the partition of nx, eliminating any duplicates.

Your task is to write a function that generates the set of all partitions of a given integer. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Factor Tables

May 8, 2012

Before the dawn of computers, most computations were done with the aid of tables: logarithm tables, sine tables, and so on. These tables were ubiquitous, indispensable, and riddled with errors. Number theorists who needed to factor numbers used tables of the least prime factor of a number. The oldest such table dates to 1603 (it contained the least prime factor of all numbers to 750), and new tables were being constructed as late as Derrick N. Lehmer’s table of least prime factors to ten million in 1909 (he was the father of Derrick H. Lehmer); Maarten Bullynck gives the history. Here’s a sample page from a large table, showing the least prime factors of all numbers less than a thousand; numbers divisible by 2 and 5 are omitted, and primes are skipped:

      0    1    2    3    4    5    6    7    8    9
 1        --    3    7   --    3   --   --    3   17
 3   --   --    7    3   13   --    3   19   11    3
 7   --   --    3   --   11    3   --    7    3   --
 9    3   --   11    3   --   --    3   --   --    3
11   --    3   --   --    3    7   13    3   --   --
13   --   --    3   --    7    3   --   23    3   11
17   --    3    7   --    3   11   --    3   19    7
19   --    7    3   11   --    3   --   --    3   --
21    3   11   13    3   --   --    3    7   --    3
23   --    3   --   17    3   --    7    3   --   13
27    3   --   --    3    7   17    3   --   --    3
29   --    3   --    7    3   23   17    3   --   --
31   --   --    3   --   --    3   --   17    3    7
33    3    7   --    3   --   13    3   --    7    3
37   --   --    3   --   19    3    7   11    3   --
39    3   --   --    3   --    7    3   --   --    3
41   --    3   --   11    3   --   --    3   29   --
43   --   11    3    7   --    3   --   --    3   23
47   --    3   13   --    3   --   --    3    7   --
49    7   --    3   --   --    3   11    7    3   13
51    3   --   --    3   11   19    3   --   23    3
53   --    3   11   --    3    7   --    3   --   --
57    3   --   --    3   --   --    3   --   --    3
59   --    3    7   --    3   13   --    3   --    7
61   --    7    3   19   --    3   --   --    3   31
63    3   --   --    3   --   --    3    7   --    3
67   --   --    3   --   --    3   23   13    3   --
69    3   13   --    3    7   --    3   --   11    3
71   --    3   --    7    3   --   11    3   13   --
73   --   --    3   --   11    3   --   --    3    7
77    7    3   --   13    3   --   --    3   --   --
79   --   --    3   --   --    3    7   19    3   11
81    3   --   --    3   13    7    3   11   --    3
83   --    3   --   --    3   11   --    3   --   --
87    3   11    7    3   --   --    3   --   --    3
89   --    3   17   --    3   19   13    3    7   23
91    7   --    3   17   --    3   --    7    3   --
93    3   --   --    3   17   --    3   13   19    3
97   --   --    3   --    7    3   17   --    3   --
99    3   --   13    3   --   --    3   17   29    3

For example the table shows that the least prime factor of 923, in the column headed 9 and row headed 23, is 13, and 997 is the greatest prime less than a thousand. To factor a number, find its least prime factor in the table, compute the remaining cofactor by division, and repeat until the cofactor is prime.

When building the table, the least prime factor is computed by sieving, not by trial division. The setup is the same as the Sieve of Eratosthenes, except that integers are used instead of booleans, and each item in the sieve is initialized to 1. Then each successively-smallest prime p is sieved, but instead of changing true to false, each 1 in the chain of multiples of p is changed to p (other values are ignored). The same optimizations as the normal sieve — odd numbers only, start at the square of p, and stop when p2 is greater than n — apply here.

Your task is to write a function that sieves for least prime factors, and to use that function to write a program that builds factor tables as illustrated above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Even-Odd Partition

May 4, 2012

I’m not sure where this problem comes from; it’s either homework or an interview question. Nonetheless, it is simple and fun:

Take an array of integers and partition it so that all the even integers in the array precede all the odd integers in the array. Your solution must take linear time in the size of the array and operate in-place with only a constant amount of extra space.

Your task is to write the indicated function. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Legendre’s Symbol

May 1, 2012

The Legendre Symbol and its cousin the Jacobi Symbol are used in modular arithmetic to determine if a number a is a quadratic residue to the modulus m. A number a is a quadratic residue if there exists a number x such that x2a (mod m) and is defined only when a and m are co-prime. For instance, a2 (mod 7) for a from 0 to 6 is the list 02 (mod 7) = 0, 12 (mod 7) = 1, 22 (mod 7) = 4, 32 (mod 7) = 2, 42 (mod 7) = 2, 52 (mod 7) = 4, and 62 (mod 7) = 1, so the quadratic residues of 7 are 1, 2 and 4 (0 is excluded because it isn’t co-prime to 7). The jacobi symbol considers any odd modulus; the legendre symbol is limited to odd prime moduli. The symbols are usually written in parentheses with a over m, like this: \left( a \atop m \right). Sometimes the symbol is written with a horizontal rule between the a and m, and sometimes it is written on a single line as (a / m).

The legendre/jacobi symbol can be calculated according to the following three termination rules:

1. (0 / m) = 0

2. (1 / m) = 1

3. (2 / m) = −1 if m mod 8 ∈ {3, 5} or 1 if m mod 8 ∈ {1, 7}

and the following three reduction rules:

4. [reducing factors of 2] (2a / m) = (2 / m) × (a / m)

5. [reducing modulo m] (a / m) = (a (mod m) / m) if am or a < 0

6. [reducing odd co-primes a and m] (a /m) = −(m / a) if am ≡ 3 (mod 4), or (m / a) otherwise

Thus, the legendre/jacobi symbol is 1 if a is a quadratic residue, -1 if a is not a quadratic residue, and 0 if a and m are not co-prime.

Our various prime-number programs have used a definition of the legendre/jacobi symbol that doesn’t work; for some values of a and m it returned wrong results, and for other values of a and m it entered an infinite loop. This exercise fixes the problem.

Your task is to write a function that calculates the legendre/jacobi symbol using the rules given above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2