February | 2011 | Programming Praxis

Sieve Of Euler

February 25, 2011

The ancient Sieve of Eratosthenes that computes the list of prime numbers is inefficient in the sense that some composite numbers are struck out more than once; for instance, 21 is struck out by both 3 and 7. The great Swiss mathematician Leonhard Euler invented a sieve that strikes out each composite number exactly once, at the cost of some additional bookkeeping. It works like this:

First, make a list of numbers from 2, as large as you wish; call the maximum number n.

Second, extract the first number from the list, make a new list in which each element of the original list, including the first, is multiplied by the extracted first number.

Third, “subtract” the new list from the original, keeping in an output list only those numbers in the original list that do not appear in the new list.

Fourth, output the first number from the list, which is prime, and repeat the second, third and fourth steps on the reduced list excluding its first element, continuing until the input list is exhausted.

For example, start with the list 2 3 4 5 … 30. Then the new list is 4 6 8 10 … 60. Subtracting gives the list 2 3 5 7 9 … 29. Now 2 is prime and the process repeats on the list 3 5 7 9 … 29. At the next step, the new list is 9 15 21 27 … 87, subtracting gives the list 3 5 7 11 13 … 29, now 2 and 3 are prime and the process repeats on the list 5 7 11 13 … 29. Likewise for the primes 5 and 7, and since 7 · 7 > 30, the process stops, with the remaining list 11 13 17 19 23 29, so the complete list of primes less than 30 is 2 3 5 7 11 13 17 19 23 29.

Just as in the Sieve of Eratosthenes, you can speed up the Sieve of Euler by considering only odd numbers, by stopping once the first item in the list is greater than the square root of n, and by computing the new list in the second step only as far as n.

Your task is to write a program that implements the Sieve of Euler, then compare its performance to the Sieve of Eratosthenes. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

16 Comments »

Sliding Window Minimum

February 22, 2011

The sliding window minimum problem takes a list of n numbers and a window size k and returns a list of the minimum values in each of the n−k+1 successive windows. For instance, given the list {4, 3, 2, 1, 5, 7, 6, 8, 9} and a window of size 3, the desired output is the list {2, 1, 1, 1, 5, 6, 6}. Richard Harter discusses this problem at his blog, along with several different solutions.

The obvious solution is to report the minimum of the first k elements of the list, slide one position down the list, take the minimum of the first k elements starting at the new position, and so on, until there are less than k elements remaining.

Harter presents a better solution that he calls the ascending minima algorithm that requires O(n) time and O(k) space. He uses an auxiliary data structure, a queue, that is initialized as the minimum value of the initial window, followed by the minimum value of those items in the initial window that follow the right-most occurrence of the minimum value, followed by the minimum value of those items in the initial window that follow the right-most occurrence of the second value, and so on, up to a maximum of k ascending minimums; each minimum is paired with the index of the position where the minimum disappears from the window. For instance, with k=6 and the first six items of the input list {5, 2, 8, 6, 4, 7}, the queue will have three pairs (2 7), (4 10), and (7 11), indicating that 2 is the minimum value up to and including the 7^th list element, then 4 is the minimum value for the 8^th, 9^th and 10^th elements, and so on, unless a smaller item appears.

Once the queue of ascending minimums is initialized, output is produced by emitting the first value in the queue then updating the queue with the next item beyond the end of the current sliding window using the following three-step process:

remove from the queue all items with value greater than the incoming item,
append the incoming item to the end of the queue, along with its “death index,” and
remove the head of the queue if it is beyond its death index.

Let’s look again at the list {4, 3, 2, 1, 5, 7, 6, 8, 9} with a window of size 3. The initial queue has the single entry (2 5), because the minimum item is the last element of the initial window (the largest possible queue arises when the window has monotonically increasing values); the queue entry indicates that the minimum value will be 2 until after the 5^th item in the input, when that minimum value “dies.” The initial minimum 2 is output and the new item 1 is added to the queue; since 2 is greater than 1, the (2 5) entry is removed from the queue, a new (1 6) entry is added to the end of the queue, and since the current index 4 is less than the death index 6, we proceed to the next item. Now we output the current minimum 1, add an entry (5 7) to the queue, and move on. Again we output the current minimum 1, add an entry (6 8) to the queue, and move on. One more time we output the current minimum 1, and an entry (8 9) to the queue, and delete (1 6) from the head of the queue since the current index has reached its death index; the queue currently contains the items (5 7) (6 8) and (8 9). Going forward, we output 5 and delete the head of the queue, output 6, and output 6 again, at which point we stop because the current index has reached the end of the input.

Your task is to write two functions that solve the sliding window minimum problem, using the two algorithms described above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

6 Comments »

Two Factoring Games

February 18, 2011

Over at mersenneforum.org, a small community of people interested in factoring large integers talk about their craft, share tips and accomplishments, and sometimes play factoring games. Today’s exercise is about two of those games.

The home prime of a number n is computed by factoring the number into its prime factors, concatenating the digits of the prime factors (which have been sorted into ascending order), and repeating until the result is prime; this prime number is known as the home prime. For instance, 99 factors as 3 · 3 · 11; 3311 factors as 7 · 11 · 43, and 71143 is prime, so HP(99) = 71143, computed in three steps. Sloane has the list of home primes at A037274. Note that the home prime of a prime number is the number itself. Many numbers resolve to their home prime in just a few steps, as above, but others take longer, and it not known if all numbers eventually resolve to a home prime. At mersenneforum, and at the World of Numbers, Alex Gruppa, Paul Leyland, and Nicolas Daminelli have been working on the calculation of HP49 for ten years, and are currently stalled, after 105 steps, by a 210-digit composite.

Another factoring game involves the Euclid-Mullin sequence, which starts with a₁ = 2 and continues $a_n = \mathrm{lpf} \left( 1 + \prod_{k=1}^{n-1} a_k \right)$ ; the first few terms are 2, 3, 7, 43, 13, 53, 5, 6221671, 38709183810571, 139, … (A000945). Currently, only the first 47 terms are known, with calculations stalled by a 256-digit composite; the 43rd term was only found last year after a lengthy effort. It is conjectured but not known that the sequence includes all the prime numbers, as it is closely related to Euclid’s original proof of the infinitude of primes.

Your task is to write two functions to compute the home prime of a given number and to compute the Euclid-Mullin sequence. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

3 Comments »

Google Code Jam Qualification Round Africa 2010

February 15, 2011

Today’s three programming exercises come from the Google Code Jam Qualification Round Africa 2010:

Store Credit: You receive a credit C at a local store and would like to buy two items. You first walk through the store and create a list L of all available items. From this list you would like to buy two items that add up to the entire value of the credit. The solution you provide will consist of the two integers indicating the positions of the items in your list (smaller number first). For instance, with C=100 and L={5,75,25} the solution is 2,3; with C=200 and L={150,24,79,50,88,345,3} the solution is 1,4; and with C=8 and L={2,1,9,4,4,56,90,3} the solution is 4,5.

Reverse Words: Given a list of space separated words, reverse the order of the words. Each input string contains L letters and W words. An input string will only consist of letters and space characters. There will be exactly one space character between each pair of consecutive words. For instance, the reverse of “this is a test” is “test a is this”, the reverse of “foobar” is “foobar”, and the reverse of “all your base” is “base your all”.

T9 Spelling: The Latin alphabet contains 26 characters and telephones only have ten digits on the keypad. We would like to make it easier to write a message to your friend using a sequence of keypresses to indicate the desired characters. The letters are mapped onto the digits as 2=ABC, 3=DEF, 4=GHI, 5=JKL, 6=MNO, 7=PQRS, 8=TUV, 9=WXYZ. To insert the character B for instance, the program would press 22. In order to insert two characters in sequence from the same key, the user must pause before pressing the key a second time. The space character should be printed to indicate a pause. For example “2 2” indicates AA whereas “22” indicates B. Each message will consist of only lowercase characters a-z and space characters. Pressing zero emits a space. For instance, the message “hi” is encoded as “44 444”, “yes” is encoded as “999337777”, “foo bar” (note two spaces) is encoded as “333666 6660022 2777”, and “hello world” is encoded as “4433555 555666096667775553”.

Your task is to solve the three Code Jam exercises. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

23 Comments »

Sums Of Powers

February 11, 2011

We saw the calculation of the Bernoulli numbers in the previous exercise. In today’s exercise we will study the Bernoulli numbers in their original context in computing the sums of powers. For instance, S₁₀(1000) = 1¹⁰ + 2¹⁰ + … + 1000¹⁰ = 91409924241424243424241924242500, a result calculated by the Swiss mathematician Jakob Bernoulli in “less than half of a quarter of an hour” in the late seventeenth century.

Any discussion of Bernoulli numbers is complicated by the fact that there are several similar sequences that go by that name. The oldest is -1/2, 1/6, 0, -1/30, 0, 1/42, 0, -1/30, 0, 5/66, …; that’s the sequence we calculated previously. The modern version of the sequence prepends a 1, so it’s 1, -1/2, 1/6, 0, -1/30, 0, 1/42, 0, -1/30, 0, 5/66, …. And sometimes the -1/2 is given as +1/2, so the sequence is 1, 1/2, 1/6, 0, -1/30, 0, 1/42, 0, -1/30, 0, 5/66, …. Even worse, sometimes the Bernoulli numbers are numbered counting from zero, though mostly they are numbered counting from one, so you have to be careful when you talk about the Bernoulli numbers.

The particular sequence that we want to look at is the third sequence that starts with 1, 1/2, 1/6, 0, -1/30, …. In 1999, S. Akiyama and Y. Tanigawa found a simple way to compute that sequence in linear time using a dynamic program represented in a triangular matrix:

1	1/2	1/3	1/4	1/5
1/2	1/3	1/4	1/5	…
1/6	1/6	3/20	…	…
0	1/30	…	…	…
-1/30	…	…	…	…

In the first row, A_0,j is computed as 1 / (j+1). In subsequent rows, A_i,j is computed as (j+1) × (A_i−1,j − A_i−1,j+1). The Bernoulli number B_n is located in the first column at A_n,0.

Given Bernoulli numbers, sums of powers can be computed by Bernoulli’s formula, where $\left( \begin{array}{c} n \\ k \end{array} \right) = \frac{n!}{(n-k)!k!}$ is the binomial coefficient:

$S_m(n) = \frac{1}{m+1} \sum_{k=1}^n \left( \begin{array}{c} m+1 \\ k \end{array} \right) B_k n^{m+1-k}$

Your task is to write functions that return a list of Bernoulli numbers up to a requested limit and that compute the sum of powers S_m(n). When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

4 Comments »

The First Computer Program

February 8, 2011

By Mirko Tobias Schaefer. Licensed under CC-BY-2.0. Beginning in the 1820s, and continuing until his death in 1871, Charles Babbage, an English mathematician (he was for ten years the Lucasian Professor of Mathematics at Cambridge, a chair once held by Sir Isaac Newton and currently held by Stephen Hawking), engineer, scientist and inventor was fascinated with the idea of a machine that could perform arithmetic calculations. His first machine, the Difference Engine, first described in 1822, used finite differences to compute the values of polynomials. His second machine, the Analytical Engine, first described in 1837, was much more ambitious, consisting of a control unit, an arithmetic processor, and a storage space for intermediate calculations, and could be programmed to perform specific calculations by use of punched cards. Both machines were mechanical, not electrical, powered by steam. Neither machine was built prior to his death, due to a lack of money and the expertise to manufacture the needed parts with the required precision.

Augusta Ada King (née Byron), Countess of Lovelace, daughter of the English poet Lord Byron and an amateur mathematician, was fascinated by Babbage’s plans for the Analytical Engine. In 1842, at Babbage’s request, she translated a manuscript about the Analytical Engine by the Italian mathematician Luigi Menabrea (who later became the Prime Minister of Italy), adding her own notes, which were more extensive than the original manuscript. Note G describes the use of the Analytical Engine to compute the Bernoulli numbers; it is complete and rigorous, and is now recognized as the first computer program, making the Countess the first computer programmer.

The Countess’ translated manuscript and notes, available from FourmiLab, make fascinating reading, a terrific way to pass a cold winter evening. She was clearly in full command of her subject, and anticipated much of computer architecture and modern computer programming languages a full century before they were discovered. For instance, if you ignore the stilted language, this quote could have been written last week:

In studying the action of the Analytical Engine, we find that the peculiar and independent nature of the considerations which in all mathematical analysis belong to operations, as distinguished from the objects operated upon and from the results of the operations performed upon those objects, is very strikingly defined and separated.

It is hard to overstate the Countess' accomplishment. She had no high-level programming language, no libraries, no integrated development environment, not even an assembler or a debugger; she was working on "bare metal" — literally! The only documentation she had was that which she wrote herself. And she was working in untrodden territory, as there were no textbooks about programming, no tutors, no "Learn X in 21 Days" to help her. She didn't even have a machine to run her program. It is amazing that she wrote the program at all; incredibly, modern computer scientists have examined the program and declared it bug-free, and today’s programmers must simply stand in awe. The Countess was not only the first computer programmer, she was also an extremely talented one.

The program that the Countess chose to demonstrate the Analytical Engine was a program to compute the Bernoulli numbers and is based on Equation 8 of Note G:

$0 = -\frac{1}{2} \cdot \frac{2n-1}{2n+1}$ $+ \mathrm{B}_1\left(\frac{2n}{2}\right)$ $+ \mathrm{B}_3\left(\frac{2n\cdot(2n-1)\cdot(2n-2)}{2\cdot3\cdot4}\right)$ $+ \mathrm{B}_5\left(\frac{2n\cdot(2n-1)\cdot\ldots\cdot(2n-4)}{2\cdot3\cdot4\cdot5\cdot6}\right)$ $+ \ldots$ $+ \mathrm{B}_{2n-1}$

Thus, B₁ = -1/2 when n = 1, B₃ = 1/6 when n = 2, B₅ = -1/30 when n = 3, B₇ = 1/42 when n = 4, B₉ = -1/30 when n = 5, B₁₁ = 5/66 when n = 6, B₁₃ = -691/2730 when n = 7, B₁₅ = 7/6 when n = 8, B₁₇ = -3617/510 when n = 9, B₁₉ = 43867/798 when n = 10, and so on. The successive Bernoulli numbers are computed in order, each feeding the computation of the next, the various numerators and denominators being computed according to the pattern in the formula.

Your task is to write a program to calculate the Bernoulli numbers, using the method of the Countess’ Note G. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

5 Comments »

Excel Columns

February 4, 2011

The columns in an Excel spreadsheet are “numbered” A through IV, where A corresponds to 1, Z corresponds to 26, AA corresponds to 27, and IV corresponds to 256.

Your task is to write functions that translate back and forth between the two representations. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

13 Comments »

Cuckoo Hashing

February 1, 2011

By Rasmus Pagh. Licensed under CC BY-SA 3.0. Hashing is a way of maintaining a dictionary data structure that provides insert, delete and lookup operations. The most common hashing method, called chained hashing, uses a single hash function to provide an offset into an array; each bucket contains a linked list, which is searched to find the desired item. This method is fast if the hash function does a good job of distributing the keys, with average access in constant time, but has an annoying linear-time worst case if all the keys hash to the same value.

An alternate hashing method, called cuckoo hashing, was invented by Rasmus Pagh and Flemming Friche Rodler. The advantage of cuckoo hashing is that it guarantees constant-time lookups, and its amortized-constant-time insertions are within a constant factor of optimal. The code that implements cuckoo hashing is simple, and cuckoo hashing performs very well in practice.

In cuckoo hashing, instead of storing a list of key/value pairs in a single bucket, each “nest” position in an array either has a single key/value pair, or is empty. And instead of a single hash function, there are two; any key must be found at one of the two nests, guaranteeing constant-time lookup. Insertion works by looking at the two possible nests; if either is empty, the new key/value pair is placed there, but if both are occupied, the item being inserted is placed in one of them and the displaced item is then inserted recursively in a different nest. If insertion ever encounters a cycle, the whole data structure is rehashed into a new one using two new hash functions; it is possible that rehashing encounters a new cycle, and so on, ad idfinitum, but in practice that almost never happens, and if it does, can be fixed by increasing the size of the array. The diagram at right shows a chain of key/value pairs; note the cycle between H and W. Cuckoo hashing derives its name from the behavior of the cuckoo bird that makes its nest by finding another bird’s nest and driving away its original occupant.

Several variants of cuckoo hashing have been described in the academic literature. The biggest disadvantage of cuckoo hashing is wasted space; the version described above requires a load factor of no more than 50% nests occupied by key/value pairs in order to guarantee amortized-constant-time insertions. Adding a third hash function increases the average fill rate from 50% to 91%, and adding additional hash functions increases that rate even more. Another possibility is to allow a fixed number of items, greater than one, in each slot of the hash table; increasing from one item to two increases the average fill rate to 80%. Of course, these ideas can be used together; using three “hatch” functions and two birds per nest increases the average fill rate to 97%, and even higher fill rates are possible, though at the cost of more work per item (but the big-oh time guarantees remain unchanged). Most real implementations of cuckoo hashing provide for the number of nests to change dynamically, growing and shrinking as the number of items in the table changes.

Your task is to write a library that maintains a dictionary of key/value pairs using cuckoo hashing; you should provide operators for the three basic operators (lookup, insert and delete) plus a function to extract the key/value pairs to a list. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

3 Comments »

Programming Praxis

Sieve Of Euler

February 25, 2011

Sliding Window Minimum

February 22, 2011

Two Factoring Games

February 18, 2011

Google Code Jam Qualification Round Africa 2010

February 15, 2011

Sums Of Powers

February 11, 2011

The First Computer Program

February 8, 2011

Excel Columns

February 4, 2011

Cuckoo Hashing

February 1, 2011

Categories

Archives

Archives