The First Two Programs

September 7, 2012

In their book The C Programming Language, Brian Kernigan and Dennis Ritchie say that the first program every programmer should write is a program that writes “Hello, world!” to the console. Then they give the second program that produces a fahrenheit/celsius temperature conversion table, with fahrenheit temperatures every 20 degrees from 0 to 300 and the corresponding celsius temperature to the nearest degree, each pair written on its own line with the two temperatures separated by tabs.

Your task is to write the first two programs. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Fountain Codes

September 4, 2012

Fountain codes provide an effective method of file transmission over lossy channels. Invented by Michael Luby in 1998, but not published unti 2002, they are used to send bit torrents and streaming media, transmit map and traffic data to in-car receivers, and probably for the recent “brain transplant” to the Curiosity rover on Mars (I’m speculating, but wouldn’t you expect to lose a few packets between here and Mars?). The idea is to send many chunks of a file, larger than the original file, like a fountain spraying droplets of water, and for the receiver to collect drops in a bucket, which can be reassembled into the original file without collecting all the drops sprayed by the fountain.

The encoding process is simple. Pick a random number d between 1 and n, the number of blocks in the file. Pick d blocks from the file at random (no duplicates) and xor them together. Send the xor of the combined blocks, along with the list of blocks used to make the xor. The distribution of d is important; it is not just a uniform random number. We choose d with probability d / (1/2)n(n+1), so if the file has 20 blocks, the denominator is 210 (the sum of the integers from 1 to 20), the degree d is 1 with probability 20/210, 2 with probability 19/210, and so on, until it is 20 with probability 1/210.

The decoding process isn’t much harder. Start with an array of blocks of length n, with an indication that each block is not currently decoded. Then each packet is analyzed as it is received. If the packet has a single block, it can be added immediately to the array of blocks; if not, any known blocks in the input list can be xor’ed into the packet, reducing its degree. A packet that is reduced to a single block is added to the output, others are added to a hold list. Any time a new block is discovered, the entire hold list is scanned for any further reductions in any of its packets.

Let’s illustrate with a fountain that sends blocks of a single byte from the “Programming Praxis” input string. We begin with an empty array and hold list and fetch the first packet from the fountain (22 15 9). That packet has two blocks, so we put it in the hold and go on. The next packet is (80 0), so we put a “P” in block 0 and go on. The next packet is (55 12 10), which we add to the hold. Next we get a packet (14 10 8), which doesn’t help immediately, but it shares a block with (55 12 10), which will be useful later; we add it to the hold, which now contains (22 15 9), (55 12 10) and (14 10 8). The next packet is (103 10). We can immediately add a “g” at block 10, then xor 55 with 103 to get the “P” at block 12 and xor 14 with 103 to get the “i” at block 8; our string is now “P…….i.g.P…..” and our hold is now (22 15 9). We next receive packets (19 5 1), (88 11 15), (45 1 9 14 12), (105 16) and (114 1), so we can add an “i” at block 16, an “r” at block 1, and, by virtue of the (19 5 1) packet in the hold, an “a” at block 5; now the string is “Pr…a..i.g.P…..” and the hold contains (22 15 9), (88 11 15), and (45 1 9 14 12). The next packet is (97 14), which gives us another block, the “a” of “Praxis,” and also lets us reduce the packet (45 1 9 14 12) in the hold to (76 1 9 12). The next packet is (99 16 4 10 1 7), which we can reduce to (118 16 4 7) by applying blocks 10 and 1. Next we receive (120 15), which gives us the “x” in block 15 directly, and also gives us the “n” in block 9 and the space character in block 11 by applying block 15 to the (22 15 9) and (88 11 15) items in the hold; we now have “Pr…a..ing P..x..” and packets (76 1 9 12) and (118 16 4 7) in the hold, with nine blocks left to be decoded. And so the process continues. Eventually the entire string is revealed, on average after O(n loge n) packets have been received, which is about 52 for our 18-block string.

In practice, it is common to do things a little bit differently. Certainly it is normal to use blocks larger than a single byte. Instead of sending the list of blocks, it is normal to send the seed to a simple random number generator; if both ends of the transmission use the same random number generator, both can create the list of blocks from the same seed. And it is common to send a checksum with each packet; if the packet fails the checksum, it is simply ignored, as if it had never been received.

It is also common to use a different degree-distribution than the one we used here. It turns out that a bimodal degree-distribution with lots of 1-, 2- or 3-block packets, combined with an n-block packet and a few n-minus-two-or-three block packets, with very few packets in between, makes the whole process go much quicker, an average O(n) with just a small overhead of 5% or 10% over the size of the file being transmitted. Use of this degree-distribution is called a raptor code.

Your task is to write a program that creates a fountain and decoder. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Once In A Blue Moon

August 31, 2012

Sometimes the phases of the moon fall on the calendar in such a way that one calendar month has two full moons; the second full moon of the month is called a blue moon, for reasons that have been lost in antiquity. Poets and songwriters sometimes use the phrase “once in a blue moon” to indicate an event that occurs infrequently, but in fact blue moons occur every two or three years, on average. Today’s full moon is the blue moon of August 2012.

We looked at the phases of the moon in a previous exercise. There we learned that new moons occur every 29.530588853 days, that a new moon occurred on January 6, 2000 (julian date 2451550.1), and that full moons occur halfway between two new moons.

Your task is to write a program that calculates all the blue moons of the twenty-first century. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Random Access Lists

August 28, 2012

Lists are ubiquitous in programming, especially in languages like Scheme, but elsewhere as well. Their primary advantage over arrays is that they can easily grow as needed, but that brings a corresponding disadvantage: it takes O(n) time to access the nth item in a list, but only O(1) time to access the nth item an an array.

Chris Okasaki has invented a remarkably clever data structure that provides the normal O(1) time complexity for the cons, head and tail operators of lists but reduces random access to the nth item in a list to O(log n), which means lists can sometimes be used in place of arrays, especially when it is inconvenient to determine the size of the array in advance or when the items of the array are normally accessed in sequence.

I won’t try to explain Okasaki’s data structure here; you can look at http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.55.5156 for the details, or look at Figure 9.7 in Okasaki’s book Purely Functional Data Structures, as I did.

Your task is to implement a library for random access lists, including the functions cons, head, tail, lookup and update. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

One of the classic data structures of computer science is the hash table, which provides constant-time access to key/value data in the average case. The idea is to store a key/value pair in a location that is computed based on the value of the key, which works fine when all the hash values are unique; the problem arises when two keys hash to the same value. The hash tables of the Standard Prelude use a method called chaining to resolve collisions; today’s exercise uses a different method called open addressing.

All hash tables work by computing an address at which to store an item based on its key. Sometimes two keys hash to the same value, which causes a collision. Chaining, as used in the Standard Prelude, resolves collisions by storing multiple items at the same location, forming a list of items. Open addressing instead computes a second address, or if necessary a third, or fourth, or …, continuing until it finds an empty spot. It is necessary that the computation of secondary addresses eventually visits every possible storage location; a simple approach, which we will adopt, is called linear probing, in which storage locations are accessed in increasing order until an empty location is found. It is possible, of course, for the hash table to become completely filled, in which case an error is reported when a new item cannot be inserted.

The tricky part of hashing with open addressing is deletions, because it’s not possible to simply delete an item because some other item may rely on the the fact that its storage location was filled when the item was inserted. The solution is to have three types of items in a storage location: nil, which indicates that the storage location has never been used; deleted, which indicates that the storage location is currently empty but has been used in the past, and in use, for those storage locations that are currently occupied.

Your task is to write functions that maintain a hash table with open addressing. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Two More Random Exercises

August 21, 2012

We did two tasks related to random numbers in the most recent exercise, and we have looked at high-quality random number generators in several previous exercises. In today’s exercise we look at two very low-quality random number generators, which should not be used for any production application.

The first, invented by John von Neumann in 1949, occasioned his famous quip “Any one who considers arithmetical methods of producing random digits is, of course, in a state of sin.” The middle-square method takes a number with an even number of digits, squares it, and extracts the middle digits for the next iteration; for instance, if the seed is 675248, the square is 455959861504, and the middle digits are 959861.

The second, invented by IBM in the early 1960s, caused Donald Knuth to claim “its very name RANDU is enough to bring dismay into the eyes and stomachs of many computer scientists!”. RANDU is based on the recursion xn+1 = 65539 · xn (mod 231), with x0 odd.

Your task is to write functions that generate random numbers by the middle-square and RANDU methods. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Two Random Exercises

August 17, 2012

We have two exercises related to random numbers today. I’m not sure of the source, but they look to me like homework exercises.

First, given a function rand3 that returns a number from 1 to 3 inclusive chosen at random, write a function that returns a number from 1 to 9, inclusive.

Second, given a function rand5 that returns a number from 1 to 5 inclusive chosen at random, write a function that returns a number from 1 to 7, inclusive.

In both cases all possible output numbers should be generated with equal frequency. You should demonstrate that your functions behave properly.

Your task is to write the rand9 and rand7 functions and demonstrate that they work properly. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

4SUM

August 14, 2012

We have today another exercise from our inexhaustible stock of interview questions:

Given an array of integers, output a list of four integers that sum to zero (the same input integer can be used multiple times), or indicate that no such set of four integers exists. For example, given the array (2 3 1 0 -4 -1), the set of four integers (3 1 0 -4) sums to zero, as does the set (0 0 0 0).

Your task is to write a program that solves the interview question. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Minimum Scalar Product

August 10, 2012

Today’s exercise comes to us from the practice round of Google Code Jam 2008.

You are given two vectors v1=(x1,x2,…,xn) and v2=(y1,y2,…,yn). The scalar product of these vectors is a single number, calculated as x1y1+x2y2+…+xnyn.

Suppose you are allowed to permute the coordinates of each vector as you wish. Choose two permutations such that the scalar product of your two new vectors is the smallest possible, and output that minimum scalar product.

Google gives two examples: the minimum scalar product of the two vectors (1 3 -5) and (-2 4 1) is -25, and the minimum scalar product of the two vectors (1 2 3 4 5) and (1 0 1 0 1) is 6.

Your task is to write a program that finds the minimum scalar product of two vectors. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Make

August 7, 2012

We continue today our occasional series based on classic Unix utilites. The make command maintains a set of file dependencies, given a definition of those files which depend on others and the commands used to update the predecessors. Make is commonly used to automate program compilation, but it also finds use in other ways, such as maintaining the dependencies between the files that make up a complicated document or website. Here is a typical makefile for a small C program:

prog:   a.o b.o c.o
        cc a.o b.o c.o -ly -o prog

a.o:    prog.h a.c
        cc -c prog.h a.c

b.o:    prog.h b.c
        cc -c prog.h b.c

c.o:    c.c
        cc -c c.c

c.c:    c.y
        yacc c.y
        mv y.tab.c c.c

print:
        pr prog.h a.c b.c c.y

The first line says that the target prog depends on files a.o, b.o and c.o and is generated by calling the C compiler to link a.o, b.o, c.o and the y library into the executable file prog. The value of make is that it eliminates needless work; if everything is up to date and a change is made to the yacc grammar in c.y, the only commands that need to be run are the yacc and mv commands to rebuild the c.y file, the cc command that rebuilds c.c, and the cc command that rebuilds prog:

yacc c.y
mv y.tab.c c.c
cc -c c.c
cc a.o b.o c.o -ly -o prog

The make program begins by reading the makefile and storing the dependencies and the associated commands. Then it takes the target, checks the filesystem to determine the ages of the target’s predecessors, calls itself recursively to update any older predecessors, and finally calls the commands associated with the target.

Your task is to write a program that takes a target and updates it according to the rules of the makefile. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2