Fountain Codes

September 4, 2012

Fountain codes provide an effective method of file transmission over lossy channels. Invented by Michael Luby in 1998, but not published unti 2002, they are used to send bit torrents and streaming media, transmit map and traffic data to in-car receivers, and probably for the recent “brain transplant” to the Curiosity rover on Mars (I’m speculating, but wouldn’t you expect to lose a few packets between here and Mars?). The idea is to send many chunks of a file, larger than the original file, like a fountain spraying droplets of water, and for the receiver to collect drops in a bucket, which can be reassembled into the original file without collecting all the drops sprayed by the fountain.

The encoding process is simple. Pick a random number d between 1 and n, the number of blocks in the file. Pick d blocks from the file at random (no duplicates) and xor them together. Send the xor of the combined blocks, along with the list of blocks used to make the xor. The distribution of d is important; it is not just a uniform random number. We choose d with probability d / (1/2)n(n+1), so if the file has 20 blocks, the denominator is 210 (the sum of the integers from 1 to 20), the degree d is 1 with probability 20/210, 2 with probability 19/210, and so on, until it is 20 with probability 1/210.

The decoding process isn’t much harder. Start with an array of blocks of length n, with an indication that each block is not currently decoded. Then each packet is analyzed as it is received. If the packet has a single block, it can be added immediately to the array of blocks; if not, any known blocks in the input list can be xor’ed into the packet, reducing its degree. A packet that is reduced to a single block is added to the output, others are added to a hold list. Any time a new block is discovered, the entire hold list is scanned for any further reductions in any of its packets.

Let’s illustrate with a fountain that sends blocks of a single byte from the “Programming Praxis” input string. We begin with an empty array and hold list and fetch the first packet from the fountain (22 15 9). That packet has two blocks, so we put it in the hold and go on. The next packet is (80 0), so we put a “P” in block 0 and go on. The next packet is (55 12 10), which we add to the hold. Next we get a packet (14 10 8), which doesn’t help immediately, but it shares a block with (55 12 10), which will be useful later; we add it to the hold, which now contains (22 15 9), (55 12 10) and (14 10 8). The next packet is (103 10). We can immediately add a “g” at block 10, then xor 55 with 103 to get the “P” at block 12 and xor 14 with 103 to get the “i” at block 8; our string is now “P…….i.g.P…..” and our hold is now (22 15 9). We next receive packets (19 5 1), (88 11 15), (45 1 9 14 12), (105 16) and (114 1), so we can add an “i” at block 16, an “r” at block 1, and, by virtue of the (19 5 1) packet in the hold, an “a” at block 5; now the string is “Pr…a..i.g.P…..” and the hold contains (22 15 9), (88 11 15), and (45 1 9 14 12). The next packet is (97 14), which gives us another block, the “a” of “Praxis,” and also lets us reduce the packet (45 1 9 14 12) in the hold to (76 1 9 12). The next packet is (99 16 4 10 1 7), which we can reduce to (118 16 4 7) by applying blocks 10 and 1. Next we receive (120 15), which gives us the “x” in block 15 directly, and also gives us the “n” in block 9 and the space character in block 11 by applying block 15 to the (22 15 9) and (88 11 15) items in the hold; we now have “Pr…a..ing P..x..” and packets (76 1 9 12) and (118 16 4 7) in the hold, with nine blocks left to be decoded. And so the process continues. Eventually the entire string is revealed, on average after O(n loge n) packets have been received, which is about 52 for our 18-block string.

In practice, it is common to do things a little bit differently. Certainly it is normal to use blocks larger than a single byte. Instead of sending the list of blocks, it is normal to send the seed to a simple random number generator; if both ends of the transmission use the same random number generator, both can create the list of blocks from the same seed. And it is common to send a checksum with each packet; if the packet fails the checksum, it is simply ignored, as if it had never been received.

It is also common to use a different degree-distribution than the one we used here. It turns out that a bimodal degree-distribution with lots of 1-, 2- or 3-block packets, combined with an n-block packet and a few n-minus-two-or-three block packets, with very few packets in between, makes the whole process go much quicker, an average O(n) with just a small overhead of 5% or 10% over the size of the file being transmitted. Use of this degree-distribution is called a raptor code.

Your task is to write a program that creates a fountain and decoder. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

One Response to “Fountain Codes”

  1. Paul Hofstra said

    A python version. I had a lot of fun with this exercise.

    txt = "Programming Praxis Part rocks!"
    
    import random
    from operator import xor
    
    def get_boxes(d, n, prng):
        L = range(n)
        prng.shuffle(L)
        return sorted(L[:d])
            
    def get_nr_packets(n, r):
        for i in xrange(n):
            r -= n - i
            if r < 0:
                return i + 1
        
    def make_fountain(source):
        """ a generator function that emits the fountain """
        prng = random.Random()
        n = len(source)
        yield n
        source = [ord(c) for c in source]
        den = n * (n + 1) / 2
        r = prng.randint(0, 2 ** 32- 1)
        prng.seed(r)
        yield r # the random seed
        while 1:
            d = get_nr_packets(n, prng.randint(0, den-1))
            boxes = get_boxes(d, n, prng)
            yield reduce(xor, [source[b] for b in boxes])
            
    def decode_fountain(datagen):
        """ """
        prng = random.Random()
        n = datagen.next()
        prng.seed(datagen.next())
        den = n * (n + 1) / 2
        park = {}
        solved = {}
        npackets = 0
        for info in datagen:
            npackets += 1
            d = get_nr_packets(n, prng.randint(0, den-1))
            boxes = get_boxes(d, n, prng)
            s = set(boxes).intersection(solved.keys())
            for si in s:
                boxes.remove(si)
                info ^= solved[si]
            tobeprocessed = []
            if len(boxes) > 1:
                park[tuple(boxes)] = info
            elif len(boxes) == 1:
                tobeprocessed.append((boxes[0], info))
                while tobeprocessed:
                    box, info = tobeprocessed.pop()
                    if box not in solved:
                        solved[box] = info
                        for b2, info2 in park.items():
                            if box in b2:
                                del park[b2]
                                b2 = list(b2)
                                b2.remove(box)
                                info2 ^= info
                                if len(b2) == 1:
                                    tobeprocessed.append((b2[0], info2))
                                else:
                                    park[tuple(b2)] = info2
            if len(solved) == n:
                L = solved.items()
                L.sort()
                #print "number of packets", npackets, n
                return "".join([chr(z) for i, z in L])
            
    print decode_fountain(make_fountain(txt))
    

Leave a comment