Morris Counting

February 20, 2015

We have today an algorithm from the early days of computing that is still relevant today: counting a large number of events using only a small amount of memory. The technique was invented by Robert Morris (early unix researcher and NSA cryptographer, father of the RTM of “internet worm” fame) and described in his 1978 paper
“Counting Large Numbers of Events in Small Registers.”

The basic idea is to count logarithms instead of discrete events. Assuming base-2 logarithms, the algorithm is simple:

Initialize a counter C to 0.

When an event occurs, increment the counter with probability 2C.

When asked for the count, return 2C − 1.

It probably seems like a trivial saving to record a count in a single byte instead of, say, a 4-byte integer. But the savings multiply quickly if you need to count a large number of distinct events; the difference between 1-megabyte and 4-megabytes of counters could be significant in a large program where counting is only a small part of the whole.

Your task is to write a program that does Morris counting. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

5 Responses to “Morris Counting”

  1. Jussi Piitulainen said

    Interesting. I think the following Python code displays a run of the last section of the paper at a number of different counts. I set a higher parameter for greater accuracy and a smaller range, and I made the counter stop at the maximum eight-bit value. Python treats a boolean False as 0 and True as 1 when added to the counter.

    from random import random as r
    for k in (0, 1, 2, 5, 10, 20, 50, 100, 200, 500, 1000, 2000):
        a, j, Δ = 70, 0, lambda : (a/(a + 1)) ** j
        for _ in range(k): j += j < 255 and r() < Δ()
        print('{} ≈ {} (j = {})'.format(k, a * ((1 + 1/a) ** j - 1), j))
    

    Here’s the output of a run. The count for 2000 events almost hit the maximum, but it was also much overestimated in this run. Small counts are approximated pretty well.

    0 ≈ 0.0 (j = 0)
    1 ≈ 0.9999999999999964 (j = 1)
    2 ≈ 2.0142857142856996 (j = 2)
    5 ≈ 5.144912578092436 (j = 5)
    10 ≈ 10.667969805273657 (j = 10)
    20 ≈ 21.65241341555693 (j = 19)
    50 ≈ 55.21913428230192 (j = 41)
    100 ≈ 96.29411148109324 (j = 61)
    200 ≈ 211.06598708822676 (j = 98)
    500 ≈ 509.40739928399057 (j = 149)
    1000 ≈ 937.4791718819866 (j = 188)
    2000 ≈ 2499.3096676730383 (j = 254)

    Perhaps one could live with this accuracy for frequency bands for (pairs or triples of adjacent) words in a large amount of text. The number of different types could be quite large, and the vast majority of counts would be quite small in the end.

  2. Graham said

    Similar implementation in Common Lisp. Thought I’d have my closure accept messages to either increment or return the count.

  3. […] can be seen as an improvement on Robert Morris’ counting algorithm that we studied in a previous exercise. We will study Flajolet’s loglog algorithm in today’s exercise and perhaps have a look […]

Leave a comment