Morris Counting

February 20, 2015

We have today an algorithm from the early days of computing that is still relevant today: counting a large number of events using only a small amount of memory. The technique was invented by Robert Morris (early unix researcher and NSA cryptographer, father of the RTM of “internet worm” fame) and described in his 1978 paper
“Counting Large Numbers of Events in Small Registers.”

The basic idea is to count logarithms instead of discrete events. Assuming base-2 logarithms, the algorithm is simple:

Initialize a counter C to 0.

When an event occurs, increment the counter with probability 2^−C.

When asked for the count, return 2^C − 1.

It probably seems like a trivial saving to record a count in a single byte instead of, say, a 4-byte integer. But the savings multiply quickly if you need to count a large number of distinct events; the difference between 1-megabyte and 4-megabytes of counters could be significant in a large program where counting is only a small part of the whole.

Your task is to write a program that does Morris counting. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

5 Comments »

5 Responses to “Morris Counting”

1p – Morris Counting | blog.offeryour.com said
February 20, 2015 at 10:36 PM
[…] https://programmingpraxis.com/2015/02/20/morris-counting/ […]
1p – Morris Counting – Exploding Ads said
February 20, 2015 at 10:38 PM
[…] https://programmingpraxis.com/2015/02/20/morris-counting/ […]
Jussi Piitulainen said
February 21, 2015 at 8:07 PM
Interesting. I think the following Python code displays a run of the last section of the paper at a number of different counts. I set a higher parameter for greater accuracy and a smaller range, and I made the counter stop at the maximum eight-bit value. Python treats a boolean False as 0 and True as 1 when added to the counter.
```
from random import random as r
for k in (0, 1, 2, 5, 10, 20, 50, 100, 200, 500, 1000, 2000):
    a, j, Δ = 70, 0, lambda : (a/(a + 1)) ** j
    for _ in range(k): j += j < 255 and r() < Δ()
    print('{} ≈ {} (j = {})'.format(k, a * ((1 + 1/a) ** j - 1), j))
```
Here’s the output of a run. The count for 2000 events almost hit the maximum, but it was also much overestimated in this run. Small counts are approximated pretty well.
0 ≈ 0.0 (j = 0) 1 ≈ 0.9999999999999964 (j = 1) 2 ≈ 2.0142857142856996 (j = 2) 5 ≈ 5.144912578092436 (j = 5) 10 ≈ 10.667969805273657 (j = 10) 20 ≈ 21.65241341555693 (j = 19) 50 ≈ 55.21913428230192 (j = 41) 100 ≈ 96.29411148109324 (j = 61) 200 ≈ 211.06598708822676 (j = 98) 500 ≈ 509.40739928399057 (j = 149) 1000 ≈ 937.4791718819866 (j = 188) 2000 ≈ 2499.3096676730383 (j = 254)
Perhaps one could live with this accuracy for frequency bands for (pairs or triples of adjacent) words in a large amount of text. The number of different types could be quite large, and the vast majority of counts would be quite small in the end.
Graham said
February 24, 2015 at 3:57 AM
Similar implementation in Common Lisp. Thought I’d have my closure accept messages to either increment or return the count.
Loglog | Programming Praxis said
August 4, 2020 at 9:00 AM
[…] can be seen as an improvement on Robert Morris’ counting algorithm that we studied in a previous exercise. We will study Flajolet’s loglog algorithm in today’s exercise and perhaps have a look […]

Programming Praxis

Morris Counting

February 20, 2015

5 Responses to “Morris Counting”

Leave a comment

Categories

Archives

Archives

Programming Praxis

Morris Counting

February 20, 2015

Share this:

Related

5 Responses to “Morris Counting”

Leave a comment

Categories

Archives

Archives