Morris Counting

February 20, 2015

We package the algorithm in a closure that hides the logarithm and returns the approximate count each time it is called:

(define (make-counter)
  (let ((c 0))
    (lambda ()
      (when (zero? (randint (expt 2 c)))
        (set! c (+ c 1)))
      (- (expt 2 c) 1))))

Then we call it like this:

> (define x (make-counter))
> (x)
1
> (x)
1
> (x)
3
> (x)
3
> (x)
3
> (x)
3
> (x)
3
> (x)
7
> (x)
7
> (x)
7
> (x)
7
> (x)
15

Indeed, 15 is a reasonable approximation of the actual count of 12. As a further test, we write a function that returns the approximate count for a known number of events:

(define (count n)
  (let ((x (make-counter)))
    (do ((n n (- n 1))) ((zero? n) (x)) (x))))

Then we can see the variance over several runs of the counter:

> (list-of (count 25000) (x range 20))
(65535 4095 16383 32767 32767 16383 32767 16383 32767 32767
  16383 32767 16383 16383 16383 16383 32767 8191 32767 16383)

The average count over those twenty runs is 24370, so our probabilistic counter can be seen to be very effective.

From the Standard Prelude we used random numbers for the counter and list comprehensions for the demonstration. You can run the program at http://ideone.com/4CbPSh.

Pages: 1 2

5 Responses to “Morris Counting”

  1. Jussi Piitulainen said

    Interesting. I think the following Python code displays a run of the last section of the paper at a number of different counts. I set a higher parameter for greater accuracy and a smaller range, and I made the counter stop at the maximum eight-bit value. Python treats a boolean False as 0 and True as 1 when added to the counter.

    from random import random as r
    for k in (0, 1, 2, 5, 10, 20, 50, 100, 200, 500, 1000, 2000):
        a, j, Δ = 70, 0, lambda : (a/(a + 1)) ** j
        for _ in range(k): j += j < 255 and r() < Δ()
        print('{} ≈ {} (j = {})'.format(k, a * ((1 + 1/a) ** j - 1), j))
    

    Here’s the output of a run. The count for 2000 events almost hit the maximum, but it was also much overestimated in this run. Small counts are approximated pretty well.

    0 ≈ 0.0 (j = 0)
    1 ≈ 0.9999999999999964 (j = 1)
    2 ≈ 2.0142857142856996 (j = 2)
    5 ≈ 5.144912578092436 (j = 5)
    10 ≈ 10.667969805273657 (j = 10)
    20 ≈ 21.65241341555693 (j = 19)
    50 ≈ 55.21913428230192 (j = 41)
    100 ≈ 96.29411148109324 (j = 61)
    200 ≈ 211.06598708822676 (j = 98)
    500 ≈ 509.40739928399057 (j = 149)
    1000 ≈ 937.4791718819866 (j = 188)
    2000 ≈ 2499.3096676730383 (j = 254)

    Perhaps one could live with this accuracy for frequency bands for (pairs or triples of adjacent) words in a large amount of text. The number of different types could be quite large, and the vast majority of counts would be quite small in the end.

  2. Graham said

    Similar implementation in Common Lisp. Thought I’d have my closure accept messages to either increment or return the count.

  3. […] can be seen as an improvement on Robert Morris’ counting algorithm that we studied in a previous exercise. We will study Flajolet’s loglog algorithm in today’s exercise and perhaps have a look […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: