Morris Counting

February 20, 2015

We package the algorithm in a closure that hides the logarithm and returns the approximate count each time it is called:

(define (make-counter) (let ((c 0)) (lambda () (when (zero? (randint (expt 2 c))) (set! c (+ c 1))) (- (expt 2 c) 1))))

Then we call it like this:

> (define x (make-counter)) > (x) 1 > (x) 1 > (x) 3 > (x) 3 > (x) 3 > (x) 3 > (x) 3 > (x) 7 > (x) 7 > (x) 7 > (x) 7 > (x) 15

Indeed, 15 is a reasonable approximation of the actual count of 12. As a further test, we write a function that returns the approximate count for a known number of events:

(define (count n) (let ((x (make-counter))) (do ((n n (- n 1))) ((zero? n) (x)) (x))))

Then we can see the variance over several runs of the counter:

> (list-of (count 25000) (x range 20)) (65535 4095 16383 32767 32767 16383 32767 16383 32767 32767 16383 32767 16383 16383 16383 16383 32767 8191 32767 16383)

The average count over those twenty runs is 24370, so our probabilistic counter can be seen to be very effective.

From the Standard Prelude we used random numbers for the counter and list comprehensions for the demonstration. You can run the program at http://ideone.com/4CbPSh.

Posted by programmingpraxis

Filed in Exercises

5 Comments »

5 Responses to “Morris Counting”

1p – Morris Counting | blog.offeryour.com said
February 20, 2015 at 10:36 PM
[…] https://programmingpraxis.com/2015/02/20/morris-counting/ […]
1p – Morris Counting – Exploding Ads said
February 20, 2015 at 10:38 PM
[…] https://programmingpraxis.com/2015/02/20/morris-counting/ […]
Jussi Piitulainen said
February 21, 2015 at 8:07 PM
Interesting. I think the following Python code displays a run of the last section of the paper at a number of different counts. I set a higher parameter for greater accuracy and a smaller range, and I made the counter stop at the maximum eight-bit value. Python treats a boolean False as 0 and True as 1 when added to the counter.
```
from random import random as r
for k in (0, 1, 2, 5, 10, 20, 50, 100, 200, 500, 1000, 2000):
    a, j, Δ = 70, 0, lambda : (a/(a + 1)) ** j
    for _ in range(k): j += j < 255 and r() < Δ()
    print('{} ≈ {} (j = {})'.format(k, a * ((1 + 1/a) ** j - 1), j))
```
Here’s the output of a run. The count for 2000 events almost hit the maximum, but it was also much overestimated in this run. Small counts are approximated pretty well.
0 ≈ 0.0 (j = 0) 1 ≈ 0.9999999999999964 (j = 1) 2 ≈ 2.0142857142856996 (j = 2) 5 ≈ 5.144912578092436 (j = 5) 10 ≈ 10.667969805273657 (j = 10) 20 ≈ 21.65241341555693 (j = 19) 50 ≈ 55.21913428230192 (j = 41) 100 ≈ 96.29411148109324 (j = 61) 200 ≈ 211.06598708822676 (j = 98) 500 ≈ 509.40739928399057 (j = 149) 1000 ≈ 937.4791718819866 (j = 188) 2000 ≈ 2499.3096676730383 (j = 254)
Perhaps one could live with this accuracy for frequency bands for (pairs or triples of adjacent) words in a large amount of text. The number of different types could be quite large, and the vast majority of counts would be quite small in the end.
Graham said
February 24, 2015 at 3:57 AM
Similar implementation in Common Lisp. Thought I’d have my closure accept messages to either increment or return the count.
Loglog | Programming Praxis said
August 4, 2020 at 9:00 AM
[…] can be seen as an improvement on Robert Morris’ counting algorithm that we studied in a previous exercise. We will study Flajolet’s loglog algorithm in today’s exercise and perhaps have a look […]

Programming Praxis

Morris Counting

February 20, 2015

5 Responses to “Morris Counting”

Leave a comment

Categories

Archives

Archives

Programming Praxis

Morris Counting

February 20, 2015

Share this:

Related

5 Responses to “Morris Counting”

Leave a comment

Categories

Archives

Archives