Approximate Median
August 21, 2020
Jeff McClintock writes the program in C++:
float median = 0.0f; float average = 0.0f; // for each sample { average += ( sample - average ) * 0.1f; // rough running average. median += _copysign( average * 0.01, sample - median ); }
It is a pleasure to translate that to Scheme:
(define (copysign magnitude sign) (* (abs magnitude) (if (negative? sign) -1 1)))
(define (median n gen tolerance) (let loop ((n n) (x (gen)) (average 0.0) (median 0.0)) (if (zero? n) median (let ((average (+ (* (- x average) 0.1) average))) (loop (- n 1) (gen) average (+ (copysign (* average tolerance) (- x median)) median))))))
Our test generator returns a random number from 0 to 100, inclusive, so the median should be 50, given a sufficiently large stream:
(define (gen) (randint 101))
> (median 100000 gen 0.01) 49.3059762897895
That result was the best of about a dozen attempts, with a range from about 46 to about 57, so this algorithm clearly has a lot of noise.
We did something similar in a previous exercise. You can run the program at https://ideone.com/9pa1tx.
I don’t understand why you would ever do this this way (other than for practice, of course). You can just store the current average and the number of terms used to calculate that average? I suppose the biggest problem is if you’re calculating the average of a number of terms larger than INT_MAX.