Approximate Median

August 21, 2020

[ I offer my apologies to my readers for my recent absence. My employer, a local community college, is struggling with this virus business, revising nearly all of its business practices, and my programmer colleagues and I have been very busy. The Fall semester starts next week (mostly on-line classes, some on-campus classes for science labs and the nursing students), so hopefully things on the virus front will get better soon. But we are also in the middle of changing our main computing system from running on HP-UX on fifteen-year old hardware to Linux on new hardware, and having all kinds of setup problems (all of the people who set up the current system twenty years ago are gone, and no one seems to know how to set up the new system), so maybe not too soon. I hope all is well with all of you. — Phil ]

We have previously studied algorithms for the streaming median and sliding median that calculate the median of a stream of numbers; the streaming median requires storage of all the numbers previously seen, and the sliding median requires storage of the last k numbers in the stream, for some k.

Today’s exercise estimates the median of a stream of numbers while storing only two numbers:

The idea is at each iteration the median inches toward the input signal at a constant rate. The rate depends on what magnitude you estimate the median to be. I use the average as an estimate of the magnitude of the median, to determines the size of each increment of the median. If you need your median accurate to about 1%, use a step-size of 0.01 * the average.

Your task is to write a program that estimates the streaming median according to the given algorithm. When you are finished, you are welcome to read or run the suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

One Response to “Approximate Median”

  1. Antonio said

    I don’t understand why you would ever do this this way (other than for practice, of course). You can just store the current average and the number of terms used to calculate that average? I suppose the biggest problem is if you’re calculating the average of a number of terms larger than INT_MAX.

    #include <stdio.h>
    #include <stdlib.h>
    
    float newAvg(float, int*, float);
    
    int main(void){
        char str[100]; // general purpose string
        int cnt = 0;
        float avg, new;
    
        while(1){
            printf(
                "\nCount: %d | Average: %s\nEnter a number ('Q' to quit).\n\t> ",
                cnt, cnt ? str : "----" // there is no average initially
            );
    
            gets(str); // take input
    
            if(str[0] == 'Q' || str[0] == 'q') break;
            else{
                new = atof(str);                // convert to float
                avg = newAvg(avg, &cnt, new);   // calculate average
                sprintf(str, "%-.2lf", avg);    // & create output string
            }
        }
    }
    /*  avg is current average
        cnt is the number of terms used to calculate that average
        new is the new number to be included in the average
    */
    float newAvg(float avg, int* cnt, float new){
        return (avg * *cnt + new) / ++*cnt;
    }
    

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: