Finding The Median

September 11, 2015

The median of an array is the value in the middle if the array was sorted; if the array has an odd number of items n, the median is the (n+1)/2’th largest item in the array (which is also the (n+1)/2’th smallest item in the array), and if the array has an even number of items n, the median is the arithmetic average of the n/2’th smallest item in the array and the n/2’th largest item in the array. For instance, the median of the array [2,4,5,7,3,6,1] is 4 and the median of the array [5,2,1,6,3,4] is 3.5.

Your task is to write a program that takes an array of 8-bit integers (possibly but not necessarily in sort) and finds the median value in the array; you should find an algorithm that takes linear time and constant space. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

10 Responses to “Finding The Median”

  1. mcmillhj said

    Similar solution in Perl:

    #!/usr/bin/env perl
    
    use strict;
    use warnings;
    
    use feature qw/say/;
    
    sub median {
       my @array = @_;
       my @freqs = (0) x 256;
    
       foreach my $elem ( @array ) {
          $freqs[$elem]++;
       }
    
       my ($low, $high) = (0, 255);
       my ($low_count, $high_count) = (0, 0);
       my $len = @array;
       while ( 1 ) {
          if ( $low_count < ($len / 2) ) {
             $low_count += $freqs[$low++];
          }
          elsif ( $high_count < ($len / 2) ) {
             $high_count += $freqs[$high--];
          }
          else {
             last;
          }
       }
       
       return ($low + $high) / 2;
    }
    
    say median(qw(2 4 5 7 3 6 1)); # 4
    say median(qw(5 2 1 6 3 4));   # 3.5
    
  2. graham said

    Clever! My knee-jerk reaction was to ignore the 8-bit integer part and just go for the slower generic solution.

    I’ve used this as an opportunity to learn some Rust. I like that it made me think about the empty array case.

    fn median(v : &Vec<i8>) -> Option<f32> {
        let len = v.len();
        match len {
            0 => None,
            _ => {
                let mut counts = vec![0; 256];
                for x in v {
                    counts[*x as usize] += 1;
                }
                let mut lo = 0;   let mut lo_count = 0;
                let mut hi = 255; let mut hi_count = 0;
                loop {
                    if lo_count < (len / 2) {
                        lo += 1; lo_count += counts[lo];
                    } else if hi_count < (len / 2) {
                        hi -= 1; hi_count += counts[hi];
                    } else {
                        break;
                    }
                }
                Some((lo as f32 + hi as f32) / 2.0)
            },
        }
    }
    
    fn main() {
        for v in vec![vec![],
                      vec![2, 4, 5, 7, 3, 6, 1],
                      vec![5, 2, 1, 6, 3, 4]] {
            println!("median({:?}) = {:?}", &v, median(&v));
        }
    }
    
  3. Mike said

    Python solution.

    def median(lst):
        m = len(lst)
        if m:
            m /= 2
            count = [0]*256
            for x in lst:
                count[x] += 1
                
            lo, lo_count = 0, 0
            while lo_count < m:
                    lo_count += count[lo]
                    lo += 1
                    
            hi, hi_count = 255, 0
            while hi_count < m:
                    hi_count +=  count[hi]
                    hi -= 1
                    
            return (lo + hi) / 2
    

    @graham, I think you have an “off by one error”. In the loop ‘lo’ is incremented before use, so the value of count[0] is never added to lo_count. Similarly, ‘hi’ is pre-decremented, so count[255] is never added to hi_count.

  4. Jussi Piitulainen said
    ;;; Praxis exercise: median of given bytes in O(n) time, O(1) space.                                             
    ;;; Plan: binary search for the middle points in the cumulative                                                  
    ;;; frequency distribution in O(n) + O(1) + 2 * O(log n) = O(n) time.                                            
    ;;; To do: the conditions to extract quartiles, minimum, and maximum.                                            
    
    (define (distribution bytes) ; new cumulative vector of counts                                                   
      (let ((u (make-vector 256 0))
            (n (vector-length bytes)))
        (do ((k 0 (+ k 1)))
            ((= k n) (do ((p 0 (+ p 1))
                          (k 1 (+ k 1)))
                         ((>= k 256) u)
                       (vector-set! u k (+ (vector-ref u p)
                                           (vector-ref u k)))))
          (let ((o (vector-ref bytes k)))
    	(vector-set! u o (+ 1 (vector-ref u o)))))))
    
    (define (index u lo? hi?)
      ;; an index k of vector u for which neither (lo? u k) nor (hi? u k)                                            
      (let seek ((b 0) (e (vector-length u))) ; b <= e                                                               
        (if (= b e) (raise 'no-such-index))
        (let ((m (quotient (+ b e) 2))) ; b <= m < e; b < m + 1 <= e                                                 
          (cond
           ((lo? u m) (seek (+ m 1) e))   ; e - (m + 1) < e - b                                                      
           ((hi? u m) (seek b m))         ; m - b < e - b                                                            
           (else m)))))
    
    ;;; The conditions were tricky to work out, so there are auxiliaries.                                            
    
    (define (half u) ; half of the number of observations                                                            
      (/ (vector-ref u 255) 2))
    
    (define (below k u) ; how many observations are strictly below k                                                 
      (if (= k 0) 0 (at-most (- k 1) u)))
    
    (define (at-most k u) ; how many observations are at most k                                                      
      (vector-ref u k))
    
    (define (at-least k u) ; how many observations are at least k                                                    
      (if (= k 0) (vector-ref u 255) (above (- k 1) u)))
    
    (define (above k u) ; how many observations are strictly above k                                                 
      (- (vector-ref u 255) (vector-ref u k)))
    
    ;;; "sub-median" is the lower-ranking candidate when there are two;                                              
    ;;; "super-median" is the higher-ranking candidate; the following                                                
    ;;; predicates test whether an index is too low or too high to be one                                            
    ;;; of these in a given cumulative byte frequency distribution.                                                  
    
    (define (sub-median-lo? u k)
      ;; whether less than half are at most k                                                                        
      (< (at-most k u) (half u)))
    
    (define (sub-median-hi? u k)
      ;; whether at least half are below and at most half are above k                                                
      (<= (above k u) (half u) (below k u)))
    
    (define (super-median-lo? u k)
      ;; whether at most half are below and at least half are above k                                                
      (<= (below k u) (half u) (above k u))) ;                                                                       
    
    (define (super-median-hi? u k)
      ;; whether less than half are at least k                                                                       
      (< (at-least k u) (half u)))
    
    ;;; --- elision of tests used to get the conditions right ---
    
    (define (median observations)
      (let* ((u (distribution observations))
             (b (index u sub-median-lo? sub-median-hi?))
             (p (index u super-median-lo? super-median-hi?)))
        (values b p (/ (+ b p) 2))))
    
    ;;; Simpler test output *after* getting the conditions right.                                                    
    
    (for-each
     (lambda (data)
       (display data) (display " => ")
       (call-with-values
           (lambda ()
             (median data))
         (lambda (b p m)
           (display b) (display " ") (display p)
           (display " => ") (display m) (newline))))
     '( #(3)
        #(3 3)
        #(3 5)
        #(1 3 5)
        #(1 3 3 5)
        #(1 3 5 5)
        #(3 3 3 5 7)
        #(3 3 5 7 11)
        #(1 3 3 3 3 3)
        #(1 3 3 5 5 5)
        #(1 3 3 3 5 7)
        #(1 3 3 5 7 11)
        #(2 4 5 7 3 6 1)
        #(5 2 1 6 3 4) ))
     
  5. Jussi Piitulainen said
    ;;; Oops. My binary searches are O(1) time, better than O(log n). Even a linear search
    ;;; of a constant-size array (256 elements) would be O(1). Silly me.
    
  6. graham said

    @Mike: good eye! Sorry for the errors, all.

  7. […] Programming Praxis problem. The problem is nicely constrained by limiting it to 8-bit […]

  8. graham said

    It looks like my solution (along with the issue @Mike caught) doesn’t correctly handle the case of a single element array. Apologies!

  9. shafi47 said

    #!/usr/bin/perl -w
    use strict;

    my @array = qw/1 2 3 4 9 10 11/;

    print median(@array);

    sub median
    {
    my @vals = sort {$a $b} @_;
    my $len = @vals;
    if($len%2) #odd?
    {
    return $vals[int($len/2)];
    }
    else #even
    {
    return ($vals[int($len/2)-1] + $vals[int($len/2)])/2;
    }
    }

  10. Mayur Lokare said

    #include
    int main()
    {
    int a[10],i,j,n;
    printf(“Enter the no of events\n”);
    scanf(“%d”,&n);
    printf(“Input\n”);
    for(i=0;i<n;i++)
    scanf("%d",a+i);
    for(i=1;i<n;i++)
    for(j=0;ja[i])
    a[j] = a[j]*a[i]/(a[i]=a[j]);

    printf(“Output\n”);
    for(i=0;i %d”,a[n/2]);
    else
    printf(“Median is -> %d”,(a[n/2]+a[n/2 -1])/2);

    }

Leave a comment