Finding The Median
September 11, 2015
The median of an array is the value in the middle if the array was sorted; if the array has an odd number of items n, the median is the (n+1)/2’th largest item in the array (which is also the (n+1)/2’th smallest item in the array), and if the array has an even number of items n, the median is the arithmetic average of the n/2’th smallest item in the array and the n/2’th largest item in the array. For instance, the median of the array [2,4,5,7,3,6,1] is 4 and the median of the array [5,2,1,6,3,4] is 3.5.
Your task is to write a program that takes an array of 8-bit integers (possibly but not necessarily in sort) and finds the median value in the array; you should find an algorithm that takes linear time and constant space. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Similar solution in Perl:
#!/usr/bin/env perl use strict; use warnings; use feature qw/say/; sub median { my @array = @_; my @freqs = (0) x 256; foreach my $elem ( @array ) { $freqs[$elem]++; } my ($low, $high) = (0, 255); my ($low_count, $high_count) = (0, 0); my $len = @array; while ( 1 ) { if ( $low_count < ($len / 2) ) { $low_count += $freqs[$low++]; } elsif ( $high_count < ($len / 2) ) { $high_count += $freqs[$high--]; } else { last; } } return ($low + $high) / 2; } say median(qw(2 4 5 7 3 6 1)); # 4 say median(qw(5 2 1 6 3 4)); # 3.5Clever! My knee-jerk reaction was to ignore the 8-bit integer part and just go for the slower generic solution.
I’ve used this as an opportunity to learn some Rust. I like that it made me think about the empty array case.
fn median(v : &Vec<i8>) -> Option<f32> { let len = v.len(); match len { 0 => None, _ => { let mut counts = vec![0; 256]; for x in v { counts[*x as usize] += 1; } let mut lo = 0; let mut lo_count = 0; let mut hi = 255; let mut hi_count = 0; loop { if lo_count < (len / 2) { lo += 1; lo_count += counts[lo]; } else if hi_count < (len / 2) { hi -= 1; hi_count += counts[hi]; } else { break; } } Some((lo as f32 + hi as f32) / 2.0) }, } } fn main() { for v in vec![vec![], vec![2, 4, 5, 7, 3, 6, 1], vec![5, 2, 1, 6, 3, 4]] { println!("median({:?}) = {:?}", &v, median(&v)); } }Python solution.
def median(lst): m = len(lst) if m: m /= 2 count = [0]*256 for x in lst: count[x] += 1 lo, lo_count = 0, 0 while lo_count < m: lo_count += count[lo] lo += 1 hi, hi_count = 255, 0 while hi_count < m: hi_count += count[hi] hi -= 1 return (lo + hi) / 2@graham, I think you have an “off by one error”. In the loop ‘lo’ is incremented before use, so the value of count[0] is never added to lo_count. Similarly, ‘hi’ is pre-decremented, so count[255] is never added to hi_count.
;;; Praxis exercise: median of given bytes in O(n) time, O(1) space. ;;; Plan: binary search for the middle points in the cumulative ;;; frequency distribution in O(n) + O(1) + 2 * O(log n) = O(n) time. ;;; To do: the conditions to extract quartiles, minimum, and maximum. (define (distribution bytes) ; new cumulative vector of counts (let ((u (make-vector 256 0)) (n (vector-length bytes))) (do ((k 0 (+ k 1))) ((= k n) (do ((p 0 (+ p 1)) (k 1 (+ k 1))) ((>= k 256) u) (vector-set! u k (+ (vector-ref u p) (vector-ref u k))))) (let ((o (vector-ref bytes k))) (vector-set! u o (+ 1 (vector-ref u o))))))) (define (index u lo? hi?) ;; an index k of vector u for which neither (lo? u k) nor (hi? u k) (let seek ((b 0) (e (vector-length u))) ; b <= e (if (= b e) (raise 'no-such-index)) (let ((m (quotient (+ b e) 2))) ; b <= m < e; b < m + 1 <= e (cond ((lo? u m) (seek (+ m 1) e)) ; e - (m + 1) < e - b ((hi? u m) (seek b m)) ; m - b < e - b (else m))))) ;;; The conditions were tricky to work out, so there are auxiliaries. (define (half u) ; half of the number of observations (/ (vector-ref u 255) 2)) (define (below k u) ; how many observations are strictly below k (if (= k 0) 0 (at-most (- k 1) u))) (define (at-most k u) ; how many observations are at most k (vector-ref u k)) (define (at-least k u) ; how many observations are at least k (if (= k 0) (vector-ref u 255) (above (- k 1) u))) (define (above k u) ; how many observations are strictly above k (- (vector-ref u 255) (vector-ref u k))) ;;; "sub-median" is the lower-ranking candidate when there are two; ;;; "super-median" is the higher-ranking candidate; the following ;;; predicates test whether an index is too low or too high to be one ;;; of these in a given cumulative byte frequency distribution. (define (sub-median-lo? u k) ;; whether less than half are at most k (< (at-most k u) (half u))) (define (sub-median-hi? u k) ;; whether at least half are below and at most half are above k (<= (above k u) (half u) (below k u))) (define (super-median-lo? u k) ;; whether at most half are below and at least half are above k (<= (below k u) (half u) (above k u))) ; (define (super-median-hi? u k) ;; whether less than half are at least k (< (at-least k u) (half u))) ;;; --- elision of tests used to get the conditions right --- (define (median observations) (let* ((u (distribution observations)) (b (index u sub-median-lo? sub-median-hi?)) (p (index u super-median-lo? super-median-hi?))) (values b p (/ (+ b p) 2)))) ;;; Simpler test output *after* getting the conditions right. (for-each (lambda (data) (display data) (display " => ") (call-with-values (lambda () (median data)) (lambda (b p m) (display b) (display " ") (display p) (display " => ") (display m) (newline)))) '( #(3) #(3 3) #(3 5) #(1 3 5) #(1 3 3 5) #(1 3 5 5) #(3 3 3 5 7) #(3 3 5 7 11) #(1 3 3 3 3 3) #(1 3 3 5 5 5) #(1 3 3 3 5 7) #(1 3 3 5 7 11) #(2 4 5 7 3 6 1) #(5 2 1 6 3 4) ))@Mike: good eye! Sorry for the errors, all.
[…] Programming Praxis problem. The problem is nicely constrained by limiting it to 8-bit […]
It looks like my solution (along with the issue @Mike caught) doesn’t correctly handle the case of a single element array. Apologies!
#!/usr/bin/perl -w
use strict;
my @array = qw/1 2 3 4 9 10 11/;
print median(@array);
sub median
{
my @vals = sort {$a $b} @_;
my $len = @vals;
if($len%2) #odd?
{
return $vals[int($len/2)];
}
else #even
{
return ($vals[int($len/2)-1] + $vals[int($len/2)])/2;
}
}
#include
int main()
{
int a[10],i,j,n;
printf(“Enter the no of events\n”);
scanf(“%d”,&n);
printf(“Input\n”);
for(i=0;i<n;i++)
scanf("%d",a+i);
for(i=1;i<n;i++)
for(j=0;ja[i])
a[j] = a[j]*a[i]/(a[i]=a[j]);
printf(“Output\n”);
for(i=0;i %d”,a[n/2]);
else
printf(“Median is -> %d”,(a[n/2]+a[n/2 -1])/2);
}