## Selection

### December 11, 2009

In statistics, the *median* is the smallest item that is greater than half the values in a set. Statisticians also define statistics such as quartiles, deciles and percentiles that show the value of an item relative to the other items in a set. A function known as `select`

allows us to find the *k*th smallest item in a set; that is, the smallest item for which *k* of the items in the set are smaller.

A simple algorithm for `select`

has us first partition the set into two subsets that are respectively smaller and larger than some randomly-chosen item from the set. If the smaller-than set has more than *k* elements, it must contain the desired element, so search recursively for the desired element in the smaller-than set; otherwise, subtract the size of the smaller-than set *k*, then search recursively for the desired element in the larger-than set.

Your task is to write a function that implements the selection algorithm. When you are finished, you are welcome to read or run a suggested solution, or to post your solution or discuss the exercise in the comments below.

My solution in Ruby:

def select(k,array)

elem = array[array.size / 2]

smaller = array.find_all { |e| e < elem }

larger = array.find_all { |e| e >= elem }

if(smaller.size == k)

return elem

end

if (smaller.size > k)

select(k,smaller)

else

select(k – smaller.size,larger)

end

end

Python solution:

Unefficient Common Lisp, but very directâ€¦

1 (

defunselect (k li)2 (

when(>=k 0)3 (

nthk (sortli#'<))))Jos Koot pointed out a bug:

`(select 3 '(1 1 2 2 3 3 4 4 5 5 6 6 7 7)) results in an infinite loop. The bug manifests itself whenever the`

kth element has a duplicate.`Both`

`select`

and`partition`

have to change to fix the bug. Here is`partition`

, where the`(cdr xs)`

in the initialization of the loop ensures that at least one element is removed from the list at each iteration, thus forcing progress toward termination:`(define (partition xs)`

(let ((x (car xs)))

(let loop ((xs (cdr xs)) (lt '()) (ge '()))

(cond ((null? xs) (values lt x ge))

((< (car xs) x)

(loop (cdr xs) (cons (car xs) lt) ge))

(else (loop (cdr xs) lt (cons (car xs) ge)))))))

`Select`

also has to change, because the partition element is no longer included in the "greater-than" partition:`(define (select k xs)`

(if (<= (length xs) k)

(error 'select "out of range")

(let loop ((k k) (xs (shuffle xs)))

(let-values (((lt x ge) (partition xs)))

(cond ((< k (length lt)) (loop k lt))

((< (length lt) k) (loop (- k (length lt) 1) ge))

(else x))))))

The variable name

`gt`

is changed to`ge`

to reflect that equal elements are possible, and are (arbitrarily) placed in the "greater-than" partition. The order of the return elements from`partition`

is also changed, so that when reading debug output from`partition`

the elements are physically displayed in their proper order, a small point that nonetheless improves the program.My implementation in C

Here is a java method to accomplish the same. I think there is a better solution feasible. Please suggest/comment.

public static int select(int[] a, int k)

{

int length = a.length;

int randomElementIndex = (new Long(Math.round((length-1)*Math.random()))).intValue();

int randomElement = a[randomElementIndex];

//Lower and Upper Limits of k are handled in the following loop.

if (k == 0 || k==length)

{

Arrays.sort(a);

return k==0?a[0]:a[length-1];

}

int[] leftTransientArray = new int[length];

int[] rightTransientArray = new int[length];

int laIndex = 0, raIndex = 0;

for(int i=0; i<length; i++)

{

if (a[i] randomElement)

{

rightTransientArray[raIndex]=a[i];

raIndex++;

}

}

int[] leftArray = Arrays.copyOfRange(leftTransientArray, 0, laIndex);

int[] rightArray = Arrays.copyOfRange(rightTransientArray, 0, raIndex);

if (leftArray.length == k+1)

return randomElement;

else if (leftArray.length <= k )

return select(rightArray, k-leftArray.length);

else

return select(leftArray, k);

}