Selection
December 11, 2009
Our select
function implements the algorithm exactly. Randomization is accomplished by shuffling as the algorithm begins:
(define (select k xs)
(if (<= (length xs) k)
(error 'select "out of range")
(let loop ((k k) (xs (shuffle xs)))
(let-values (((x lt gt) (partition xs)))
(cond ((< k (length lt)) (loop k lt))
((< (length lt) k) (loop (- k (length lt)) gt))
(else x))))))
Partition
returns the partition element, the smaller-than partition, and the greater-than partition:
(define (partition xs)
(let ((x (car xs)))
(let loop ((xs xs) (lt '()) (gt '()))
(cond ((null? xs) (values x lt gt))
((< (car xs) x)
(loop (cdr xs) (cons (car xs) lt) gt))
(else (loop (cdr xs) lt (cons (car xs) gt)))))))
We used shuffle
from the Standard Prelude, which in turn calls randint
and rand
; we also used let-values
, but most Scheme systems now provide this syntax as part of their standard environment. You can run the code at http://programmingpraxis.codepad.org/EpyiXmk9.
My solution in Ruby:
def select(k,array)
elem = array[array.size / 2]
smaller = array.find_all { |e| e < elem }
larger = array.find_all { |e| e >= elem }
if(smaller.size == k)
return elem
end
if (smaller.size > k)
select(k,smaller)
else
select(k – smaller.size,larger)
end
end
Python solution:
Unefficient Common Lisp, but very direct…
1 (defun select (k li)
2 (when (>= k 0)
3 (nth k (sort li #'<))))
Jos Koot pointed out a bug:
(select 3 '(1 1 2 2 3 3 4 4 5 5 6 6 7 7)) results in an infinite loop. The bug manifests itself whenever the kth element has a duplicate.
Both
select
andpartition
have to change to fix the bug. Here ispartition
, where the(cdr xs)
in the initialization of the loop ensures that at least one element is removed from the list at each iteration, thus forcing progress toward termination:(define (partition xs)
(let ((x (car xs)))
(let loop ((xs (cdr xs)) (lt '()) (ge '()))
(cond ((null? xs) (values lt x ge))
((< (car xs) x)
(loop (cdr xs) (cons (car xs) lt) ge))
(else (loop (cdr xs) lt (cons (car xs) ge)))))))
Select
also has to change, because the partition element is no longer included in the "greater-than" partition:(define (select k xs)
(if (<= (length xs) k)
(error 'select "out of range")
(let loop ((k k) (xs (shuffle xs)))
(let-values (((lt x ge) (partition xs)))
(cond ((< k (length lt)) (loop k lt))
((< (length lt) k) (loop (- k (length lt) 1) ge))
(else x))))))
The variable name
gt
is changed toge
to reflect that equal elements are possible, and are (arbitrarily) placed in the "greater-than" partition. The order of the return elements frompartition
is also changed, so that when reading debug output frompartition
the elements are physically displayed in their proper order, a small point that nonetheless improves the program.My implementation in C
Here is a java method to accomplish the same. I think there is a better solution feasible. Please suggest/comment.
public static int select(int[] a, int k)
{
int length = a.length;
int randomElementIndex = (new Long(Math.round((length-1)*Math.random()))).intValue();
int randomElement = a[randomElementIndex];
//Lower and Upper Limits of k are handled in the following loop.
if (k == 0 || k==length)
{
Arrays.sort(a);
return k==0?a[0]:a[length-1];
}
int[] leftTransientArray = new int[length];
int[] rightTransientArray = new int[length];
int laIndex = 0, raIndex = 0;
for(int i=0; i<length; i++)
{
if (a[i] randomElement)
{
rightTransientArray[raIndex]=a[i];
raIndex++;
}
}
int[] leftArray = Arrays.copyOfRange(leftTransientArray, 0, laIndex);
int[] rightArray = Arrays.copyOfRange(rightTransientArray, 0, raIndex);
if (leftArray.length == k+1)
return randomElement;
else if (leftArray.length <= k )
return select(rightArray, k-leftArray.length);
else
return select(leftArray, k);
}