Selection
December 11, 2009
Our select function implements the algorithm exactly. Randomization is accomplished by shuffling as the algorithm begins:
(define (select k xs)
(if (<= (length xs) k)
(error 'select "out of range")
(let loop ((k k) (xs (shuffle xs)))
(let-values (((x lt gt) (partition xs)))
(cond ((< k (length lt)) (loop k lt))
((< (length lt) k) (loop (- k (length lt)) gt))
(else x))))))
Partition returns the partition element, the smaller-than partition, and the greater-than partition:
(define (partition xs)
(let ((x (car xs)))
(let loop ((xs xs) (lt '()) (gt '()))
(cond ((null? xs) (values x lt gt))
((< (car xs) x)
(loop (cdr xs) (cons (car xs) lt) gt))
(else (loop (cdr xs) lt (cons (car xs) gt)))))))
We used shuffle from the Standard Prelude, which in turn calls randint and rand; we also used let-values, but most Scheme systems now provide this syntax as part of their standard environment. You can run the code at http://programmingpraxis.codepad.org/EpyiXmk9.
My solution in Ruby:
def select(k,array)
elem = array[array.size / 2]
smaller = array.find_all { |e| e < elem }
larger = array.find_all { |e| e >= elem }
if(smaller.size == k)
return elem
end
if (smaller.size > k)
select(k,smaller)
else
select(k – smaller.size,larger)
end
end
Python solution:
def select(S, k): # make sure we have a builtin 'set' if not isinstance(S, set): S = set(S) assert len(S) >= k # else it's *bad* input item = S.pop() smaller = set() greater = set() for it in S: if it < item: smaller.add(it) else: greater.add(it) l = len(smaller) if l == k: return item elif l > k: return select(smaller, k) else: return select(greater, k-l-1)Unefficient Common Lisp, but very direct…
1 (defun select (k li)
2 (when (>= k 0)
3 (nth k (sort li #'<))))
Jos Koot pointed out a bug:
(select 3 '(1 1 2 2 3 3 4 4 5 5 6 6 7 7)) results in an infinite loop. The bug manifests itself whenever the kth element has a duplicate.Both
selectandpartitionhave to change to fix the bug. Here ispartition, where the(cdr xs)in the initialization of the loop ensures that at least one element is removed from the list at each iteration, thus forcing progress toward termination:(define (partition xs)(let ((x (car xs)))
(let loop ((xs (cdr xs)) (lt '()) (ge '()))
(cond ((null? xs) (values lt x ge))
((< (car xs) x)
(loop (cdr xs) (cons (car xs) lt) ge))
(else (loop (cdr xs) lt (cons (car xs) ge)))))))
Selectalso has to change, because the partition element is no longer included in the "greater-than" partition:(define (select k xs)(if (<= (length xs) k)
(error 'select "out of range")
(let loop ((k k) (xs (shuffle xs)))
(let-values (((lt x ge) (partition xs)))
(cond ((< k (length lt)) (loop k lt))
((< (length lt) k) (loop (- k (length lt) 1) ge))
(else x))))))
The variable name
gtis changed togeto reflect that equal elements are possible, and are (arbitrarily) placed in the "greater-than" partition. The order of the return elements frompartitionis also changed, so that when reading debug output frompartitionthe elements are physically displayed in their proper order, a small point that nonetheless improves the program.My implementation in C
static void swap(int arr[], int i, int j); static int partition(int arr[], int i, int j); static int select_imp(int arr[], int left, int right, int k); /* assume arr[left-1] <= arr[k] for left <= k <= right and arr[right+1] >= arr[k] for left <= k <= right */ int select(int arr[], int left, int right, int k) { if(right < left) return -1; if(k > (right + 1) || k < (left+1)) return -1; return select_imp(arr, left, right, k); } static int select_imp(int arr[], int left, int right, int k) { int pos; pos = partition(arr, left, right); if(pos == k) return arr[pos]; else if(pos > k) return select_imp(arr, left, pos-1, k); else return select_imp(arr, pos+1, right, k); } /* assume arr[left-1] <= arr[k] for left <= k <= right and arr[right+1] >= arr[k] for left <= k <= right */ static int partition(int arr[], int left, int right) { int pivot, mid; int i, j, key; /* find partition element */ mid = (left + right)/2; pivot = (left + mid + right)/3; swap(arr, left, pivot); /* partition array */ for(i = left+1, j = right, key = arr[left]; ; i++, j--) { /* find first bigger element by moving from left to right */ for(; arr[i] < key; i++); /* find first smaller element by moving from right to left */ for(; arr[j] > key; j--); if(j <= i) break; /* swap arr[i] with arr[j] */ swap(arr, i, j); } /* swap pivot element with jth element */ swap(arr, left, j); return j; } static void swap(int arr[], int i, int j) { int tmp; tmp = arr[i]; arr[i] = arr[j]; arr[j] = tmp; }Here is a java method to accomplish the same. I think there is a better solution feasible. Please suggest/comment.
public static int select(int[] a, int k)
{
int length = a.length;
int randomElementIndex = (new Long(Math.round((length-1)*Math.random()))).intValue();
int randomElement = a[randomElementIndex];
//Lower and Upper Limits of k are handled in the following loop.
if (k == 0 || k==length)
{
Arrays.sort(a);
return k==0?a[0]:a[length-1];
}
int[] leftTransientArray = new int[length];
int[] rightTransientArray = new int[length];
int laIndex = 0, raIndex = 0;
for(int i=0; i<length; i++)
{
if (a[i] randomElement)
{
rightTransientArray[raIndex]=a[i];
raIndex++;
}
}
int[] leftArray = Arrays.copyOfRange(leftTransientArray, 0, laIndex);
int[] rightArray = Arrays.copyOfRange(rightTransientArray, 0, raIndex);
if (leftArray.length == k+1)
return randomElement;
else if (leftArray.length <= k )
return select(rightArray, k-leftArray.length);
else
return select(leftArray, k);
}