## Selection

### December 11, 2009

In statistics, the median is the smallest item that is greater than half the values in a set. Statisticians also define statistics such as quartiles, deciles and percentiles that show the value of an item relative to the other items in a set. A function known as select allows us to find the kth smallest item in a set; that is, the smallest item for which k of the items in the set are smaller.

A simple algorithm for select has us first partition the set into two subsets that are respectively smaller and larger than some randomly-chosen item from the set. If the smaller-than set has more than k elements, it must contain the desired element, so search recursively for the desired element in the smaller-than set; otherwise, subtract the size of the smaller-than set k, then search recursively for the desired element in the larger-than set.

Your task is to write a function that implements the selection algorithm. When you are finished, you are welcome to read or run a suggested solution, or to post your solution or discuss the exercise in the comments below.

Pages: 1 2

### 6 Responses to “Selection”

My solution in Ruby:

def select(k,array)
elem = array[array.size / 2]
smaller = array.find_all { |e| e < elem }
larger = array.find_all { |e| e >= elem }
if(smaller.size == k)
return elem
end
if (smaller.size > k)
select(k,smaller)
else
select(k – smaller.size,larger)
end
end

2. wja said

Python solution:

def select(S, k):
# make sure we have a builtin 'set'
if not isinstance(S, set): S = set(S)
assert len(S) >= k # else it's *bad* input
item = S.pop()
smaller = set()
greater = set()
for it in S:
l = len(smaller)
if l == k: return item
elif l > k: return select(smaller, k)
else: return select(greater, k-l-1)

3. Axio said

Unefficient Common Lisp, but very directâ€¦

1 (defun select (k li)
2   (when (>= k 0)
3     (nth k (sort li #'<))))

4. programmingpraxis said

Jos Koot pointed out a bug: (select 3 '(1 1 2 2 3 3 4 4 5 5 6 6 7 7)) results in an infinite loop. The bug manifests itself whenever the kth element has a duplicate.

Both select and partition have to change to fix the bug. Here is partition, where the (cdr xs) in the initialization of the loop ensures that at least one element is removed from the list at each iteration, thus forcing progress toward termination:

(define (partition xs)
(let ((x (car xs)))
(let loop ((xs (cdr xs)) (lt '()) (ge '()))
(cond ((null? xs) (values lt x ge))
((< (car xs) x)
(loop (cdr xs) (cons (car xs) lt) ge))
(else (loop (cdr xs) lt (cons (car xs) ge)))))))

Select also has to change, because the partition element is no longer included in the "greater-than" partition:

(define (select k xs)
(if (<= (length xs) k)
(error 'select "out of range")
(let loop ((k k) (xs (shuffle xs)))
(let-values (((lt x ge) (partition xs)))
(cond ((< k (length lt)) (loop k lt))
((< (length lt) k) (loop (- k (length lt) 1) ge))
(else x))))))

The variable name gt is changed to ge to reflect that equal elements are possible, and are (arbitrarily) placed in the "greater-than" partition. The order of the return elements from partition is also changed, so that when reading debug output from partition the elements are physically displayed in their proper order, a small point that nonetheless improves the program.

5. Vikas Tandi said

My implementation in C

static void swap(int arr[], int i, int j);
static int partition(int arr[], int i, int j);
static int select_imp(int arr[], int left, int right, int k);

/* assume arr[left-1] <= arr[k] for left <= k <= right and
arr[right+1] >= arr[k] for left <= k <= right */
int select(int arr[], int left, int right, int k)
{
if(right < left)
return -1;

if(k > (right + 1) || k < (left+1))
return -1;

return select_imp(arr, left, right, k);
}

static int select_imp(int arr[], int left, int right, int k)
{
int pos;

pos = partition(arr, left, right);

if(pos == k)
return arr[pos];
else if(pos > k)
return select_imp(arr, left, pos-1, k);
else
return select_imp(arr, pos+1, right, k);
}

/* assume arr[left-1] <= arr[k] for left <= k <= right and
arr[right+1] >= arr[k] for left <= k <= right */
static int partition(int arr[], int left, int right)
{
int pivot, mid;
int i, j, key;

/* find partition element */
mid = (left + right)/2;
pivot = (left + mid + right)/3;
swap(arr, left, pivot);

/* partition array */
for(i = left+1, j = right, key = arr[left]; ; i++, j--)
{
/* find first bigger element by moving from left to right */
for(; arr[i] < key; i++);

/* find first smaller element by moving from right to left */
for(; arr[j] > key; j--);

if(j <= i)
break;

/* swap arr[i] with arr[j] */
swap(arr, i, j);
}
/* swap pivot element with jth element */
swap(arr, left, j);
return j;
}

static void swap(int arr[], int i, int j)
{
int tmp;

tmp = arr[i];
arr[i] = arr[j];
arr[j] = tmp;
}

6. Maithily said

Here is a java method to accomplish the same. I think there is a better solution feasible. Please suggest/comment.

public static int select(int[] a, int k)
{
int length = a.length;

int randomElementIndex = (new Long(Math.round((length-1)*Math.random()))).intValue();
int randomElement = a[randomElementIndex];

//Lower and Upper Limits of k are handled in the following loop.
if (k == 0 || k==length)
{
Arrays.sort(a);
return k==0?a[0]:a[length-1];
}

int[] leftTransientArray = new int[length];
int[] rightTransientArray = new int[length];

int laIndex = 0, raIndex = 0;

for(int i=0; i<length; i++)
{
if (a[i] randomElement)
{
rightTransientArray[raIndex]=a[i];
raIndex++;
}
}

int[] leftArray = Arrays.copyOfRange(leftTransientArray, 0, laIndex);
int[] rightArray = Arrays.copyOfRange(rightTransientArray, 0, raIndex);

if (leftArray.length == k+1)
return randomElement;
else if (leftArray.length <= k )
return select(rightArray, k-leftArray.length);
else
return select(leftArray, k);
}