## Selection

### December 11, 2009

In statistics, the median is the smallest item that is greater than half the values in a set. Statisticians also define statistics such as quartiles, deciles and percentiles that show the value of an item relative to the other items in a set. A function known as `select` allows us to find the kth smallest item in a set; that is, the smallest item for which k of the items in the set are smaller.

A simple algorithm for `select` has us first partition the set into two subsets that are respectively smaller and larger than some randomly-chosen item from the set. If the smaller-than set has more than k elements, it must contain the desired element, so search recursively for the desired element in the smaller-than set; otherwise, subtract the size of the smaller-than set k, then search recursively for the desired element in the larger-than set.

Your task is to write a function that implements the selection algorithm. When you are finished, you are welcome to read or run a suggested solution, or to post your solution or discuss the exercise in the comments below.

Pages: 1 2

### 6 Responses to “Selection”

My solution in Ruby:

def select(k,array)
elem = array[array.size / 2]
smaller = array.find_all { |e| e < elem }
larger = array.find_all { |e| e >= elem }
if(smaller.size == k)
return elem
end
if (smaller.size > k)
select(k,smaller)
else
select(k – smaller.size,larger)
end
end

2. wja said

Python solution:

```def select(S, k):
# make sure we have a builtin 'set'
if not isinstance(S, set): S = set(S)
assert len(S) >= k # else it's *bad* input
item = S.pop()
smaller = set()
greater = set()
for it in S:
l = len(smaller)
if l == k: return item
elif l > k: return select(smaller, k)
else: return select(greater, k-l-1)
```
3. Axio said

Unefficient Common Lisp, but very direct…

1 (defun select (k li)
2   (when (>= k 0)
3     (nth k (sort li #'<))))

4. programmingpraxis said

Jos Koot pointed out a bug: `(select 3 '(1 1 2 2 3 3 4 4 5 5 6 6 7 7)) results in an infinite loop. The bug manifests itself whenever the kth element has a duplicate.`

``` ```

`Both ``select` and `partition` have to change to fix the bug. Here is `partition`, where the `(cdr xs)` in the initialization of the loop ensures that at least one element is removed from the list at each iteration, thus forcing progress toward termination:

```(define (partition xs)   (let ((x (car xs)))     (let loop ((xs (cdr xs)) (lt '()) (ge '()))       (cond ((null? xs) (values lt x ge))             ((< (car xs) x)               (loop (cdr xs) (cons (car xs) lt) ge))             (else (loop (cdr xs) lt (cons (car xs) ge)))))))```

`Select` also has to change, because the partition element is no longer included in the "greater-than" partition:

```(define (select k xs)   (if (<= (length xs) k)       (error 'select "out of range")       (let loop ((k k) (xs (shuffle xs)))         (let-values (((lt x ge) (partition xs)))           (cond ((< k (length lt)) (loop k lt))                 ((< (length lt) k) (loop (- k (length lt) 1) ge))                 (else x))))))```

The variable name `gt` is changed to `ge` to reflect that equal elements are possible, and are (arbitrarily) placed in the "greater-than" partition. The order of the return elements from `partition` is also changed, so that when reading debug output from `partition` the elements are physically displayed in their proper order, a small point that nonetheless improves the program.

5. Vikas Tandi said

My implementation in C

```static void swap(int arr[], int i, int j);
static int partition(int arr[], int i, int j);
static int select_imp(int arr[], int left, int right, int k);

/* assume arr[left-1] <= arr[k] for left <= k <= right and
arr[right+1] >= arr[k] for left <= k <= right */
int select(int arr[], int left, int right, int k)
{
if(right < left)
return -1;

if(k > (right + 1) || k < (left+1))
return -1;

return select_imp(arr, left, right, k);
}

static int select_imp(int arr[], int left, int right, int k)
{
int pos;

pos = partition(arr, left, right);

if(pos == k)
return arr[pos];
else if(pos > k)
return select_imp(arr, left, pos-1, k);
else
return select_imp(arr, pos+1, right, k);
}

/* assume arr[left-1] <= arr[k] for left <= k <= right and
arr[right+1] >= arr[k] for left <= k <= right */
static int partition(int arr[], int left, int right)
{
int pivot, mid;
int i, j, key;

/* find partition element */
mid = (left + right)/2;
pivot = (left + mid + right)/3;
swap(arr, left, pivot);

/* partition array */
for(i = left+1, j = right, key = arr[left]; ; i++, j--)
{
/* find first bigger element by moving from left to right */
for(; arr[i] < key; i++);

/* find first smaller element by moving from right to left */
for(; arr[j] > key; j--);

if(j <= i)
break;

/* swap arr[i] with arr[j] */
swap(arr, i, j);
}
/* swap pivot element with jth element */
swap(arr, left, j);
return j;
}

static void swap(int arr[], int i, int j)
{
int tmp;

tmp = arr[i];
arr[i] = arr[j];
arr[j] = tmp;
}
```
6. Maithily said

Here is a java method to accomplish the same. I think there is a better solution feasible. Please suggest/comment.

public static int select(int[] a, int k)
{
int length = a.length;

int randomElementIndex = (new Long(Math.round((length-1)*Math.random()))).intValue();
int randomElement = a[randomElementIndex];

//Lower and Upper Limits of k are handled in the following loop.
if (k == 0 || k==length)
{
Arrays.sort(a);
return k==0?a:a[length-1];
}

int[] leftTransientArray = new int[length];
int[] rightTransientArray = new int[length];

int laIndex = 0, raIndex = 0;

for(int i=0; i<length; i++)
{
if (a[i] randomElement)
{
rightTransientArray[raIndex]=a[i];
raIndex++;
}
}

int[] leftArray = Arrays.copyOfRange(leftTransientArray, 0, laIndex);
int[] rightArray = Arrays.copyOfRange(rightTransientArray, 0, raIndex);

if (leftArray.length == k+1)
return randomElement;
else if (leftArray.length <= k )
return select(rightArray, k-leftArray.length);
else
return select(leftArray, k);
}