## Two Stream Selection Questions

### February 6, 2015

The first problem has a well-known solution due to Robert Floyd: the first item is selected with probability 1/1, the second item replaces the current selection with probability 1/2, the third item replaces that selection with probability 1/3, and so on, so that the kth item is selected with probability 1/k:

```(define (select-equal xs)   (let loop ((n 1) (x #f) (xs xs))     (cond ((null? xs) x)           ((< (rand) (/ n))             (loop (+ n 1) (car xs) (cdr xs)))           (else (loop (+ n 1) x (cdr xs))))))```

``` ```

```> (select-equal '(A D F A G)) A```

This algorithm appears in the Standard Prelude under the name `fortune`, which derives rom the unix game of the same name that randomly selects an epigram from a file containing one per line.

The second problem is a simple variant of the first; instead of adding 1 at each step, we add the current weight. Thus, in our sample, A is selected with probability 1/1, D replaces A with probability 2/3, F replaces the current selection with probability 5/8, A replaces the current selection with probability 8/11, and G replaces the current selection with probability 9/20:

```(define (select-weighted xs)   (let loop ((n 0) (x #f) (xs xs))     (if (null? xs) x       (let ((y (caar xs)))         (if (< (rand) (/ y (+ n y)))             (loop (+ n y) (cadar xs) (cdr xs))             (loop (+ n y) x (cdr xs)))))))```

``` ```

```> (select-weighted '((1 A) (2 D) (5 F) (3 A) (9 G))) G```

We need to demonstrate that the two functions make selections in the expected proportions. Function `count` makes a table showing the number of times each item is selected during n trials:

```(define (symbol<? a b)   (string<? (symbol->string a) (symbol->string b)))```

``` (define (symbol=? a b)   (string=? (symbol->string a) (symbol->string b))) ```

```(define (count n select input)   (uniq-c symbol=?     (sort symbol<?       (map (select input)            (range n)))))```

And here’s the output for the two sets of trials, clearly showing the correct counts:

```> (for-each     (lambda (x)       (display (car x)) (display #\tab)       (display (cdr x)) (newline))     (count 50000 select-equal '(A D F A G))) A       20095 D       9840 F       10119 G       9946 > (for-each     (lambda (x)       (display (car x)) (display #\tab)       (display (cdr x)) (newline))     (count 200000 select-weighted '((1 A) (2 D) (5 F) (3 A) (9 G)))) A       40273 D       20134 F       49914 G       89679```

We used range, uniq-c and random numbers from the Standard Prelude. You can run the program at http://programmingpraxis.codepad.org/8yiHSI59.

Pages: 1 2

### 2 Responses to “Two Stream Selection Questions”

1. Mike said
```from itertools import repeat
from random import uniform

def random_pick(items, weights=repeat(1.0)):
"""Provides a weighted selection from items, wherein the probability of being selected is the
item_weight/total_weights.  The optional weights can be ints, floats, or Fractions.
"""
total_weight = 0.0

for item, weight in zip(items, weights):
total_weight += weight
if uniform(0.0, total_weight) < weight:
choice = item

return choice
```

Examples:
>>> items = (‘red’, ‘blue’, ‘green’)
>>> Counter(random_pick(items) for _ in range(1000))
Counter({‘green’: 353, ‘red’: 324, ‘blue’: 323})

>>> weights = (0.5, 1.0, 1.5)
>>> Counter(random_pick(items, weights) for _ in range(1000))
Counter({‘green’: 519, ‘blue’: 317, ‘red’: 164})

>>> from fractions import *
>>> weights = (Fraction(1,2), Fraction(1,3), Fraction(1,6))
>>> Counter(random_pick(items, weights) for _ in range(1000))
Counter({‘red’: 504, ‘blue’: 344, ‘green’: 152})