Two Stream Selection Questions

February 6, 2015

The first problem has a well-known solution due to Robert Floyd: the first item is selected with probability 1/1, the second item replaces the current selection with probability 1/2, the third item replaces that selection with probability 1/3, and so on, so that the k^th item is selected with probability 1/k:

(define (select-equal xs) (let loop ((n 1) (x #f) (xs xs)) (cond ((null? xs) x) ((< (rand) (/ n)) (loop (+ n 1) (car xs) (cdr xs))) (else (loop (+ n 1) x (cdr xs))))))

> (select-equal '(A D F A G)) A

This algorithm appears in the Standard Prelude under the name fortune, which derives rom the unix game of the same name that randomly selects an epigram from a file containing one per line.

The second problem is a simple variant of the first; instead of adding 1 at each step, we add the current weight. Thus, in our sample, A is selected with probability 1/1, D replaces A with probability 2/3, F replaces the current selection with probability 5/8, A replaces the current selection with probability 8/11, and G replaces the current selection with probability 9/20:

(define (select-weighted xs) (let loop ((n 0) (x #f) (xs xs)) (if (null? xs) x (let ((y (caar xs))) (if (< (rand) (/ y (+ n y))) (loop (+ n y) (cadar xs) (cdr xs)) (loop (+ n y) x (cdr xs)))))))

> (select-weighted '((1 A) (2 D) (5 F) (3 A) (9 G))) G

We need to demonstrate that the two functions make selections in the expected proportions. Function count makes a table showing the number of times each item is selected during n trials:

(define (symbol<? a b) (string<? (symbol->string a) (symbol->string b)))


(define (symbol=? a b)

  (string=? (symbol->string a) (symbol->string b)))

(define (count n select input) (uniq-c symbol=? (sort symbol<? (map (select input) (range n)))))

And here’s the output for the two sets of trials, clearly showing the correct counts:

> (for-each (lambda (x) (display (car x)) (display #\tab) (display (cdr x)) (newline)) (count 50000 select-equal '(A D F A G))) A 20095 D 9840 F 10119 G 9946 > (for-each (lambda (x) (display (car x)) (display #\tab) (display (cdr x)) (newline)) (count 200000 select-weighted '((1 A) (2 D) (5 F) (3 A) (9 G)))) A 40273 D 20134 F 49914 G 89679

We used range, uniq-c and random numbers from the Standard Prelude. You can run the program at http://programmingpraxis.codepad.org/8yiHSI59.

Posted by programmingpraxis

Filed in Exercises

2 Comments »

2 Responses to “Two Stream Selection Questions”

Mike said
February 6, 2015 at 6:10 PM
```
from itertools import repeat
from random import uniform

def random_pick(items, weights=repeat(1.0)):
    """Provides a weighted selection from items, wherein the probability of being selected is the
        item_weight/total_weights.  The optional weights can be ints, floats, or Fractions.
        """
    total_weight = 0.0

    for item, weight in zip(items, weights):
        total_weight += weight
        if uniform(0.0, total_weight) < weight:
            choice = item

    return choice
```
Examples:
>>> items = (‘red’, ‘blue’, ‘green’)
>>> Counter(random_pick(items) for _ in range(1000))
Counter({‘green’: 353, ‘red’: 324, ‘blue’: 323})

>>> weights = (0.5, 1.0, 1.5)
>>> Counter(random_pick(items, weights) for _ in range(1000))
Counter({‘green’: 519, ‘blue’: 317, ‘red’: 164})

>>> from fractions import *
>>> weights = (Fraction(1,2), Fraction(1,3), Fraction(1,6))
>>> Counter(random_pick(items, weights) for _ in range(1000))
Counter({‘red’: 504, ‘blue’: 344, ‘green’: 152})
xojoc said
February 6, 2015 at 10:02 PM
In golang: http://xojoc.pw/programming-praxis/reservoir-sampling.html

Programming Praxis