Two Stream Selection Questions

February 6, 2015

The first problem has a well-known solution due to Robert Floyd: the first item is selected with probability 1/1, the second item replaces the current selection with probability 1/2, the third item replaces that selection with probability 1/3, and so on, so that the kth item is selected with probability 1/k:

(define (select-equal xs)
  (let loop ((n 1) (x #f) (xs xs))
    (cond ((null? xs) x)
          ((< (rand) (/ n))
            (loop (+ n 1) (car xs) (cdr xs)))
          (else (loop (+ n 1) x (cdr xs))))))

> (select-equal '(A D F A G))
A

This algorithm appears in the Standard Prelude under the name fortune, which derives rom the unix game of the same name that randomly selects an epigram from a file containing one per line.

The second problem is a simple variant of the first; instead of adding 1 at each step, we add the current weight. Thus, in our sample, A is selected with probability 1/1, D replaces A with probability 2/3, F replaces the current selection with probability 5/8, A replaces the current selection with probability 8/11, and G replaces the current selection with probability 9/20:

(define (select-weighted xs)
  (let loop ((n 0) (x #f) (xs xs))
    (if (null? xs) x
      (let ((y (caar xs)))
        (if (< (rand) (/ y (+ n y)))
            (loop (+ n y) (cadar xs) (cdr xs))
            (loop (+ n y) x (cdr xs)))))))

> (select-weighted '((1 A) (2 D) (5 F) (3 A) (9 G)))
G

We need to demonstrate that the two functions make selections in the expected proportions. Function count makes a table showing the number of times each item is selected during n trials:

(define (symbol<? a b)
  (string<? (symbol->string a) (symbol->string b)))

(define (symbol=? a b)
  (string=? (symbol->string a) (symbol->string b)))

(define (count n select input)
  (uniq-c symbol=?
    (sort symbol<?
      (map (select input)
           (range n)))))

And here’s the output for the two sets of trials, clearly showing the correct counts:

> (for-each
    (lambda (x)
      (display (car x)) (display #\tab)
      (display (cdr x)) (newline))
    (count 50000 select-equal '(A D F A G)))
A       20095
D       9840
F       10119
G       9946
> (for-each
    (lambda (x)
      (display (car x)) (display #\tab)
      (display (cdr x)) (newline))
    (count 200000 select-weighted '((1 A) (2 D) (5 F) (3 A) (9 G))))
A       40273
D       20134
F       49914
G       89679

We used range, uniq-c and random numbers from the Standard Prelude. You can run the program at http://programmingpraxis.codepad.org/8yiHSI59.

Advertisement

Pages: 1 2

2 Responses to “Two Stream Selection Questions”

  1. Mike said
    from itertools import repeat
    from random import uniform
    
    def random_pick(items, weights=repeat(1.0)):
        """Provides a weighted selection from items, wherein the probability of being selected is the
            item_weight/total_weights.  The optional weights can be ints, floats, or Fractions.
            """
        total_weight = 0.0
    
        for item, weight in zip(items, weights):
            total_weight += weight
            if uniform(0.0, total_weight) < weight:
                choice = item
    
        return choice
    

    Examples:
    >>> items = (‘red’, ‘blue’, ‘green’)
    >>> Counter(random_pick(items) for _ in range(1000))
    Counter({‘green’: 353, ‘red’: 324, ‘blue’: 323})

    >>> weights = (0.5, 1.0, 1.5)
    >>> Counter(random_pick(items, weights) for _ in range(1000))
    Counter({‘green’: 519, ‘blue’: 317, ‘red’: 164})

    >>> from fractions import *
    >>> weights = (Fraction(1,2), Fraction(1,3), Fraction(1,6))
    >>> Counter(random_pick(items, weights) for _ in range(1000))
    Counter({‘red’: 504, ‘blue’: 344, ‘green’: 152})

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: