Two Stream Selection Questions
February 6, 2015
Today’s exercise provides two questions involving selection from a stream. Both require a solution that takes O(n) time and O(1) space.
First: Given a stream of unknown length, select a random item from the stream, with all items selected with equal probability. For instance, given the stream {A, D, F, A, G} the selection would return A with probability 40% and D, F or G with probability 20%.
Second: Given a stream of unknown length, each item in the stream paired with a weight, select a random item from the stream with all items selected with probability according to their weights. For instance, given the stream {1 A, 2 D, 5 F, 3 A, 9 G} the selection would return A with probability 20%, D with probability 10%, F with probability 25%, and G with probability 45%.
Your task is to write the two programs that select items from streams. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Examples:
>>> items = (‘red’, ‘blue’, ‘green’)
>>> Counter(random_pick(items) for _ in range(1000))
Counter({‘green’: 353, ‘red’: 324, ‘blue’: 323})
>>> weights = (0.5, 1.0, 1.5)
>>> Counter(random_pick(items, weights) for _ in range(1000))
Counter({‘green’: 519, ‘blue’: 317, ‘red’: 164})
>>> from fractions import *
>>> weights = (Fraction(1,2), Fraction(1,3), Fraction(1,6))
>>> Counter(random_pick(items, weights) for _ in range(1000))
Counter({‘red’: 504, ‘blue’: 344, ‘green’: 152})
In golang: http://xojoc.pw/programming-praxis/reservoir-sampling.html