The 37% Rule
September 21, 2018
This is easy. For a given problem size n, we start with the integers from 0 to n − 1, shuffle them, find the maximum of the first 37%, then examine the rest one-by-one until we find the winner:
(define (37percent n) (let* ((candidates (shuffle (range n))) (size (round (* 37/100 n))) (cutoff (apply max (take size candidates)))) (display candidates) (display " ") (let loop ((cs (drop size candidates)) (prev #f)) (cond ((null? cs) prev) ((< cutoff (car cs)) (car cs)) (else (loop (cdr cs) (car cs)))))))
Here we display the shuffled list of candidates, so you can verify for yourself that the program works properly:
> (37percent 20) (15 4 14 3 11 7 2 18 0 8 16 5 10 19 1 13 6 12 9 17) 18
We examine the first 37% * 20 = 7 candidates, finding a cutoff of 15; the first candidate after the seventh is 18, which is our winner. Note that we only examined 8 of the 20 candidates. Averaged over 1000 runs, with n = 100, the average winner was 82, which isn’t perfect, but isn’t bad, either:
> (sum (map (lambda (x) (37percent 100)) (range 1000))) 81729
You can run the program at https://ideone.com/3ROCbv, and learn more about the math at Wikipedia.
In Python. Did this a long time ago. Another possible application is: how to choose your wife.
This is a silly algorithm to select hires.
Note that the original algorithm said that if you didn’t find a better candidate in the 63% remaining, you had to take the last one.
This algorithm is assuming that you cannot go back to select a previously seen candidate. The original example application was to select a husband or wife. But here again, it’s not the right algorithm, because you don’t know how many potential spouses you will encounter. And even then you can behave properly and be able to come back to a previous candidate successfully (even if it’s not always easy).
So the conditions of application of this mathematical theorem are rather restrictive: you need to know the number of candidate you will encounter, you may be able to consider only one candidate at a time, and you must decide immediately whether you keep it or pass for ever. Also, the order in which the candidates occur must be random!
When you have to hire a new programmer from a pool of 100 applicants, (this is the purpose of this exercise right?) the right algorithm to use is to filter them out using a set of criteria, ordering the criteria by the rejection rate they allow.
Eg. you can reject 75% of them on their resume (bad formatting, bad spelling, bad job). Then you can eliminate further 75% of them by asking them to write a small program remotely at their leisure time.
That leaves 6 candidates that you can interview in an afternoon. You will be able to reject easily the one who smells, the one you can’t fathom working with, the two that are obviously not fit for the job, and take the one asking the cheapest salary of the two remaining ones.
This is a variation of the secretary problem.
Here’s a solution in C. The average ranking of the selected candidate was 20 (where the best candidate is ranked 1 and the worst is ranked 100).
The secretary problem maximizes the probability of selecting the best candidate. The cardinal payoff variant maximizes the expected value of the hire. Under the assumptions of that variant of the problem, the optimal strategy is to reject the first sqrt(N) = 10 applicants, and proceed the same way as the original problem. Quick experiments suggest that strategy does not minimize the expected ranking (it’s formulated to maximize the expected value, not minimize expected ranking). Rejecting the first 9 users (out of 100) and proceeding as usual seems to minimize the expected ranking.
Example: