The 37% Rule

September 21, 2018

You have to hire a new programmer from a pool of 100 applicants. One method is to interview all 100 and hire the best, but that takes a while. Mathematicians have developed the 37% rule:

Examine 37% of the candidates, knowing in advance you will not select any of them. Then examine the remaining candidates one-by-one, choosing the first of them that is better than any of the first 37%.

Your task is to simulate this process and determine how close you get to the optimal candidate. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

4 Comments »

4 Responses to “The 37% Rule”

Paul said

September 21, 2018 at 9:57 AM

In Python. Did this a long time ago. Another possible application is: how to choose your wife.

import itertools as IT
import math
import time
import random

def strategy3(options, perc):
    """
    Choose a candidate by checking perc percent first and
    then take the first better after that
    Return True if best candidate is selected
    """
    ibest = options.index(max(options))
    left = options[:ibest]
    ileftbest = left.index(max(left)) if left else 0
    first_size = round(len(options) * perc)
    return ileftbest <= first_size < ibest

def create_random(option_size):
    """Create a random case with brides
    """
    wrk = list(range(option_size))
    random.shuffle(wrk)
    return wrk

def gen_random(cases, option_size):
    return IT.cycle(create_random(option_size) for i in range(cases))

def batch(cases, perc, ran):
    """ Run cases
    """
    return sum([strategy3(next(ran), perc) for i in range(cases)]) / cases

def test(cases, options_size, perc_list):
    """ RUn many cases for various percentages
    """
    start_time = time.time()
    ran = gen_random(cases, options_size)
    for perc in perc_list:
        success = batch(cases, perc, ran)
        print("%4.2f %6.4f %s" % (perc, success, pr_result(success)))
    print("Time elapsed", time.time() - start_time)

def pr_result(res):
    i = int(res*100)
    return "".join(["*"] * i)

if __name__ == "__main__":
    test(100000, 50, [0.05 * i for i in range(1, 20)])
    print("Theoretical value for % skipped and chance of success", 1.0 / math.e)

skip% %success    
0.05 0.1790 *****************
0.10 0.2627 **************************
0.15 0.3163 *******************************
0.20 0.3413 **********************************
0.25 0.3589 ***********************************
0.30 0.3725 *************************************
0.35 0.3753 *************************************
0.40 0.3720 *************************************
0.45 0.3639 ************************************
0.50 0.3443 **********************************
0.55 0.3192 *******************************
0.60 0.3006 ******************************
0.65 0.2772 ***************************
0.70 0.2391 ***********************
0.75 0.1961 *******************
0.80 0.1643 ****************
0.85 0.1132 ***********
0.90 0.0764 *******
0.95 0.0199 *
Time elapsed 20.012144565582275
Theoretical value for % skipped and chance of success 0.36787944117144233

Pascal Bourguignon said
September 21, 2018 at 5:45 PM
This is a silly algorithm to select hires.

Note that the original algorithm said that if you didn’t find a better candidate in the 63% remaining, you had to take the last one.

This algorithm is assuming that you cannot go back to select a previously seen candidate. The original example application was to select a husband or wife. But here again, it’s not the right algorithm, because you don’t know how many potential spouses you will encounter. And even then you can behave properly and be able to come back to a previous candidate successfully (even if it’s not always easy).

So the conditions of application of this mathematical theorem are rather restrictive: you need to know the number of candidate you will encounter, you may be able to consider only one candidate at a time, and you must decide immediately whether you keep it or pass for ever. Also, the order in which the candidates occur must be random!

When you have to hire a new programmer from a pool of 100 applicants, (this is the purpose of this exercise right?) the right algorithm to use is to filter them out using a set of criteria, ordering the criteria by the rejection rate they allow.

Eg. you can reject 75% of them on their resume (bad formatting, bad spelling, bad job). Then you can eliminate further 75% of them by asking them to write a small program remotely at their leisure time.

That leaves 6 candidates that you can interview in an afternoon. You will be able to reject easily the one who smells, the one you can’t fathom working with, the two that are obviously not fit for the job, and take the one asking the cheapest salary of the two remaining ones.

Daniel said

September 21, 2018 at 10:39 PM

This is a variation of the secretary problem.

Here’s a solution in C. The average ranking of the selected candidate was 20 (where the best candidate is ranked 1 and the worst is ranked 100).

The secretary problem maximizes the probability of selecting the best candidate. The cardinal payoff variant maximizes the expected value of the hire. Under the assumptions of that variant of the problem, the optimal strategy is to reject the first sqrt(N) = 10 applicants, and proceed the same way as the original problem. Quick experiments suggest that strategy does not minimize the expected ranking (it’s formulated to maximize the expected value, not minimize expected ranking). Rejecting the first 9 users (out of 100) and proceeding as usual seems to minimize the expected ranking.

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define N_CAND 100
#define N_INIT_REJECTS 37

void shuffle(int* array, int n) {
    for (int i = n - 1; i > 0; --i) {
        // the following is slightly biased when RAND_MAX % (i+1) != 0
        int j = rand() % (i + 1);
        int tmp = array[i];
        array[i] = array[j];
        array[j] = tmp;
    }
}

int main(int argc, char* argv[]) {
    if (argc != 2) {
        fprintf(stderr, "Usage: %s <ITERATIONS>\n", argv[0]);
        return EXIT_FAILURE;
    }
    struct timespec t;
    clock_gettime(CLOCK_MONOTONIC, &t);
    srand(t.tv_nsec);
    int n_iter = atoi(argv[1]);
    int rankings[N_CAND];
    for (int i = 0; i < N_CAND; ++i) {
        rankings[i] = i + 1;
    }
    double running_average = 0.0;
    for (int iter = 1; iter <= n_iter; ++iter) {
        shuffle(rankings, N_CAND);
        int cutoff = N_CAND + 1;
        for (int i = 0; i < N_INIT_REJECTS; ++i) {
            if (rankings[i] < cutoff) cutoff = rankings[i];
        }
        int selection = rankings[N_CAND - 1];
        for (int i = N_INIT_REJECTS; i < N_CAND; ++i) {
            if (rankings[i] < cutoff) selection = rankings[i];
        }
        running_average += (selection - running_average) / iter;
    }
    printf("%f\n", running_average); return EXIT_SUCCESS;
}

Example:

$ ./a.out 100000000
20.018909

Mark said

September 22, 2018 at 2:03 AM

from random import shuffle

num_trials = 10000
cnt = 0
candidates = list(range(100))

for _ in range(num_trials):
    shuffle(candidates)
    best_so_far = max(candidates[:37])
    if best_so_far == 99: # out of luck
        continue
    for n in candidates[37:]: # the rest
        if n > best_so_far:
            if n == 99: # you got the best
                cnt += 1
            break
print(cnt/num_trials)

Programming Praxis

The 37% Rule

September 21, 2018

4 Responses to “The 37% Rule”

Leave a comment

Categories

Archives

Archives

Programming Praxis

The 37% Rule

September 21, 2018

Share this:

Related

4 Responses to “The 37% Rule”

Leave a comment

Categories

Archives

Archives