Two Interview Questions
June 21, 2016
I like to read a web site called Career Cup, both to enjoy solving some of the programming exercise given there and to find exercise for Programming Praxis. As I write this exercise, here are the two most recent exercises on Career Cup:
- Given a function
rand2
that returns 0 or 1 with equal probability, write a functionrand3
that returns 0, 1 or 2 with equal probability, using onlyrand2
as a source of random numbers. - Given a set of characters and a dictionary of words, find the shortest word in the dictionary that contains all of the characters in the set. In case of a tie, return all the words of the same (shortest) length.
Your task is to write the two programs described above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
The rand3 one, and another rand2 in terms of rand3 using the same method (but not mutually recursive :)
Here’s a C++ solution to the first problem.
#include <algorithm>
#include <iostream>
#include <random>
using std::cout;
using std::endl;
int rand2() {
std::random_device rd;
std::default_random_engine generator(rd());
std::uniform_int_distribution<int> distribution(0,1);
return distribution(generator);
}
int rand3() {
while (true) {
int r = (2 * rand2()) + rand2();
if (r < 3) {
return r;
}
}
}
int main(int argc, char* argv[]) {
int arr[3] = {0};
int n = 100000;
for (int i = 0; i < n; i++) {
arr[rand3()]++;
}
cout << "Distribution:" << endl;
for (int i = 0; i < 3; i++) {
cout << arr[i] / static_cast<float>(n);
if (i < 2) {
cout << ", ";
}
}
}
And here’s the output, which shows the distribution across 0, 1, and 2.
Here’s a C++ solution to the second problem.
#include <algorithm>
#include <iostream>
#include <random>
using std::cout;
using std::endl;
int rand2() {
std::random_device rd;
std::default_random_engine generator(rd());
std::uniform_int_distribution<int> distribution(0,1);
return distribution(generator);
}
int rand3() {
while (true) {
int r = (2 * rand2()) + rand2();
if (r < 3) {
return r;
}
}
}
int main(int argc, char* argv[]) {
int arr[3] = {0};
int n = 100000;
for (int i = 0; i < n; i++) {
arr[rand3()]++;
}
cout << "Distribution:" << endl;
for (int i = 0; i < 3; i++) {
cout << arr[i] / static_cast<float>(n);
if (i < 2) {
cout << ", ";
}
}
}
And here’s the output:
I mistakenly posted the first solution twice. Here’s the second solution. I mistakenly entered lang=”c++” in my earlier posts, but it should be lang=”cpp” to get the formatting. I’ll do that in this post.
And here’s the output:
In Python.
A more interesting version of the first question is to ask for a rand3 that returns in a bounded amount of time, assuming that rand2 returns in a bounded amount of time. I don’t know an answer to this one.
A technically correct solution is simply to arrange for repeated calls on rand3 to return 0, 1, 2, 0, 1, 2 …., ignoring rand2 completely. This is equiprobable without being random in the usual sense.
I find the probabilistic bound interesting: successive rejections can be made vanishingly rare.
Generate n bits to get one out of many possible integers in binary. At a small probability, reject one or two bit patterns so the number of remaining bit patterns is divisible by three. Map the first third of them to 0, the second third to 1, the third third to 2, giving uniform probability.
With n = 4, reject 1 pattern at the small probability 1/4. With n = 62, reject 1 pattern at the even smaller probability 1/4611686018427387904.
The respective probabilities of two rejections in a row would be 1/16 and 1/21267647932558653966460912964485513216 where the smaller bound comes at the cost of always generating extra bits that are usually not needed for uniformity. Programming language random generators do tend to generate a number of bits at a time, which a simple rand2 then throws away. I suppose in this exercise we are pretending to have a less wasteful rand2.
Yet another way to think of it might be that even without any rejection, a large n would produce a distribution that is very close to uniform. One third of the large range would have 1 bit pattern fewer or more than the others. That may not be really in the spirit of this exercise.
@JohnCowan, here’s a proof that rand3 (and all randN where N is not a power of 2) cannot return in a bounded amount of time.
http://cs.stackexchange.com/a/9143
Regarding your proposed approach, I don’t consider that as “technically correct”. I believe the ‘relative frequency’ interpretation of probability assumes independent trials. In the rand3 you proposed, trials are not independent. Then again, you did say “without being random in the usual sense”…
> “Regarding your proposed approach, I don’t consider that as ‘technically correct’.”
Perhaps I’m just being pedantic (unintentionally). I suppose the same argument I made could be made about an ordinary seeded pseudo-random number generator, since the trials are not independent. However, from an observer’s perspective, the trials *seem* independent, whereas they wouldn’t if the function returned 0, 1, 2, 0, 1, 2, …
Solutions in F#:
A Haskell version of the random number problem.
A Haskell version of the dictionary question.
Hmm, the output got a little mangled by the shell redirection. Let’s try it another way.
Ah, but adding two random numbers does not give you a random number back. Take the case where x is 0 or 1 with equal probability, and also y is 0 or 1 with equal probability. Then there are four cases for the sum: 0+0=0, 0+1=1, 1+0=1 and 1+1=2. Therefore the result is twice as likely to be 1 as it is to be 0 or to be 2. So we see the rand3 example above returns eleven ones in a sample of twenty numbers.
@pollyhancock Thanks for pointing that out. I knew I would have to set the bits of the result (pseudo-) independently, but when writing the code I forgot to multiply one of the random numbers by 2. That’s what I get for simply eyeballing the results of a small test run.
In this version I’ve replaced `let z = x + y` with `let z = x + 2*y` and changed the way it prints the results, making it easier to count the values produced.
one easy and probably proper methode is to call rand2() for a couple times (like ten times) and since it returns only 0 or 1, it is possible to combine all the returned 0 and 1s to build up a number in base 2 then converted to base 10, the returned value for rand3() would be the remaider of the built number by 3, which is indeed either 0 or 1 or 2.