## Two Interview Questions

### June 21, 2016

For `rand3`, calculate `2 * rand2() + rand2()`. If the result is less than 3, return it. Otherwise, call `rand3` again:

```(define (rand3)
(let ((x (+ (* 2 (rand2)) (rand2))))
(if (< x 3) x (rand3))))```

With some help from the Standard Prelude, it is easy to demonstrate that the output of `rand3` is uniform:

```> (uniq-c = (sort < (map (lambda (x) (rand3)) (range 100000))))
((0 . 33220) (1 . 33472) (2 . 33308))```

For the shortest word containing a set of characters, there are several solutions, depending on how you want to represent the set and the words. The basic idea is to form each word into a set, intersect it with the set of characters, and if the result is the same as the set of characters, declare a match; then, while iterating through the dictionary, compare any matches found with the previous minimum, keep the match if it is the same length as the current minimum, set a new minimum if it is shorter than the current minimum, or discard it if it is longer than the current minimum. We’ll represent a set as a list of characters in sorted order, which means that our matching algorithm is quadratic in the length of the word being matched:

```(define (make-set chars)
(if (member x xs) xs (cons x xs)))
(do ((chars (string->list chars) (cdr chars))
(set (list) (adjoin (car chars) set)))
((null? chars) (sort char<? set))))```
```(define (intersect set1 set2)
(let loop ((set1 set1) (set2 set2) (result (list)))
(cond ((or (null? set1) (null? set2)) (reverse result))
((char<? (car set1) (car set2))
(loop (cdr set1) set2 result))
((char<? (car set2) (car set1))
(loop set1 (cdr set2) result))
(else (loop (cdr set1) (cdr set2)
(cons (car set1) result))))))```
```(define (min-set chars words)
(let ((set (make-set chars)))
(let loop ((words words) (len 1000) (result (list)))
(if (null? words) result
(if (equal? (intersect set (make-set (car words))) set)
(let ((match-len (string-length (car words))))
(cond ((< match-len len)
(loop (cdr words) match-len (list (car words))))
((< len match-len)
(loop (cdr words) len result))
(else (loop (cdr words) len (cons (car words) result)))))
(loop (cdr words) len result))))))```

`Make-set` turns a word into a set of characters, stored as described above, `intersect` finds the intersection of two sets of characters, and `min-set` solves the problem. Here are some examples:

```> (min-set "eo" '("hello" "goodbye"))
("hello")
> (min-set "eo" '("hello" "goodbye" "wrote"))
("wrote" "hello")
> (min-set "eo" '("hello" "goodbye" "wrote" "hoe"))
("hoe")```

In the first set, “hello” and “goodbye” both match, but “hello” has fewer characters; in the second set, there is a tie for the shortest match, and in the third set all the words match, but only one has minimum length.

You can run the program at http://ideone.com/IKwBku, where you will also see the contributions from the Standard Prelude. There are better ways to represent the set, so if you have a large amount of data, you will want to replace `make-set` and `intersect` with something better.

Pages: 1 2

### 16 Responses to “Two Interview Questions”

1. Jussi Piitulainen said

The rand3 one, and another rand2 in terms of rand3 using the same method (but not mutually recursive :)

```(define (rand2) (random 2))

(define (rand3)
(let ((hi (rand2))
(lo (rand2)))
(if (= hi lo 1)
(rand3)
(+ hi hi lo))))

(define (x-rand2)
(let ((it (rand3)))
(if (= it 2)
(x-rand2)
it)))

(define (mean rand)
(let ((n 10000))
(do ((k 0 (+ k 1))
(s 0 (+ s (rand))))
((= k n)
(/ s n)))))

(write (exact->inexact (mean rand2))) (newline)
(write (exact->inexact (mean rand2))) (newline)
(write (exact->inexact (mean rand3))) (newline)
(write (exact->inexact (mean rand3))) (newline)
(write (exact->inexact (mean x-rand2))) (newline)
(write (exact->inexact (mean x-rand2))) (newline)

;;; 0.5057
;;; 0.4981
;;; 1.0119
;;; 1.0133
;;; 0.4969
;;; 0.5052
```
2. Daniel said

Here’s a C++ solution to the first problem.

#include <algorithm>
#include <iostream>
#include <random>

using std::cout;
using std::endl;

int rand2() {
std::random_device rd;
std::default_random_engine generator(rd());
std::uniform_int_distribution<int> distribution(0,1);
return distribution(generator);
}

int rand3() {
while (true) {
int r = (2 * rand2()) + rand2();
if (r < 3) {
return r;
}
}
}

int main(int argc, char* argv[]) {
int arr = {0};
int n = 100000;
for (int i = 0; i < n; i++) {
arr[rand3()]++;
}
cout << "Distribution:" << endl;
for (int i = 0; i < 3; i++) {
cout << arr[i] / static_cast<float>(n);
if (i < 2) {
cout << ", ";
}
}
}

And here’s the output, which shows the distribution across 0, 1, and 2.

```Distribution:
0.33362, 0.334, 0.33238
```
3. Daniel said

Here’s a C++ solution to the second problem.

#include <algorithm>
#include <iostream>
#include <random>

using std::cout;
using std::endl;

int rand2() {
std::random_device rd;
std::default_random_engine generator(rd());
std::uniform_int_distribution<int> distribution(0,1);
return distribution(generator);
}

int rand3() {
while (true) {
int r = (2 * rand2()) + rand2();
if (r < 3) {
return r;
}
}
}

int main(int argc, char* argv[]) {
int arr = {0};
int n = 100000;
for (int i = 0; i < n; i++) {
arr[rand3()]++;
}
cout << "Distribution:" << endl;
for (int i = 0; i < 3; i++) {
cout << arr[i] / static_cast<float>(n);
if (i < 2) {
cout << ", ";
}
}
}

And here’s the output:

```Words:
lorem
dolor
```
4. Daniel said

I mistakenly posted the first solution twice. Here’s the second solution. I mistakenly entered lang=”c++” in my earlier posts, but it should be lang=”cpp” to get the formatting. I’ll do that in this post.

```
#include <cstddef>
#include <iostream>
#include <limits>
#include <set>
#include <string>
#include <utility>
#include <vector>

using std::cout;
using std::endl;
using std::numeric_limits;
using std::set;
using std::size_t;
using std::string;
using std::vector;

bool ContainsAllChars(const string& word, set<char> charset) {
for (const char& c : word) {
if (charset.empty()) {
break;
}
charset.erase(c);
}
return charset.empty();
}

// Returns a list of indices corresponding to shortest words in 'dictionary'
// that have all characters in charset.
vector<int> ShortestWordsWithChars(
const vector<string>& dictionary,
const set<char>& charset) {
size_t min_length = std::numeric_limits<size_t>::max();
vector<int> indices;
for (int i = 0; i < dictionary.size(); i++) {
const string& word = dictionary[i];
size_t word_len = word.length();
if (word_len > min_length || !ContainsAllChars(word, charset)) {
continue;
}
if (word_len < min_length) {
min_length = word_len;
indices.clear();
}
indices.push_back(i);
}
return indices;
}

int main(int argc, char* argv[]) {
vector<string> dictionary;
set<char> charset;

charset.insert('o');
charset.insert('l');

dictionary.push_back("lorem");
dictionary.push_back("ipsum");
dictionary.push_back("dolor");
dictionary.push_back("sit");
dictionary.push_back("amet");
dictionary.push_back("consectetur");
dictionary.push_back("elit");

vector<int> indices = ShortestWordsWithChars(dictionary, charset);

if (!indices.empty()) {
cout << "Words:" << endl;
} else {
cout << "No Matches" << endl;
}
for (int i : indices) {
cout << dictionary[i] << endl;
}
}

```

And here’s the output:

```Words:
lorem
dolor
```
5. Paul said

In Python.

```def min_set(chars, dict):
charset = set(chars)
length = float("inf")
for word in dict:
n = len(word)
if n <= length and set(word) > charset:
if n < length:
length = n
words = []
words.append(word)
return words
```
6. John Cowan said

A more interesting version of the first question is to ask for a rand3 that returns in a bounded amount of time, assuming that rand2 returns in a bounded amount of time. I don’t know an answer to this one.

A technically correct solution is simply to arrange for repeated calls on rand3 to return 0, 1, 2, 0, 1, 2 …., ignoring rand2 completely. This is equiprobable without being random in the usual sense.

7. Jussi Piitulainen said

I find the probabilistic bound interesting: successive rejections can be made vanishingly rare.

Generate n bits to get one out of many possible integers in binary. At a small probability, reject one or two bit patterns so the number of remaining bit patterns is divisible by three. Map the first third of them to 0, the second third to 1, the third third to 2, giving uniform probability.

With n = 4, reject 1 pattern at the small probability 1/4. With n = 62, reject 1 pattern at the even smaller probability 1/4611686018427387904.

The respective probabilities of two rejections in a row would be 1/16 and 1/21267647932558653966460912964485513216 where the smaller bound comes at the cost of always generating extra bits that are usually not needed for uniformity. Programming language random generators do tend to generate a number of bits at a time, which a simple rand2 then throws away. I suppose in this exercise we are pretending to have a less wasteful rand2.

Yet another way to think of it might be that even without any rejection, a large n would produce a distribution that is very close to uniform. One third of the large range would have 1 bit pattern fewer or more than the others. That may not be really in the spirit of this exercise.

8. Daniel said

@JohnCowan, here’s a proof that rand3 (and all randN where N is not a power of 2) cannot return in a bounded amount of time.
http://cs.stackexchange.com/a/9143

Regarding your proposed approach, I don’t consider that as “technically correct”. I believe the ‘relative frequency’ interpretation of probability assumes independent trials. In the rand3 you proposed, trials are not independent. Then again, you did say “without being random in the usual sense”…

9. Daniel said

> “Regarding your proposed approach, I don’t consider that as ‘technically correct’.”

Perhaps I’m just being pedantic (unintentionally). I suppose the same argument I made could be made about an ordinary seeded pseudo-random number generator, since the trials are not independent. However, from an observer’s perspective, the trials *seem* independent, whereas they wouldn’t if the function returned 0, 1, 2, 0, 1, 2, …

10. catalinc said

Solutions in F#:

```
let rand2() =
let r = System.Random()
r.Next(0, 2)

let rec rand3() =
let f = rand2()
let s = rand2()
match f, s with
| 0, 0 -> 0
| 0, 1 -> 1
| 1, 0 -> 2
| _, _ -> rand3()

printfn "rand3: %d" (rand3()) // -> 0, 1, 2

let contains letters word =
let wordLetters = Set.ofSeq word
Set.isSubset letters wordLetters

let words = lines |> Seq.filter (contains letters) |> Seq.toList
let minLen = words |> List.minBy String.length |> String.length
let shortest = words |> List.filter (fun w -> String.length w = minLen)
shortest

printfn "shortestWords: %A" (shortestWords "abcde" "words.lst") // -> ["abduce"; "bached"; "backed"; "braced"; "cabbed"; "cabled"]

```
11. Globules said

A Haskell version of the random number problem.

```import Control.Monad.State
import System.Random

-- We're given a function that returns 0 or 1 with equal probability.
rand2 :: RandomGen g => g -> (Int, g)
rand2 = randomR (0, 1)

-- Using rand2 and the technique of rejection sampling we implement a function
-- that returns 0, 1 or 2 with equal probability.  The State monad is used to
-- maintain the random number generator state.
rand3 :: RandomGen g => State g Int
rand3 = do
x <- state rand2
y <- state rand2
let z = x + y
if z < 3 then return z else rand3

main :: IO ()
main = do
rand <- getStdGen
print \$ evalState (replicateM 20 rand3) rand
```
```\$ ./rand3
[1,2,1,0,1,2,1,1,0,1,0,0,1,1,2,1,0,2,1,1]
```
12. Globules said

A Haskell version of the dictionary question.

```import Control.Monad (forM_)
import Data.Char (isLetter, toLower)
import Data.Function (on)
import Data.List (intercalate, nub, sort)
import Data.Map.Lazy (Map)
import qualified Data.Map as M
import System.Environment (getArgs)

-- In the code below we'll call the sorted set of letters in a word
-- its "signature" (or "sig", for short).
--
-- We could improve the performance by using Text instead of String,
-- but the code is a bit easier to read with String.

-- The signature of a word.
sig :: String -> String
sig = nub . sort . map toLower . filter isLetter

-- A mapping from signatures to the words having those signatures.
sigMapFrom :: [String] -> Map String [String]
sigMapFrom = M.fromListWith combine . map (\s -> (sig s, [s]))
where combine :: [String] -> [String] -> [String]
combine ns@[new] os@(old:_) = case (compare `on` length) new old of
LT -> ns
EQ -> new : os
GT -> os
combine _ _ = error "invalid call to combine"

-- The list of shortest words, equal to the given word or found in the
-- dictionary, whose signature is the same as that of the given word.
shortestMatching :: Map String [String] -> String -> [String]
shortestMatching sigMap word = M.findWithDefault [word] (sig word) sigMap

-- Convenience function.
join :: [String] -> String
join = intercalate ", "

main :: IO ()
main = do
args <- getArgs
sigMap <- fmap (sigMapFrom . words) getContents
forM_ args \$ \word ->
putStrLn \$ word ++ " -> " ++ join (shortestMatching sigMap word)
```
```\$ ./shortMatch several tactful mississippi settlers  versal, slaver, serval, salver
tactful -> tactful, factual, factful
mississippi -> simp
settlers -> streel, Selter, Lester
```
13. Globules said

Hmm, the output got a little mangled by the shell redirection. Let’s try it another way.

```\$ cat /usr/share/dict/words | ./shortMatch several tactful mississippi settlers
several -> versal, slaver, serval, salver
tactful -> tactful, factual, factful
mississippi -> simp
settlers -> streel, Selter, Lester
```
14. pollyhancock said

Ah, but adding two random numbers does not give you a random number back. Take the case where x is 0 or 1 with equal probability, and also y is 0 or 1 with equal probability. Then there are four cases for the sum: 0+0=0, 0+1=1, 1+0=1 and 1+1=2. Therefore the result is twice as likely to be 1 as it is to be 0 or to be 2. So we see the rand3 example above returns eleven ones in a sample of twenty numbers.

15. Globules said

@pollyhancock Thanks for pointing that out. I knew I would have to set the bits of the result (pseudo-) independently, but when writing the code I forgot to multiply one of the random numbers by 2. That’s what I get for simply eyeballing the results of a small test run.

In this version I’ve replaced `let z = x + y` with `let z = x + 2*y` and changed the way it prints the results, making it easier to count the values produced.

```import Control.Monad.State
import System.Environment
import System.Random

-- We're given a function that returns 0 or 1 with equal probability.
rand2 :: RandomGen g => g -> (Int, g)
rand2 = randomR (0, 1)

-- Using rand2 and the technique of rejection sampling we implement a function
-- that returns 0, 1 or 2 with equal probability.  The State monad is used to
-- maintain the random number generator state.
rand3 :: RandomGen g => State g Int
rand3 = do
x <- state rand2
y <- state rand2
let z = x + 2*y
if z < 3 then return z else rand3

main :: IO ()
main = do
[n] <- fmap (map read) getArgs
rand <- getStdGen
mapM_ print \$ evalState (replicateM n rand3) rand
```
```\$ ./rand3 10000 | sort | uniq -c
3301 0
3347 1
3352 2
```
16. fl84sg said

one easy and probably proper methode is to call rand2() for a couple times (like ten times) and since it returns only 0 or 1, it is possible to combine all the returned 0 and 1s to build up a number in base 2 then converted to base 10, the returned value for rand3() would be the remaider of the built number by 3, which is indeed either 0 or 1 or 2.