Google Interview Question

April 8, 2016

Here’s our solution:

(define (g xs)
  (car
    (sort
      (lambda (xs ys) (> (car xs) (car ys)))
      (map
        (lambda (xs)
          (cons (* (string-length (car xs))
                   (string-length (cadr xs)))
                xs))
        (filter
          (lambda (xs)
            (null?
              (intersect
                (string->list (car xs))
                (string->list (cadr xs)))))
          (cross xs xs))))))

Starting from the inside, cross forms the cross product of the input words, filter retains only those pairs that do not share any letters (the intersection of their letters is null), map decorates each pair with the product of its string-lengths, sort arranges from largest product to smallest, and car takes the winner. The way to write programs like this is one step at a time: take the cross product, and check that it does what you expect; then filter, and check again; then map, and check again; then sort, and make one last check, then take the winner. It took me only a few minutes to write, and I was confident in my solution at each step along the way. Here it is in action:

> (g '("ABCW" "BAZ" "FOO" "BAR" "XTFN" "ABCDEF"))
(16 "ABCW" "XTFN")

This is obviously a brute-force solution. In an interview situation, I’d probably give that first, just to make sure I had a working solution, then start improving. The cross product could be improved to give only one of each pair of permutations (FOO BAR and BAR FOO), thus reducing the work by half. The intersection could be computed by bit arrays. Instead of sorting, you could simply find the maximum. But this solves the problem, simply and obviously, makes good use of the standard library, and would probably have acceptable performance up to a fairly large input.

You can run the program at http://ideone.com/1WgN56, where you will also see the definitions of cross and intersect.

Posted by programmingpraxis

Filed in Exercises

17 Comments »

17 Responses to “Google Interview Question”

James Curtis-Smith said

April 8, 2016 at 9:26 AM

Again a nice one for perl as it’s string handling!

use List::Util qw(first);
my @words = reverse sort { length $a <=> length $b } qw(ABCW BAZ FOO BAR XTFN ABCDEF);
my $max = 0;
while(my $x = shift @words) {
  my $q = (length first { ! /[$x]/i } @words) * length $x;
  $max  = $q if $max < $q;
}
print "$max\n";

John said

April 8, 2016 at 7:35 PM

Using Factor:

USING: assocs kernel math math.combinatorics sequences sets ;

qw{ ABCW BAZ FOO BAR XTFN ABCDEF }
2 [ first2 intersects? not ] filter-combinations
[ [ first2 [ length ] bi@ * ] keep ] { } map>assoc
supremum

Mike said

April 9, 2016 at 12:08 AM

The first version finds the “maximum” pair of words. The key depends on whether the intersection of their respective sets of letters is empty. If it is empty, the key is the product of the lengths of the two words. Otherwise, the key is zero.

from itertools import combinations

key is len(word1) * len(word2) if intersection of letter in each word is empty, else key is 0

max(permutations(words, 2), key=lambda t:len(t[0])*len(t[1]) if not (set(t[0]) & set(t[1])) else 0)

This second version builds a dictionary mapping letters to sets of words containing that letter. Then set operations can be used to calculate pairs of words that have no letters in common.

from collections import defaultdict

def giq(words):
    # build a dictionary mapping letters to sets of words that have that letter
    # e.g., letter_set['a'] -> a set of words containing the letter 'a'
    letter_set = defaultdict(set)

    for word in words:
        for ch in word:
            letter_set[ch].add(word)

    # for collecting words pairs and their word length product
    pairs = defaultdict(list)

    seen = set()

    for w1 in words:

        # w2 comes from set of words we've seen, after removing words having
        # a letter in common with w1
        #for w2 in seen.difference(*(letter_set[c] for c in w1)):
        for w2 in seen - set.union(*(letter_set[c] for c in set(w1))):
            pairs[len(w1)*len(w2)].append((w1,w2))

        seen.add(w1)

    return max(pairs.items())

words = 'ABCW BAZ FOO BAR XTFN ABCDEF'.split()
print(giq(words))

Mike said

April 9, 2016 at 12:13 AM

The second line was a comment, it lost a ‘#’ character somewhere.

from itertools import combinations
 
#key is len(word1) * len(word2) if intersection of letter in each word is empty, else key is 0
 
max(permutations(words, 2), key=lambda t:len(t[0])*len(t[1]) if not (set(t[0]) & set(t[1])) else 0)

Paul said

April 9, 2016 at 8:16 AM

This works pretty fast for larger number of words.

def g(words):
    """ sort (reversed)  the words on length and save their sets of characters
        try all combinations as long as the length product is large enough
    """
    word_info = sorted(((len(w), set(w)) for w in set(words)), reverse=True)
    maxlength = 0
    for i, (length, chars) in enumerate(word_info):
        if length * length <= maxlength:
            return maxlength
        for length2, chars2 in word_info[i+1:]:
            if length * length2 <= maxlength:
                break
            if not (chars & chars2):
                maxlength = length * length2
                break

matthew said

April 9, 2016 at 3:03 PM

Another Python solution, similar to Paul’s (I borrowed some your code too, thanks Paul), but using a heap to ensure we examine possible solutions in order. Using negative lengths is because Python heapq uses a fixed less than ordering & also to ensure the solutions come out in lexicographic order. This version prints all valid word pairs:

import heapq
wordlist = [ "XTFN","ABCW","ABCDEF","BAZ","FOO","BAR","ZO","BO" ];
info = sorted((-len(w), w, set(w)) for w in wordlist)

def wordlen(a): return -info[a][0]
def word(a): return info[a][1]
def letters(a): return info[a][2]
def makeentry(a,b): return (-wordlen(a)*wordlen(b),a,b)
def valid(a,b): return not(letters(a) & letters(b))
def gen():
    q = [makeentry(0,1)]
    while len(q) > 0:
        (length,a,b) = heapq.heappop(q)
        if valid(a,b): yield(-length,word(a),word(b))
        if b+1 < len(info):
            heapq.heappush(q,makeentry(a,b+1))
            if b == a+1: heapq.heappush(q,makeentry(a+1,b+1))

print(list(gen()))

Paul said

April 10, 2016 at 12:04 PM

Another method. Make a heap of the (-len(word), set(word)) tuples. Then build a sorted list and simultanuously see if any combination of a new item with the already sorted items has no overlap. If so, the solution is found. Otherwise sort further. This method is fast as it is not necessary to sort all items and the items are tested for large products first.

def g3(words):
    word_info = [(-len(w), set(w)) for w in set(words)]
    heapq.heapify(word_info)
    sorted_info = [heapq.heappop(word_info)]
    while word_info:
        length2, chars2 = heapq.heappop(word_info)
        for length, chars in sorted_info:
            if not (chars & chars2):
                return length * length2
        sorted_info.append((length2, chars2))
    return 0

Paul said

April 10, 2016 at 1:25 PM

As usual, simple solutions are also often the fastest. This one is about as fast as g3 and is much simpler.

def g4(words):
    sorted_info = sorted((-len(w), set(w)) for w in set(words))
    for i, (length2, chars2) in enumerate(sorted_info[1:], 1):
        for length, chars in sorted_info[:i]:
            if not (chars & chars2):
                return length * length2
    return 0

Mark said
April 10, 2016 at 6:21 PM
Here’s a solution in Python. It isn’t as concise as the posted solution, but the critical difference is that it runs in O(n) time and constant additional space when given n words of input.

The trick is to realize that we don’t need to keep track of each previously seen word, only the maximum word length seen for each combination of letters, and there’s a large-ish but fixed size of possible letter combinations at 2^26-1. This means we can put a constant upper bound on the amount of work done to process each additional word.

In the solution below, each word gets reduced to an “lset”, a string identifying the set of letters it contains. For example, “FOOBAR” has the lset “ABFOR”; “FOO” and “FFOO” both have the lset “FO”. We store the maximum word length seen for each lset in a dictionary that we iterate over for each new word. The maximum number of lsets in the dictionary is 2^26-1.
```
def max_product(words):
    result = 0
    lset_len = {}

    for word in words:
        n = len(word)

        for lset, m in lset_len.items():
            if nonintersecting(word, lset) and m * n > result:
                result = m * n

        lset = word_lset(word)
        if not lset in lset_len or n > lset_len[lset]:
            lset_len[lset] = n

    return result

def nonintersecting(word, lset):
    return len(set(word.upper()) & set(lset.upper())) == 0

def word_lset(word):
    return ''.join(sorted(set(word.upper())))
```
Paul said
April 11, 2016 at 6:17 AM
@Mark. I do not see that your method is O(n). I see a loop over words and an inner loop over the dict lset_len. I tested the method on 355000 words. After half an hour your method did not finish. g3 and g4 finish in about 5 seconds.
matthew said
April 11, 2016 at 6:23 AM
@Paul – I think Mark’s point is that the size of the lset dict is bounded as there are only 2^26-1 possibly entries, therefore time iterating over all the entries is also bounded – so it’s O(n) but with a rather large constant factor on the upper bound – nice illustration of the limitations of big-O notation.
matthew said
April 11, 2016 at 6:24 AM
Of course, that’s assuming 8-bit characters, doing Unicode would be a different kettle of fish (stll theoretically O(n) though)
matthew said
April 11, 2016 at 6:25 AM
I meant “characters in A-Z”, not 8-bit characters.
Paul said
April 11, 2016 at 12:55 PM
@matthew and @Mark. I understand (now) that one could say that this is O(n). It is certainly so, that the lset_len dict saves space and time as words with the same characters are collapsed into one item. I printed some sizes of the dict and saw that after 15000 (dutch) words the number of entries in the dict were 0.88 * 15000. The time and space saving are not very significant. Therefore the loop over lset_len is still over the majority of the words seen sofar. If the number of words would be much larger than 2^26-1 (about 67,000,000; the number of words in the Bible is about 783,000 (about 12,000 unique words)) then the saving would be significant.

Paul said

April 11, 2016 at 8:38 PM

Methods g3 and g4 have a bug. Here is a (more) correct version of g3.

def g3a(words):
    word_info = [(-len(w), set(w)) for w in set(words)]
    heapq.heapify(word_info)
    sorted_info = [heapq.heappop(word_info)]
    L = sorted_info[0][0]
    maxlength = 0
    while word_info:
        length2, chars2 = heapq.heappop(word_info)
        if length2 * L <= maxlength:
            break
        for length, chars in sorted_info:
            if length * length2 <= maxlength:
                break
            if not (chars & chars2):
                maxlength = length * length2
        sorted_info.append((length2, chars2))
    return maxlength

Kalpesh Patel said
April 16, 2016 at 9:16 PM
Here is Scala solution:
def googInterview(ls: List[String]) = {

ls map(x => x.toUpperCase) sorted

val yy = for {
xx <- ls
yy xx && ((xx.toSet intersect yy.toSet).isEmpty)
}yield((xx length)*(yy length), xx,yy)

yy max
}

pmarfleet said

April 17, 2016 at 4:49 PM

Solution in C#, using LINQ and Extension Methods:

using System.Collections.Generic;
using System.Linq;

namespace ProgrammingPraxis.Core
{
    public static class GoogleInterviewQuestionExtensions
    {
        public static int CalculateProductOfTwoLongestWordsThatDoNotShareAnyLetters(this IEnumerable<string> input)
        {
            return (from first in input
                    from second in input
                    where !ReferenceEquals(first, second) && HaveUniqueLetters(first, second)
                    select first.Length * second.Length).Max();
        }

        private static bool HaveUniqueLetters(string first, string second)
        {
            return first.All(charFirst => !second.Any(charSecond => 
               char.ToLowerInvariant(charFirst) == char.ToLowerInvariant(charSecond)));
        }
    }
}

Programming Praxis