Google Interview Question

April 8, 2016

Today’s exercise is an interview question from Google:

Given a list of words, find the maximum value of the product of the lengths of two words from the list, subject to the constraint that the two words cannot share any letters. For instance, given the words ABCW, BAZ, FOO, BAR, XTFN, and ABCDEF, the pair FOO BAR has a product of 9, the pair BAZ XFTN has a product of 12, and the pair ABCW XTFN has a product of 16, which is the maximum. Note that the pair ABCW ABCDEF doesn’t work because the two words share three letters.

Your task is to write a program to solve Google’s interview question. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

17 Comments »

17 Responses to “Google Interview Question”

James Curtis-Smith said

April 8, 2016 at 9:26 AM

Again a nice one for perl as it’s string handling!

use List::Util qw(first);
my @words = reverse sort { length $a <=> length $b } qw(ABCW BAZ FOO BAR XTFN ABCDEF);
my $max = 0;
while(my $x = shift @words) {
  my $q = (length first { ! /[$x]/i } @words) * length $x;
  $max  = $q if $max < $q;
}
print "$max\n";

John said

April 8, 2016 at 7:35 PM

Using Factor:

USING: assocs kernel math math.combinatorics sequences sets ;

qw{ ABCW BAZ FOO BAR XTFN ABCDEF }
2 [ first2 intersects? not ] filter-combinations
[ [ first2 [ length ] bi@ * ] keep ] { } map>assoc
supremum

Mike said

April 9, 2016 at 12:08 AM

The first version finds the “maximum” pair of words. The key depends on whether the intersection of their respective sets of letters is empty. If it is empty, the key is the product of the lengths of the two words. Otherwise, the key is zero.

from itertools import combinations

key is len(word1) * len(word2) if intersection of letter in each word is empty, else key is 0

max(permutations(words, 2), key=lambda t:len(t[0])*len(t[1]) if not (set(t[0]) & set(t[1])) else 0)

This second version builds a dictionary mapping letters to sets of words containing that letter. Then set operations can be used to calculate pairs of words that have no letters in common.

from collections import defaultdict

def giq(words):
    # build a dictionary mapping letters to sets of words that have that letter
    # e.g., letter_set['a'] -> a set of words containing the letter 'a'
    letter_set = defaultdict(set)

    for word in words:
        for ch in word:
            letter_set[ch].add(word)

    # for collecting words pairs and their word length product
    pairs = defaultdict(list)

    seen = set()

    for w1 in words:

        # w2 comes from set of words we've seen, after removing words having
        # a letter in common with w1
        #for w2 in seen.difference(*(letter_set[c] for c in w1)):
        for w2 in seen - set.union(*(letter_set[c] for c in set(w1))):
            pairs[len(w1)*len(w2)].append((w1,w2))

        seen.add(w1)

    return max(pairs.items())

words = 'ABCW BAZ FOO BAR XTFN ABCDEF'.split()
print(giq(words))

Mike said

April 9, 2016 at 12:13 AM

The second line was a comment, it lost a ‘#’ character somewhere.

from itertools import combinations
 
#key is len(word1) * len(word2) if intersection of letter in each word is empty, else key is 0
 
max(permutations(words, 2), key=lambda t:len(t[0])*len(t[1]) if not (set(t[0]) & set(t[1])) else 0)

Paul said

April 9, 2016 at 8:16 AM

This works pretty fast for larger number of words.

def g(words):
    """ sort (reversed)  the words on length and save their sets of characters
        try all combinations as long as the length product is large enough
    """
    word_info = sorted(((len(w), set(w)) for w in set(words)), reverse=True)
    maxlength = 0
    for i, (length, chars) in enumerate(word_info):
        if length * length <= maxlength:
            return maxlength
        for length2, chars2 in word_info[i+1:]:
            if length * length2 <= maxlength:
                break
            if not (chars & chars2):
                maxlength = length * length2
                break

matthew said

April 9, 2016 at 3:03 PM

Another Python solution, similar to Paul’s (I borrowed some your code too, thanks Paul), but using a heap to ensure we examine possible solutions in order. Using negative lengths is because Python heapq uses a fixed less than ordering & also to ensure the solutions come out in lexicographic order. This version prints all valid word pairs:

import heapq
wordlist = [ "XTFN","ABCW","ABCDEF","BAZ","FOO","BAR","ZO","BO" ];
info = sorted((-len(w), w, set(w)) for w in wordlist)

def wordlen(a): return -info[a][0]
def word(a): return info[a][1]
def letters(a): return info[a][2]
def makeentry(a,b): return (-wordlen(a)*wordlen(b),a,b)
def valid(a,b): return not(letters(a) & letters(b))
def gen():
    q = [makeentry(0,1)]
    while len(q) > 0:
        (length,a,b) = heapq.heappop(q)
        if valid(a,b): yield(-length,word(a),word(b))
        if b+1 < len(info):
            heapq.heappush(q,makeentry(a,b+1))
            if b == a+1: heapq.heappush(q,makeentry(a+1,b+1))

print(list(gen()))

Paul said

April 10, 2016 at 12:04 PM

Another method. Make a heap of the (-len(word), set(word)) tuples. Then build a sorted list and simultanuously see if any combination of a new item with the already sorted items has no overlap. If so, the solution is found. Otherwise sort further. This method is fast as it is not necessary to sort all items and the items are tested for large products first.

def g3(words):
    word_info = [(-len(w), set(w)) for w in set(words)]
    heapq.heapify(word_info)
    sorted_info = [heapq.heappop(word_info)]
    while word_info:
        length2, chars2 = heapq.heappop(word_info)
        for length, chars in sorted_info:
            if not (chars & chars2):
                return length * length2
        sorted_info.append((length2, chars2))
    return 0

Paul said

April 10, 2016 at 1:25 PM

As usual, simple solutions are also often the fastest. This one is about as fast as g3 and is much simpler.

def g4(words):
    sorted_info = sorted((-len(w), set(w)) for w in set(words))
    for i, (length2, chars2) in enumerate(sorted_info[1:], 1):
        for length, chars in sorted_info[:i]:
            if not (chars & chars2):
                return length * length2
    return 0

Mark said
April 10, 2016 at 6:21 PM
Here’s a solution in Python. It isn’t as concise as the posted solution, but the critical difference is that it runs in O(n) time and constant additional space when given n words of input.

The trick is to realize that we don’t need to keep track of each previously seen word, only the maximum word length seen for each combination of letters, and there’s a large-ish but fixed size of possible letter combinations at 2^26-1. This means we can put a constant upper bound on the amount of work done to process each additional word.

In the solution below, each word gets reduced to an “lset”, a string identifying the set of letters it contains. For example, “FOOBAR” has the lset “ABFOR”; “FOO” and “FFOO” both have the lset “FO”. We store the maximum word length seen for each lset in a dictionary that we iterate over for each new word. The maximum number of lsets in the dictionary is 2^26-1.
```
def max_product(words):
    result = 0
    lset_len = {}

    for word in words:
        n = len(word)

        for lset, m in lset_len.items():
            if nonintersecting(word, lset) and m * n > result:
                result = m * n

        lset = word_lset(word)
        if not lset in lset_len or n > lset_len[lset]:
            lset_len[lset] = n

    return result

def nonintersecting(word, lset):
    return len(set(word.upper()) & set(lset.upper())) == 0

def word_lset(word):
    return ''.join(sorted(set(word.upper())))
```
Paul said
April 11, 2016 at 6:17 AM
@Mark. I do not see that your method is O(n). I see a loop over words and an inner loop over the dict lset_len. I tested the method on 355000 words. After half an hour your method did not finish. g3 and g4 finish in about 5 seconds.
matthew said
April 11, 2016 at 6:23 AM
@Paul – I think Mark’s point is that the size of the lset dict is bounded as there are only 2^26-1 possibly entries, therefore time iterating over all the entries is also bounded – so it’s O(n) but with a rather large constant factor on the upper bound – nice illustration of the limitations of big-O notation.
matthew said
April 11, 2016 at 6:24 AM
Of course, that’s assuming 8-bit characters, doing Unicode would be a different kettle of fish (stll theoretically O(n) though)
matthew said
April 11, 2016 at 6:25 AM
I meant “characters in A-Z”, not 8-bit characters.
Paul said
April 11, 2016 at 12:55 PM
@matthew and @Mark. I understand (now) that one could say that this is O(n). It is certainly so, that the lset_len dict saves space and time as words with the same characters are collapsed into one item. I printed some sizes of the dict and saw that after 15000 (dutch) words the number of entries in the dict were 0.88 * 15000. The time and space saving are not very significant. Therefore the loop over lset_len is still over the majority of the words seen sofar. If the number of words would be much larger than 2^26-1 (about 67,000,000; the number of words in the Bible is about 783,000 (about 12,000 unique words)) then the saving would be significant.

Paul said

April 11, 2016 at 8:38 PM

Methods g3 and g4 have a bug. Here is a (more) correct version of g3.

def g3a(words):
    word_info = [(-len(w), set(w)) for w in set(words)]
    heapq.heapify(word_info)
    sorted_info = [heapq.heappop(word_info)]
    L = sorted_info[0][0]
    maxlength = 0
    while word_info:
        length2, chars2 = heapq.heappop(word_info)
        if length2 * L <= maxlength:
            break
        for length, chars in sorted_info:
            if length * length2 <= maxlength:
                break
            if not (chars & chars2):
                maxlength = length * length2
        sorted_info.append((length2, chars2))
    return maxlength

Kalpesh Patel said
April 16, 2016 at 9:16 PM
Here is Scala solution:
def googInterview(ls: List[String]) = {

ls map(x => x.toUpperCase) sorted

val yy = for {
xx <- ls
yy xx && ((xx.toSet intersect yy.toSet).isEmpty)
}yield((xx length)*(yy length), xx,yy)

yy max
}

pmarfleet said

April 17, 2016 at 4:49 PM

Solution in C#, using LINQ and Extension Methods:

using System.Collections.Generic;
using System.Linq;

namespace ProgrammingPraxis.Core
{
    public static class GoogleInterviewQuestionExtensions
    {
        public static int CalculateProductOfTwoLongestWordsThatDoNotShareAnyLetters(this IEnumerable<string> input)
        {
            return (from first in input
                    from second in input
                    where !ReferenceEquals(first, second) && HaveUniqueLetters(first, second)
                    select first.Length * second.Length).Max();
        }

        private static bool HaveUniqueLetters(string first, string second)
        {
            return first.All(charFirst => !second.Any(charSecond => 
               char.ToLowerInvariant(charFirst) == char.ToLowerInvariant(charSecond)));
        }
    }
}

Programming Praxis