Google Interview Question

April 8, 2016

Here’s our solution:

(define (g xs)
  (car
    (sort
      (lambda (xs ys) (> (car xs) (car ys)))
      (map
        (lambda (xs)
          (cons (* (string-length (car xs))
                   (string-length (cadr xs)))
                xs))
        (filter
          (lambda (xs)
            (null?
              (intersect
                (string->list (car xs))
                (string->list (cadr xs)))))
          (cross xs xs))))))

Starting from the inside, cross forms the cross product of the input words, filter retains only those pairs that do not share any letters (the intersection of their letters is null), map decorates each pair with the product of its string-lengths, sort arranges from largest product to smallest, and car takes the winner. The way to write programs like this is one step at a time: take the cross product, and check that it does what you expect; then filter, and check again; then map, and check again; then sort, and make one last check, then take the winner. It took me only a few minutes to write, and I was confident in my solution at each step along the way. Here it is in action:

> (g '("ABCW" "BAZ" "FOO" "BAR" "XTFN" "ABCDEF"))
(16 "ABCW" "XTFN")

This is obviously a brute-force solution. In an interview situation, I’d probably give that first, just to make sure I had a working solution, then start improving. The cross product could be improved to give only one of each pair of permutations (FOO BAR and BAR FOO), thus reducing the work by half. The intersection could be computed by bit arrays. Instead of sorting, you could simply find the maximum. But this solves the problem, simply and obviously, makes good use of the standard library, and would probably have acceptable performance up to a fairly large input.

You can run the program at http://ideone.com/1WgN56, where you will also see the definitions of cross and intersect.

Advertisement

Pages: 1 2

17 Responses to “Google Interview Question”

  1. Again a nice one for perl as it’s string handling!

    use List::Util qw(first);
    my @words = reverse sort { length $a <=> length $b } qw(ABCW BAZ FOO BAR XTFN ABCDEF);
    my $max = 0;
    while(my $x = shift @words) {
      my $q = (length first { ! /[$x]/i } @words) * length $x;
      $max  = $q if $max < $q;
    }
    print "$max\n";
    
  2. John said

    Using Factor:

    USING: assocs kernel math math.combinatorics sequences sets ;
    
    qw{ ABCW BAZ FOO BAR XTFN ABCDEF }
    2 [ first2 intersects? not ] filter-combinations
    [ [ first2 [ length ] bi@ * ] keep ] { } map>assoc
    supremum
    
  3. Mike said

    The first version finds the “maximum” pair of words. The key depends on whether the intersection of their respective sets of letters is empty. If it is empty, the key is the product of the lengths of the two words. Otherwise, the key is zero.

    from itertools import combinations
    
    key is len(word1) * len(word2) if intersection of letter in each word is empty, else key is 0
    
    max(permutations(words, 2), key=lambda t:len(t[0])*len(t[1]) if not (set(t[0]) & set(t[1])) else 0)
    

    This second version builds a dictionary mapping letters to sets of words containing that letter. Then set operations can be used to calculate pairs of words that have no letters in common.

    from collections import defaultdict
    
    def giq(words):
        # build a dictionary mapping letters to sets of words that have that letter
        # e.g., letter_set['a'] -> a set of words containing the letter 'a'
        letter_set = defaultdict(set)
    
        for word in words:
            for ch in word:
                letter_set[ch].add(word)
    
        # for collecting words pairs and their word length product
        pairs = defaultdict(list)
    
        seen = set()
    
        for w1 in words:
    
            # w2 comes from set of words we've seen, after removing words having
            # a letter in common with w1
            #for w2 in seen.difference(*(letter_set[c] for c in w1)):
            for w2 in seen - set.union(*(letter_set[c] for c in set(w1))):
                pairs[len(w1)*len(w2)].append((w1,w2))
    
            seen.add(w1)
    
        return max(pairs.items())
    
    words = 'ABCW BAZ FOO BAR XTFN ABCDEF'.split()
    print(giq(words))
    
  4. Mike said

    The second line was a comment, it lost a ‘#’ character somewhere.

    from itertools import combinations
     
    #key is len(word1) * len(word2) if intersection of letter in each word is empty, else key is 0
     
    max(permutations(words, 2), key=lambda t:len(t[0])*len(t[1]) if not (set(t[0]) & set(t[1])) else 0)
    
    
  5. Paul said

    This works pretty fast for larger number of words.

    def g(words):
        """ sort (reversed)  the words on length and save their sets of characters
            try all combinations as long as the length product is large enough
        """
        word_info = sorted(((len(w), set(w)) for w in set(words)), reverse=True)
        maxlength = 0
        for i, (length, chars) in enumerate(word_info):
            if length * length <= maxlength:
                return maxlength
            for length2, chars2 in word_info[i+1:]:
                if length * length2 <= maxlength:
                    break
                if not (chars & chars2):
                    maxlength = length * length2
                    break
    
  6. matthew said

    Another Python solution, similar to Paul’s (I borrowed some your code too, thanks Paul), but using a heap to ensure we examine possible solutions in order. Using negative lengths is because Python heapq uses a fixed less than ordering & also to ensure the solutions come out in lexicographic order. This version prints all valid word pairs:

    import heapq
    wordlist = [ "XTFN","ABCW","ABCDEF","BAZ","FOO","BAR","ZO","BO" ];
    info = sorted((-len(w), w, set(w)) for w in wordlist)
    
    def wordlen(a): return -info[a][0]
    def word(a): return info[a][1]
    def letters(a): return info[a][2]
    def makeentry(a,b): return (-wordlen(a)*wordlen(b),a,b)
    def valid(a,b): return not(letters(a) & letters(b))
    def gen():
        q = [makeentry(0,1)]
        while len(q) > 0:
            (length,a,b) = heapq.heappop(q)
            if valid(a,b): yield(-length,word(a),word(b))
            if b+1 < len(info):
                heapq.heappush(q,makeentry(a,b+1))
                if b == a+1: heapq.heappush(q,makeentry(a+1,b+1))
    
    print(list(gen()))
    
  7. Paul said

    Another method. Make a heap of the (-len(word), set(word)) tuples. Then build a sorted list and simultanuously see if any combination of a new item with the already sorted items has no overlap. If so, the solution is found. Otherwise sort further. This method is fast as it is not necessary to sort all items and the items are tested for large products first.

    def g3(words):
        word_info = [(-len(w), set(w)) for w in set(words)]
        heapq.heapify(word_info)
        sorted_info = [heapq.heappop(word_info)]
        while word_info:
            length2, chars2 = heapq.heappop(word_info)
            for length, chars in sorted_info:
                if not (chars & chars2):
                    return length * length2
            sorted_info.append((length2, chars2))
        return 0
    
  8. Paul said

    As usual, simple solutions are also often the fastest. This one is about as fast as g3 and is much simpler.

    def g4(words):
        sorted_info = sorted((-len(w), set(w)) for w in set(words))
        for i, (length2, chars2) in enumerate(sorted_info[1:], 1):
            for length, chars in sorted_info[:i]:
                if not (chars & chars2):
                    return length * length2
        return 0
    
  9. Mark said

    Here’s a solution in Python. It isn’t as concise as the posted solution, but the critical difference is that it runs in O(n) time and constant additional space when given n words of input.

    The trick is to realize that we don’t need to keep track of each previously seen word, only the maximum word length seen for each combination of letters, and there’s a large-ish but fixed size of possible letter combinations at 2^26-1. This means we can put a constant upper bound on the amount of work done to process each additional word.

    In the solution below, each word gets reduced to an “lset”, a string identifying the set of letters it contains. For example, “FOOBAR” has the lset “ABFOR”; “FOO” and “FFOO” both have the lset “FO”. We store the maximum word length seen for each lset in a dictionary that we iterate over for each new word. The maximum number of lsets in the dictionary is 2^26-1.

    def max_product(words):
        result = 0
        lset_len = {}
    
        for word in words:
            n = len(word)
    
            for lset, m in lset_len.items():
                if nonintersecting(word, lset) and m * n > result:
                    result = m * n
    
            lset = word_lset(word)
            if not lset in lset_len or n > lset_len[lset]:
                lset_len[lset] = n
    
        return result
    
    def nonintersecting(word, lset):
        return len(set(word.upper()) & set(lset.upper())) == 0
    
    def word_lset(word):
        return ''.join(sorted(set(word.upper())))
    
  10. Paul said

    @Mark. I do not see that your method is O(n). I see a loop over words and an inner loop over the dict lset_len. I tested the method on 355000 words. After half an hour your method did not finish. g3 and g4 finish in about 5 seconds.

  11. matthew said

    @Paul – I think Mark’s point is that the size of the lset dict is bounded as there are only 2^26-1 possibly entries, therefore time iterating over all the entries is also bounded – so it’s O(n) but with a rather large constant factor on the upper bound – nice illustration of the limitations of big-O notation.

  12. matthew said

    Of course, that’s assuming 8-bit characters, doing Unicode would be a different kettle of fish (stll theoretically O(n) though)

  13. matthew said

    I meant “characters in A-Z”, not 8-bit characters.

  14. Paul said

    @matthew and @Mark. I understand (now) that one could say that this is O(n). It is certainly so, that the lset_len dict saves space and time as words with the same characters are collapsed into one item. I printed some sizes of the dict and saw that after 15000 (dutch) words the number of entries in the dict were 0.88 * 15000. The time and space saving are not very significant. Therefore the loop over lset_len is still over the majority of the words seen sofar. If the number of words would be much larger than 2^26-1 (about 67,000,000; the number of words in the Bible is about 783,000 (about 12,000 unique words)) then the saving would be significant.

  15. Paul said

    Methods g3 and g4 have a bug. Here is a (more) correct version of g3.

    def g3a(words):
        word_info = [(-len(w), set(w)) for w in set(words)]
        heapq.heapify(word_info)
        sorted_info = [heapq.heappop(word_info)]
        L = sorted_info[0][0]
        maxlength = 0
        while word_info:
            length2, chars2 = heapq.heappop(word_info)
            if length2 * L <= maxlength:
                break
            for length, chars in sorted_info:
                if length * length2 <= maxlength:
                    break
                if not (chars & chars2):
                    maxlength = length * length2
            sorted_info.append((length2, chars2))
        return maxlength
    
  16. Kalpesh Patel said

    Here is Scala solution:
    def googInterview(ls: List[String]) = {

    ls map(x => x.toUpperCase) sorted

    val yy = for {
    xx <- ls
    yy xx && ((xx.toSet intersect yy.toSet).isEmpty)
    }yield((xx length)*(yy length), xx,yy)

    yy max
    }

  17. pmarfleet said

    Solution in C#, using LINQ and Extension Methods:

    using System.Collections.Generic;
    using System.Linq;
    
    namespace ProgrammingPraxis.Core
    {
        public static class GoogleInterviewQuestionExtensions
        {
            public static int CalculateProductOfTwoLongestWordsThatDoNotShareAnyLetters(this IEnumerable<string> input)
            {
                return (from first in input
                        from second in input
                        where !ReferenceEquals(first, second) && HaveUniqueLetters(first, second)
                        select first.Length * second.Length).Max();
            }
    
            private static bool HaveUniqueLetters(string first, string second)
            {
                return first.All(charFirst => !second.Any(charSecond => 
                   char.ToLowerInvariant(charFirst) == char.ToLowerInvariant(charSecond)));
            }
        }
    }
    

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: