Count All Matches

March 10, 2015

Count All Matches

Today’s exercise is an interview question from Google, as reported at Career Cup:

Given two strings, find the number of times the first string occurs in the second, whether continuous or discontinuous. For instance, the string CAT appears in the string CATAPULT three times, as CATapult, CAtapulT, and CatApulT.

Your task is to write the indicated program. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below:

Advertisement

Pages: 1 2

8 Responses to “Count All Matches”

  1. Francesco said

    Haskell:

    import Control.Monad (filterM)
    
    f n h = length . filter (== n) . filterM (const [True, False]) $ h
    
  2. Paul said

    In Python.

    def matches(m, txt):
        if not m:
            return 1
        if not txt:
            return 0
        n = matches(m, txt[1:])
        if m[0] == txt[0]:
            n += matches(m[1:], txt[1:])
        return n
    
  3. Sergei said

    (defun count-all-matches (substr text)
    (labels ((cam (ss tt)
    (let ((count 0))
    (cond ((null ss) 1)
    ((null tt) 0)
    (t (loop while (setq tt (member (car ss) tt)) do
    (incf count (cam (subseq ss 1) (subseq tt 1)))
    (setq tt (subseq tt 1)))
    count)))))
    (cam (coerce substr ‘list) (coerce text ‘list))))))

  4. Scott said
           private static int numSubStrings(String sub, String full) {
    		char[] subArray = sub.toCharArray();
    		int[] found = new int[subArray.length];
    		
    		for(char c : full.toCharArray()){
    			if(c == subArray[0]){
    				found[0]++;
    			}
    			else{
    				for(int i = 1; i < subArray.length; i++){
    					if(c == subArray[i] && found[i-1] > 0){
    						found[i] += found[i-1];
    						break;
    					}
    				}
    			}
    		}
    		return found[found.length-1];	
    	}
    
  5. matthew said

    My initial C++ solution was like Scott’s, but there were some problems with eg. “count aaaa aaaaaaaa” (the perils of imperative programming). The following seems to works (and when applied to the above, the intermediate arrays of counts make a nice illustration of Pascal’s triangle). It’s sort of a compact memoized/dynamic programming version of the recursive solution (and a good deal more efficient than generating all possible substrings and comparing each one with the target):

    #include <string.h>
    #include <stdio.h>
    #include <vector>
    int main(int argc, char *argv[])
    {
       char *s = argv[1], *t = argv[2];
       int slen = strlen(s), tlen = strlen(t);
       std::vector<int> a(slen+1);
       a[0] = 1;
       for (int i = 0; i < tlen; i++) {
          for (int j = slen; j > 0; j--) {
             if (t[i] == s[j-1]) {
                a[j] += a[j-1];
             }
          }
          for (int i = 0; i < slen+1; i++) {
             printf("%d%s", a[i],(i<slen)?" ":"\n");
          }
       }
       printf("%d\n", a[slen]);
    }
    
    $ ./count aaaa aaaaaaaa
    1 1 0 0 0
    1 2 1 0 0
    1 3 3 1 0
    1 4 6 4 1
    1 5 10 10 5
    1 6 15 20 15
    1 7 21 35 35
    1 8 28 56 70
    70
    
  6. Scott said

    My previous solution fails for duplicate letters in the sub string e.g ‘tat’ –> ‘tatapult’. This corrects that mistake

    private static int numSubStrings(String sub, String full) {
    		char[] subArray = sub.toCharArray();
    		int[] found = new int[subArray.length];
    
    		for(char c : full.toCharArray()){
    			for(int i = 0; i < subArray.length; i++){
    				if(i == 0 && c == subArray[0]){
    					found[0]++;
    				}
    				else if(c == subArray[i] && found[i-1] > 0){
    					found[i] += found[i-1];
    				}
    			}
    		}
    		return found[found.length-1];	
    }
    
  7. Mike said

    Python.

    def count_all_matches(s, t):
        def cam(s, t):
            if s:
                return sum(cam(s[1:], t[i+1:]) for i in range(len(t)) if s[0] == t[i])
            return 1
        return cam(s, t) if s else 0
    

    If t is very long, you might want to preprocess it to remove characters that are not in s.

    I think this is equivalent to Francesco’s Haskell:

    def count(s, t):
    	u = tuple(s)
    	return sum(u == v for v in it.combinations(t, len(u)))
    
  8. Jan Van lent said

    Another Haskell solution.

    cam [] _ = 1
    cam _ [] = 0
    cam xxs@(x:xs) (y:ys) = if x == y then m + n else n
      where m = cam xs ys
            n = cam xxs ys
    

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: