Count All Matches

March 10, 2015

This is simple if you think recursively:

(define (count-all-matches needle haystack) (let loop ((needle (string->list needle)) (haystack (string->list haystack))) (cond ((null? needle) 1) ((null? haystack) 0) ((char=? (car needle) (car haystack)) (+ (loop (cdr needle) (cdr haystack)) (loop needle (cdr haystack)))) (else (loop needle (cdr haystack))))))

The first two cond clauses check for termination: if the needle is empty, a match has been found, but if the haystack is empty when the needle is not, matching failed. In the third cond clause, where the first characters of needle and haystack match, count the number of times the remainder of the needle and the remainder of the haystack match, and the last two cond clauses also count the number of times the needle is found in the remainder of the haystack. Fortunately, the code is simpler than this long-winded explanation. Here are some examples:

> (count-all-matches "cat" "catapult") 3 > (count-all-matches "cap" "catapult") 2 > (count-all-matches "cut" "catapult") 1 > (count-all-matches "cup" "catapult") 0

You can run the program at http://ideone.com/SlYKiF.

Posted by programmingpraxis

Filed in Exercises

8 Comments »

8 Responses to “Count All Matches”

Francesco said

March 10, 2015 at 12:07 PM

Haskell:

import Control.Monad (filterM)

f n h = length . filter (== n) . filterM (const [True, False]) $ h

Paul said

March 10, 2015 at 12:35 PM

In Python.

def matches(m, txt):
    if not m:
        return 1
    if not txt:
        return 0
    n = matches(m, txt[1:])
    if m[0] == txt[0]:
        n += matches(m[1:], txt[1:])
    return n

Sergei said
March 10, 2015 at 2:03 PM
(defun count-all-matches (substr text)
(labels ((cam (ss tt)
(let ((count 0))
(cond ((null ss) 1)
((null tt) 0)
(t (loop while (setq tt (member (car ss) tt)) do
(incf count (cam (subseq ss 1) (subseq tt 1)))
(setq tt (subseq tt 1)))
count)))))
(cam (coerce substr ‘list) (coerce text ‘list))))))

Scott said

March 10, 2015 at 2:09 PM

       private static int numSubStrings(String sub, String full) {
		char[] subArray = sub.toCharArray();
		int[] found = new int[subArray.length];
		
		for(char c : full.toCharArray()){
			if(c == subArray[0]){
				found[0]++;
			}
			else{
				for(int i = 1; i < subArray.length; i++){
					if(c == subArray[i] && found[i-1] > 0){
						found[i] += found[i-1];
						break;
					}
				}
			}
		}
		return found[found.length-1];	
	}

matthew said

March 10, 2015 at 6:12 PM

My initial C++ solution was like Scott’s, but there were some problems with eg. “count aaaa aaaaaaaa” (the perils of imperative programming). The following seems to works (and when applied to the above, the intermediate arrays of counts make a nice illustration of Pascal’s triangle). It’s sort of a compact memoized/dynamic programming version of the recursive solution (and a good deal more efficient than generating all possible substrings and comparing each one with the target):

#include <string.h>
#include <stdio.h>
#include <vector>
int main(int argc, char *argv[])
{
   char *s = argv[1], *t = argv[2];
   int slen = strlen(s), tlen = strlen(t);
   std::vector<int> a(slen+1);
   a[0] = 1;
   for (int i = 0; i < tlen; i++) {
      for (int j = slen; j > 0; j--) {
         if (t[i] == s[j-1]) {
            a[j] += a[j-1];
         }
      }
      for (int i = 0; i < slen+1; i++) {
         printf("%d%s", a[i],(i<slen)?" ":"\n");
      }
   }
   printf("%d\n", a[slen]);
}

$ ./count aaaa aaaaaaaa
1 1 0 0 0
1 2 1 0 0
1 3 3 1 0
1 4 6 4 1
1 5 10 10 5
1 6 15 20 15
1 7 21 35 35
1 8 28 56 70
70

Scott said

March 10, 2015 at 6:19 PM

My previous solution fails for duplicate letters in the sub string e.g ‘tat’ –> ‘tatapult’. This corrects that mistake

private static int numSubStrings(String sub, String full) {
		char[] subArray = sub.toCharArray();
		int[] found = new int[subArray.length];

		for(char c : full.toCharArray()){
			for(int i = 0; i < subArray.length; i++){
				if(i == 0 && c == subArray[0]){
					found[0]++;
				}
				else if(c == subArray[i] && found[i-1] > 0){
					found[i] += found[i-1];
				}
			}
		}
		return found[found.length-1];	
}

Mike said

March 10, 2015 at 8:17 PM

Python.

def count_all_matches(s, t):
    def cam(s, t):
        if s:
            return sum(cam(s[1:], t[i+1:]) for i in range(len(t)) if s[0] == t[i])
        return 1
    return cam(s, t) if s else 0

If t is very long, you might want to preprocess it to remove characters that are not in s.

I think this is equivalent to Francesco’s Haskell:

def count(s, t):
	u = tuple(s)
	return sum(u == v for v in it.combinations(t, len(u)))

Jan Van lent said

March 20, 2015 at 11:55 PM

Another Haskell solution.

cam [] _ = 1
cam _ [] = 0
cam xxs@(x:xs) (y:ys) = if x == y then m + n else n
  where m = cam xs ys
        n = cam xxs ys

S	M	T	W	T	F	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

Programming Praxis