String Search: Rabin-Karp

September 1, 2009

We examined the brute-force, Knuth-Morris-Pratt and Boyer-Moore methods of searching for a pattern in a string in three previous exercises. Today, we conclude this series of string-searching exercises with a look at a method devised by Michael Rabin (the same Michael Rabin we saw earlier in the exercise on primality checking) and Richard Karp in 1987 that uses hashing to find a substring in a text.

The basic idea of the Rabin-Karp algorithm is to take every pattern-length substring of the string being searched, hash the characters to a number, and compare that number to the hash number of the pattern. When a substring hash is equal to the pattern hash, a final check must be made to confirm the match.

It is useful to use a hash function that considers the characters in order; that way, considering each substring in sequence makes it easy to compute the various hashes. One approach is to add the integer values corresponding to each character; to compute the next hash, drop off the integer value of the first character and add the integer value of the current character to the running hash value.

The hash function that we will adopt treats each substring as a base-256 number. Thus, cat, with ascii values of 99, 97 and 116 for its three letters, hashes to 99 · 2562 + 97 · 2561 + 116 · 2560 = 6513012. Since hash values can grow large quickly, it is customary to take them modulo some convenient prime number.

Given this hash function, it is easy to roll the hash sequentially through the search string. If the search string is catch, the next hash after cat will be 6386787, which can be computed as 6513012 – 97 · 2562 = 24948, multipled by 256, thus 24948 × 256 = 6386688, then add 99, thus 6386688 + 99 = 6386787.

Your task is to write a function that searches for a pattern in a string using the Rabin-Karp algorithm. When you are finished, you are welcome to read or run a suggested solution, or to post your solution or discuss the exercise in the comments below.

About these ads

Pages: 1 2

4 Responses to “String Search: Rabin-Karp”

  1. [...] Praxis – String Search: Rabin-Karp By Remco Niemeijer Today‚Äôs Programming Praxis problem marks the end of the string search series. In this final entry, we have [...]

  2. Remco Niemeijer said

    My Haskell solution (see http://bonsaicode.wordpress.com/2009/09/01/programming-praxis-string-search-rabin-karp/ for a version with comments):

    import Data.Char
    import Data.List
    
    asciis :: String -> [Integer]
    asciis = map (fromIntegral . ord)
    
    hash :: Num a => [a] -> a
    hash = sum . zipWith (*) (iterate (* 256) 1) . reverse
    
    hashes :: Int -> String -> [Integer]
    hashes p xs = scanl (\s (f,t) -> 256*s - 256^p*f + t) (hash $ take p ascs) .
                  zip ascs $ drop p ascs where ascs = asciis xs
    
    rabinKarp :: String -> Maybe Int -> String -> Maybe Int
    rabinKarp p s = lookup (hash $ asciis p) . drop (maybe 0 id s) .
                    flip zip [0..] . hashes (length p)
    
  3. Vikas Tandi said

    my implementation in C

    #include <string.h>
    
    static long double rolling_hash(long double hashval, long double val, char s, char e);
    static long double hash(char *s, int m);
    
    int string_index_4(char *s, int slen, char *p, int plen)
    {
    	int i;
    	long double hsub, hs, pow;
    
    	/* sanity check */
    	if(!s || !p || plen <= 0)
    		return -1;
    
    	for(i = 1, pow = 1; i <= (plen-1); i++)
    		pow = pow * 256;
    
    	/* calculate hash for sub-string and string */
    	hs = hash(s, plen);
    	hsub = hash(p, plen);
    
    	/* search sub-string */
    	for(i = 0; i <= (slen - plen); i++)
    	{
    		if(hs == hsub)
    			if(strncmp(&s[i], p, plen) == 0)
    				return i+1;
    
    		hs = rolling_hash(hs, pow, s[i], s[i+plen]);
    	}
    
    	return -1;
    }
    
    static long double hash(char *s, int m)
    {
    	int i;
    	long double hashval;
    	for(hashval = 0.0, i = 0; s[i] != '\0' && i < m; i++)
    		hashval = 256 * hashval + s[i];
    
    	return hashval;
    }
    
    static long double rolling_hash(long double hashval, long double val, char s, char e)
    {
    	hashval = hashval - s * val;
    	hashval = hashval * 256 + e;
    	return hashval;
    }
    

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 628 other followers

%d bloggers like this: