String Search: Boyer-Moore

August 28, 2009

The two previous exercises discussed the brute-force and Knuth-Morris-Pratt algoritms for searching strings. Today we discuss the Boyer-Moore string search algorithm, invented by Bob Boyer and J Strother Moore in 1977, in a variant devised by Nigel Horspool.

The Boyer-Moore algorithm is a “backwards” version of the Knuth-Morris-Pratt algorithm. It looks at the last character of the pattern first, working its way right-to-left until it finds a mis-match, when it slides the pattern right along the search string for a skip size based on the current character.

Consider the pattern ABABAC, the same pattern used in the prior exercise. The skip array is:

Pages: 1 2

4 Responses to “String Search: Boyer-Moore”

  1. Connochaetes said
    Maxchar = 256
    
    def preprocess(pattern)
    	length    = pattern.length
    	skip 	  = Array.new(Maxchar, length)
    	index     = -1 
    	pattern.each_byte  do |b|
    		skip[b] = length - (index += 1) - 1
    	end
    	skip
    end
    
    def horspool_search(str, pattern)
    	m, n = pattern.length, str.length
    	skip = preprocess(pattern)
    	k, j = m-1, m-1
    	while k < n
    		i = k
    		while j>=0 and str[i] == pattern[j]
    			i -=1; j -= 1
    		end
    		return i+1 if j == -1
    		k += skip[str[k]]
    	end
    	nil
    end
    
  2. […] Praxis – String Search: Boyer-Moore By Remco Niemeijer In yesterday’s Programming Praxis problem we have to implement a more efficient string search algorithm than the […]

  3. Remco Niemeijer said

    My Haskell solution (see http://bonsaicode.wordpress.com/2009/08/29/programming-praxis-string-search-boyer-moore/ for a version with comments):

    import Data.Map (findWithDefault, fromList, (!))
    
    horspool :: Ord a => [a] -> Maybe Int -> [a] -> Maybe Int
    horspool pat skip xs = f (lp - 1 + maybe 0 id skip) p' where
        (lp, lxs, p') = (length pat, length xs, reverse pat)
        t = fromList $ zip pat [lp - 1, lp - 2..]
        m = fromList $ zip [0..] xs
        f n []     = Just (n + 1)
        f n (p:ps) | n >= lxs   = Nothing
    	           | p == m ! n = f (n - 1) ps
                   | otherwise  = f (n + findWithDefault lp (m ! n) t) p'
    
  4. Vikas Tandi said

    Here is my code in C

    #include <limits.h>
    
    /* Boyer–Moore–Horspool algorithm */
    int string_index_3(char *s, int str_size, char *p, int pattern_size)
    {
    	int i, j;
    	int bad_char_shift[UCHAR_MAX+1];
    
    	/* sanity check */
    	if(!s || !p || pattern_size <= 0)
    		return -1;
    
    	/* initialize shift table */
    	for(i = 0; i <= UCHAR_MAX; i++)
    		bad_char_shift[i] = pattern_size;
    
    	/* build shift table */
    	for(i = pattern_size-2; i >= 0; i--)
    		if(bad_char_shift[p[i]] == pattern_size)
    			bad_char_shift[p[i]] = pattern_size - i - 1;
    
    	/* search pattern */
    	for(i = 0; i < str_size;)
    	{
    		/* check the substring from right to left */
    		for(j = pattern_size -1; p[j] == s[i+j]; j--)
    			if(j == 0)
    				return i+1;
    
    		/* move right using skip table */
    		i += bad_char_shift[s[i+pattern_size-1]];
    	}
    	return -1;
    }
    

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: