String Search: Boyer-Moore

August 28, 2009

The two previous exercises discussed the brute-force and Knuth-Morris-Pratt algoritms for searching strings. Today we discuss the Boyer-Moore string search algorithm, invented by Bob Boyer and J Strother Moore in 1977, in a variant devised by Nigel Horspool.

The Boyer-Moore algorithm is a “backwards” version of the Knuth-Morris-Pratt algorithm. It looks at the last character of the pattern first, working its way right-to-left until it finds a mis-match, when it slides the pattern right along the search string for a skip size based on the current character.

Consider the pattern ABABAC, the same pattern used in the prior exercise. The skip array is:

A    1
B    2
C    0
else 6

If the current character of the search string isn't in the pattern, you can skip all the way past the current pattern. If the current character of the search string is C, the last character of the pattern, the pattern doesn't move, and the comparison shifts to the next character to the left. If the current character of the search string is A, the next-to-last character of the pattern, slide the pattern one character to the right and restart at the end of the pattern. And if the current character of the search string is B, the second-to-last character of the pattern, slide the pattern two characters to the right and restart at the end of the pattern.

Your task is to write a function that performs string searching using the Horspool variant of the Boyer-Moore algorithm. When you are finished, you are welcome to read or run a suggested solution, or to post your solution or discuss the exercise in the comments below.

About these ads

Pages: 1 2

4 Responses to “String Search: Boyer-Moore”

  1. Connochaetes said
    Maxchar = 256
    
    def preprocess(pattern)
    	length    = pattern.length
    	skip 	  = Array.new(Maxchar, length)
    	index     = -1 
    	pattern.each_byte  do |b|
    		skip[b] = length - (index += 1) - 1
    	end
    	skip
    end
    
    def horspool_search(str, pattern)
    	m, n = pattern.length, str.length
    	skip = preprocess(pattern)
    	k, j = m-1, m-1
    	while k < n
    		i = k
    		while j>=0 and str[i] == pattern[j]
    			i -=1; j -= 1
    		end
    		return i+1 if j == -1
    		k += skip[str[k]]
    	end
    	nil
    end
    
  2. [...] Praxis – String Search: Boyer-Moore By Remco Niemeijer In yesterday’s Programming Praxis problem we have to implement a more efficient string search algorithm than the [...]

  3. Remco Niemeijer said

    My Haskell solution (see http://bonsaicode.wordpress.com/2009/08/29/programming-praxis-string-search-boyer-moore/ for a version with comments):

    import Data.Map (findWithDefault, fromList, (!))
    
    horspool :: Ord a => [a] -> Maybe Int -> [a] -> Maybe Int
    horspool pat skip xs = f (lp - 1 + maybe 0 id skip) p' where
        (lp, lxs, p') = (length pat, length xs, reverse pat)
        t = fromList $ zip pat [lp - 1, lp - 2..]
        m = fromList $ zip [0..] xs
        f n []     = Just (n + 1)
        f n (p:ps) | n >= lxs   = Nothing
    	           | p == m ! n = f (n - 1) ps
                   | otherwise  = f (n + findWithDefault lp (m ! n) t) p'
    
  4. Vikas Tandi said

    Here is my code in C

    #include <limits.h>
    
    /* Boyer–Moore–Horspool algorithm */
    int string_index_3(char *s, int str_size, char *p, int pattern_size)
    {
    	int i, j;
    	int bad_char_shift[UCHAR_MAX+1];
    
    	/* sanity check */
    	if(!s || !p || pattern_size <= 0)
    		return -1;
    
    	/* initialize shift table */
    	for(i = 0; i <= UCHAR_MAX; i++)
    		bad_char_shift[i] = pattern_size;
    
    	/* build shift table */
    	for(i = pattern_size-2; i >= 0; i--)
    		if(bad_char_shift[p[i]] == pattern_size)
    			bad_char_shift[p[i]] = pattern_size - i - 1;
    
    	/* search pattern */
    	for(i = 0; i < str_size;)
    	{
    		/* check the substring from right to left */
    		for(j = pattern_size -1; p[j] == s[i+j]; j--)
    			if(j == 0)
    				return i+1;
    
    		/* move right using skip table */
    		i += bad_char_shift[s[i+pattern_size-1]];
    	}
    	return -1;
    }
    

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 628 other followers