String Search: Boyer-Moore
August 28, 2009
The two previous exercises discussed the brute-force and Knuth-Morris-Pratt algoritms for searching strings. Today we discuss the Boyer-Moore string search algorithm, invented by Bob Boyer and J Strother Moore in 1977, in a variant devised by Nigel Horspool.
The Boyer-Moore algorithm is a “backwards” version of the Knuth-Morris-Pratt algorithm. It looks at the last character of the pattern first, working its way right-to-left until it finds a mis-match, when it slides the pattern right along the search string for a skip size based on the current character.
Consider the pattern ABABAC, the same pattern used in the prior exercise. The skip array is:
[…] Praxis – String Search: Boyer-Moore By Remco Niemeijer In yesterday’s Programming Praxis problem we have to implement a more efficient string search algorithm than the […]
My Haskell solution (see http://bonsaicode.wordpress.com/2009/08/29/programming-praxis-string-search-boyer-moore/ for a version with comments):
import Data.Map (findWithDefault, fromList, (!)) horspool :: Ord a => [a] -> Maybe Int -> [a] -> Maybe Int horspool pat skip xs = f (lp - 1 + maybe 0 id skip) p' where (lp, lxs, p') = (length pat, length xs, reverse pat) t = fromList $ zip pat [lp - 1, lp - 2..] m = fromList $ zip [0..] xs f n [] = Just (n + 1) f n (p:ps) | n >= lxs = Nothing | p == m ! n = f (n - 1) ps | otherwise = f (n + findWithDefault lp (m ! n) t) p'Here is my code in C
#include <limits.h> /* Boyer–Moore–Horspool algorithm */ int string_index_3(char *s, int str_size, char *p, int pattern_size) { int i, j; int bad_char_shift[UCHAR_MAX+1]; /* sanity check */ if(!s || !p || pattern_size <= 0) return -1; /* initialize shift table */ for(i = 0; i <= UCHAR_MAX; i++) bad_char_shift[i] = pattern_size; /* build shift table */ for(i = pattern_size-2; i >= 0; i--) if(bad_char_shift[p[i]] == pattern_size) bad_char_shift[p[i]] = pattern_size - i - 1; /* search pattern */ for(i = 0; i < str_size;) { /* check the substring from right to left */ for(j = pattern_size -1; p[j] == s[i+j]; j--) if(j == 0) return i+1; /* move right using skip table */ i += bad_char_shift[s[i+pattern_size-1]]; } return -1; }