String Search: Boyer-Moore

August 28, 2009

The two previous exercises discussed the brute-force and Knuth-Morris-Pratt algoritms for searching strings. Today we discuss the Boyer-Moore string search algorithm, invented by Bob Boyer and J Strother Moore in 1977, in a variant devised by Nigel Horspool.

The Boyer-Moore algorithm is a “backwards” version of the Knuth-Morris-Pratt algorithm. It looks at the last character of the pattern first, working its way right-to-left until it finds a mis-match, when it slides the pattern right along the search string for a skip size based on the current character.

Consider the pattern ABABAC, the same pattern used in the prior exercise. The skip array is:

A    1
B    2
C    0
else 6

If the current character of the search string isn't in the pattern, you can skip all the way past the current pattern. If the current character of the search string is C, the last character of the pattern, the pattern doesn't move, and the comparison shifts to the next character to the left. If the current character of the search string is A, the next-to-last character of the pattern, slide the pattern one character to the right and restart at the end of the pattern. And if the current character of the search string is B, the second-to-last character of the pattern, slide the pattern two characters to the right and restart at the end of the pattern.

Your task is to write a function that performs string searching using the Horspool variant of the Boyer-Moore algorithm. When you are finished, you are welcome to read or run a suggested solution, or to post your solution or discuss the exercise in the comments below.

Pages: 1 2

Follow

Get every new post delivered to your Inbox.

Join 600 other followers