String Search: Boyer-Moore
August 28, 2009
Our bm-search
function works in two phases; the do-loop calculates the skip vector, the named-let loop scans across the search string:
(define (bm-search pat str . s)
(define (ord str s) (char->integer (string-ref str s)))
(let* ((plen (string-length pat))
(slen (string-length str))
(skip (make-vector 256 plen)))
(do ((p 0 (+ p 1))) ((= p plen))
(vector-set! skip (ord pat p) (- plen p 1)))
(let loop ((p (- plen 1)) (s (if (null? s) (- plen 1) (+ (car s) plen -1))))
(cond ((negative? p) (+ s 1))
((<= slen s) #f)
((char=? (string-ref pat p) (string-ref str s)) (loop (- p 1) (- s 1)))
(else (loop (- plen 1) (+ s (vector-ref skip (ord str s)))))))))
Boyer-Moore is very fast, approaching O(m/n), if most of the pattern characters don’t appear in the search string. The Horspool variant is O(mn) in the worst case, but that seldom happens, and it is O(n) in the average case; the standard Boyer-Moore algorithm, not given here, improves that to O(n) in the worst case. In practice, both the standard Boyer-Moore and the Horspool variant are very fast, faster even than Knuth-Morris-Pratt on most patterns, and never very bad. The chief complaint against Boyer-Moore is that it requires backing up, and is thus not suitable for “stream” applications such as searching for a pattern in a file.
You can see the string searcher in action, including a run of the standard test suite, at http://programmingpraxis.codepad.org/CjKUK0jC.
[…] Praxis – String Search: Boyer-Moore By Remco Niemeijer In yesterday’s Programming Praxis problem we have to implement a more efficient string search algorithm than the […]
My Haskell solution (see http://bonsaicode.wordpress.com/2009/08/29/programming-praxis-string-search-boyer-moore/ for a version with comments):
Here is my code in C