String Search: Boyer-Moore

August 28, 2009

Our bm-search function works in two phases; the do-loop calculates the skip vector, the named-let loop scans across the search string:

(define (bm-search pat str . s)
  (define (ord str s) (char->integer (string-ref str s)))
  (let* ((plen (string-length pat))
         (slen (string-length str))
         (skip (make-vector 256 plen)))
    (do ((p 0 (+ p 1))) ((= p plen))
      (vector-set! skip (ord pat p) (- plen p 1)))
    (let loop ((p (- plen 1)) (s (if (null? s) (- plen 1) (+ (car s) plen -1))))
      (cond ((negative? p) (+ s 1))
            ((<= slen s) #f)
            ((char=? (string-ref pat p) (string-ref str s)) (loop (- p 1) (- s 1)))
            (else (loop (- plen 1) (+ s (vector-ref skip (ord str s)))))))))

Boyer-Moore is very fast, approaching O(m/n), if most of the pattern characters don’t appear in the search string. The Horspool variant is O(mn) in the worst case, but that seldom happens, and it is O(n) in the average case; the standard Boyer-Moore algorithm, not given here, improves that to O(n) in the worst case. In practice, both the standard Boyer-Moore and the Horspool variant are very fast, faster even than Knuth-Morris-Pratt on most patterns, and never very bad. The chief complaint against Boyer-Moore is that it requires backing up, and is thus not suitable for “stream” applications such as searching for a pattern in a file.

You can see the string searcher in action, including a run of the standard test suite, at http://programmingpraxis.codepad.org/CjKUK0jC.

About these ads

Pages: 1 2

4 Responses to “String Search: Boyer-Moore”

  1. Connochaetes said
    Maxchar = 256
    
    def preprocess(pattern)
    	length    = pattern.length
    	skip 	  = Array.new(Maxchar, length)
    	index     = -1 
    	pattern.each_byte  do |b|
    		skip[b] = length - (index += 1) - 1
    	end
    	skip
    end
    
    def horspool_search(str, pattern)
    	m, n = pattern.length, str.length
    	skip = preprocess(pattern)
    	k, j = m-1, m-1
    	while k < n
    		i = k
    		while j>=0 and str[i] == pattern[j]
    			i -=1; j -= 1
    		end
    		return i+1 if j == -1
    		k += skip[str[k]]
    	end
    	nil
    end
    
  2. [...] Praxis – String Search: Boyer-Moore By Remco Niemeijer In yesterday’s Programming Praxis problem we have to implement a more efficient string search algorithm than the [...]

  3. Remco Niemeijer said

    My Haskell solution (see http://bonsaicode.wordpress.com/2009/08/29/programming-praxis-string-search-boyer-moore/ for a version with comments):

    import Data.Map (findWithDefault, fromList, (!))
    
    horspool :: Ord a => [a] -> Maybe Int -> [a] -> Maybe Int
    horspool pat skip xs = f (lp - 1 + maybe 0 id skip) p' where
        (lp, lxs, p') = (length pat, length xs, reverse pat)
        t = fromList $ zip pat [lp - 1, lp - 2..]
        m = fromList $ zip [0..] xs
        f n []     = Just (n + 1)
        f n (p:ps) | n >= lxs   = Nothing
    	           | p == m ! n = f (n - 1) ps
                   | otherwise  = f (n + findWithDefault lp (m ! n) t) p'
    
  4. Vikas Tandi said

    Here is my code in C

    #include <limits.h>
    
    /* Boyer–Moore–Horspool algorithm */
    int string_index_3(char *s, int str_size, char *p, int pattern_size)
    {
    	int i, j;
    	int bad_char_shift[UCHAR_MAX+1];
    
    	/* sanity check */
    	if(!s || !p || pattern_size <= 0)
    		return -1;
    
    	/* initialize shift table */
    	for(i = 0; i <= UCHAR_MAX; i++)
    		bad_char_shift[i] = pattern_size;
    
    	/* build shift table */
    	for(i = pattern_size-2; i >= 0; i--)
    		if(bad_char_shift[p[i]] == pattern_size)
    			bad_char_shift[p[i]] = pattern_size - i - 1;
    
    	/* search pattern */
    	for(i = 0; i < str_size;)
    	{
    		/* check the substring from right to left */
    		for(j = pattern_size -1; p[j] == s[i+j]; j--)
    			if(j == 0)
    				return i+1;
    
    		/* move right using skip table */
    		i += bad_char_shift[s[i+pattern_size-1]];
    	}
    	return -1;
    }
    

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 615 other followers

%d bloggers like this: