String Search: Rabin-Karp
September 1, 2009
Our function is quite straight forward:
(define (rk-search pat str . s)
(define (hash s)
(fold-left (lambda (x y) (+ (* 256 x) y)) 0
(map char->integer (string->list s))))
(let* ((q 1073741789)
(plen (string-length pat))
(slen (string-length str))
(h (modulo (expt 256 (- plen 1)) q))
(phash (modulo (hash pat) q))
(s (if (null? s) 0 (car s)))
(shash (modulo (hash (substring str s (+ s plen))) q)))
(let loop ((s s) (shash shash))
(cond ((and (= phash shash)
(string=? pat (substring str s (+ s plen)))) s)
((<= (- slen plen) s) #f)
(else (loop (+ s 1)
(modulo (+ (* 256 (- shash (* h (char->integer (string-ref str s)))))
(char->integer (string-ref str (+ s plen)))) q)))))))
Rabin-Karp is O(n) to search sequentially through the string, and O(m) to confirm each match. If each sub-string generates a false match, the total time will be O(mn), which is worse than the other string-searching algorithms. But Rabin-Karp excels when multiple strings are being sought; put all the hashes in a set or bloom filter, then look for them all in parallel (the details get a little tricky if the patterns can all have varying lengths).
Rk-search
uses fold-left from the Standard Prelude. You can see the Rabin-Karp searcher in action, including a run of the standard test suite, at http://programmingpraxis.codepad.org/ybaO1twV.
[…] Praxis – String Search: Rabin-Karp By Remco Niemeijer Today’s Programming Praxis problem marks the end of the string search series. In this final entry, we have […]
My Haskell solution (see http://bonsaicode.wordpress.com/2009/09/01/programming-praxis-string-search-rabin-karp/ for a version with comments):
[…] […]
my implementation in C