Spell Checking
April 17, 2009
Our word list will be stored in a trie that provides t-null
for the empty trie, t-look
to determine if word is in the trie, and t-bind
to bind a new word to the trie. The branches of the trie are stored in association lists, but instead of the standard Scheme a-lists we use our own, in which characters are stored in sorted order. All comparisons are case-insensitive. First the a-list functions:
(define a-null '())
(define (a-look c a)
(cond ((null? a) #f)
((char-ci<? c (caar a)) #f)
((char-ci<? (caar a) c) (a-look c (cdr a)))
(else (car a))))
(define (a-bind c x a)
(cond ((null? a) (list (cons c x)))
((char-ci<? c (caar a)) (cons (cons c x) a))
((char-ci<? (caar a) c) (cons (car a) (a-bind c x (cdr a))))
(else (cons (cons c x) (cdr a)))))
A trie is a pair consisting of a value and an a-list of successors. The value is boxed in a list, so '()
indicates no-value and '(x)
indicates the value x
. The null trie conses the null list to the null a-list:
(define t-null (cons '() a-null))
Trie lookup goes one character at a time. There are two ways for lookup to fail: if the end of the key finds a '()
value, or if an intermediate node does not exist:
(define (t-look ks t)
(if (null? ks)
(if (pair? (car t)) (caar t) #f)
(let ((x (a-look (car ks) (cdr t))))
(if x (t-look (cdr ks) (cdr x))#f))))
Adding or replacing a value in a trie is similar to lookup, except that it never fails; if necessary, the trie is extended with a null node:
(define (t-bind ks x t)
(if (null? ks) (cons (list x) (cdr t))
(let* ((a1 (a-look (car ks) (cdr t)))
(t2 (t-bind (cdr ks) x (if (pair? a1) (cdr a1) t-null))))
(cons (car t) (a-bind (car ks) t2 (cdr t))))))
Make-dict
reads a word list, one word per line, binding each word to #t
; read-line
comes from the Standard Prelude:
(define (make-dict filename)
(with-input-from-file filename
(lambda ()
(let loop ((word (read-line)) (dict t-null))
(if (eof-object? word) dict
(loop (read-line) (t-bind (string->list word) #t dict)))))))
The dictionary is loaded from the standard word list:
(define dict (make-dict "/usr/dict/words"))
Spell-checking a word is done by submitting the key to the dictionary:
(define (spell word)
(t-look (string->list word) dict))
Programming is spelled correctly, but my dictionary thinks praxis is somehow akin to cwm:
> (spell "Programming")
#t
> (spell "Praxis")
#f
You can see the program at http://codepad.org/fK2CUpEQ.
In Haskell:
import qualified Data.ByteString.Char8 as B
import Data.Trie
main = do trie <- fmap (fromList . map (flip (,) ()) . B.lines) $ B.readFile "words.txt" print $ valid trie "qarter" valid trie = flip member trie . B.pack [/sourcecode]
In scheme :
http://codepad.org/aHBN2HbM
Another in scheme:
trie-tests.ss
hosted with ❤ by GitHub
trie.ss
hosted with ❤ by GitHub
implemented in c language:
http://codepad.org/PmYVlSN6
In Rust:
https://gist.github.com/ftt/1e22d9f7873eb955455b11bfee503cf4.js
Sorry, bad link above :( https://gist.github.com/ftt/1e22d9f7873eb955455b11bfee503cf4