Spell Checking
April 17, 2009
Our word list will be stored in a trie that provides t-null
for the empty trie, t-look
to determine if word is in the trie, and t-bind
to bind a new word to the trie. The branches of the trie are stored in association lists, but instead of the standard Scheme a-lists we use our own, in which characters are stored in sorted order. All comparisons are case-insensitive. First the a-list functions:
(define a-null '())
(define (a-look c a)
(cond ((null? a) #f)
((char-ci<? c (caar a)) #f)
((char-ci<? (caar a) c) (a-look c (cdr a)))
(else (car a))))
(define (a-bind c x a)
(cond ((null? a) (list (cons c x)))
((char-ci<? c (caar a)) (cons (cons c x) a))
((char-ci<? (caar a) c) (cons (car a) (a-bind c x (cdr a))))
(else (cons (cons c x) (cdr a)))))
A trie is a pair consisting of a value and an a-list of successors. The value is boxed in a list, so '()
indicates no-value and '(x)
indicates the value x
. The null trie conses the null list to the null a-list:
(define t-null (cons '() a-null))
Trie lookup goes one character at a time. There are two ways for lookup to fail: if the end of the key finds a '()
value, or if an intermediate node does not exist:
(define (t-look ks t)
(if (null? ks)
(if (pair? (car t)) (caar t) #f)
(let ((x (a-look (car ks) (cdr t))))
(if x (t-look (cdr ks) (cdr x))#f))))
Adding or replacing a value in a trie is similar to lookup, except that it never fails; if necessary, the trie is extended with a null node:
(define (t-bind ks x t)
(if (null? ks) (cons (list x) (cdr t))
(let* ((a1 (a-look (car ks) (cdr t)))
(t2 (t-bind (cdr ks) x (if (pair? a1) (cdr a1) t-null))))
(cons (car t) (a-bind (car ks) t2 (cdr t))))))
Make-dict
reads a word list, one word per line, binding each word to #t
; read-line
comes from the Standard Prelude:
(define (make-dict filename)
(with-input-from-file filename
(lambda ()
(let loop ((word (read-line)) (dict t-null))
(if (eof-object? word) dict
(loop (read-line) (t-bind (string->list word) #t dict)))))))
The dictionary is loaded from the standard word list:
(define dict (make-dict "/usr/dict/words"))
Spell-checking a word is done by submitting the key to the dictionary:
(define (spell word)
(t-look (string->list word) dict))
Programming is spelled correctly, but my dictionary thinks praxis is somehow akin to cwm:
> (spell "Programming")
#t
> (spell "Praxis")
#f
You can see the program at http://codepad.org/fK2CUpEQ.
In Haskell:
import qualified Data.ByteString.Char8 as B
import Data.Trie
main = do trie <- fmap (fromList . map (flip (,) ()) . B.lines) $ B.readFile "words.txt" print $ valid trie "qarter" valid trie = flip member trie . B.pack [/sourcecode]
In scheme :
http://codepad.org/aHBN2HbM
Another in scheme:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
trie-tests.ss
hosted with ❤ by GitHub
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
trie.ss
hosted with ❤ by GitHub
implemented in c language:
http://codepad.org/PmYVlSN6
In Rust:
https://gist.github.com/ftt/1e22d9f7873eb955455b11bfee503cf4.js
Sorry, bad link above :( https://gist.github.com/ftt/1e22d9f7873eb955455b11bfee503cf4