Spell Checking

April 17, 2009

Our word list will be stored in a trie that provides t-null for the empty trie, t-look to determine if word is in the trie, and t-bind to bind a new word to the trie. The branches of the trie are stored in association lists, but instead of the standard Scheme a-lists we use our own, in which characters are stored in sorted order. All comparisons are case-insensitive. First the a-list functions:

(define a-null '())

(define (a-look c a)
  (cond ((null? a) #f)
        ((char-ci<? c (caar a)) #f)
        ((char-ci<? (caar a) c) (a-look c (cdr a)))
        (else (car a))))

(define (a-bind c x a)
  (cond ((null? a) (list (cons c x)))
        ((char-ci<? c (caar a)) (cons (cons c x) a))
        ((char-ci<? (caar a) c) (cons (car a) (a-bind c x (cdr a))))
        (else (cons (cons c x) (cdr a)))))

A trie is a pair consisting of a value and an a-list of successors. The value is boxed in a list, so '() indicates no-value and '(x) indicates the value x. The null trie conses the null list to the null a-list:

(define t-null (cons '() a-null))

Trie lookup goes one character at a time. There are two ways for lookup to fail: if the end of the key finds a '() value, or if an intermediate node does not exist:

(define (t-look ks t)
  (if (null? ks)
      (if (pair? (car t)) (caar t) #f)
      (let ((x (a-look (car ks) (cdr t))))
        (if x (t-look (cdr ks) (cdr x))#f))))

Adding or replacing a value in a trie is similar to lookup, except that it never fails; if necessary, the trie is extended with a null node:

(define (t-bind ks x t)
  (if (null? ks) (cons (list x) (cdr t))
    (let* ((a1 (a-look (car ks) (cdr t)))
           (t2 (t-bind (cdr ks) x (if (pair? a1) (cdr a1) t-null))))
      (cons (car t) (a-bind (car ks) t2 (cdr t))))))

Make-dict reads a word list, one word per line, binding each word to #t; read-line comes from the Standard Prelude:

(define (make-dict filename)
  (with-input-from-file filename
    (lambda ()
      (let loop ((word (read-line)) (dict t-null))
        (if (eof-object? word) dict
          (loop (read-line) (t-bind (string->list word) #t dict)))))))

The dictionary is loaded from the standard word list:

(define dict (make-dict "/usr/dict/words"))

Spell-checking a word is done by submitting the key to the dictionary:

(define (spell word)
  (t-look (string->list word) dict))

Programming is spelled correctly, but my dictionary thinks praxis is somehow akin to cwm:

> (spell "Programming")
> (spell "Praxis")

You can see the program at http://codepad.org/fK2CUpEQ.

Pages: 1 2

6 Responses to “Spell Checking”

  1. FalconNL said

    In Haskell:

    import qualified Data.ByteString.Char8 as B
    import Data.Trie

    main = do trie <- fmap (fromList . map (flip (,) ()) . B.lines) $ B.readFile "words.txt" print $ valid trie "qarter" valid trie = flip member trie . B.pack [/sourcecode]

  2. Eric H said

    Another in scheme:

  3. Vikas Tandi said

    implemented in c language:

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: