Spell Checking

April 17, 2009

Spell checkers are ubiquitous. Word processors have spell checkers, as do browser-based e-mail clients. They all work the same way: a dictionary is stored in some data structure, then each word of input is submitted to a search in the data structure, and those that fail are flagged as spelling errors. There is a certain art to building the word list on which a spell checker is based; Exxon isn’t a word in anybody’s dictionary, but it is likely included in the word list, but cwm (a geological structure), which is certainly a word, is most likely omitted from the word list, on the grounds it is more likely an error than a legitimate spelling (unless you are a geologist).

There are many appropriate data structures to store the word list, including a sorted array accessed via binary search, a hash table, or a bloom filter. In this exercise you are challenged to store the word list character-by-character in a trie. Consider this sample trie that stores the words CAR, CART, CAT and DOG:

trie

To see that CAT is a valid spelling, follow the C branch, then the A branch, then the T branch, where you find a filled circle, indicating a word. To see that CAB is not a valid spelling, follow the C branch, then the A branch, and fail when there is no B branch.

Tries can be very fast; for instance, to see that QARTER (a misspelling of QUARTER) is not a word, follow the Q branch, then fail when there is no A branch. This is even faster than hashing, which must read all six letters of QARTER just to compute the hash value. And tries can also be space-efficient, since space is shared between common prefixes.

Your task is to build a spell checker based on tries. When you are finished, you can read or run a suggested solution, or post your solution or discuss the exercise in the comments below.

About these ads

Pages: 1 2

4 Responses to “Spell Checking”

  1. FalconNL said

    In Haskell:

    import qualified Data.ByteString.Char8 as B
    import Data.Trie
    
    main = do trie <- fmap (fromList . map (flip (,) ()) . B.lines) $ B.readFile "words.txt"
              print $ valid trie "qarter"
       
    valid trie = flip member trie . B.pack
    
  2. Eric H said

    Another in scheme:

  3. Vikas Tandi said

    implemented in c language:

    http://codepad.org/PmYVlSN6

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 635 other followers

%d bloggers like this: