Dodgson’s Doublets

March 20, 2009

We use make-hash, string-hash, read-line and string-downcase from the Standard Prelude. We assume an on-line wordlist, one word per line, stored at /usr/dict/words; if you don’t have a wordlist, Kevin Atkinson, who maintains the Gnu aspell program, provides several at http://wordlist.sourceforge.net/.

We keep the wordlist in a data structure with the primary task of retrieving, given a word, a list of all the words that are one-letter away. That structure is a graph, but instead of forming the graph by pre-computing the adjacent words of all the words in the wordlist, we employ a trick that’s part of the computing folklore: we keep a hash table of all the words and their corresponding star-words, cross-referenced in both directions, where a word’s star-words are all the words that can be made by replacing each of the word’s letters with a star. For a small wordlist that contains only the words cat, cot, dog and dot, we keep the following entries in the hash table:

*at   cat
*og   dog
*ot   cot dot
c*t   cat cot
ca*   cat
cat   *at c*t ca*
co*   cot
cot   *ot c*t co*
d*g   dog
d*t   dot
do*   dot dog
dog   *og d*g do*
dot   *ot d*t do*

Then to find adjacent words we append all the wordlists of all the star-words of the given word, eliminating the given word from the output. The add-word function puts a word and all its star-words in the hash table, and the adjacent-to function returns a list of the words one-letter away from a given word:

(define (add-word word)
  (define (adjoin key value)
    (words 'insert key (cons value (words 'lookup key))))
  (let ((len (string-length word)))
    (do ((i 0 (+ i 1))) ((= i len))
      (let ((star-word
             (string-append
               (substring word 0 i)
               (string #\*)
               (substring word (+ i 1) len))))
        (adjoin word star-word)
        (adjoin star-word word)))))

(define (adjacent-to word)
  (define (remove ws)
    (cond ((null? ws) ws)
          ((string=? (car ws) word) (cdr ws))
          (else (cons (car ws) (remove (cdr ws))))))
  (apply append
    (map (lambda (w)
           (remove (words 'lookup w)))
         (words 'lookup word))))

The word/star-word pairs are stored in a hash table called words, which is built from the wordlist in /usr/dict/words by the function build-words:

(define (build-words word-file)
  (with-input-from-file word-file
    (lambda ()
      (do ((word (read-line) (read-line)))
          ((eof-object? word))
        (add-word (string-downcase word))))))

(define words (make-hash string-hash string=? '() 999983))

(build-words "/usr/dict/words")

Building a minimal-length word ladder from a source word to a target word is done by breadth-first search through the graph stored in the words hash table; breadth-first search requires a queue, which is stored explicitly using the old trick of front and back lists, with new items consed onto the head of the back list, which is periodically reversed to become the front list:

(define (ladder source target)
  (let loop1 ((front (list (list source)))
              (back '()) (visited (list source)))
    ; (for-each display `(,front " " ,back " " ,visited #\newline))
    (cond ((null? front)
            (if (null? back) '()
              (loop1 (reverse back) '() visited)))
          ((string=? (caar front) target)
            (reverse (car front)))
          (else (let loop2 ((words (adjacent-to (caar front)))
                            (back back) (visited visited))
                  (cond ((null? words)
                          (loop1 (cdr front) back visited))
                        ((member (car words) visited)
                          (loop2 (cdr words) back visited))
                        (else (loop2 (cdr words)
                                     (cons (cons (car words) (car front)) back)
                                     (cons (car words) visited)))))))))

If the queue is ever exhausted, ladder reports failure by returning a null list; it reports success by returning the chain of words, which it stores in each element of the breadth-first queue.

The code, with a sample dictionary containing the words cat, cot, dog and dot, is available at http://programmingpraxis.codepad.org/HFMJWAe3. Debugging output shows the words stored in the hash table and the action of loop1 in the ladder function.

Pages: 1 2

7 Responses to “Dodgson’s Doublets”

  1. […] are a common data type, which we have used in several exercises (Mark V. Shaney, Word Frequencies, Dodgson’s Doublets, Anagrams). Hash tables are often used as the underlying implementation for dictionaries, and in a […]

  2. […] Dodgson’s Doublets Charles Dodgson was an English author, mathematician and logician of the nineteenth century; In 1879, Dodgson published the Doublets word game in the Vanity Fair magazine: […]

  3. […] we have another post from Programming Praxis’ Word Games. This time, it’s a puzzle inspired by Lewis Carroll […]

  4. JP said

    There weren’t as many solutions here as to most of the posts, so I decided to give it a try.

    What I ended up with is actually three solutions (all in Racket): First a naive brute force, depth first solution and then a slightly better version that uses a simple heuristic to check the more likely branches first. Finally, I went for a depth first solution that took *much* longer to run, but is guaranteed to find a shortest solution. For example, HEAD/TAIL gives us HEAD ↦ HEAL ↦ HEIL ↦ HAIL ↦ TAIL which is actually shorter than Dodgson’s given solution.

    The best solution I’ve found for BLACK/WHITE is BLACK ↦ CLACK ↦ CLICK ↦ CHICK ↦ CHINK ↦ CHINE ↦ WHINE ↦ WHITE.

    You can see my code here: Dodgson’s Doublet

  5. Lucas A. Brown said
    #! /usr/bin/env python
    
    # 1. head and tail must be of the same length.
    # 2. Both must be in words.
    # 3. words must be a list of words of the same length as head and tail.
    # 4. We will pretend that the contents of dontuse are not in words.
    def dodgson(head, tail, words, dontuse=[]):
        l = len(head)
        if sum(head[i] != tail[i] for i in xrange(l)) == 1: return [head, tail]
        heads = []
        for w in words:
            if sum(head[i] != w[i] for i in xrange(l)) == 1 and w not in dontuse: heads.append(w)
        if heads == []: return None
        for h in heads: # This loop isn't necessary but on the average results in a huge speedup and significantly shorter ladders.
            if sum(h[i] != tail[i] for i in xrange(l)) == 1: return [head, h, tail]
        for h in heads:
            d = dodgson(h, tail, words, dontuse + [head])
            if not d is None: return [head] + d
        return None
    
    if __name__ == "__main__":
        from sys import argv
        from random import shuffle
        if len(argv) == 4: g = open(argv[3], "r")
        else: g = open("/etc/dictionaries-common/words", "r")
        f = g.read()
        g.close()
        w, words = [], []
        l = len(argv[1])
        for w in f.split():
            if all(l in "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" for l in w) and len(w) == l: words.append(w.lower())
        lengths = {}
        for _ in xrange(100000):
            if _ % 100 == 0: print _
            shuffle(words)
            ladder = dodgson(argv[1].lower(), argv[2].lower(), words)
            l = len(ladder)
            if l in lengths: lengths[l] += 1
            else: lengths[l] = 1
            if l <= 6: print ladder
        x = [l for l in lengths]
        x.sort()
        for l in x: print l, lengths[l]
    
  6. […] studied word chains in a previous exercise; for instance, you can convert HEAD to TAIL by the word chain HEAD, HEAL, TEAL, TELL, TALL, TAIL in […]

Leave a comment