Anagram Phrases

January 27, 2012

The key to this exercise is the minus function that removes from a list of characters xs all the characters in a list of characters ys, or returns #f if there is a character in ys that is not in xs in sufficient multiplicity:

(define (minus xs ys)
  (let loop ((xs xs) (ys ys) (zs (list)))
    (cond ((null? ys) (append (reverse zs) xs))
          ((null? xs) #f)
          ((char<? (car xs) (car ys))
            (loop (cdr xs) ys (cons (car xs) zs)))
          ((char<? (car ys) (car xs)) #f)
          (else (loop (cdr xs) (cdr ys) zs)))))

With that, we can write a program that repeatedly subtracts candidate words from the diminishing target list until the target list is empty. Instead of backtracking, we use Wadler's "list of successes" technique which appends a new solution, if one is found, or appends a null list, which is ignored in the final output, if a path leads to no solution:

(define (anagrams str words)
  (let* ((cs (sort charlist str)))))
         (ws (relevant cs words)))
    (map (lambda (ws) (sort string<? ws)) (reverse
      (let anagrams ((cs cs) (ws ws))
        (define (cons-x x) (lambda (xs) (cons x xs)))
        (cond ((null? cs) (list (list)))
              ((null? ws) #f)
              ((equal? cs (cdar ws))
                (list (list (caar ws))))
              (else (append
                (let ((tss (anagrams cs (cdr ws))))
                  (if tss tss (list)))
                (let ((ts (minus cs (cdar ws))))
                  (if ts (let ((tss (anagrams ts (cdr ws))))
                           (if tss (map (cons-x (caar ws)) tss) (list)))
                      (list)))))))))))

The fun happens in the named-let anagrams. The first three cond clauses handle the recursive bases. The else clause appends the anagrams that include the current word (the second let clause, which has to handle specially the case where there are no anagrams) to the anagrams that exclude the current word (the first let clause).

This program incorporates an optimization: The original dictionary is condensed to only those relevant words that have all their letters in the target phrase. This optimization is significant because the list of relevant words is frequently dozens of times smaller than the original word list. The function relevant computes the condensed list:

(define (relevant cs words)
  (filter (lambda (w) (minus cs (cdr w))) words))

The dictionary consists of a list of entries of the form ("word" #\d #\o #\r #\w) containing the string form of a word in its car and the individual letters of the word, sorted in ascending order, in its car. We use map-input, filter-input and read-line from the Standard Prelude to generate the list of words:

(define (read-words dict)
  (define (valid? word)
    (and (< 2 (string-length word))
         (all? (and char-alphabetic? char-lower-case?)
               (string->list word))))
  (define (proc word)
    (cons word (sort char<? (string->list word))))
  (map-input (filter-input read-line valid?) proc dict))

We use the Moby list of common English words:

> (define words (read-words "moby.common"))

Here’s an example:

> (anagrams "Programming Praxis" words)
("aga" "grim" "nix" "prop" "rms")
... 10997 lines elided ...
("primp" "ragman" "rig" "sox"")

You can run the program at http://programmingpraxis.codepad.org/CBWq8vut.

Pages: 1 2

2 Responses to “Anagram Phrases”

  1. […] exercise taken from programmingpraxis is to generate all anagrams from a phrase, e.g “hey man” -> “may hen” […]

Leave a comment