Mark V. Shaney

February 27, 2009

The central data structure of the markov chain algorithm is a state table that shows, for any given state, all possible next states. We keep a hash table indexed by the two prior words with a list of possible following words:

(define state-tab (make-hash string-hash string=? '() 4093))

R6RS Scheme provides hash tables, but R5RS Scheme doesn’t; here are portable make-hash and string-hash procedures:

(define (make-hash hash eql? oops size)
  (define (lookup x xs)
    (cond ((null? xs) oops)
          ((eql? x (caar xs)) (cdar xs))
          (else (lookup x (cdr xs)))))
  (let ((table (make-vector size '())))
    (lambda (message key . val)
      (let* ((index (modulo (hash key) size))
             (alist (vector-ref table index)))
        (case message
          ((show) (do ((i 0 (+ i 1))) ((= i size))
                    (display i) (display " ")
                    (display (vector-ref table i))
                    (newline)))
          ((insert!) (vector-set! table index
            (cons (cons key (car val)) alist)))
          ((lookup) (lookup key alist)))))))

(define (string-hash str)
  (let loop ((cs (string->list str)) (s 0))
    (if (null? cs) s
      (loop (cdr cs) (+ (* s 31)
        (char->integer (car cs)))))))

We build the state table by reading the input a word at a time, adding a new entry to the state table for each input word. Read-word includes punctuation, so that the output will have some sentence structure. Make-key concatenates the two prior words with a tab character between them. The build process is initialized by two non-words, and terminated with another non-word.

(define (read-word)
  (let loop ((c (read-char)) (cs '()))
    (cond ((eof-object? c)
            (if (null? cs) c
              (list->string (reverse cs))))
          ((not (char-whitespace? c))
            (loop (read-char) (cons c cs)))
          ((pair? cs) (list->string (reverse cs)))
          (else (loop (read-char) cs)))))

(define (make-key w1 w2)
  (string-append w1 (string #\tab) w2))

(define (build-state-tab filename)
  (with-input-from-file filename
    (lambda ()
      (let loop ((w1 "") (w2 "") (key (make-key "" "")) (w3 (read-word)))
        (if (eof-object? w3)
            (state-tab 'insert! key
              (cons "" (state-tab 'lookup key)))
            (begin (state-tab 'insert! key
                     (cons w3 (state-tab 'lookup key)))
                   (loop w2 w3 (make-key w2 w3) (read-word))))))))

The usual euphemism for generating output is a “random walk” through the markov chain. The walk procedure is initialized with two non-words, so the first two words of output will be the same as the first two words of the training text. But after that, the rand-item procedure will select one of the possibilities at random.

(define rand
  (let ((a 69069) (c 5)
        (m (expt 2 32))
        (seed 17070415))
    (lambda s
      (set! seed (modulo
        (if (pair? s)
            (numerator (inexact->exact (car s)))
            (+ (* a seed) c)) m))
      (/ seed m))))

(define (rand-item xs)
  (let loop ((n 2) (x (car xs)) (xs (cdr xs)))
    (cond ((null? xs) x)
          ((< (rand) (/ n))
            (loop (+ n 1) (car xs) (cdr xs)))
          (else (loop (+ n 1) x (cdr xs))))))

(define (walk n)
  (let loop ((w1 "") (w2 "") (n n) (ws '()))
    (let* ((key (make-key w1 w2))
           (w3 (rand-item (state-tab 'lookup key))))
      (if (or (string=? w3 "") (zero? n))
          (reverse ws)
          (loop w2 w3 (- n 1) (cons w3 ws))))))

Output from walk is a list of words, which are gathered into sixty-character lines by fmt:

(define (fmt ws)
  (let loop ((n 60) (ws ws))
    (cond ((null? ws) (newline))
          ((< n (string-length (car ws)))
            (newline) (loop 60 ws))
          (else (display (car ws)) (display " ")
                (loop (- n (string-length (car ws)) 1) (cdr ws))))))

The main procedure is called shaney:

(define (shaney file-name n)
  (build-state-tab file-name)
  (fmt (walk n)))

Our solution is derived from the third chapter of The Practice of Programming by Brian W. Kernighan and Rob Pike. As they did, we test by training on the Book of Psalms, available from Project Gutenberg:

> (shaney "psalms.txt" 2000)
Happy those whose way is unerring; the LORD'S holy splendor!
The voice of the Most High, to God Most High, his decrees I
did not cast me aside in my distress I called on God's name;
they called on the day and night, as they traced their long
furrows. But the poor may hear and be glad. Magnify the LORD
forever! Amen and amen. The end of the Most High, Who
humbled their hearts are steadfast, trusting the LORD. God
will rescue you, and you took away the guilt of your favor:
make my home. Even at night I meditate in my paths, In a
moment I would have betrayed your people. Though I am
prompt, I do not know. They repay me evil for good, hatred
for my strength are my God. Grant me justice, LORD!

A. K. Dewdney wrote about Shaney in his “Computer Recreations” column in the June 1989 issue of Scientific American, and Google preserves Mark V. Shaney’s posts on Usenet. The complete code, but no example, is given at http://programmingpraxis.codepad.org/YREG9ZSu.

About these ads

Pages: 1 2

8 Responses to “Mark V. Shaney”

  1. Roger said

    That’s really interesting. It gives surprisingly good results. I have sume problems to running your code, though. Some of the brackets seem to be wrong, but even after correcting them the code does not run in Plt Scheme. Which version of Scheme did you use?

  2. programmingpraxis said

    Sorry. It’s fixed now, including a link to codepad.org. Thanks.

  3. [...] 26, 2009 Dictionaries are a common data type, which we have used in several exercises (Mark V. Shaney, Word Frequencies, Dodgson’s Doublets, Anagrams). Hash tables are often used as the [...]

  4. Interesting and funny.

    import qualified Data.Map as M
    import Data.List
    import Control.Applicative ((<$>))
    import System.Random
    import System.Environment (getArgs)
    
    type WordMap = M.Map (String, String) [String]
    
    -- Utilités
    oneOf    :: [a] -> IO a
    oneOf xs = do
      index <- randomRIO (0, limit)
      return (xs !! index)
     where limit = length xs - 1
    
    takeByN       :: Int -> Int -> [a] -> [[a]]
    takeByN _ _ [] = []
    takeByN n step xs = take n xs : takeByN n step (drop step xs)
    
    
    buildWordTable      :: String -> WordMap
    buildWordTable text = foldl' insertTriple M.empty mots
        where mots = takeByN 3 1 $ words text
              insertTriple m [a, b, c] = M.insertWith' (++) (a,b) [c] m
              insertTriple m _         = m
    
    buildMapFromFile   :: FilePath -> IO WordMap
    buildMapFromFile f = do
      content <- readFile f
      return $ buildWordTable content
    
    -- The random walk through the words
    walk         :: (String, String) -> WordMap -> IO [String]
    walk key@(a,b) m = do
      case M.lookup key m of
        Nothing -> return []
        Just ws -> do
                choice <- oneOf ws
                rest <- walk (b, choice) m
                return $ choice : rest
    
    shaney     :: Int -> FilePath -> IO String
    shaney n f = do
      m <- buildMapFromFile f
      [a,b] <- take 2 . words <$>  readFile f
      randomWords <- walk (a,b) m
      return (format $ a:b: take (n-2) randomWords)
          where format ws = unlines $ map unwords (takeByN 10 10 ws)
    
    main = do
      args <- getArgs
      case args of
        [n, f] -> putStrLn =<< shaney (read n) f
        _ -> error "Deux arguments requis."
    
    
  5. mshacker12 said

    I have found what appears to be a markov-cypher(D) post in the weridist place – on a complaints fourm!!!

    This might be the work of shaney:

    http://www.complaints.com/2007/march/29/radiation_90444.htm

    ————————————————————–

    TELL ME WHAT YOU THINK OF THAT??!

    ODD RIGHT?????

  6. This has to be the most fun programming exercise I have ever done.

    The best thing my program said was “Your monkeys ass is in my daughters orange juice.”

    I’m working on some improvements, it should hopefully be finished tomorrow!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 574 other followers

%d bloggers like this: