The Daily Cryptogram

July 14, 2009

We represent the key as a string of twenty-six upper-case letters; the first letter is the cipher-text representation of the plain-text letter A, the second letter is the cipher-text representation of the plain-text letter B, and so on, until the twenty-sixth letter is the cipher-text representation of the plain-text letter Z. Unknown letters are represented by the underscore character \#_, so the null key is

(define null-key (make-string 26 #\_))

Instead of specifying a full twenty-six letter key, it is common to give the key as a word or phrase, such as THE EAGLE HAS LANDED. That pass-phrase can be made into a key by writing out the unique letters in the phrase, THEAGLSND, followed by the remaining letters of the alphabet. But that approach has a weakness that can be easily exploited by a cryptanalyst: the last several letters of the alphabet will likely represent themselves. So another approach is to continue the alphabet at the next letter after the end of the key phrase, which in the example is F (the first unused letter after D), followed by IJKMOPQRUVWXYZ, followed by those letters that are still unused, BC. The make-key function converts a key phrase to a key; it can build the key in either of the two styles shown above just by commenting or uncommenting a single line of code:

(define (make-key pass-phrase)
  (define (c->z c)
    (map integer->char
      (range (char->integer c) 91)))
  (define (add-if xs ys)
    (let loop ((xs xs) (ys ys))
      (cond ((null? xs) ys)
            ((not (char-alphabetic? (car xs))) (loop (cdr xs) ys))
            ((member (car xs) ys) (loop (cdr xs) ys))
            (else (loop (cdr xs) (cons (car xs) ys))))))
  (let* ((ks (add-if (map char-upcase (string->list pass-phrase)) '()))
         ; uncomment next line for stronger key
         ; (ks (add-if (c->z (car ks)) ks))
         (ks (add-if (c->z #\A) ks)))
    (list->string (reverse ks))))

Now that we can create keys, the next step is to encipher and decipher messages. Functions p->c and c->p convert single letters between plain-text and cipher-text; p->c simply indexes into the key, but c->p must examine each letter of the key until it finds the requested letter:

(define (p->c key p)
  (if (not (char-alphabetic? p)) p
    (string-ref key (- (char->integer (char-upcase p)) 65))))

(define (c->p key c)
  (if (not (char-alphabetic? c)) c
    (let loop ((c-index 25))
      (cond ((negative? c-index) #\_)
            ((char=? (string-ref key c-index) c)
              (integer->char (+ c-index 65)))
            (else (loop (- c-index 1)))))))

Encipher and decipher call p->c and c->p to operate on strings:

(define (encipher key plain-text)
  (list->string
    (map (left-section p->c key)
      (string->list plain-text))))

(define (decipher key cipher-text)
  (list->string
    (map (left-section c->p key)
      (string->list cipher-text))))

As an example, the plain-text THAT’S ONE SMALL STEP FOR MAN, ONE GIANT LEAP FOR MANKIND is enciphered using the key THE EAGLE HAS LANDED as

> (encipher (make-key "THE EAGLE HAS LANDED")
    "THAT'S ONE SMALL STEP FOR MAN, ONE GIANT LEAP FOR MANKIND")
"VNTV'U OMG UKTJJ UVGP LOR KTM, OMG SDTMV JGTP LOR KTMIDMA"

We are finally ready to begin cryptanalysis. The normal tool for cryptanalysis of substitution ciphers is frequency analysis of single letters or of small groups of letters, especially digrams and trigrams. (freq text) performs frequency analysis of single letters, and (freq n text) does the same for groups of n letters:

(define (freq . args)
  (let* ((bite (if (= (length args) 1) 1 (car args)))
         (text (if (= (length args) 1) (car args) (cadr args))))
    (let loop ((ts (string->list text)) (fs '()))
      (if (< (length ts) bite)
          (sort (lambda (a b)
                  (cond ((< (cdr b) (cdr a)) #t)
                        ((< (cdr a) (cdr b)) #f)
                        (else (string<? (car a) (car b)))))
            (uniq-c string=? (sort string<? fs)))
          (loop (cdr ts) (cons (list->string (take bite ts)) fs))))))

Frequency analysis of single letters works fairly well on the unknown text, showing a clear winner; we can associate the cipher-text A with the plain-text E, and the cipher-text Y, E and P are most likely the plain-text T, A and O, in some order:

> (freq "P OYUUAOEXYW YM AEFGAD, FZGPEAG JAATUL, MYC ENA AGFOPEXYW PWG AWSYLVAWE YM ENA DPHHL ZCYICPVVAC")
((" " . 14) ("A" . 12) ("Y" . 8) ("E" . 7) ("P" . 6)
 ("G" . 5) ("W" . 5) ("C" . 4) ("F" . 3) ("L" . 3) ("M" . 3)
 ("O" . 3) ("U" . 3) ("V" . 3) ("," . 2) ("D" . 2) ("H" . 2)
 ("N" . 2) ("X" . 2) ("Z" . 2) ("\n" . 1) ("I" . 1) ("J" . 1)
 ("S" . 1) ("T" . 1))

Frequency analysis is helpful for manually solving monoalphabetic ciphers, but because the daily cryptogram preserves word length and punctuation, including intra-word apostrophes, a better approach is a dictionary attack. For instance, regardless of the key, the cipher-text OYUUAOEXYW only corresponds to two dictionary words, COLLECTION and CORRECTION. And, if the cipher-text X corresponds to plain-text I, then the first word of the cryptogram, P, must be A. The third word, YM, starts with O, and can’t be ON, which suggests that it is either OF or OR. And so on.

To perform a dictionary attack, we write a function dict that takes a partial key, a cipher word, and a list of dictionary words and returns a list of dictionary words that could possibly match the cipher word. For instance, we want (dict null-key "OYUUAOEXYW") to return the list ("COLLECTION" "CORRECTION"). Here is dict:

(define (dict key cipher-text)
  (let* ((cs (map char-upcase (string->list cipher-text)))
         (len (length cs)))
    (let loop ((words words) (ws '()))
      (cond ((null? words) (reverse ws))
            ((not (= (string-length (car words)) len))
              (loop (cdr words) ws))
            ((match? key cs (string->list (car words)))
              (loop (cdr words) (cons (car words) ws)))
            (else (loop (cdr words) ws))))))

Dict is simple, just looping through the dictionary words reporting all the matches it finds. The real work is done by match?. XXX DESCRIBE THIS XXX

(define (match? key cs ws)
  (let loop ((key key) (cs cs) (ws ws))
    (cond ((null? cs) #t)
          ((and (not (char-alphabetic? (car cs)))
                (not (char-alphabetic? (car ws)))
                (char=? (car cs) (car ws)))
            (loop key (cdr cs) (cdr ws)))
          ((or (not (char-alphabetic? (car cs)))
               (not (char-alphabetic? (car ws)))) #f)
          ((char=? (c->p key (car cs)) (car ws))
            (loop key (cdr cs) (cdr ws)))
          ((char=? (p->c key (car ws)) (car cs))
            (loop key (cdr cs) (cdr ws)))
          ((and (char=? (c->p key (car cs)) #\_)
                (char=? (p->c key (car ws)) #\_))
            (loop (add-key key (list (cons (car ws) (car cs)))) (cdr cs) (cdr ws)))
          (else #f))))

Of course, we also need a dictionary, which we store in the global variable words:

(define words
  (map string-upcase
    (sort string<?
      (unique string=?
        (fold-input read-line (lambda (d a) (cons a d)) '()
          "/usr/dict/words")))))

Now the rest of the program is straight forward, though somewhat long and tedious. String->words returns a list of words in a cryptogram, removing the punctuation and carefully preserving intra-word apostrophes. Enhance-key adds to an existing key the new cipher/plain pairings found in a tentatively-decoded word. Solve loops over a list of words, at each word keeping a list of all possible keys; it calls next, which returns the next possible set of keys given a single cipher-text word, and nexts, which calls next to return the next possible set of keys for all the cipher-text words in a list. Sort-by-counts is a heuristic that orders a list of words by the number of possibilities to which each word can be deciphered, which minimizes the size of the cross-products formed. Cryptogram is the top-level function that wraps the others; it returns a list of possible decipherments of a cipher-text.

(define (string->words str)
  (let loop ((cs (string->list str)) (w '()) (ws '()))
    (cond ((null? cs)
            (reverse
              (if (null? w) ws
                (cons (list->string (reverse w)) ws))))
          ((or (char-alphabetic? (car cs)) (char=? (car cs) #\'))
            (loop (cdr cs) (cons (car cs) w) ws))
          ((pair? w) (loop (cdr cs) '() (cons (list->string (reverse w)) ws)))
          (else (loop (cdr cs) w ws)))))

(define (enhance-key key cipher-text plain-text)
  (let ((key (string-copy key))
        (cipher-text (string-upcase cipher-text))
        (plain-text (string-upcase plain-text)))
    (do ((i 0 (+ i 1))) ((= i (string-length plain-text)))
      (if (char-alphabetic? (string-ref plain-text i))
          (string-set! key (- (char->integer (string-ref plain-text i)) 65)
            (string-ref cipher-text i))))
    key))

(define (solve cs)
  (define (next k c)
    (map (lambda (w) (enhance-key k c w)) (dict k c)))
  (define (nexts ks c)
    (apply append (map (lambda (k) (next k c)) ks)))
  (let loop ((cs cs) (ks (list null-key)))
    ; (display cs) (display (length ks)) (newline)
    (if (null? cs) ks
      (loop (cdr cs) (nexts ks (car cs))))))

(define (sort-by-counts words)
  (map car (sort (lambda (x y) (< (cdr x) (cdr y)))
    (map (lambda (w) (cons w (length (dict null-key w)))) words))))

(define (cryptogram cipher)
  (unique string=? (sort string<?
    (map (lambda (k) (decipher k cipher))
      (solve (sort-by-counts (string->words cipher)))))))

Here you can see the cryptogram solver in action:

> (cryptogram "P OYUUAOEXYW YM AEFGAD, FZGPEAG JAATUL, MYC ENA AGFOPEXYW PWG AWSYLVAWE YM ENA DPHHL ZCYICPVVAC")
("A COLLECTION OF ETUDES, UPDATED WEEKLY, FOR THE EDUCATION AND ENJOYMENT OF THE SAVVY PROGRAMMER")

The pass-phrase was PROGRAMMING PRAXIS.

Cryptogram doesn’t always work. It will never find a word that is not in the dictionary. It forms the cross-product of all possible keys for all the words in the cryptogram, and if that is too large, or if the heuristic is ineffective, or if early words don’t significantly reduce the keyspace, it will exhaust memory, leading to failure. And some messages offer more than one solution; for instance, you’ll have to figure out for yourself which of these is correct:

> (cryptogram "VNTV'U OMG UKTJJ UVGP LOR KTM, OMG SDTMV JGTP LOR KTMIDMA")
("THAT'S ONE SMALL STEP BOW MAN, ONE GIANT LEAP BOW MANKIND"
  "THAT'S ONE SMALL STEP BOX MAN, ONE GIANT LEAP BOX MANKIND"
  "THAT'S ONE SMALL STEP BOY MAN, ONE GIANT LEAP BOY MANKIND"
  "THAT'S ONE SMALL STEP COB MAN, ONE GIANT LEAP COB MANKIND"
  "THAT'S ONE SMALL STEP COW MAN, ONE GIANT LEAP COW MANKIND"
  "THAT'S ONE SMALL STEP COX MAN, ONE GIANT LEAP COX MANKIND"
  "THAT'S ONE SMALL STEP COY MAN, ONE GIANT LEAP COY MANKIND"
  "THAT'S ONE SMALL STEP FOB MAN, ONE GIANT LEAP FOB MANKIND"
  "THAT'S ONE SMALL STEP FOR MAN, ONE GIANT LEAP FOR MANKIND"
  "THAT'S ONE SMALL STEP FOX MAN, ONE GIANT LEAP FOX MANKIND"
  "THAT'S ONE SMALL STEP JOB MAN, ONE GIANT LEAP JOB MANKIND"
  "THAT'S ONE SMALL STEP JOY MAN, ONE GIANT LEAP JOY MANKIND"
  "THAT'S ONE SMALL STEP ROB MAN, ONE GIANT LEAP ROB MANKIND"
  "THAT'S ONE SMALL STEP ROW MAN, ONE GIANT LEAP ROW MANKIND"
  "THAT'S ONE SMALL STEP ROY MAN, ONE GIANT LEAP ROY MANKIND"
  "THAT'S ONE SMALL STEP VOW MAN, ONE GIANT LEAP VOW MANKIND"
  "THAT'S ONE SMALL STEP YOU MAN, ONE GIANT LEAP YOU MANKIND")

We relied strongly on the Standard Prelude: take, range, string-upcase, read-line, fold-input, sort, unique, uniq-c and left-section. The code is collected at http://programmingpraxis.codepad.org/GBXcEPTy, but you can’t run it, since there is no dictionary.

About these ads

Pages: 1 2

4 Responses to “The Daily Cryptogram”

  1. [...] Praxis – The Daily Cryptogram By Remco Niemeijer Today‚Äôs Programming Praxis problem is an interesting one: we have to write a program to solve [...]

  2. Remco Niemeijer said

    First a small notification: I’ll be on vacation the next 4 weeks, so I won’t be able to post any solutions in that period. I should be back for puzzle 59.

    My Haskell solution (see http://bonsaicode.wordpress.com/2009/07/14/programming-praxis-the-daily-cryptogram/ for a version with comments):

    import Data.Char
    import Data.List
    import qualified Data.Map as M
    import GHC.Exts
    
    match :: M.Map Char String -> String -> String -> Bool
    match _ []     []     = True
    match k (c:cs) (p:ps) = M.findWithDefault [p] c k == [p] &&
                            match (M.insert c [p] k) cs ps
    match _ _      _      = False
    
    substitute :: M.Map Char String -> String -> String
    substitute key = concatMap (\c -> M.findWithDefault [c] c key)
    
    getKeys :: M.Map Char String -> [String] -> String -> [M.Map Char String]
    getKeys k dict c = [M.fromList .
        unionBy (\(a,x) (b,y) -> a == b || x == y) (M.assocs k) .
        zip w $ map return p | w <- words c,
        let ps = filter (match k w) dict, length ps < 100, p <- ps]
    
    score :: M.Map Char String -> String -> Int
    score k = negate . length . filter isLower . substitute k
    
    findBestKeys :: Int -> [String] -> String -> [M.Map Char String]
    findBestKeys n dict c = iterate (take 10 . sortWith (`score` c) .
        concatMap (\k -> getKeys k dict c)) [M.empty] !! n
    
    solve :: Int -> [String] -> String -> [String]
    solve n dict c = map (`substitute` c) $ findBestKeys n dict $
                     filter (\x -> isAlpha x || isSpace x) c
    
    crypto = "P OYUUAOEXYW YM AEFGAD, FZGPEAG JAATUL, MYC ENA " ++
             "AGFOPEXYW PWG AWSYLVAWE YM ENA DPHHL ZCYICPVVAC"
    
    main :: IO ()
    main = do dict <- fmap lines $ readFile "english-words.10"
              mapM_ print $ solve 7 dict crypto
    
  3. Several years ago, I got involved in Simon Singh’s Code Challenge, and wrote a number of programs to try to crack classic ciphers. As part of this, I wrote a program which used a genetic algorithm to crack cryptograms. In many ways, it’s kind of crude: it doesn’t use a dictionary of english words, and ignored punctuation and word spacing. What it does use is a table containing the logarithm of the probability of each trigram for a corpus of text I analyzed (don’t remember what I used, maybe Sherlock Holmes or the Bible or something from Project Gutenberg). For longer cryptograms, it almost always works perfectly. It mucked up a couple of letters on this one, which is short and includes a fairly diverse set of trigrams, but the message was very easy to read even with these few mistakes in assigning low probability letters.

    Here’s the Python source. For some reason, I can’t find the code which generated the pickled “corpus”, but it shouldn’t be too hard to figure out. Perhaps I’ll rewrite one during lunch. The code ia also strictly speaking not an “algorithm”, since doesn’t actually stop: it just keeps running, trying to maximize the score, and does nothing to detect stasis. But you might find it amusing.

    [ EDIT: I put the word 'python' in single quotes in the sourcecode tag; without the single quotes, the tag is not recognized and the formatting is incorrect. I also fixed some ampersand-coded html characters. I hope this is correct now. ProgPrax ]

    #!/u0/markv/my-python/bin/python
    #                   _                            
    #  __ _ _ _  _ _ __| |_ ___  __ _ _ _ __ _ _ __  
    # / _| '_| || | '_ \  _/ _ \/ _` | '_/ _` | '  \ 
    # \__|_|  \_, | .__/\__\___/\__, |_| \__,_|_|_|_|
    #         |__/|_|           |___/                
    # 
    # A simple program for automatically solving simple substitution ciphers
    # 
    # Written by Mark VandeWettering
    
    
    import pickle
    import random
    import string
    import re
    import sys
    
    ldict = pickle.load(open('newcorpus.p'))
    
    letters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
    
    sorig = open(sys.argv[1]).read().upper()
    
    swork = re.sub(r"[^A-Z\!?.,']+", " ", sorig) 
    swork = filter(lambda x : x in r"ABCDEFGHIJKLMNOPQRSTUVWXYZ!?.,' ", swork)
    
    print "SWORK: ", swork
    
    POOLSIZE = 1000
    
    pool = []
    
    for p in range(POOLSIZE):
        a = list(letters)
        random.shuffle(a)
        pool.append(''.join(a))
    
    def calculatescore(x):
        s = 0 
        for i in range(len(x)-3):
    	s = s + ldict.get(x[i:i+3], 0)
        return s 
    
    class BogusError(Exception):
    	pass
    
    def mutate(str):
        if random.uniform(0, 1) < 0.01:
    	a = list(letters)
    	random.shuffle(a)
    	return ''.join(a)
        s = list(str)
        c0 = random.choice(range(26))
        c1 = random.choice(range(26))
        s[c0], s[c1] = s[c1], s[c0] 
        return ''.join(s)
    
    def crossover(pop, mom):
        pop = list(pop)
        mom = list(mom)
        c0 = random.randint(0, 13)
        c1 = random.randint(1, 13)
        c1 = c0 + c1 
        child = pop[:]
        for idx in range(c0, c1):
    	idx2 = pop.index(mom[idx])
    	child[idx], child[idx2] = child[idx2], child[idx]
        child = ''.join(child)
    
        for c in string.uppercase:
    	if c not in child:
    	    raise BogusError
    
        return child
    
    
    def sumpool(pool):
        total = 0 
        sums = []
    
        for score, key in pool:
    	total = total + score
    	sums.append(total)
    
        return total, sums
    
    import bisect
    
    def pick(total, l):
        idx = bisect.bisect_right(l, random.randint(0, total))
        return idx 
    
    while True:
        newpool = []
        for p in pool:
    	twork = string.translate(swork, string.maketrans(letters, p))
    	score = calculatescore(twork)
    	newpool.append((score, p))
        newpool.sort(lambda x, y: cmp(y,x))
    
        newpool = newpool[:POOLSIZE/2]
    
        # now, generate the distribution 
    
        total, sums = sumpool(newpool)
        newpool = map(lambda x: x[1], newpool)
    
        torig = string.translate(sorig, string.maketrans(letters, newpool[0]))
        
        print "Best Decode score = %d" % sums[0]
        print torig
        
        for x in range(POOLSIZE/2):
    
    	idx = pick(total, sums)
    	if (random.uniform(0, 1) < 0.15):
    	    newpool.append(mutate(newpool[idx]))
    	else:
    	    idx2 = pick(total, sums)
    	    newpool.append(crossover(newpool[idx], newpool[idx2]))
    
        pool = newpool 
    
  4. Pete Bevin said

    FYI, words 4 and 5 (AEFGAD and FZGPEAG) aren’t in /usr/share/dict/words on Mac. Only word 4 is missing on Ubuntu.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 621 other followers

%d bloggers like this: