## The Daily Cryptogram

### July 14, 2009

We represent the key as a string of twenty-six upper-case letters; the first letter is the cipher-text representation of the plain-text letter A, the second letter is the cipher-text representation of the plain-text letter B, and so on, until the twenty-sixth letter is the cipher-text representation of the plain-text letter Z. Unknown letters are represented by the underscore character `\#_`, so the null key is

`(define null-key (make-string 26 #\_))`

Instead of specifying a full twenty-six letter key, it is common to give the key as a word or phrase, such as THE EAGLE HAS LANDED. That pass-phrase can be made into a key by writing out the unique letters in the phrase, THEAGLSND, followed by the remaining letters of the alphabet. But that approach has a weakness that can be easily exploited by a cryptanalyst: the last several letters of the alphabet will likely represent themselves. So another approach is to continue the alphabet at the next letter after the end of the key phrase, which in the example is F (the first unused letter after D), followed by IJKMOPQRUVWXYZ, followed by those letters that are still unused, BC. The `make-key` function converts a key phrase to a key; it can build the key in either of the two styles shown above just by commenting or uncommenting a single line of code:

```(define (make-key pass-phrase)   (define (c->z c)     (map integer->char       (range (char->integer c) 91)))   (define (add-if xs ys)     (let loop ((xs xs) (ys ys))       (cond ((null? xs) ys)             ((not (char-alphabetic? (car xs))) (loop (cdr xs) ys))             ((member (car xs) ys) (loop (cdr xs) ys))             (else (loop (cdr xs) (cons (car xs) ys))))))   (let* ((ks (add-if (map char-upcase (string->list pass-phrase)) '()))          ; uncomment next line for stronger key          ; (ks (add-if (c->z (car ks)) ks))          (ks (add-if (c->z #\A) ks)))     (list->string (reverse ks))))```

Now that we can create keys, the next step is to encipher and decipher messages. Functions `p->c` and `c->p` convert single letters between plain-text and cipher-text; `p->c` simply indexes into the key, but `c->p` must examine each letter of the key until it finds the requested letter:

```(define (p->c key p)   (if (not (char-alphabetic? p)) p     (string-ref key (- (char->integer (char-upcase p)) 65))))```

```(define (c->p key c)   (if (not (char-alphabetic? c)) c     (let loop ((c-index 25))       (cond ((negative? c-index) #\_)             ((char=? (string-ref key c-index) c)               (integer->char (+ c-index 65)))             (else (loop (- c-index 1)))))))```

`Encipher` and `decipher` call `p->c` and `c->p` to operate on strings:

```(define (encipher key plain-text)   (list->string     (map (left-section p->c key)       (string->list plain-text))))```

```(define (decipher key cipher-text)   (list->string     (map (left-section c->p key)       (string->list cipher-text))))```

As an example, the plain-text THAT’S ONE SMALL STEP FOR MAN, ONE GIANT LEAP FOR MANKIND is enciphered using the key THE EAGLE HAS LANDED as

```> (encipher (make-key "THE EAGLE HAS LANDED")     "THAT'S ONE SMALL STEP FOR MAN, ONE GIANT LEAP FOR MANKIND") "VNTV'U OMG UKTJJ UVGP LOR KTM, OMG SDTMV JGTP LOR KTMIDMA"```

We are finally ready to begin cryptanalysis. The normal tool for cryptanalysis of substitution ciphers is frequency analysis of single letters or of small groups of letters, especially digrams and trigrams. `(freq `text`)` performs frequency analysis of single letters, and `(freq `n` `text`)` does the same for groups of n letters:

```(define (freq . args)   (let* ((bite (if (= (length args) 1) 1 (car args)))          (text (if (= (length args) 1) (car args) (cadr args))))     (let loop ((ts (string->list text)) (fs '()))       (if (< (length ts) bite)           (sort (lambda (a b)                   (cond ((< (cdr b) (cdr a)) #t)                         ((< (cdr a) (cdr b)) #f)                         (else (string<? (car a) (car b)))))             (uniq-c string=? (sort string<? fs)))           (loop (cdr ts) (cons (list->string (take bite ts)) fs))))))```

Frequency analysis of single letters works fairly well on the unknown text, showing a clear winner; we can associate the cipher-text A with the plain-text E, and the cipher-text Y, E and P are most likely the plain-text T, A and O, in some order:

```> (freq "P OYUUAOEXYW YM AEFGAD, FZGPEAG JAATUL, MYC ENA AGFOPEXYW PWG AWSYLVAWE YM ENA DPHHL ZCYICPVVAC") ((" " . 14) ("A" . 12) ("Y" . 8) ("E" . 7) ("P" . 6)  ("G" . 5) ("W" . 5) ("C" . 4) ("F" . 3) ("L" . 3) ("M" . 3)  ("O" . 3) ("U" . 3) ("V" . 3) ("," . 2) ("D" . 2) ("H" . 2)  ("N" . 2) ("X" . 2) ("Z" . 2) ("\n" . 1) ("I" . 1) ("J" . 1)  ("S" . 1) ("T" . 1))```

Frequency analysis is helpful for manually solving monoalphabetic ciphers, but because the daily cryptogram preserves word length and punctuation, including intra-word apostrophes, a better approach is a dictionary attack. For instance, regardless of the key, the cipher-text OYUUAOEXYW only corresponds to two dictionary words, COLLECTION and CORRECTION. And, if the cipher-text X corresponds to plain-text I, then the first word of the cryptogram, P, must be A. The third word, YM, starts with O, and can’t be ON, which suggests that it is either OF or OR. And so on.

To perform a dictionary attack, we write a function `dict` that takes a partial key, a cipher word, and a list of dictionary words and returns a list of dictionary words that could possibly match the cipher word. For instance, we want `(dict null-key "OYUUAOEXYW")` to return the list `("COLLECTION" "CORRECTION")`. Here is `dict`:

```(define (dict key cipher-text)   (let* ((cs (map char-upcase (string->list cipher-text)))          (len (length cs)))     (let loop ((words words) (ws '()))       (cond ((null? words) (reverse ws))             ((not (= (string-length (car words)) len))               (loop (cdr words) ws))             ((match? key cs (string->list (car words)))               (loop (cdr words) (cons (car words) ws)))             (else (loop (cdr words) ws))))))```

`Dict` is simple, just looping through the dictionary `words` reporting all the matches it finds. The real work is done by `match?`. XXX DESCRIBE THIS XXX

```(define (match? key cs ws)   (let loop ((key key) (cs cs) (ws ws))     (cond ((null? cs) #t)           ((and (not (char-alphabetic? (car cs)))                 (not (char-alphabetic? (car ws)))                 (char=? (car cs) (car ws)))             (loop key (cdr cs) (cdr ws)))           ((or (not (char-alphabetic? (car cs)))                (not (char-alphabetic? (car ws)))) #f)           ((char=? (c->p key (car cs)) (car ws))             (loop key (cdr cs) (cdr ws)))           ((char=? (p->c key (car ws)) (car cs))             (loop key (cdr cs) (cdr ws)))           ((and (char=? (c->p key (car cs)) #\_)                 (char=? (p->c key (car ws)) #\_))             (loop (add-key key (list (cons (car ws) (car cs)))) (cdr cs) (cdr ws)))           (else #f))))```

Of course, we also need a dictionary, which we store in the global variable `words`:

```(define words   (map string-upcase     (sort string<?       (unique string=?         (fold-input read-line (lambda (d a) (cons a d)) '()           "/usr/dict/words")))))```

Now the rest of the program is straight forward, though somewhat long and tedious. `String->words` returns a list of words in a cryptogram, removing the punctuation and carefully preserving intra-word apostrophes. `Enhance-key` adds to an existing key the new cipher/plain pairings found in a tentatively-decoded word. `Solve` loops over a list of words, at each word keeping a list of all possible keys; it calls `next`, which returns the next possible set of keys given a single cipher-text word, and `nexts`, which calls `next` to return the next possible set of keys for all the cipher-text words in a list. `Sort-by-counts` is a heuristic that orders a list of words by the number of possibilities to which each word can be deciphered, which minimizes the size of the cross-products formed. `Cryptogram` is the top-level function that wraps the others; it returns a list of possible decipherments of a cipher-text.

```(define (string->words str)   (let loop ((cs (string->list str)) (w '()) (ws '()))     (cond ((null? cs)             (reverse               (if (null? w) ws                 (cons (list->string (reverse w)) ws))))           ((or (char-alphabetic? (car cs)) (char=? (car cs) #\'))             (loop (cdr cs) (cons (car cs) w) ws))           ((pair? w) (loop (cdr cs) '() (cons (list->string (reverse w)) ws)))           (else (loop (cdr cs) w ws)))))```

```(define (enhance-key key cipher-text plain-text)   (let ((key (string-copy key))         (cipher-text (string-upcase cipher-text))         (plain-text (string-upcase plain-text)))     (do ((i 0 (+ i 1))) ((= i (string-length plain-text)))       (if (char-alphabetic? (string-ref plain-text i))           (string-set! key (- (char->integer (string-ref plain-text i)) 65)             (string-ref cipher-text i))))     key))```

```(define (solve cs)   (define (next k c)     (map (lambda (w) (enhance-key k c w)) (dict k c)))   (define (nexts ks c)     (apply append (map (lambda (k) (next k c)) ks)))   (let loop ((cs cs) (ks (list null-key)))     ; (display cs) (display (length ks)) (newline)     (if (null? cs) ks       (loop (cdr cs) (nexts ks (car cs))))))```

```(define (sort-by-counts words)   (map car (sort (lambda (x y) (< (cdr x) (cdr y)))     (map (lambda (w) (cons w (length (dict null-key w)))) words))))```

```(define (cryptogram cipher)   (unique string=? (sort string<?     (map (lambda (k) (decipher k cipher))       (solve (sort-by-counts (string->words cipher)))))))```

Here you can see the cryptogram solver in action:

```> (cryptogram "P OYUUAOEXYW YM AEFGAD, FZGPEAG JAATUL, MYC ENA AGFOPEXYW PWG AWSYLVAWE YM ENA DPHHL ZCYICPVVAC") ("A COLLECTION OF ETUDES, UPDATED WEEKLY, FOR THE EDUCATION AND ENJOYMENT OF THE SAVVY PROGRAMMER")```

The pass-phrase was PROGRAMMING PRAXIS.

`Cryptogram` doesn’t always work. It will never find a word that is not in the dictionary. It forms the cross-product of all possible keys for all the words in the cryptogram, and if that is too large, or if the heuristic is ineffective, or if early words don’t significantly reduce the keyspace, it will exhaust memory, leading to failure. And some messages offer more than one solution; for instance, you’ll have to figure out for yourself which of these is correct:

> (cryptogram "VNTV'U OMG UKTJJ UVGP LOR KTM, OMG SDTMV JGTP LOR KTMIDMA")
("THAT'S ONE SMALL STEP BOW MAN, ONE GIANT LEAP BOW MANKIND"
"THAT'S ONE SMALL STEP BOX MAN, ONE GIANT LEAP BOX MANKIND"
"THAT'S ONE SMALL STEP BOY MAN, ONE GIANT LEAP BOY MANKIND"
"THAT'S ONE SMALL STEP COB MAN, ONE GIANT LEAP COB MANKIND"
"THAT'S ONE SMALL STEP COW MAN, ONE GIANT LEAP COW MANKIND"
"THAT'S ONE SMALL STEP COX MAN, ONE GIANT LEAP COX MANKIND"
"THAT'S ONE SMALL STEP COY MAN, ONE GIANT LEAP COY MANKIND"
"THAT'S ONE SMALL STEP FOB MAN, ONE GIANT LEAP FOB MANKIND"
"THAT'S ONE SMALL STEP FOR MAN, ONE GIANT LEAP FOR MANKIND"
"THAT'S ONE SMALL STEP FOX MAN, ONE GIANT LEAP FOX MANKIND"
"THAT'S ONE SMALL STEP JOB MAN, ONE GIANT LEAP JOB MANKIND"
"THAT'S ONE SMALL STEP JOY MAN, ONE GIANT LEAP JOY MANKIND"
"THAT'S ONE SMALL STEP ROB MAN, ONE GIANT LEAP ROB MANKIND"
"THAT'S ONE SMALL STEP ROW MAN, ONE GIANT LEAP ROW MANKIND"
"THAT'S ONE SMALL STEP ROY MAN, ONE GIANT LEAP ROY MANKIND"
"THAT'S ONE SMALL STEP VOW MAN, ONE GIANT LEAP VOW MANKIND"
"THAT'S ONE SMALL STEP YOU MAN, ONE GIANT LEAP YOU MANKIND")

We relied strongly on the Standard Prelude: take, range, string-upcase, read-line, fold-input, sort, unique, uniq-c and left-section. The code is collected at http://programmingpraxis.codepad.org/GBXcEPTy, but you can’t run it, since there is no dictionary.

Pages: 1 2

### 4 Responses to “The Daily Cryptogram”

1. […] Praxis – The Daily Cryptogram By Remco Niemeijer Today’s Programming Praxis problem is an interesting one: we have to write a program to solve […]

2. Remco Niemeijer said

First a small notification: I’ll be on vacation the next 4 weeks, so I won’t be able to post any solutions in that period. I should be back for puzzle 59.

import Data.Char
import Data.List
import qualified Data.Map as M
import GHC.Exts

match :: M.Map Char String -> String -> String -> Bool
match _ [] [] = True
match k (c:cs) (p:ps) = M.findWithDefault [p] c k == [p] &&
match (M.insert c [p] k) cs ps
match _ _ _ = False

substitute :: M.Map Char String -> String -> String
substitute key = concatMap (\c -> M.findWithDefault [c] c key)

getKeys :: M.Map Char String -> [String] -> String -> [M.Map Char String]
getKeys k dict c = [M.fromList .
unionBy (\(a,x) (b,y) -> a == b || x == y) (M.assocs k) .
zip w \$ map return p | w <- words c, let ps = filter (match k w) dict, length ps < 100, p <- ps] score :: M.Map Char String -> String -> Int
score k = negate . length . filter isLower . substitute k

findBestKeys :: Int -> [String] -> String -> [M.Map Char String]
findBestKeys n dict c = iterate (take 10 . sortWith (`score` c) .
concatMap (\k -> getKeys k dict c)) [M.empty] !! n

solve :: Int -> [String] -> String -> [String]
solve n dict c = map (`substitute` c) \$ findBestKeys n dict \$
filter (\x -> isAlpha x || isSpace x) c

crypto = “P OYUUAOEXYW YM AEFGAD, FZGPEAG JAATUL, MYC ENA ” ++
“AGFOPEXYW PWG AWSYLVAWE YM ENA DPHHL ZCYICPVVAC”

main :: IO ()
main = do dict <- fmap lines \$ readFile "english-words.10" mapM_ print \$ solve 7 dict crypto [/sourcecode]

3. Several years ago, I got involved in Simon Singh’s Code Challenge, and wrote a number of programs to try to crack classic ciphers. As part of this, I wrote a program which used a genetic algorithm to crack cryptograms. In many ways, it’s kind of crude: it doesn’t use a dictionary of english words, and ignored punctuation and word spacing. What it does use is a table containing the logarithm of the probability of each trigram for a corpus of text I analyzed (don’t remember what I used, maybe Sherlock Holmes or the Bible or something from Project Gutenberg). For longer cryptograms, it almost always works perfectly. It mucked up a couple of letters on this one, which is short and includes a fairly diverse set of trigrams, but the message was very easy to read even with these few mistakes in assigning low probability letters.

Here’s the Python source. For some reason, I can’t find the code which generated the pickled “corpus”, but it shouldn’t be too hard to figure out. Perhaps I’ll rewrite one during lunch. The code ia also strictly speaking not an “algorithm”, since doesn’t actually stop: it just keeps running, trying to maximize the score, and does nothing to detect stasis. But you might find it amusing.

[ EDIT: I put the word ‘python’ in single quotes in the sourcecode tag; without the single quotes, the tag is not recognized and the formatting is incorrect. I also fixed some ampersand-coded html characters. I hope this is correct now. ProgPrax ]

#!/u0/markv/my-python/bin/python
# _
# __ _ _ _ _ _ __| |_ ___ __ _ _ _ __ _ _ __
# / _| ‘_| || | ‘_ \ _/ _ \/ _` | ‘_/ _` | ‘ \
# \__|_| \_, | .__/\__\___/\__, |_| \__,_|_|_|_|
# |__/|_| |___/
#
# A simple program for automatically solving simple substitution ciphers
#
# Written by Mark VandeWettering

import pickle
import random
import string
import re
import sys

letters = “ABCDEFGHIJKLMNOPQRSTUVWXYZ”

swork = re.sub(r”[^A-Z\!?.,’]+”, ” “, sorig)
swork = filter(lambda x : x in r”ABCDEFGHIJKLMNOPQRSTUVWXYZ!?.,’ “, swork)

print “SWORK: “, swork

POOLSIZE = 1000

pool = []

for p in range(POOLSIZE):
a = list(letters)
random.shuffle(a)
pool.append(”.join(a))

def calculatescore(x):
s = 0
for i in range(len(x)-3):
s = s + ldict.get(x[i:i+3], 0)
return s

class BogusError(Exception):
pass

def mutate(str):
if random.uniform(0, 1) < 0.01: a = list(letters) random.shuffle(a) return ''.join(a) s = list(str) c0 = random.choice(range(26)) c1 = random.choice(range(26)) s[c0], s[c1] = s[c1], s[c0] return ''.join(s) def crossover(pop, mom): pop = list(pop) mom = list(mom) c0 = random.randint(0, 13) c1 = random.randint(1, 13) c1 = c0 + c1 child = pop[:] for idx in range(c0, c1): idx2 = pop.index(mom[idx]) child[idx], child[idx2] = child[idx2], child[idx] child = ''.join(child) for c in string.uppercase: if c not in child: raise BogusError return child def sumpool(pool): total = 0 sums = [] for score, key in pool: total = total + score sums.append(total) return total, sums import bisect def pick(total, l): idx = bisect.bisect_right(l, random.randint(0, total)) return idx while True: newpool = [] for p in pool: twork = string.translate(swork, string.maketrans(letters, p)) score = calculatescore(twork) newpool.append((score, p)) newpool.sort(lambda x, y: cmp(y,x)) newpool = newpool[:POOLSIZE/2] # now, generate the distribution total, sums = sumpool(newpool) newpool = map(lambda x: x, newpool) torig = string.translate(sorig, string.maketrans(letters, newpool)) print "Best Decode score = %d" % sums print torig for x in range(POOLSIZE/2): idx = pick(total, sums) if (random.uniform(0, 1) < 0.15): newpool.append(mutate(newpool[idx])) else: idx2 = pick(total, sums) newpool.append(crossover(newpool[idx], newpool[idx2])) pool = newpool [/sourcecode]

4. Pete Bevin said

FYI, words 4 and 5 (AEFGAD and FZGPEAG) aren’t in /usr/share/dict/words on Mac. Only word 4 is missing on Ubuntu.