Freq

February 5, 2016

Every so often I get it in mind to start solving the daily cryptogram in the newspaper. For instance:

[Category: Literature] V MVXEGC NK V RGIIZH HYZ IGXDK PZQ YNK QLMCGIIV HYGX AYG KQX NK KYNXNXO VXD HZXAK NA MVUE AYG LNXQAG NA MGONXK AZ CVNX. — LVCE AHVNX

The first word is either A or I, and the repeated two-letter words NK, NK, NA and NA suggest words like IF, IS, IT or OF, ON, OR. Further, the digraph NK also appears as YNK. So we guess that NK is IS, YNK is HIS, and V is A. Now the author LVCE AHVNK is _A__ __AI_, which rather strongly suggests MARK TWAIN. Now we have the phrase WH_N TH_ S_N IS SHININ_, which is almost certainly WHEN THE SUN IS SHINING, and the rest of the cryptogram is easy: A BANKER IS A FELLOW WHO LENDS YOU HIS UMBRELLA WHEN THE SUN IS SHINING AND WANTS IT BACK THE MINUTE IT BEGINS TO RAIN. — MARK TWAIN.

We’ve written a program to solve cryptograms, a long time ago, using a dictionary attack, but sometimes it’s still fun to solve them by hand. One way a computer can help is by creating frequency tables of single letters, digraphs, and trigraphs; in the puzzle shown above, a table of digraphs leads quickly to NX, which appears six times, along with NK (three times) and NA (two times), which quickly limits the possibilities for the letter N. We didn’t use it in the analysis above, but AYG is the most common trigram, and both times it appears it is a word by itself, so it is almost certainly THE or AND. Thus, frequency tables quickly lead to a solution.

Your task is to write a program that generates frequency tables, including single characters, digraphs, trigraphs, and other lengths. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

2 Comments »

2 Responses to “Freq”

Rutger said
February 5, 2016 at 12:16 PM
Python Counter does the job.

from collections import Counter

s = "V MVXEGC NK V RGIIZH HYZ IGXDK PZQ YNK QLMCGIIV HYGX AYG KQX NK KYNXNXO VXD HZXAK NA MVUE AYG LNXQAG NA MGONXK AZ CVNX. LVCE AHVNX"

unigraphs = Counter(s)
del unigraphs[‘ ‘]
digraphs = Counter([s[i:i+2] for i in range(len(s)-1) if ‘ ‘ not in s[i:i+2]])
trigraphs = Counter([s[i:i+3] for i in range(len(s)-2) if ‘ ‘ not in s[i:i+3]])
print unigraphs.most_common(5)
print digraphs.most_common(5)
print trigraphs.most_common(5)

# output
# [(‘X’, 12), (‘N’, 11), (‘G’, 9), (‘V’, 9), (‘A’, 8)]
# [(‘NX’, 6), (‘YG’, 3), (‘NK’, 3), (‘HY’, 2), (‘YN’, 2)]
# [(‘VNX’, 2), (‘AYG’, 2), (‘GII’, 2), (‘VUE’, 1), (‘YNK’, 1)]

Jussi Piitulainen said

February 9, 2016 at 9:40 AM

Python’s strings have this nifty pair of methods to build and apply translation tables. I had
forgotten what the message was (though I quickly remembered it was by Mark Twain) so
I had a chance to actually work it out (trying to not remember that it was by Mark Twain).

All I used was the short words, especially the
combination NK V, and after a promising start, the most frequent unigram heuristic. At
that point, clear patterns started to emerge and the message revealed itself. The last
translation steps are commented out in order to show that stage of the output.

There’s a typo in HZXAK.

s = """V MVXEGC NK V RGIIZH HYZ IGXDK PZQ YNK QLMCGIIV HYGX AYG KQX NK
KYNXNXO VXD HZXAK NA MVUE AYG LNXQAG NA MGONXK AZ CVNX. LVCE AHVNX"""

from collections import Counter

unigraphs = Counter(c for c in s if not c == " ")
digraphs  = Counter(c + d for c, d in zip(s, s[1:]))
trigraphs = Counter(c + d + e for c, d, e in zip(s, s[1:], s[2:]))
print("1-grams:", *unigraphs.most_common(7))
print("2-grams:", *digraphs.most_common(6))
print("3-grams:", *trigraphs.most_common(5))
print("1-words:", *(w for w in s.split() if len(w) == 1))
print("2-words:", *(w for w in s.split() if len(w) == 2))
print("3-words:", *(w for w in s.split() if len(w) == 3))
print()
print(s)
print()
print(s
      .translate(str.maketrans("V", "a"))   # "NK V" -> "?? a", "?? i"
      .translate(str.maketrans("NK", "is")) # "NK V" -> "in, if, is, it" "a"
      .translate(str.maketrans("NA", "it")) # looks promising? go with this
      .translate(str.maketrans("AZ", "to"))
      .translate(str.maketrans("XG", "ne"))
      # "etaoin" are supposed to be the most frequent letters, "taoi"
      # are assigned above, "XG" -> "en" led to "aeD" and "Caie" but
      # "XG" -> "ne" worked, as follows
      .translate(str.maketrans("MeOins to Cain", "begins to rain"))
      # .translate(str.maketrans("tYe", "the"))
      # .translate(str.maketrans("Hho", "who"))
      # .translate(str.maketrans("ReIIow", "fellow"))
      # .translate(str.maketrans("LarE twain", "mark twain"))
      # .translate(str.maketrans("lenDs", "lends"))
      # .translate(str.maketrans("PoQ", "you"))
      # .translate(str.maketrans("baUk", "back"))
      .translate(str.maketrans("", "")))

The output at the point when the message began to reveal itself, before I wrote
the commented-out translation steps:

1-grams: ('X', 12) ('N', 11) ('G', 9) ('V', 9) ('K', 8) ('A', 8) ('Y', 6)
2-grams: ('NX', 6) ('K ', 5) (' A', 4) (' N', 4) ('NK', 3) (' H', 3)
3-grams: ('A M', 2) ('VNX', 2) (' HY', 2) ('AYG', 2) (' AY', 2)
1-words: V V
2-words: NK NK NA NA AZ
3-words: HYZ PZQ YNK AYG KQX VXD AYG

V MVXEGC NK V RGIIZH HYZ IGXDK PZQ YNK QLMCGIIV HYGX AYG KQX NK
KYNXNXO VXD HZXAK NA MVUE AYG LNXQAG NA MGONXK AZ CVNX. LVCE AHVNX

a banEer is a ReIIoH HYo IenDs PoQ Yis QLbreIIa HYen tYe sQn is
sYining anD Honts it baUE tYe LinQte it begins to rain. LarE tHain

S	M	T	W	T	F	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29

Programming Praxis

Freq

February 5, 2016

2 Responses to “Freq”

Leave a comment

Categories

Archives

Archives

Programming Praxis

Freq

February 5, 2016

Share this:

Related

2 Responses to “Freq”

Leave a comment

Categories

Archives

Archives