N-Gram Frequency
February 20, 2018
Given a string of characters and an n-gram size n, prepare a frequency table of all combinations of size n blocks of characters. For instance, with the string “Programming Praxis” and an n-gram size of 2, the n-grams are “Pr”, “ro”, “og”, “gr”, “ra”, “am”, “mm”, “mi”, “in”, “ng”, “g “, ” P”, “Pr”, “ra”, “ax”, “xi”, and “is”; the two n-grams “Pr” and “ra” appear twice, and all the others appear once.
Your task is to write a program that computes a frequency table of n-grams. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Using a deque we can iterate through the string, adding each character to the deque, keeping count of the deques’ size, once it reaches the n-gram size, add to result, and
deque
the first item.Here’s a solution in Python 2.
Output:
Here’s a Haskell version.
Generic Python, using dictionaries
{‘Pr’: 2, ‘ro’: 1, ‘og’: 1, ‘gr’: 1, ‘ra’: 2, ‘am’: 1, ‘mm’: 1, ‘mi’: 1, ‘in’: 1, ‘ng’: 1, ‘g ‘: 1, ‘ P’: 1, ‘ax’: 1, ‘xi’: 1, ‘is’: 1}
Here’s my smart-alec J version:
That gives:
The slightly-kinder-formatted version (not being obtuse) is:
Ah, J. I’ll never use you for real work, but you’re so fun.
Java…
err… that didn’t work…