Word Frequencies
March 10, 2009
Write a program that takes a filename and a parameter n and prints the n most common words in the file, and the count of their occurrences, in descending order.
A collection of etudes, updated weekly, for the education and enjoyment of the savvy programmer
Write a program that takes a filename and a parameter n and prints the n most common words in the file, and the count of their occurrences, in descending order.
Pages: 1 2
perl -ne ‘while (/([a-z]+)/gi) {$words{$1}++} END{ map { print “$_ $words{$_}\n”} sort {$words{$b} $words{$a}} keys %words}’
A straightforward (though probably not very efficient) Haskell solution:
import System.Environment
import Control.Arrow
import Data.List
import Data.Ord
main = do [n, fileName] <- getArgs
content String -> [(String, Int)]
findMostCommonWords n = take n . reverse . sortBy (comparing snd) . map (head &&& length) . group . sort . words
Sigh… gotta love forms that don’t escape html characters. Let’s try that again.
import System.Environment
import Control.Arrow
import Data.List
import Data.Ord
main = do [n, fileName] <- getArgs
content <- readFile fileName
mapM_ print $ findMostCommonWords (read n) content
findMostCommonWords :: Int -> String -> [(String, Int)]
findMostCommonWords n = take n . reverse . sortBy (comparing snd) . map (head &&& length) . group . sort . words
Interesting challenge, here’s my solution in python.
http://pastebin.com/f1ac76000
Action CountWords = (filename, top) =>
{
foreach (var kv in
File.ReadAllText(filename)
.Split()
.GroupBy(w => w,
(w, c) => new
{ Word = w, Count = c.Count() })
.OrderByDescending(a => a.Count)
.Take(top))
Console.WriteLine(kv.Word + ” – ” + kv.Count);
};
http://pastebin.com/fbf06089
[…] Dictionaries are a common data type, which we have used in several exercises (Mark V. Shaney, Word Frequencies, Dodgson’s Doublets, Anagrams). Hash tables are often used as the underlying implementation […]
Here it is in ruby (commented so I don’t have to do it elsewhere) …
Clojure library has an handy frequencies function.
Hello guys,
Check my solution in Python. I went a bit further I cleaned the source from punctuation characters.
It brings more relevant output.
https://github.com/ftt/programming-praxis/blob/master/20090310-word-frequencies/word-frequencies.py
# In Ruby
text = File.read(ARGV[0])
n = ARGV[1].to_i
puts text
.scan(/\w+/)
.group_by(&:itself)
.map { |k, v| [k, v.count] }
.sort_by { |_, v| -v }
.take(n)
.map { |k, v| “#{k}: #{v}” }