Common Words

April 26, 2019

We write our solution in awk.

We must first decide on the definition of a word. Awk’s normal field splitting assumes a word is a maximal sequence of non-space characters, but that includes punctuation. We could consider a word as a maximal sequence of letters, but then contractions like didn’t would count as two words. For the sake of simplicity, and because it solves the problem for the given text, we assume awk’s standard field splitting.

We must also decide what to do with duplicates. If the word WORD appears once on one line and twice on the next, does that count 1 or 2? The only sensible solution is to count 1, even though that is harder to arrange than merely comparing words.

Here is our solution:

$ echo '
> word1 word2 word3 word4
> word4 word5 word6 word7
> word6 word7 word8 word9
> word9 word6 word8 word3
> word1 word4 word5 word4
> ' | awk -v n=3 '
> NR == 1 { for (i = 1; i  NR >  1 { counter = 0
>           for (i = 1; i                if (word[$i]-- > 0) counter++ }
>           if (counter >= n) print $0
>           delete word
>           for (i = 1; i  '
word9 word6 word8 word3

If you want to change the definition of a word, set FS on the awk command line. You can run the program at https://ideone.com/e7zW3o.

Posted by programmingpraxis

Filed in Exercises

2 Comments »

2 Responses to “Common Words”

Daniel said

April 26, 2019 at 8:48 PM

Here’s a solution in Python. This solution essentially only considers the first occurrence of a word on each line. That is, a word appearing twice on line X is not counted as two matches if the word appears on line X – 1.

import sys

n = int(sys.argv[1])
last_words = set()
for line in open(sys.argv[2]):
    line = line.strip()
    words = set(line.split())
    if len(words.intersection(last_words)) == n:
        print(line)
    last_words = words

Example Usage:

$ python common.py 3 words.txt
word9 word6 word8 word3

Globules said

April 27, 2019 at 6:08 AM

A Haskell version.

import Control.Arrow ((>>>), (&&&))
import Data.Function ((&), on)
import Data.List (intersect, nub)
import System.Environment (getArgs)
import Text.Read (readMaybe)

inCommon :: Eq a => Int -> [a] -> [a] -> Bool
inCommon n xs ys = length (xs `intersect` ys) == n

wordsInCommon :: Int -> [String] -> [String]
wordsInCommon n ls = let lws = map (id &&& (nub . words)) ls
                     in zip lws (drop 1 lws) &
                        filter (uncurry (inCommon n `on` snd)) &
                        map (fst . snd)

main :: IO ()
main = do
  args <- getArgs
  case map readMaybe args of
    [Just n] -> interact $ lines >>> wordsInCommon n >>> unlines
    _        -> error "The number of words in common is required."

$ ./common 0 < common.txt 
word1 word4 word5 word4
$ ./common 3 < common.txt 
word9 word6 word8 word3
$ ./common 5 < common.txt 
$

S	M	T	W	T	F	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

Programming Praxis

Common Words

April 26, 2019

2 Responses to “Common Words”

Leave a comment

Categories

Archives

Archives

Programming Praxis

Common Words

April 26, 2019

Share this:

Related

2 Responses to “Common Words”

Leave a comment

Categories

Archives

Archives