Common Words

April 26, 2019

Today’s exercise comes from Stack Overflow:

Given a text file like:

word1 word2 word3 word4
word4 word5 word6 word7
word6 word7 word8 word9
word9 word6 word8 word3
word1 word4 word5 word4

Write a program that returns those lines that have n words in common with the previous line. For instance, given the input above, the only output line would be:

word9 word6 word8 word3

The original question requested a solution in sed or awk, but you are free to use any language.

Your task is to write a program to extract lines from a text file that have n words in common with the previous line. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

2 Responses to “Common Words”

  1. Daniel said

    Here’s a solution in Python. This solution essentially only considers the first occurrence of a word on each line. That is, a word appearing twice on line X is not counted as two matches if the word appears on line X – 1.

    import sys
    
    n = int(sys.argv[1])
    last_words = set()
    for line in open(sys.argv[2]):
        line = line.strip()
        words = set(line.split())
        if len(words.intersection(last_words)) == n:
            print(line)
        last_words = words
    

    Example Usage:

    $ python common.py 3 words.txt
    word9 word6 word8 word3
    
  2. Globules said

    A Haskell version.

    import Control.Arrow ((>>>), (&&&))
    import Data.Function ((&), on)
    import Data.List (intersect, nub)
    import System.Environment (getArgs)
    import Text.Read (readMaybe)
    
    inCommon :: Eq a => Int -> [a] -> [a] -> Bool
    inCommon n xs ys = length (xs `intersect` ys) == n
    
    wordsInCommon :: Int -> [String] -> [String]
    wordsInCommon n ls = let lws = map (id &&& (nub . words)) ls
                         in zip lws (drop 1 lws) &
                            filter (uncurry (inCommon n `on` snd)) &
                            map (fst . snd)
    
    main :: IO ()
    main = do
      args <- getArgs
      case map readMaybe args of
        [Just n] -> interact $ lines >>> wordsInCommon n >>> unlines
        _        -> error "The number of words in common is required."
    
    $ ./common 0 < common.txt 
    word1 word4 word5 word4
    $ ./common 3 < common.txt 
    word9 word6 word8 word3
    $ ./common 5 < common.txt 
    $ 
    

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: