Common Words
April 26, 2019
Today’s exercise comes from Stack Overflow:
Given a text file like:
word1 word2 word3 word4 word4 word5 word6 word7 word6 word7 word8 word9 word9 word6 word8 word3 word1 word4 word5 word4Write a program that returns those lines that have n words in common with the previous line. For instance, given the input above, the only output line would be:
word9 word6 word8 word3
The original question requested a solution in sed or awk, but you are free to use any language.
Your task is to write a program to extract lines from a text file that have n words in common with the previous line. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Here’s a solution in Python. This solution essentially only considers the first occurrence of a word on each line. That is, a word appearing twice on line X is not counted as two matches if the word appears on line X – 1.
Example Usage:
A Haskell version.