## Common Words

### April 26, 2019

We write our solution in awk.

We must first decide on the definition of a word. Awk’s normal field splitting assumes a word is a maximal sequence of non-space characters, but that includes punctuation. We could consider a word as a maximal sequence of letters, but then contractions like didn’t would count as two words. For the sake of simplicity, and because it solves the problem for the given text, we assume awk’s standard field splitting.

We must also decide what to do with duplicates. If the word WORD appears once on one line and twice on the next, does that count 1 or 2? The only sensible solution is to count 1, even though that is harder to arrange than merely comparing words.

Here is our solution:

```\$ echo '
> word1 word2 word3 word4
> word4 word5 word6 word7
> word6 word7 word8 word9
> word9 word6 word8 word3
> word1 word4 word5 word4
> ' | awk -v n=3 '
> NR == 1 { for (i = 1; i  NR >  1 { counter = 0
>           for (i = 1; i                if (word[\$i]-- > 0) counter++ }
>           if (counter >= n) print \$0
>           delete word
>           for (i = 1; i  '
word9 word6 word8 word3```

If you want to change the definition of a word, set FS on the awk command line. You can run the program at https://ideone.com/e7zW3o.

Pages: 1 2

### 2 Responses to “Common Words”

1. Daniel said

Here’s a solution in Python. This solution essentially only considers the first occurrence of a word on each line. That is, a word appearing twice on line X is not counted as two matches if the word appears on line X – 1.

```import sys

n = int(sys.argv)
last_words = set()
for line in open(sys.argv):
line = line.strip()
words = set(line.split())
if len(words.intersection(last_words)) == n:
print(line)
last_words = words
```

Example Usage:

```\$ python common.py 3 words.txt
word9 word6 word8 word3
```
2. Globules said

```import Control.Arrow ((>>>), (&&&))
import Data.Function ((&), on)
import Data.List (intersect, nub)
import System.Environment (getArgs)

inCommon :: Eq a => Int -> [a] -> [a] -> Bool
inCommon n xs ys = length (xs `intersect` ys) == n

wordsInCommon :: Int -> [String] -> [String]
wordsInCommon n ls = let lws = map (id &&& (nub . words)) ls
in zip lws (drop 1 lws) &
filter (uncurry (inCommon n `on` snd)) &
map (fst . snd)

main :: IO ()
main = do
args <- getArgs
```\$ ./common 0 < common.txt