Comm
May 10, 2011
We have today another exercise in our continuing series of Unix V7 commands:
NAME
comm — select or reject lines common to two sorted files
SYNOPSIS
comm [ -[123] ] file1 file2
DESCRIPTION
Comm reads file1 and file2, which should be ordered in ASCII collating sequence, and produces a three column output: lines only in file1; lines only in file2; and lines in both files. The filename ‘-‘ means the standard input. Flags 1, 2, or 3 suppress printing of the corresponding column. Thus comm -12 prints only the lines common to the two files; comm -23 prints only lines in the first file but not in the second; comm -123 is a no-op.
Your task is to implement the comm command. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
My Haskell solution (see http://bonsaicode.wordpress.com/2011/05/10/programming-praxis-comm/ for a version with comments):
import Control.Monad import System.Environment import Text.Printf import System.IO import qualified System.IO.Strict as SIO import GHC.IO.Handle comm :: (Num b, Ord a) => [b] -> [a] -> [a] -> [(a, b)] comm flags zs = filter ((`notElem` flags) . snd) . f zs where f xs [] = map (flip (,) 1) xs f [] ys = map (flip (,) 2) ys f (x:xs) (y:ys) = case compare x y of LT -> (x,1) : f xs (y:ys) GT -> (y,2) : f (x:xs) ys EQ -> (x,3) : f xs ys columns :: [(String, Int)] -> IO () columns xs = let width = maximum (map (length . fst) xs) + 2 in mapM_ (\(s,c) -> printf "%*s%-*s\n" ((c - 1) * width) "" width s) xs main :: IO () main = do args <- getArgs columns =<< case args of (('-':p:ps):fs) -> go (map (read . return) (p:ps)) fs fs -> go [] fs where go args ~[f1, f2] = liftM2 (comm args) (file f1) (file f2) file src = fmap lines $ if src == "-" then newStdIn else readFile src newStdIn = catch (SIO.hGetContents =<< hDuplicate stdin) (\_ -> return [])# -*- coding: cp1252 -*- import argparse def comparelines(f1,f2): line1 = f1.readline() line2 = f2.readline() while line1 or line2: if line1 and (line1 < line2 or line2==''): line, col = line1, 1 elif line2 and (line2 < line1 or line1==''): line, col = line2, 2 else: line, col = line2, 3 yield line, col if col != 1: line2 = f2.readline() if col != 2: line1 = f1.readline() parser = argparse.ArgumentParser( description="select or reject lines common to two sorted files.", epilog="""\ Reads file1 and file2, which should be ordered in ASCII collating sequence, and produces a three column output: lines only in file1; lines only in file2; and lines in both files. The filename ‘-’ means the standard input. Flags 1, 2, or 3 suppress printing of the corresponding column. Thus, %(prog)s -12 prints only the lines common to the two files; %(prog)s -23 prints only lines in the first file but not in the second; %(prog)s -123 is a no-op.""" ) parser.add_argument('-1', dest='col1', action='store_false') parser.add_argument('-2', dest='col2', action='store_false') parser.add_argument('-3', dest='col3', action='store_false') parser.add_argument('file1', type=argparse.FileType('r')) parser.add_argument('file2', type=argparse.FileType('r')) args = parser.parse_args() column_flag = [None, args.col1, args.col2, args.col3] for line, col in comparelines(args.file1, args.file2): if column_flag[col]: print '{}{}'.format('\t'*(col-1), line.rstrip())Solution in pascal: github