Comm

May 10, 2011

We have today another exercise in our continuing series of Unix V7 commands:

NAME

comm — select or reject lines common to two sorted files

SYNOPSIS

comm [ -[123] ] file1 file2

DESCRIPTION

Comm reads file1 and file2, which should be ordered in ASCII collating sequence, and produces a three column output: lines only in file1; lines only in file2; and lines in both files. The filename ‘-‘ means the standard input. Flags 1, 2, or 3 suppress printing of the corresponding column. Thus comm -12 prints only the lines common to the two files; comm -23 prints only lines in the first file but not in the second; comm -123 is a no-op.

Your task is to implement the comm command. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

About these ads

Pages: 1 2

3 Responses to “Comm”

  1. My Haskell solution (see http://bonsaicode.wordpress.com/2011/05/10/programming-praxis-comm/ for a version with comments):

    import Control.Monad
    import System.Environment
    import Text.Printf
    import System.IO
    import qualified System.IO.Strict as SIO
    import GHC.IO.Handle
    
    comm :: (Num b, Ord a) => [b] -> [a] -> [a] -> [(a, b)]
    comm flags zs = filter ((`notElem` flags) . snd) . f zs where
        f xs     []     = map (flip (,) 1) xs
        f []     ys     = map (flip (,) 2) ys
        f (x:xs) (y:ys) = case compare x y of 
            LT -> (x,1) : f xs     (y:ys)
            GT -> (y,2) : f (x:xs) ys
            EQ -> (x,3) : f xs     ys
    
    columns :: [(String, Int)] -> IO ()
    columns xs = let width = maximum (map (length . fst) xs) + 2 in
        mapM_ (\(s,c) -> printf "%*s%-*s\n" ((c - 1) * width) "" width s) xs
    
    main :: IO ()
    main = do args <- getArgs
              columns =<< case args of
                  (('-':p:ps):fs) -> go (map (read . return) (p:ps)) fs
                  fs              -> go [] fs
        where go args ~[f1, f2] = liftM2 (comm args) (file f1) (file f2)
              file src = fmap lines $ if src == "-" then newStdIn
                                                    else readFile src
              newStdIn = catch (SIO.hGetContents =<< hDuplicate stdin)
                               (\_ -> return [])
    
  2. Mike said
    # -*- coding: cp1252 -*-
    import argparse
    
    def comparelines(f1,f2):
        line1 = f1.readline()
        line2 = f2.readline()
    
        while line1 or line2:
            if line1 and (line1 < line2 or line2==''):
                line, col = line1, 1
    
            elif line2 and (line2 < line1 or line1==''):
                line, col = line2, 2
    
            else:
                line, col = line2, 3
    
            yield line, col
    
            if col != 1: line2 = f2.readline()
            if col != 2: line1 = f1.readline()
    
    
    parser = argparse.ArgumentParser(
        description="select or reject lines common to two sorted files.",
        epilog="""\
    Reads file1 and file2, which should be ordered in ASCII collating sequence,
    and produces a three column output: lines only in file1; lines only in file2;
    and lines in both files. The filename ‘-’ means the standard input.
    Flags 1, 2, or 3 suppress printing of the corresponding column.  Thus,
        %(prog)s -12 prints only the lines common to the two files;
        %(prog)s -23 prints only lines in the first file but not in the second;
        %(prog)s -123 is a no-op."""
        )
    
    parser.add_argument('-1', dest='col1', action='store_false')
    parser.add_argument('-2', dest='col2', action='store_false')
    parser.add_argument('-3', dest='col3', action='store_false')
    parser.add_argument('file1', type=argparse.FileType('r'))
    parser.add_argument('file2', type=argparse.FileType('r'))
    
    
    args = parser.parse_args()
    column_flag = [None, args.col1, args.col2, args.col3]
    
    for line, col in comparelines(args.file1, args.file2):
        if column_flag[col]:
                print '{}{}'.format('\t'*(col-1), line.rstrip())
    
    
  3. arturasl said

    Solution in pascal: github

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 612 other followers

%d bloggers like this: