Diff

June 8, 2010

The unix program diff identifies differences between text files; it is most useful for comparing two versions of a program.

Given the longest common subsequence between two files, which we computed in a previous exercise, it is easy to compute the diff between the two files; the diff is just those lines that aren’t part of the lcs.

Your task is to write a program that finds the differences between two files. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

2 Responses to “Diff”

  1. […] Praxis – Diff By Remco Niemeijer In today’s Programming Praxis exercise our task is to write a diff command line tool. Let’s get started, […]

  2. Remco Niemeijer said

    My Haskell solution (see http://bonsaicode.wordpress.com/2010/06/08/programming-praxis-diff/ for a version with comments):

    import Data.List.LCS.HuntSzymanski
    
    data Change = D | A | C
    
    linenum :: (Int, Int) -> String
    linenum (s, e) = if s == e then show s else show s ++ "," ++ show e
    
    header :: (Int, Int) -> String -> (Int, Int) -> IO ()
    header l op r = putStrLn $ linenum l ++ op ++ linenum r
    
    section :: Char -> [String] -> IO ()
    section c = mapM_ (\s -> putStrLn $ c:' ':s)
    
    diff :: String -> String -> IO ()
    diff xs ys = f 0 0 (lines xs) (lines ys) where 
        f n1 n2 = g where
            g [] b  = change A [] b
            g a  [] = change D a []
            g a  b  = case lcs a b of
                []    -> change C a b
                (d:_) -> case (head a == d, head b == d) of
                    (True, True) -> rec 1 1
                    (True, _   ) -> change A q1 q2 >> rec len1 len2
                    (_   , True) -> change D q1 q2 >> rec len1 len2
                    _            -> change C q1 q2 >> rec len1 len2
                    where [q1, q2] = map (takeWhile (/= d)) [a, b]
                          [len1, len2] = map length [q1, q2]
                          rec l r = f (n1+l) (n2+r) (drop l a) (drop r b)
    
            change D a _ = header (n1+1, n1+length a) "d" (n2, n2) >>
                           section '<' a
            change A _ b = header (n1, n1) "a" (n2+1, n2 + length b) >>
                           section '>' b
            change C a b = header (n1+1, n1+length a) "c" (n2+1, n2+length b) >>
                           section '<' a >> putStrLn "---" >> section '>' b
    

Leave a comment