Diff

June 8, 2010

Here’s our version:

(define (diff file1 file2)
  (let* ((f1 (read-lines file1))
         (f2 (read-lines file2))
         (ds (lcs string=? f1 f2)))
    (let loop ((f1 f1) (f2 f2) (ds ds) (n1 0) (n2 0))
      (cond ((null? f1)
              (when (pair? f2)
                (display-header n1 n1 "a" (+ n2 1) (+ n2 (length f2)))
                (for-each (display-line ">") f2)))
            ((null? f2)
              (when (pair? f1)
                (display-header (+ n1 1) (+ n1 (length f1)) "d" n2 n2)
                (for-each (display-line "<") f1)))
            ((null? ds)
              (let ((len1 (length f1)) (len2 (length f2)))
                (display-header (+ n1 1) (+ n1 len1) "c" (+ n2 1) (+ n2 len2))
                (for-each (display-line "<") f1)
                (display "---") (newline)
                (for-each (display-line ">") f2)))
            ((and (string=? (car f1) (car ds)) (string=? (car f2) (car ds)))
              (loop (cdr f1) (cdr f2) (cdr ds) (+ n1 1) (+ n2 1)))
            ((string=? (car f1) (car ds))
              (let* ((xs (take-while-not-string= (car ds) f2)) (len (length xs)))
                (display-header n1 n1 "a" (+ n2 1) (+ n2 len))
                (for-each (display-line ">") xs)
                (loop f1 (drop len f2) ds n1 (+ n2 len))))
            ((string=? (car f2) (car ds))
              (let* ((xs (take-while-not-string= (car ds) f1)) (len (length xs)))
                (display-header (+ n1 1) (+ n1 len) "d" n2 n2)
                (for-each (display-line "<") xs)
                (loop (drop len f1) f2 ds (+ n1 len) n2)))
             (else (let* ((x1 (take-while-not-string= (car ds) f1))
                          (x2 (take-while-not-string= (car ds) f2))
                          (len1 (length x1)) (len2 (length x2)))
                     (display-header (+ n1 1) (+ n1 len1) "c" (+ n2 1) (+ n2 len2))
                     (for-each (display-line "<") x1)
                     (display "---") (newline)
                     (for-each (display-line ">") x2)
                     (loop (drop len1 f1) (drop len2 f2) ds (+ n1 len1) (+ n2 len2))))))))

The cond expression has seven clauses. The first three clauses terminate the recursion as they identify lines added at the end of the file, lines deleted at the end of the file, and lines that are changed at the end of the file. The fourth clause identifies matching lines that are part of the longest common subsequence; it advances both files and recurs without writing any output. The last three clauses identify lines added to the first file, lines deleted from the second file, and lines that changed between the files; each advances the appropriate file, then recurs. All of the clauses except the fourth, which identifies matching lines, produce output before they recur.

Output is written by two helper functions, display-header, which writes the summary line including line numbers at the beginning of each set of differences, and display-line, which writes a single line with the leading less-than or greater-than sign. Note that display-line is curried, so it can be called easily from for-each.

(define (display-header a b c d e)
  (display a) (when (not (= a b)) (display ",") (display b)) (display c)
  (display d) (when (not (= d e)) (display ",") (display e)) (newline))

(define-curried (display-line c s)
  (display c) (display " ") (display s) (newline))

Function take-while-not-string= extracts an operation that is performed four times:

(define (take-while-not-string= s1 s2)
  (take-while (lambda (s) (not (string=? s1 s))) s2))

Here’s the output from running diff on two files, f1 that contains the letters P R O G R A M M I N G on successive lines and f2 that contains the letters P R A X I S on successive lines; the longest common subsequence is P R A I:

3,5d2
< O
< G
< R
7,8c4
< M
< M
---
> X
10,11c6
< N
< G
---
> S

We used take-while, drop, read-line, the matrix operators, curried-lambda and define-curried from the Standard Prelude, wrote a function read-lines that isn’t shown here, and used lcs from a previous exercise. You can see the entire program, including read-lines, at http://programmingpraxis.codepad.org/tNLiXFvE.

Advertisement

Pages: 1 2

2 Responses to “Diff”

  1. […] Praxis – Diff By Remco Niemeijer In today’s Programming Praxis exercise our task is to write a diff command line tool. Let’s get started, […]

  2. Remco Niemeijer said

    My Haskell solution (see http://bonsaicode.wordpress.com/2010/06/08/programming-praxis-diff/ for a version with comments):

    import Data.List.LCS.HuntSzymanski
    
    data Change = D | A | C
    
    linenum :: (Int, Int) -> String
    linenum (s, e) = if s == e then show s else show s ++ "," ++ show e
    
    header :: (Int, Int) -> String -> (Int, Int) -> IO ()
    header l op r = putStrLn $ linenum l ++ op ++ linenum r
    
    section :: Char -> [String] -> IO ()
    section c = mapM_ (\s -> putStrLn $ c:' ':s)
    
    diff :: String -> String -> IO ()
    diff xs ys = f 0 0 (lines xs) (lines ys) where 
        f n1 n2 = g where
            g [] b  = change A [] b
            g a  [] = change D a []
            g a  b  = case lcs a b of
                []    -> change C a b
                (d:_) -> case (head a == d, head b == d) of
                    (True, True) -> rec 1 1
                    (True, _   ) -> change A q1 q2 >> rec len1 len2
                    (_   , True) -> change D q1 q2 >> rec len1 len2
                    _            -> change C q1 q2 >> rec len1 len2
                    where [q1, q2] = map (takeWhile (/= d)) [a, b]
                          [len1, len2] = map length [q1, q2]
                          rec l r = f (n1+l) (n2+r) (drop l a) (drop r b)
    
            change D a _ = header (n1+1, n1+length a) "d" (n2, n2) >>
                           section '<' a
            change A _ b = header (n1, n1) "a" (n2+1, n2 + length b) >>
                           section '>' b
            change C a b = header (n1+1, n1+length a) "c" (n2+1, n2+length b) >>
                           section '<' a >> putStrLn "---" >> section '>' b
    

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: