Diff
June 8, 2010
Here’s our version:
(define (diff file1 file2)
(let* ((f1 (read-lines file1))
(f2 (read-lines file2))
(ds (lcs string=? f1 f2)))
(let loop ((f1 f1) (f2 f2) (ds ds) (n1 0) (n2 0))
(cond ((null? f1)
(when (pair? f2)
(display-header n1 n1 "a" (+ n2 1) (+ n2 (length f2)))
(for-each (display-line ">") f2)))
((null? f2)
(when (pair? f1)
(display-header (+ n1 1) (+ n1 (length f1)) "d" n2 n2)
(for-each (display-line "<") f1)))
((null? ds)
(let ((len1 (length f1)) (len2 (length f2)))
(display-header (+ n1 1) (+ n1 len1) "c" (+ n2 1) (+ n2 len2))
(for-each (display-line "<") f1)
(display "---") (newline)
(for-each (display-line ">") f2)))
((and (string=? (car f1) (car ds)) (string=? (car f2) (car ds)))
(loop (cdr f1) (cdr f2) (cdr ds) (+ n1 1) (+ n2 1)))
((string=? (car f1) (car ds))
(let* ((xs (take-while-not-string= (car ds) f2)) (len (length xs)))
(display-header n1 n1 "a" (+ n2 1) (+ n2 len))
(for-each (display-line ">") xs)
(loop f1 (drop len f2) ds n1 (+ n2 len))))
((string=? (car f2) (car ds))
(let* ((xs (take-while-not-string= (car ds) f1)) (len (length xs)))
(display-header (+ n1 1) (+ n1 len) "d" n2 n2)
(for-each (display-line "<") xs)
(loop (drop len f1) f2 ds (+ n1 len) n2)))
(else (let* ((x1 (take-while-not-string= (car ds) f1))
(x2 (take-while-not-string= (car ds) f2))
(len1 (length x1)) (len2 (length x2)))
(display-header (+ n1 1) (+ n1 len1) "c" (+ n2 1) (+ n2 len2))
(for-each (display-line "<") x1)
(display "---") (newline)
(for-each (display-line ">") x2)
(loop (drop len1 f1) (drop len2 f2) ds (+ n1 len1) (+ n2 len2))))))))
The cond
expression has seven clauses. The first three clauses terminate the recursion as they identify lines added at the end of the file, lines deleted at the end of the file, and lines that are changed at the end of the file. The fourth clause identifies matching lines that are part of the longest common subsequence; it advances both files and recurs without writing any output. The last three clauses identify lines added to the first file, lines deleted from the second file, and lines that changed between the files; each advances the appropriate file, then recurs. All of the clauses except the fourth, which identifies matching lines, produce output before they recur.
Output is written by two helper functions, display-header
, which writes the summary line including line numbers at the beginning of each set of differences, and display-line
, which writes a single line with the leading less-than or greater-than sign. Note that display-line
is curried, so it can be called easily from for-each
.
(define (display-header a b c d e)
(display a) (when (not (= a b)) (display ",") (display b)) (display c)
(display d) (when (not (= d e)) (display ",") (display e)) (newline))
(define-curried (display-line c s)
(display c) (display " ") (display s) (newline))
Function take-while-not-string=
extracts an operation that is performed four times:
(define (take-while-not-string= s1 s2)
(take-while (lambda (s) (not (string=? s1 s))) s2))
Here’s the output from running diff
on two files, f1
that contains the letters P R O G R A M M I N G on successive lines and f2
that contains the letters P R A X I S on successive lines; the longest common subsequence is P R A I:
3,5d2
< O
< G
< R
7,8c4
< M
< M
---
> X
10,11c6
< N
< G
---
> S
We used take-while, drop, read-line, the matrix operators, curried-lambda and define-curried from the Standard Prelude, wrote a function read-lines
that isn’t shown here, and used lcs
from a previous exercise. You can see the entire program, including read-lines
, at http://programmingpraxis.codepad.org/tNLiXFvE.
[…] Praxis – Diff By Remco Niemeijer In today’s Programming Praxis exercise our task is to write a diff command line tool. Let’s get started, […]
My Haskell solution (see http://bonsaicode.wordpress.com/2010/06/08/programming-praxis-diff/ for a version with comments):