Entab And Detab
May 6, 2011
Two of my favorite programming textbooks are Software Tools and Software Tools in Pascal, both by Brian W. Kernighan and P. J. Plauger. Our entab and detab programs are based on the corresponding programs in their books. We begin, as Kernighan and Plauger did, with detab. Here is their pseudo-code:
col := 1
while (getc(c) <> ENDFILE)
if (c = TAB)
print one or more blanks and update col
until tabpos(col)
else if (c = NEWLINE)
putc(c)
col := 1
else
putc(c)
col := col + 1
Our Scheme version of the program is similar. We implement the loop “print one or more blanks” by a recursive call to loop
that doesn’t read an additional character:
(define (detab n file-name)
(with-input-from-file file-name
(lambda ()
(let loop ((c (read-char)) (col 1))
(when (not (eof-object? c))
(cond ((char=? c #\tab)
(display #\space)
(if (zero? (modulo col n))
(loop (read-char) (+ col 1))
(loop c (+ col 1))))
((char=? c #\newline)
(display c)
(loop (read-char) 1))
(else (display c)
(loop (read-char) (+ col 1)))))))))
Regarding entab, Kernighan and Plauger give the following advice:
An easy way for entab to keep track of the blanks is to use another variable newcol that moves away from col as blanks are encountered. Whenever a tab is output, col is made to catch up to newcol. Then, when a non-blank character is encountered, if col is less than newcol there are excess blanks accumulated (not enough to be replaced by a tab) which must be output before the character can be.
Their pseudo-code is shown below:
col := 1
repeat
while (getc(c) = BLANK) { collect blanks }
if (at tab stop)
print a tab
while (any blanks left over)
put them out
{ c is now ENDFILE or non-blank }
if (c <> ENDFILE)
putc(c)
if (c = NEWLINE)
col := 1
else
col := col + 1
until (c = ENDFILE)
That’s a little bit harder to translate to Scheme, for two reasons: first, Scheme doesn’t have a repeat
… until
control structure; second, Scheme handles the end-of-file marker out-of-band, so there is no direct comparison between characters and the end-of-file marker. The biggest consequence is that we might have work to do after reading the end-of-file marker, in the case where there are some unprocessed space characters remaining. Here’s our code:
(define (entab n file-name)
(with-input-from-file file-name
(lambda ()
(let loop ((c (read-char)) (col 1) (newcol 1))
(cond ((eof-object? c)
(when (< col newcol)
(display #\space)
(loop c (+ col 1) newcol)))
((char=? c #\space)
(if (zero? (modulo newcol n))
(begin (display #\tab)
(loop (read-char) (+ newcol 1) (+ newcol 1)))
(loop (read-char) col (+ newcol 1))))
((< col newcol)
(display #\space)
(loop c (+ col 1) newcol))
((char=? c #\newline)
(display c)
(loop (read-char) 1 1))
(else (display c)
(loop (read-char) (+ col 1) (+ newcol 1))))))))
The code is reproduced at http://programmingpraxis.codepad.org/N44f2cDA.
By the way, it might also be useful to have a program like detab that writes html-style non-breaking spaces — you could use it to post your solution in the comments below!
My Haskell solution (see http://bonsaicode.wordpress.com/2011/05/06/programming-praxis-entab-and-detab/ for a version with comments):
Remko: you change all tabs/spaces. You should only consider the ones at the head of a line. And they may be mixed.
“Remco”, sorry for the spelling error.
Axio: Correct, detab changes all tabs, since this is the behaviour of the provided solution. entab only process spaces at the start of the line. Mixed spaces and tabs are already handled correctly.
Gambit-C Scheme, and some macros inspired by Common Lisp…
Not the most beautiful code, and no magic involved.
Will handle mixed tabs and spaces on same line, and stop at the first non-space-nor-tab character.
Procedures to apply to each line of a loaded file.
I think that’s pretty much it…
(define *tab-width* 4)
(define (flush seen)
(unless (zero? seen)
(for-each (lambda (x) (display " ")) (iota 1 seen))))
(define (entab-line line #!optional (tab-width *tab-width*))
(let ((sl (string-length line)))
(let loop ((pos 0)
(seen 0))
(if (= pos sl)
(flush seen)
(case (string-ref line pos)
((#\space)
(if (= seen (- *tab-width* 1))
(begin
(display #\tab)
(loop (1+ pos)
0))
(loop (1+ pos)
(1+ seen))))
((#\tab)
(flush seen)
(display #\tab)
(loop (1+ pos)
0))
(else
(flush seen)
(display (substring line pos sl))))))))
(define (detab-line line #!optional (tab-width *tab-width*))
(let ((sl (string-length line)))
(let loop ((pos 0))
(unless (= pos sl)
(case (string-ref line pos)
((#\space)
(display " ")
(loop (1+ pos)))
((#\tab)
(flush tab-width)
(loop (1+ pos)))
(else
(display (substring line pos sl))))))))
With better indentation, hopefully.
(define *tab-width* 4)
;
(define (flush seen)
(unless (zero? seen)
(for-each (lambda (x) (display " ")) (iota 1 seen))))
;
(define (entab-line line #!optional (tab-width *tab-width*))
(let ((sl (string-length line)))
(let loop ((pos 0)
(seen 0))
(if (= pos sl)
(flush seen)
(case (string-ref line pos)
((#\space)
(if (= seen (- *tab-width* 1))
(begin
(display #\tab)
(loop (1+ pos)
0))
(loop (1+ pos)
(1+ seen))))
((#\tab)
(flush seen)
(display #\tab)
(loop (1+ pos)
0))
(else
(flush seen)
(display (substring line pos sl))))))))
;
(define (detab-line line #!optional (tab-width *tab-width*))
(let ((sl (string-length line)))
(let loop ((pos 0))
(unless (= pos sl)
(case (string-ref line pos)
((#\space)
(display " ")
(loop (1+ pos)))
((#\tab)
(flush tab-width)
(loop (1+ pos)))
(else
(display (substring line pos sl))))))))
My solution in C:
[…] – entab and detab are used to handle problems on copy-and-paste from text files (ref) […]
Write a program detab that replaces tabs in the input with the proper number of blanks to space to the next tab stop. Assume a fixed set of tab stops, say every n columns. Should n be a variable or a symbolic parameter?