Entab And Detab

May 6, 2011

Two of my favorite programming textbooks are Software Tools and Software Tools in Pascal, both by Brian W. Kernighan and P. J. Plauger. Our entab and detab programs are based on the corresponding programs in their books. We begin, as Kernighan and Plauger did, with detab. Here is their pseudo-code:

col := 1 while (getc(c) <> ENDFILE) if (c = TAB) print one or more blanks and update col until tabpos(col) else if (c = NEWLINE) putc(c) col := 1 else putc(c) col := col + 1

Our Scheme version of the program is similar. We implement the loop “print one or more blanks” by a recursive call to loop that doesn’t read an additional character:

(define (detab n file-name) (with-input-from-file file-name (lambda () (let loop ((c (read-char)) (col 1)) (when (not (eof-object? c)) (cond ((char=? c #\tab) (display #\space) (if (zero? (modulo col n)) (loop (read-char) (+ col 1)) (loop c (+ col 1)))) ((char=? c #\newline) (display c) (loop (read-char) 1)) (else (display c) (loop (read-char) (+ col 1)))))))))

Regarding entab, Kernighan and Plauger give the following advice:

An easy way for entab to keep track of the blanks is to use another variable newcol that moves away from col as blanks are encountered. Whenever a tab is output, col is made to catch up to newcol. Then, when a non-blank character is encountered, if col is less than newcol there are excess blanks accumulated (not enough to be replaced by a tab) which must be output before the character can be.

Their pseudo-code is shown below:

col := 1 repeat while (getc(c) = BLANK) { collect blanks } if (at tab stop) print a tab while (any blanks left over) put them out { c is now ENDFILE or non-blank } if (c <> ENDFILE) putc(c) if (c = NEWLINE) col := 1 else col := col + 1 until (c = ENDFILE)

That’s a little bit harder to translate to Scheme, for two reasons: first, Scheme doesn’t have a repeat … until control structure; second, Scheme handles the end-of-file marker out-of-band, so there is no direct comparison between characters and the end-of-file marker. The biggest consequence is that we might have work to do after reading the end-of-file marker, in the case where there are some unprocessed space characters remaining. Here’s our code:

(define (entab n file-name) (with-input-from-file file-name (lambda () (let loop ((c (read-char)) (col 1) (newcol 1)) (cond ((eof-object? c) (when (< col newcol) (display #\space) (loop c (+ col 1) newcol))) ((char=? c #\space) (if (zero? (modulo newcol n)) (begin (display #\tab) (loop (read-char) (+ newcol 1) (+ newcol 1))) (loop (read-char) col (+ newcol 1)))) ((< col newcol) (display #\space) (loop c (+ col 1) newcol)) ((char=? c #\newline) (display c) (loop (read-char) 1 1)) (else (display c) (loop (read-char) (+ col 1) (+ newcol 1))))))))

The code is reproduced at http://programmingpraxis.codepad.org/N44f2cDA.

By the way, it might also be useful to have a program like detab that writes html-style   non-breaking spaces — you could use it to post your solution in the comments below!

Posted by programmingpraxis

Filed in Exercises

10 Comments »

10 Responses to “Entab And Detab”

Remco Niemeijer said

May 6, 2011 at 10:55 AM

My Haskell solution (see http://bonsaicode.wordpress.com/2011/05/06/programming-praxis-entab-and-detab/ for a version with comments):

import Text.Regex

detab :: Int -> String -> String
detab w s = subRegex (mkRegex "\t") s (replicate w ' ')

entab :: Int -> String -> String
entab w = unlines . map f . lines where
    f s = replicate tabs '\t' ++ replicate spaces ' ' ++ line where
        (indent, line) = span (`elem` " \t") s
        (tabs, spaces) = divMod (sum $ map width indent) w
    width c = if c == '\t' then w else 1

Axio said
May 6, 2011 at 1:14 PM
Remko: you change all tabs/spaces. You should only consider the ones at the head of a line. And they may be mixed.
Axio said
May 6, 2011 at 1:15 PM
“Remco”, sorry for the spelling error.
Remco Niemeijer said
May 6, 2011 at 3:10 PM
Axio: Correct, detab changes all tabs, since this is the behaviour of the provided solution. entab only process spaces at the start of the line. Mixed spaces and tabs are already handled correctly.
Axio said
May 6, 2011 at 3:35 PM
Gambit-C Scheme, and some macros inspired by Common Lisp…
Not the most beautiful code, and no magic involved.
Will handle mixed tabs and spaces on same line, and stop at the first non-space-nor-tab character.
Procedures to apply to each line of a loaded file.

I think that’s pretty much it…

(define *tab-width* 4)
(define (flush seen) (unless (zero? seen) (for-each (lambda (x) (display " ")) (iota 1 seen)))) (define (entab-line line #!optional (tab-width *tab-width*)) (let ((sl (string-length line))) (let loop ((pos 0) (seen 0)) (if (= pos sl) (flush seen) (case (string-ref line pos) ((#\space) (if (= seen (- *tab-width* 1)) (begin (display #\tab) (loop (1+ pos) 0)) (loop (1+ pos) (1+ seen)))) ((#\tab) (flush seen) (display #\tab) (loop (1+ pos) 0)) (else (flush seen) (display (substring line pos sl))))))))
(define (detab-line line #!optional (tab-width *tab-width*)) (let ((sl (string-length line))) (let loop ((pos 0)) (unless (= pos sl) (case (string-ref line pos) ((#\space) (display " ") (loop (1+ pos))) ((#\tab) (flush tab-width) (loop (1+ pos))) (else (display (substring line pos sl))))))))
Axio said
May 6, 2011 at 3:38 PM
With better indentation, hopefully.
(define *tab-width* 4) ; (define (flush seen) (unless (zero? seen) (for-each (lambda (x) (display " ")) (iota 1 seen)))) ; (define (entab-line line #!optional (tab-width *tab-width*)) (let ((sl (string-length line))) (let loop ((pos 0) (seen 0)) (if (= pos sl) (flush seen) (case (string-ref line pos) ((#\space) (if (= seen (- *tab-width* 1)) (begin (display #\tab) (loop (1+ pos) 0)) (loop (1+ pos) (1+ seen)))) ((#\tab) (flush seen) (display #\tab) (loop (1+ pos) 0)) (else (flush seen) (display (substring line pos sl)))))))) ; (define (detab-line line #!optional (tab-width *tab-width*)) (let ((sl (string-length line))) (let loop ((pos 0)) (unless (= pos sl) (case (string-ref line pos) ((#\space) (display " ") (loop (1+ pos))) ((#\tab) (flush tab-width) (loop (1+ pos))) (else (display (substring line pos sl))))))))

Lautaro Pecile said

May 6, 2011 at 7:42 PM

class Start(object):
    def __init__(self, machine):
        self.machine = machine
    def __call__(self, c):
        if c == '\t':
            return c
        elif c == ' ':
            self.machine.spaces += 1
            if self.machine.spaces % self.machine.tablen == 0:
                self.machine.spaces = 0
                return '\t'
            return ''
        else:
            self.machine.state = End()
            return ' ' * self.machine.spaces + c

        
class End(object):
    def __call__(self, c):
        return c


class Machine(object):
    """
    A state machine. Transforms spaces into tabs until it finds 
        the first non whitespace character.
    There are two states in this machine, represented by the
        classes Start and End.
    """
    def __init__(self, tablen=8):
        self.state = Start(self)
        self.spaces = 0
        self.tablen = tablen

    def __call__(self, c):
        return self.state(c)


def entab(s, tablen=8):
    m = Machine(tablen)
    return ''.join(map(m, s))


def detab(s, tablen=8):
    return s.expandtabs(tablen)

arturasl said

May 8, 2011 at 10:10 AM

My solution in C:

#include      <string.h>
#include      <stdio.h>

int main(int argc, char **argv) {
	int bEntab = 0, nSize = 4;
	int chCur, bStartOfLine = 1, i, nSpaces = 0;

	while (--argc) {
		++argv;
		bEntab = (strcmp(*argv, "e") == 0);
	}

	while ((chCur = getchar()) != EOF) {
		if (bEntab == 1 && chCur == ' ' && bStartOfLine == 1) {
			++nSpaces;

			if (nSpaces == nSize) {
				putchar('\t');
				nSpaces = 0;
			}
		} else if (bEntab == 0 && chCur == '\t' && bStartOfLine == 1) {
			for (i = 0; i < nSize; ++i) {
				putchar(' ');
			}
		} else if (chCur == '\n') {
			putchar(chCur);
			bStartOfLine = 1;
			nSpaces = 0;
		} else {
			putchar(chCur);
			bStartOfLine = 0;
		}
	}

	return 0;
}

The C Programming Language (K&R) Chapter 5 Ex 10-20 « 盲頭烏蠅 said
August 5, 2011 at 5:02 PM
[…] – entab and detab are used to handle problems on copy-and-paste from text files (ref) […]
siri said
June 1, 2012 at 10:39 AM
Write a program detab that replaces tabs in the input with the proper number of blanks to space to the next tab stop. Assume a fixed set of tab stops, say every n columns. Should n be a variable or a symbolic parameter?

S	M	T	W	T	F	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

Programming Praxis

Entab And Detab

May 6, 2011

10 Responses to “Entab And Detab”

Leave a comment

Categories

Archives

Archives

Programming Praxis

Entab And Detab

May 6, 2011

Share this:

Related

10 Responses to “Entab And Detab”

Leave a comment

Categories

Archives

Archives