Entab And Detab

May 6, 2011

Two of my favorite programming textbooks are Software Tools and Software Tools in Pascal, both by Brian W. Kernighan and P. J. Plauger. Our entab and detab programs are based on the corresponding programs in their books. We begin, as Kernighan and Plauger did, with detab. Here is their pseudo-code:

col := 1
while (getc(c) <> ENDFILE)
    if (c = TAB)
        print one or more blanks and update col
        until tabpos(col)
    else if (c = NEWLINE)
        putc(c)
        col := 1
    else
        putc(c)
        col := col + 1

Our Scheme version of the program is similar. We implement the loop “print one or more blanks” by a recursive call to loop that doesn’t read an additional character:

(define (detab n file-name)
  (with-input-from-file file-name
    (lambda ()
      (let loop ((c (read-char)) (col 1))
        (when (not (eof-object? c))
          (cond ((char=? c #\tab)
                  (display #\space)
                  (if (zero? (modulo col n))
                      (loop (read-char) (+ col 1))
                      (loop c (+ col 1))))
                ((char=? c #\newline)
                  (display c)
                  (loop (read-char) 1))
                (else (display c)
                      (loop (read-char) (+ col 1)))))))))

Regarding entab, Kernighan and Plauger give the following advice:

An easy way for entab to keep track of the blanks is to use another variable newcol that moves away from col as blanks are encountered. Whenever a tab is output, col is made to catch up to newcol. Then, when a non-blank character is encountered, if col is less than newcol there are excess blanks accumulated (not enough to be replaced by a tab) which must be output before the character can be.

Their pseudo-code is shown below:

col := 1
repeat
    while (getc(c) = BLANK) { collect blanks }
        if (at tab stop)
            print a tab
    while (any blanks left over)
        put them out
    { c is now ENDFILE or non-blank }
    if (c <> ENDFILE)
        putc(c)
        if (c = NEWLINE)
            col := 1
        else
            col := col + 1
until (c = ENDFILE)

That’s a little bit harder to translate to Scheme, for two reasons: first, Scheme doesn’t have a repeatuntil control structure; second, Scheme handles the end-of-file marker out-of-band, so there is no direct comparison between characters and the end-of-file marker. The biggest consequence is that we might have work to do after reading the end-of-file marker, in the case where there are some unprocessed space characters remaining. Here’s our code:

(define (entab n file-name)
  (with-input-from-file file-name
    (lambda ()
      (let loop ((c (read-char)) (col 1) (newcol 1))
        (cond ((eof-object? c)
                (when (< col newcol)
                  (display #\space)
                  (loop c (+ col 1) newcol)))
              ((char=? c #\space)
                (if (zero? (modulo newcol n))
                    (begin (display #\tab)
                           (loop (read-char) (+ newcol 1) (+ newcol 1)))
                    (loop (read-char) col (+ newcol 1))))
              ((< col newcol)
                (display #\space)
                (loop c (+ col 1) newcol))
              ((char=? c #\newline)
                (display c)
                (loop (read-char) 1 1))
              (else (display c)
                    (loop (read-char) (+ col 1) (+ newcol 1))))))))

The code is reproduced at http://programmingpraxis.codepad.org/N44f2cDA.

By the way, it might also be useful to have a program like detab that writes html-style &nbsp; non-breaking spaces — you could use it to post your solution in the comments below!

Pages: 1 2

10 Responses to “Entab And Detab”

  1. My Haskell solution (see http://bonsaicode.wordpress.com/2011/05/06/programming-praxis-entab-and-detab/ for a version with comments):

    import Text.Regex
    
    detab :: Int -> String -> String
    detab w s = subRegex (mkRegex "\t") s (replicate w ' ')
    
    entab :: Int -> String -> String
    entab w = unlines . map f . lines where
        f s = replicate tabs '\t' ++ replicate spaces ' ' ++ line where
            (indent, line) = span (`elem` " \t") s
            (tabs, spaces) = divMod (sum $ map width indent) w
        width c = if c == '\t' then w else 1
    
  2. Axio said

    Remko: you change all tabs/spaces. You should only consider the ones at the head of a line. And they may be mixed.

  3. Axio said

    “Remco”, sorry for the spelling error.

  4. Axio: Correct, detab changes all tabs, since this is the behaviour of the provided solution. entab only process spaces at the start of the line. Mixed spaces and tabs are already handled correctly.

  5. Axio said

    Gambit-C Scheme, and some macros inspired by Common Lisp…
    Not the most beautiful code, and no magic involved.
    Will handle mixed tabs and spaces on same line, and stop at the first non-space-nor-tab character.
    Procedures to apply to each line of a loaded file.

    I think that’s pretty much it…


    (define *tab-width* 4)

    (define (flush seen)
    (unless (zero? seen)
    (for-each (lambda (x) (display " ")) (iota 1 seen))))

    (define (entab-line line #!optional (tab-width *tab-width*))
    (let ((sl (string-length line)))
    (let loop ((pos 0)
    (seen 0))
    (if (= pos sl)
    (flush seen)
    (case (string-ref line pos)
    ((#\space)
    (if (= seen (- *tab-width* 1))
    (begin
    (display #\tab)
    (loop (1+ pos)
    0))
    (loop (1+ pos)
    (1+ seen))))
    ((#\tab)
    (flush seen)
    (display #\tab)
    (loop (1+ pos)
    0))
    (else
    (flush seen)
    (display (substring line pos sl))))))))

    (define (detab-line line #!optional (tab-width *tab-width*))
    (let ((sl (string-length line)))
    (let loop ((pos 0))
    (unless (= pos sl)
    (case (string-ref line pos)
    ((#\space)
    (display " ")
    (loop (1+ pos)))
    ((#\tab)
    (flush tab-width)
    (loop (1+ pos)))
    (else
    (display (substring line pos sl))))))))

  6. Axio said

    With better indentation, hopefully.

    (define *tab-width* 4)
    ;
    (define (flush seen)
      (unless (zero? seen)
        (for-each (lambda (x) (display " ")) (iota 1 seen))))
    ;
    (define (entab-line line #!optional (tab-width *tab-width*))
      (let ((sl (string-length line)))
        (let loop ((pos 0)
                   (seen 0))
          (if (= pos sl)
            (flush seen)
            (case (string-ref line pos)
              ((#\space)
               (if (= seen (- *tab-width* 1))
                 (begin
                   (display #\tab)
                   (loop (1+ pos)
                         0))
                 (loop (1+ pos)
                       (1+ seen))))
              ((#\tab)
               (flush seen)
               (display #\tab)
               (loop (1+ pos)
                     0))
              (else
                (flush seen)
                (display (substring line pos sl))))))))
    ;
    (define (detab-line line #!optional (tab-width *tab-width*))
      (let ((sl (string-length line)))
        (let loop ((pos 0))
          (unless (= pos sl)
            (case (string-ref line pos)
              ((#\space)
               (display " ")
               (loop (1+ pos)))
              ((#\tab)
               (flush tab-width)
               (loop (1+ pos)))
              (else
                (display (substring line pos sl))))))))

  7. Lautaro Pecile said
    class Start(object):
        def __init__(self, machine):
            self.machine = machine
        def __call__(self, c):
            if c == '\t':
                return c
            elif c == ' ':
                self.machine.spaces += 1
                if self.machine.spaces % self.machine.tablen == 0:
                    self.machine.spaces = 0
                    return '\t'
                return ''
            else:
                self.machine.state = End()
                return ' ' * self.machine.spaces + c
    
            
    class End(object):
        def __call__(self, c):
            return c
    
    
    class Machine(object):
        """
        A state machine. Transforms spaces into tabs until it finds 
            the first non whitespace character.
        There are two states in this machine, represented by the
            classes Start and End.
        """
        def __init__(self, tablen=8):
            self.state = Start(self)
            self.spaces = 0
            self.tablen = tablen
    
        def __call__(self, c):
            return self.state(c)
    
    
    def entab(s, tablen=8):
        m = Machine(tablen)
        return ''.join(map(m, s))
    
    
    def detab(s, tablen=8):
        return s.expandtabs(tablen)
    
  8. arturasl said

    My solution in C:

    #include      <string.h>
    #include      <stdio.h>
    
    int main(int argc, char **argv) {
    	int bEntab = 0, nSize = 4;
    	int chCur, bStartOfLine = 1, i, nSpaces = 0;
    
    	while (--argc) {
    		++argv;
    		bEntab = (strcmp(*argv, "e") == 0);
    	}
    
    	while ((chCur = getchar()) != EOF) {
    		if (bEntab == 1 && chCur == ' ' && bStartOfLine == 1) {
    			++nSpaces;
    
    			if (nSpaces == nSize) {
    				putchar('\t');
    				nSpaces = 0;
    			}
    		} else if (bEntab == 0 && chCur == '\t' && bStartOfLine == 1) {
    			for (i = 0; i < nSize; ++i) {
    				putchar(' ');
    			}
    		} else if (chCur == '\n') {
    			putchar(chCur);
    			bStartOfLine = 1;
    			nSpaces = 0;
    		} else {
    			putchar(chCur);
    			bStartOfLine = 0;
    		}
    	}
    
    	return 0;
    }
    
  9. […] – entab and detab are used to handle problems on copy-and-paste from text files (ref) […]

  10. siri said

    Write a program detab that replaces tabs in the input with the proper number of blanks to space to the next tab stop. Assume a fixed set of tab stops, say every n columns. Should n be a variable or a symbolic parameter?

Leave a comment