Text Formatting

October 7, 2014

This isn’t hard, but there are several special cases that we have to deal with. When words is empty, the first cond clause gets more words, handling end-of-file and blank lines by itself. The second cond clause handles multiple successive space characters, and the third cond clause handles words that are longer than width. The fourth cond clause adds the next word to the current line, and the final cond clause prints a full line and continues with the rest of the text:

(define (format file-name . args)
  (let ((width (if (null? args) 60 (car args))))
    (with-input-from-file file-name (lambda ()
      (let loop ((words (list)) (line ""))
        (cond ((null? words)
                (let ((in-line (read-line)))
                  (cond ((eof-object? in-line)
                          (print-line line))
                        ((string=? in-line "")
                          (print-line line)
                          (newline)
                          (loop words ""))
                        (else (loop (string-split #\space in-line) line)))))
              ((string=? (car words) "") (loop (cdr words) line))
              ((< width (string-length (car words)))
                (print-line (car words))
                (display (car words)) (newline)
                (loop (cdr words) ""))
              ((< (+ (string-length line) (string-length (car words))) width)
                (loop (cdr words) (string-append line " " (car words))))
              (else (print-line line) (loop words ""))))))))

Function format calls print-line to handle the actual printing, chopping off the space character that starts the line; it’s a separate function because it’s called in four places:

(define (print-line line)
  (when (positive? (string-length line))
    (display (substring line 1 (string-length line)))
    (newline)))

Here’s an example:

> (format "gettysburg.txt" 30)
Four score and seven years
ago our fathers brought forth
on this continent a new
nation, conceived in Liberty,
and dedicated to the
proposition that all men are
created equal.

Now we are engaged in a great
civil war, testing whether
that nation, or any nation,
so conceived and so
dedicated, can long endure.
We are met on a great
battle-field of that war. We
have come to dedicate a
portion of that field, as a
final resting place for those
who here gave their lives
that that nation might live.
It is altogether fitting and
proper that we should do
this.

But, in a larger sense, we
can not dedicate -- we can
not consecrate -- we can not
hallow -- this ground. The
brave men, living and dead,
who struggled here, have
consecrated it, far above our
poor power to add or detract.
The world will little note,
nor long remember what we say
here, but it can never forget
what they did here. It is for
us the living, rather, to be
dedicated here to the
unfinished work which they
who fought here so nobly
advanced. It is rather for us
to be here dedicated to the
great task remaining before
us -- that from these honored
dead we take increased
devotion to that cause for
which they gave the last full
measure of devotion -- that
we here highly resolve that
these dead shall not have
died in vain -- that this
nation, under God, shall have
a new birth of freedom -- and
that government of the
people, by the people, for
the people, shall not perish
from the earth.

You can run the program at http://programmingpraxis.codepad.org/5QN1g3TD.

Pages: 1 2

3 Responses to “Text Formatting”

  1. matthew said

    Well, if no one else is going to step up to the crease, here’s some C++. It uses a simple state machine to keep track of what needs to be done, in particular, in state BREAK it will insert a paragraph break before the next text output. Multiple blank lines are treated as one, blank lines at start and end are ignored. Words are read from an STL stringstream with the usual iostream operations. Input and output is just done through cin and cout. Probably best to do these things with a pipeline of filter functions of course.

    #include <stdlib.h>
    #include <iostream>
    #include <sstream>
    #include <string>
    
    int main(int argc, char *argv[])
    {
      int maxlen = 60;
      if (argc > 1) maxlen = strtol(argv[1],NULL,0);
      std::string line;
      int outlen = 0;
      enum { START, TEXT, BREAK } state = START;
      while (std::getline(std::cin, line)) {
        std::istringstream s(line);
        std::string word;
        bool empty = true;
        while(s >> word) {
          int wlen = word.size();
          switch (state) {
          case START:
            state = TEXT;
            break;
          case TEXT:
            if (outlen + wlen > maxlen) {
              outlen = 0;
              std::cout << "\n";
            } else {
              std::cout << " ";
            }
            break;
          case BREAK:
            outlen = 0;
            std::cout << "\n\n";
            state = TEXT;
            break;
          }
          outlen += wlen + 1;
          std::cout << word;
          empty = false;
        }
        if (empty && state == TEXT) state = BREAK;
      }
      if (state != START) std::cout << "\n";
    }
    
  2. Andras said

    Java solution. Handles spaces not perfect.

    
    
    public static class FormattedWriter extends Writer {
    
            private static final int DEFAULT_LINE_LENGTH = 60;
    
            private Writer writer;
            private final StringBuilder sb = new StringBuilder();
            private final int WIDTH;
    
            public FormattedWriter(Writer writer) {
                this(writer, DEFAULT_LINE_LENGTH);
            }
    
            public FormattedWriter(Writer writer, int lineWidth) {
                this.writer = writer;
                this.WIDTH = lineWidth;
            }
    
            @Override
            public void write(char[] cbuf, int off, int len) throws IOException {
                sb.append(cbuf, off, len);
                String actual = sb.toString();
                String[] paragraphs = actual.split(PS, -1);
                for (int i = 0; i < paragraphs.length; i++) {
                    String paragraph = paragraphs[i];
                    if (i < paragraphs.length - 1) {
                        writer.write(format(paragraph) + LS);
                    } else {
                        sb.setLength(0);
                        sb.append(paragraph);
                    }
                }
            }
    
            private String format(String paragraph) {
                String formatted = "";
                String ohneLS = paragraph.replace(LS, " ").trim();
                //            System.out.println("P: " + ohneLS);
    
                String[] words = ohneLS.split(" ", -1);
                String actualLine = "";
                for (String word : words) {
                    //                System.out.println("  W: " + word);
                    if (actualLine.length() + 1 + word.length() > WIDTH) {
                        if (actualLine.length() > 0) {
                            formatted += actualLine + LS;
                        }
                        actualLine = word;
                    } else {
                        actualLine += actualLine.length() == 0 ? word : " " + word;
                    }
                    while (actualLine.length() > WIDTH) {
                        formatted += actualLine.substring(0, WIDTH) + LS;
                        actualLine = actualLine.substring(WIDTH);
                    }
                    //                System.out.println("  AL: " + actualLine + " F:" + formatted.replaceAll(LS, "*"));
                }
                formatted += actualLine;
    
                return formatted;
            }
    
            @Override
            public void flush() throws IOException {
                writer.write(format(sb.toString()));
                writer.flush();
            }
    
            @Override
            public void close() throws IOException {
                flush();
                writer.close();
                writer = null;
            }
        }
    
  3. svenningsson said

    A Haskell solution. The line width argument is mandatory, not optional. It’s possible but tedious to make it optional.

    format width = unlines . map unwords . fmt width . map words . lines
    
    lineLength l = sum (map length l) + length l - 1
    
    fmt :: Int -> [[String]] -> [[String]]
    fmt width ([]:[]:rest) = []:[]:rest
    fmt width ([]:rest) = fmt width rest
    fmt width (s:(w:ws):rest)
      | lineLength (s++[w]) <= width = fmt width ((s++[w]): if null ws then rest else ws :rest)
    fmt width (s:(w:ws):rest) = s : fmt width ((w:ws):rest)
    fmt width (s:[]:rest) = s : [] : fmt width rest
    fmt width (s:[]) = [s]
    fmt width [] = []
    

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: