Text Formatting

October 7, 2014

Text formatting is a huge topic. Today’s exercise looks at a simple text formatter. Input to the formatter is a file of ascii text; the input is free-form, except that paragraphs are marked by blank lines (two successive newline). The formatter copies the file to its output, moving text from one line to the previous line to fill each line as much as possible. It is possible to specify the width of a line, but if none is given the width defaults to sixty characters.

Your task is to write a simple text formatter. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

3 Responses to “Text Formatting”

  1. matthew said

    Well, if no one else is going to step up to the crease, here’s some C++. It uses a simple state machine to keep track of what needs to be done, in particular, in state BREAK it will insert a paragraph break before the next text output. Multiple blank lines are treated as one, blank lines at start and end are ignored. Words are read from an STL stringstream with the usual iostream operations. Input and output is just done through cin and cout. Probably best to do these things with a pipeline of filter functions of course.

    #include <stdlib.h>
    #include <iostream>
    #include <sstream>
    #include <string>
    
    int main(int argc, char *argv[])
    {
      int maxlen = 60;
      if (argc > 1) maxlen = strtol(argv[1],NULL,0);
      std::string line;
      int outlen = 0;
      enum { START, TEXT, BREAK } state = START;
      while (std::getline(std::cin, line)) {
        std::istringstream s(line);
        std::string word;
        bool empty = true;
        while(s >> word) {
          int wlen = word.size();
          switch (state) {
          case START:
            state = TEXT;
            break;
          case TEXT:
            if (outlen + wlen > maxlen) {
              outlen = 0;
              std::cout << "\n";
            } else {
              std::cout << " ";
            }
            break;
          case BREAK:
            outlen = 0;
            std::cout << "\n\n";
            state = TEXT;
            break;
          }
          outlen += wlen + 1;
          std::cout << word;
          empty = false;
        }
        if (empty && state == TEXT) state = BREAK;
      }
      if (state != START) std::cout << "\n";
    }
    
  2. Andras said

    Java solution. Handles spaces not perfect.

    
    
    public static class FormattedWriter extends Writer {
    
            private static final int DEFAULT_LINE_LENGTH = 60;
    
            private Writer writer;
            private final StringBuilder sb = new StringBuilder();
            private final int WIDTH;
    
            public FormattedWriter(Writer writer) {
                this(writer, DEFAULT_LINE_LENGTH);
            }
    
            public FormattedWriter(Writer writer, int lineWidth) {
                this.writer = writer;
                this.WIDTH = lineWidth;
            }
    
            @Override
            public void write(char[] cbuf, int off, int len) throws IOException {
                sb.append(cbuf, off, len);
                String actual = sb.toString();
                String[] paragraphs = actual.split(PS, -1);
                for (int i = 0; i < paragraphs.length; i++) {
                    String paragraph = paragraphs[i];
                    if (i < paragraphs.length - 1) {
                        writer.write(format(paragraph) + LS);
                    } else {
                        sb.setLength(0);
                        sb.append(paragraph);
                    }
                }
            }
    
            private String format(String paragraph) {
                String formatted = "";
                String ohneLS = paragraph.replace(LS, " ").trim();
                //            System.out.println("P: " + ohneLS);
    
                String[] words = ohneLS.split(" ", -1);
                String actualLine = "";
                for (String word : words) {
                    //                System.out.println("  W: " + word);
                    if (actualLine.length() + 1 + word.length() > WIDTH) {
                        if (actualLine.length() > 0) {
                            formatted += actualLine + LS;
                        }
                        actualLine = word;
                    } else {
                        actualLine += actualLine.length() == 0 ? word : " " + word;
                    }
                    while (actualLine.length() > WIDTH) {
                        formatted += actualLine.substring(0, WIDTH) + LS;
                        actualLine = actualLine.substring(WIDTH);
                    }
                    //                System.out.println("  AL: " + actualLine + " F:" + formatted.replaceAll(LS, "*"));
                }
                formatted += actualLine;
    
                return formatted;
            }
    
            @Override
            public void flush() throws IOException {
                writer.write(format(sb.toString()));
                writer.flush();
            }
    
            @Override
            public void close() throws IOException {
                flush();
                writer.close();
                writer = null;
            }
        }
    
  3. svenningsson said

    A Haskell solution. The line width argument is mandatory, not optional. It’s possible but tedious to make it optional.

    format width = unlines . map unwords . fmt width . map words . lines
    
    lineLength l = sum (map length l) + length l - 1
    
    fmt :: Int -> [[String]] -> [[String]]
    fmt width ([]:[]:rest) = []:[]:rest
    fmt width ([]:rest) = fmt width rest
    fmt width (s:(w:ws):rest)
      | lineLength (s++[w]) <= width = fmt width ((s++[w]): if null ws then rest else ws :rest)
    fmt width (s:(w:ws):rest) = s : fmt width ((w:ws):rest)
    fmt width (s:[]:rest) = s : [] : fmt width rest
    fmt width (s:[]) = [s]
    fmt width [] = []
    

Leave a comment