Formatting Text, Again

October 10, 2014

Our solution today is written in Python, because I felt like it. We read the input and accumulate words into an array. There are cnt words on a line, so cnt - 1 holes to fill with spaces. If the extra blanks don’t exactly fit the number of holes, the extras are spread alternately from left and right on adjacent lines to avoid “rivers” of white space.

import sys

def justify(filename, width = 60):
    with open(filename, 'r') as file:
        words, cnt, size, dir = [], 0, 0, 0
        for line in file:
            if line in ['\n','\r\n']:
                printline(width, words, cnt, size, dir, 'no')
                words, cnt, size, dir = [], 0, 0, dir
                print ''
                continue
            for word in line.split():
                if cnt + size + len(word) > width:
                    printline(width, words, cnt, size, dir, 'yes')
                    words, cnt, size, dir = [], 0, 0, 1 - dir
                words.append(word); cnt += 1; size += len(word)
        printline(width, words, cnt, size, dir, 'no')

The dir variable is either 0 or 1; n uses dir to put extra spaces on the left or right side of the page. It is necessary to use sys.stdout.write to prevent Python from adding unnecessary spaces after each word.

def printline(width, words, cnt, size, dir, just):
    if just == 'no' or cnt == 1:
        for i in range(0, cnt):
            sys.stdout.write(words[i])
            sys.stdout.write('\n' if i == cnt - 1 else ' ')
    elif cnt > 1:
        nblanks, holes, i = width - size, cnt - 1, 0
        while holes > 0:
            n = int((nblanks - dir) / holes) + dir
            sys.stdout.write(words[i] + ' ' * n)
            nblanks -= n; holes -= 1; i += 1
        print words[cnt - 1]
    else:
        print ''

Here’s an example:

>>> justify("gettysburg.txt", 40)
Four  score  and  seven  years  ago  our
fathers  brought forth on this continent
a new nation, conceived in Liberty,  and
dedicated  to  the  proposition that all
men are created equal.

Now we are engaged in a great civil war,
testing  whether  that  nation,  or  any
nation, so conceived and  so  dedicated,
can  long  endure. We are met on a great
battle-field of that war. We  have  come
to  dedicate a portion of that field, as
a final resting place for those who here
gave  their lives that that nation might
live.  It  is  altogether  fitting   and
proper that we should do this.

But,  in  a  larger  sense,  we  can not
dedicate -- we can not consecrate --  we
can not hallow -- this ground. The brave
men,  living  and  dead,  who  struggled
here, have consecrated it, far above our
poor power to add or detract. The  world
will little note, nor long remember what
we say here, but  it  can  never  forget
what  they  did  here.  It is for us the
living, rather, to be dedicated here  to
the   unfinished  work  which  they  who
fought here so  nobly  advanced.  It  is
rather  for  us  to be here dedicated to
the great task remaining  before  us  --
that  from  these  honored  dead we take
increased devotion  to  that  cause  for
which they gave the last full measure of
devotion -- that we here highly  resolve
that  these  dead shall not have died in
vain -- that  this  nation,  under  God,
shall have a new birth of freedom -- and
that government of the  people,  by  the
people, for the people, shall not perish
from the earth.

You can run the program at http://programmingpraxis.codepad.org/1ich81ti. Much of the code is stolen from Exercise 5.13 of The Awk Programming Language by Al Aho, Brian Kernighan and Peter Weinberger.

Pages: 1 2

2 Responses to “Formatting Text, Again”

  1. matthew said

    I thought that might be the next stage. Here’s a version of my C++ program – it attempts to spread out the padding space evenly through the line (and goes in alternating directions – perhaps we should use the Morse-Thue sequence for this to ensure there aren’t any other regularities).

    It’s a bit long-winded, but here it is:

    #include <stdlib.h>
    #include <iostream>
    #include <sstream>
    #include <string>
    
    // spaces is the number of spaces already in the line
    // n is the desired length of the line
    void padline(std::string &line, int n, int spaces, int lineno)
    {
      int lsize = line.size();
      if (lsize < n && spaces > 0) {
        int needed = n-lsize;
        line.resize(n); // Make room
        int t = 0; // Sort of a token bucket
        int nleft = spaces;
        int i = lsize-1; // Indexes for string copying
        int j = n-1;
        while (nleft > 0) {
          line[j] = line[i]; // Move a character up
          if (line[i] == ' ') {
            nleft--;
            int scount = 0; // Number of spaces to insert
            if (lineno%2) {
              t += needed; // Add some more tokens
              while (t >= spaces) { t -= spaces; scount++; }
            } else {
              t -= needed; // Reverse of above
              while (t < 0) { t += spaces; scount++; }
            }
            for (int i = 0; i < scount; i++) { line[--j] = ' '; }
          }
          --j; --i;
        }
      }
    }
    
    int main(int argc, char *argv[])
    {
       std::string line;
       std::string outline;
       int current = 0;
       enum { START, NORMAL, PARA } state = START;
       int max = atoi(argv[1]);
       int spaces = 0;
       int lineno = 0;
       while(std::getline(std::cin, line)) {
          std::istringstream s(line);
          std::string word;
          bool empty = true;
          while(s >> word) {
             int wlen = word.size();
             switch (state) {
             case START:
                state = NORMAL;
                break;
             case NORMAL:
                if (current + wlen > max) {
                   current = 0;
                   padline(outline,max,spaces,lineno);
                   lineno++;
                   std::cout << outline << "\n";
                   outline = "";
                   spaces = 0;
                } else {
                   outline += " ";
                   spaces++;
                }
                break;
             case PARA:
                current = 0;
                lineno++;
                std::cout << outline << "\n\n";
                outline = "";
                spaces = 0;
                state = NORMAL;
                break;
             }
             current += wlen + 1;
             outline += word;
             empty = false;
          }
          if (empty && state == NORMAL) state = PARA;
       }
       if (state != START) std::cout << outline << "\n";
    }
    
  2. use strict;
    use warnings;
    
    my $width = shift @ARGV || 60;
    $/ = undef;
    
    print f($_) foreach split m{\n\s*\n}mxs, <>;
    
    sub pad {
      my ( $len, $ll, @line ) = @_;
      my $s   = (@line-1)/2;
      my $sta = ($len - $ll)/$s+0.0001;
      my $n=0;
      foreach(0..($s-1)) {
        $n += $sta;
        $line[2*$_] .= q( )x($n+.5);
        $n -= int ($n+.5);
      }
      return @line;
    }
    
    sub f {
      my ($ll,@out,@line) = (-1);
      foreach( split m{\s+}, shift ) {
        if( $ll + 1 + length $_ > $width ) {
          push @out, pad( $width, $ll, @line ), "\n";
          @line=();
          $ll = -1;
        } else {
          push @line, q( ) if @line;
        }
        $ll+= 1 + length $_;
        push @line,$_;
      }
      push @out, @line,"\n\n";
      return join q(), @out;
    }
    

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: