Formatting Text, Again
October 10, 2014
Our solution today is written in Python, because I felt like it. We read the input and accumulate words into an array. There are cnt words on a line, so cnt - 1 holes to fill with spaces. If the extra blanks don’t exactly fit the number of holes, the extras are spread alternately from left and right on adjacent lines to avoid “rivers” of white space.
import sys
def justify(filename, width = 60):
with open(filename, 'r') as file:
words, cnt, size, dir = [], 0, 0, 0
for line in file:
if line in ['\n','\r\n']:
printline(width, words, cnt, size, dir, 'no')
words, cnt, size, dir = [], 0, 0, dir
print ''
continue
for word in line.split():
if cnt + size + len(word) > width:
printline(width, words, cnt, size, dir, 'yes')
words, cnt, size, dir = [], 0, 0, 1 - dir
words.append(word); cnt += 1; size += len(word)
printline(width, words, cnt, size, dir, 'no')
The dir variable is either 0 or 1; n uses dir to put extra spaces on the left or right side of the page. It is necessary to use sys.stdout.write to prevent Python from adding unnecessary spaces after each word.
def printline(width, words, cnt, size, dir, just):
if just == 'no' or cnt == 1:
for i in range(0, cnt):
sys.stdout.write(words[i])
sys.stdout.write('\n' if i == cnt - 1 else ' ')
elif cnt > 1:
nblanks, holes, i = width - size, cnt - 1, 0
while holes > 0:
n = int((nblanks - dir) / holes) + dir
sys.stdout.write(words[i] + ' ' * n)
nblanks -= n; holes -= 1; i += 1
print words[cnt - 1]
else:
print ''
Here’s an example:
>>> justify("gettysburg.txt", 40)
Four score and seven years ago our
fathers brought forth on this continent
a new nation, conceived in Liberty, and
dedicated to the proposition that all
men are created equal.
Now we are engaged in a great civil war,
testing whether that nation, or any
nation, so conceived and so dedicated,
can long endure. We are met on a great
battle-field of that war. We have come
to dedicate a portion of that field, as
a final resting place for those who here
gave their lives that that nation might
live. It is altogether fitting and
proper that we should do this.
But, in a larger sense, we can not
dedicate -- we can not consecrate -- we
can not hallow -- this ground. The brave
men, living and dead, who struggled
here, have consecrated it, far above our
poor power to add or detract. The world
will little note, nor long remember what
we say here, but it can never forget
what they did here. It is for us the
living, rather, to be dedicated here to
the unfinished work which they who
fought here so nobly advanced. It is
rather for us to be here dedicated to
the great task remaining before us --
that from these honored dead we take
increased devotion to that cause for
which they gave the last full measure of
devotion -- that we here highly resolve
that these dead shall not have died in
vain -- that this nation, under God,
shall have a new birth of freedom -- and
that government of the people, by the
people, for the people, shall not perish
from the earth.
You can run the program at http://programmingpraxis.codepad.org/1ich81ti. Much of the code is stolen from Exercise 5.13 of The Awk Programming Language by Al Aho, Brian Kernighan and Peter Weinberger.
I thought that might be the next stage. Here’s a version of my C++ program – it attempts to spread out the padding space evenly through the line (and goes in alternating directions – perhaps we should use the Morse-Thue sequence for this to ensure there aren’t any other regularities).
It’s a bit long-winded, but here it is:
#include <stdlib.h> #include <iostream> #include <sstream> #include <string> // spaces is the number of spaces already in the line // n is the desired length of the line void padline(std::string &line, int n, int spaces, int lineno) { int lsize = line.size(); if (lsize < n && spaces > 0) { int needed = n-lsize; line.resize(n); // Make room int t = 0; // Sort of a token bucket int nleft = spaces; int i = lsize-1; // Indexes for string copying int j = n-1; while (nleft > 0) { line[j] = line[i]; // Move a character up if (line[i] == ' ') { nleft--; int scount = 0; // Number of spaces to insert if (lineno%2) { t += needed; // Add some more tokens while (t >= spaces) { t -= spaces; scount++; } } else { t -= needed; // Reverse of above while (t < 0) { t += spaces; scount++; } } for (int i = 0; i < scount; i++) { line[--j] = ' '; } } --j; --i; } } } int main(int argc, char *argv[]) { std::string line; std::string outline; int current = 0; enum { START, NORMAL, PARA } state = START; int max = atoi(argv[1]); int spaces = 0; int lineno = 0; while(std::getline(std::cin, line)) { std::istringstream s(line); std::string word; bool empty = true; while(s >> word) { int wlen = word.size(); switch (state) { case START: state = NORMAL; break; case NORMAL: if (current + wlen > max) { current = 0; padline(outline,max,spaces,lineno); lineno++; std::cout << outline << "\n"; outline = ""; spaces = 0; } else { outline += " "; spaces++; } break; case PARA: current = 0; lineno++; std::cout << outline << "\n\n"; outline = ""; spaces = 0; state = NORMAL; break; } current += wlen + 1; outline += word; empty = false; } if (empty && state == NORMAL) state = PARA; } if (state != START) std::cout << outline << "\n"; }use strict; use warnings; my $width = shift @ARGV || 60; $/ = undef; print f($_) foreach split m{\n\s*\n}mxs, <>; sub pad { my ( $len, $ll, @line ) = @_; my $s = (@line-1)/2; my $sta = ($len - $ll)/$s+0.0001; my $n=0; foreach(0..($s-1)) { $n += $sta; $line[2*$_] .= q( )x($n+.5); $n -= int ($n+.5); } return @line; } sub f { my ($ll,@out,@line) = (-1); foreach( split m{\s+}, shift ) { if( $ll + 1 + length $_ > $width ) { push @out, pad( $width, $ll, @line ), "\n"; @line=(); $ll = -1; } else { push @line, q( ) if @line; } $ll+= 1 + length $_; push @line,$_; } push @out, @line,"\n\n"; return join q(), @out; }