Literate Programming

August 10, 2010

I wrote this fourteen years ago:

When Donald Knuth invented literate programming, he designed very big
systems which included prettyprinting, macro processing and rudimentary
version control systems, among other features. Even the more modern
literate programming systems, which tend to be much smaller, include
such features as automatic indexing of code chunks and identifiers. But
the essence of literate programming is simply the rearranging of chunks
of code, from an order which is convenient for the programmer to an
order which is convenient for the compiler.

Described below is a complete literate programming system which is
implemented in a dozen and a half lines of awk. It has almost no
features. Input files consist of text, which is ignored, and chunks of
code, which are rearranged as indicated by the programmer. This
document is itself a literate program.

The form of the input file is very simple. Code chunks are introduced
by a line beginning with double less-than signs and ending with double
greater-than signs and an equals sign; there may be no white space at
the beginning or end of the line. Code chunks are referenced on any
line within another code chunk by surrounding the name of the chunk,
which must exactly match the name given on the definition line, with
double less-than and greater-than signs; there may be only one reference
per line. A code chunk ends at the first blank line following its
beginning, or at the end of the file, whichever comes sooner.

The action of the essential literate programming system is to read an
input file, expand a root chunk named "*", and write the expanded file
to standard output; that is, it performs the functions of the tangle
program of most other literate programming systems. The program looks
like this:

<<*>>=
<<read input and store chunks>>
END { <<output tangled file>> }
<<recursively tangle a chunk>>

Reading the input file is simple. Lines are read and ignored until a
chunk definition line is seen. Then, all of the following non-blank
lines are saved. After the end of a chunk, lines are again read and
ignored until the next chunk definition line is seen.

<<read input and store chunks>>=
/^<<.+>>=$/ {
    name = substr($0, 3, length($0) - 5)
    while (getline > 0) {
        if (length($0) == 0) next
        chunk[name, ++count[name]] = $0 } }

It is necessary that the tangled output file preserve the indentation of
the input, so that code included by reference is aligned at the same
relative level of indentation as the reference itself. Function tangle
takes two arguments: a chunk name to expand and a string containing the
necessary white space to be prepended to each output line. Whenever
tangle sees a code reference, it calls itself recursively to include the
referenced code.

<<recursively tangle a chunk>>=
function tangle(name, prefix,    i, tag, suffix) {
    for (i = 1; i <= count[name]; i++) {
        if (i == 2) gsub(/[^ \t]/, " ", prefix)
        if (match(chunk[name,i], /<<.+>>/)) {
            tag = substr(chunk[name,i], RSTART + 2, RLENGTH - 4)
            if (tag in count) {
                suffix = substr(chunk[name,i], RSTART + RLENGTH)
                tangle(tag, prefix substr(chunk[name,i], 1, RSTART - 1))
                printf "%s", suffix }
            else printf "%s%s", prefix, chunk[name,i] }
        else printf "%s%s", prefix, chunk[name,i]
        if (i < count[name]) printf "\n" } }

Basically, tangle encodes a depth-first search through the chunk-call
graph, replacing each reference to a chunk with the text of the chunk.
Keeping prefixes and suffixes straight adds some complication. Newlines
are never printed at the end of a chunk, so that the suffix of the
calling chunk can be printed. Likewise, that portion of an input line
which precedes a chunk reference is passed to the lower-level chunk as
part of the prefix, so it is printed at the beginning of the first line
of the lower-level chunk.

To write the output, all that is necessary is to start the recursive
process of tangling the file by tangling the "*" chunk with no prefix.
A printf adds the newline that ends the "*" chunk.

<<output tangled file>>=
tangle("*", ""); printf "\n"

That's all there is. Unlike other literate programming systems, there
is no weave command. The input file *is* the woven output. Simple,
plain ascii. No formatting commands. No index of identifiers. No
cross-referencing of chunks. Nothing.

In fact, for programmers who are writing or maintaining a literate
program, nothing else is necessary. Most of the time, you only see your
code from within your favorite editor. Prettyprinting isn't available
there, and searching with your editor is more convenient than a printed
index of your program which is probably out of date anyway. The essence
of literate programming is rearranging chunks of code, and a dozen and a
half lines of awk is all you need for that.

Of course, with so little code it's not possible for everything to be
perfect. There is no checking for bad or missing chunks. Cycles in the
input will cause an infinite loop. And, worst of all, if your language
uses double less-than or greater-than signs as an operator, the program
can become confused. Even so, this microscopic system provides a useful
tool that encompasses the essence of literate programming.

I still use that program today, having modified it only to recognize the beginning of a top-level chunk as any line beginning with a left parenthesis or semi-colon, which makes it suitable for Scheme programming. I also have a short Scheme function lload (for “literate load”) that calls the awk program and loads the tangled code into a running Scheme listener. I started once to re-write tangle in Scheme, and quickly got a basic version working, but I never used it — perhaps I was too lazy to replace a working program.

You can see the assembled awk program at http://programmingpraxis.codepad.org/WneWNXBF.

About these ads

Pages: 1 2

3 Responses to “Literate Programming”

  1. [...] Praxis – Literate Programming By Remco Niemeijer In today’s Programming Praxis exercise, our task is to write a program that converts K&R-style literate [...]

  2. Remco Niemeijer said

    My Haskell solution (see http://bonsaicode.wordpress.com/2010/08/10/programming-praxis-literate-programming/ for a version with comments):

    import Control.Applicative hiding (many, (<|>))
    import Text.Parsec
    
    data Lit = Line String | Ref String String
    
    def = (,) <$> (many literate *> chunk <* char '=' <* literate)
              <*> sepEndBy1 line newline where
        rol = many (noneOf "\n")
        upto end = manyTill anyChar (try $ end)
        chunk = string "<<" *> many space *> upto (many space *> string ">>")
        literate = notFollowedBy chunk *> rol <* newline
        line = try (Ref <$> many space <*> chunk <* rol)
               <|> (Line <$> many1 (noneOf "\n"))
    
    tangle :: String -> String
    tangle src = maybe "" (unlines . (f =<<)) $ lookup "*" defs where
        defs = either (error . show) id $ parse (many def) "" src
        f (Line s)  = [s]
        f (Ref n s) = map (n ++) . maybe [] (f =<<) $ lookup s defs
    
    main :: IO ()
    main = putStrLn . tangle =<< readFile "tangle.txt"
    
  3. Aidan said

    Here’s my solution, in Python. Not tremendously elegant, but it’s simple. I was tempted to write classes and node traversers and whatnot… then I thought of this.

    import sys

    chunks = {}
    current = '*'

    filename = sys.argv[1]

    with open(filename,'r') as file:
        for line in file:
            line = line.rstrip()
            startdef = line.find('<<')
            enddef = line.find('>>=')
            if startdef != -1 and enddef != -1:
                current = line[startdef+2:enddef]
                chunks[current] = []
            elif line == '':
                current = ''
            else:
                pass
                chunks[current].append(line)

    def processNode(name):
        root = chunks[name]
        for i in range(len(root)):
            line = root[i]
            startdef = line.find('<<')
            enddef = line.find('>>')
            if startdef != -1 and enddef != -1:
                root[i] = processNode(line[startdef+2:enddef])
        return '\n'.join(root)

    print processNode('*')

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 630 other followers

%d bloggers like this: