Literate Programming

August 10, 2010

I wrote this fourteen years ago:

When Donald Knuth invented literate programming, he designed very big systems which included prettyprinting, macro processing and rudimentary version control systems, among other features. Even the more modern literate programming systems, which tend to be much smaller, include such features as automatic indexing of code chunks and identifiers. But the essence of literate programming is simply the rearranging of chunks of code, from an order which is convenient for the programmer to an order which is convenient for the compiler.
Described below is a complete literate programming system which is implemented in a dozen and a half lines of awk. It has almost no features. Input files consist of text, which is ignored, and chunks of code, which are rearranged as indicated by the programmer. This document is itself a literate program. The form of the input file is very simple. Code chunks are introduced by a line beginning with double less-than signs and ending with double greater-than signs and an equals sign; there may be no white space at the beginning or end of the line. Code chunks are referenced on any line within another code chunk by surrounding the name of the chunk, which must exactly match the name given on the definition line, with double less-than and greater-than signs; there may be only one reference per line. A code chunk ends at the first blank line following its beginning, or at the end of the file, whichever comes sooner. The action of the essential literate programming system is to read an input file, expand a root chunk named "*", and write the expanded file to standard output; that is, it performs the functions of the tangle program of most other literate programming systems. The program looks like this: <<*>>= <<read input and store chunks>> END { <<output tangled file>> } <<recursively tangle a chunk>> Reading the input file is simple. Lines are read and ignored until a chunk definition line is seen. Then, all of the following non-blank lines are saved. After the end of a chunk, lines are again read and ignored until the next chunk definition line is seen. <<read input and store chunks>>= /^<<.+>>=$/ { name = substr($0, 3, length($0) - 5) while (getline > 0) { if (length($0) == 0) next chunk[name, ++count[name]] = $0 } } It is necessary that the tangled output file preserve the indentation of the input, so that code included by reference is aligned at the same relative level of indentation as the reference itself. Function tangle takes two arguments: a chunk name to expand and a string containing the necessary white space to be prepended to each output line. Whenever tangle sees a code reference, it calls itself recursively to include the referenced code. <<recursively tangle a chunk>>= function tangle(name, prefix, i, tag, suffix) { for (i = 1; i <= count[name]; i++) { if (i == 2) gsub(/[^ \t]/, " ", prefix) if (match(chunk[name,i], /<<.+>>/)) { tag = substr(chunk[name,i], RSTART + 2, RLENGTH - 4) if (tag in count) { suffix = substr(chunk[name,i], RSTART + RLENGTH) tangle(tag, prefix substr(chunk[name,i], 1, RSTART - 1)) printf "%s", suffix } else printf "%s%s", prefix, chunk[name,i] } else printf "%s%s", prefix, chunk[name,i] if (i < count[name]) printf "\n" } } Basically, tangle encodes a depth-first search through the chunk-call graph, replacing each reference to a chunk with the text of the chunk. Keeping prefixes and suffixes straight adds some complication. Newlines are never printed at the end of a chunk, so that the suffix of the calling chunk can be printed. Likewise, that portion of an input line which precedes a chunk reference is passed to the lower-level chunk as part of the prefix, so it is printed at the beginning of the first line of the lower-level chunk. To write the output, all that is necessary is to start the recursive process of tangling the file by tangling the "*" chunk with no prefix. A printf adds the newline that ends the "*" chunk. <<output tangled file>>= tangle("*", ""); printf "\n" That's all there is. Unlike other literate programming systems, there is no weave command. The input file *is* the woven output. Simple, plain ascii. No formatting commands. No index of identifiers. No cross-referencing of chunks. Nothing. In fact, for programmers who are writing or maintaining a literate program, nothing else is necessary. Most of the time, you only see your code from within your favorite editor. Prettyprinting isn't available there, and searching with your editor is more convenient than a printed index of your program which is probably out of date anyway. The essence of literate programming is rearranging chunks of code, and a dozen and a half lines of awk is all you need for that.
Of course, with so little code it's not possible for everything to be perfect. There is no checking for bad or missing chunks. Cycles in the input will cause an infinite loop. And, worst of all, if your language uses double less-than or greater-than signs as an operator, the program can become confused. Even so, this microscopic system provides a useful tool that encompasses the essence of literate programming.

I still use that program today, having modified it only to recognize the beginning of a top-level chunk as any line beginning with a left parenthesis or semi-colon, which makes it suitable for Scheme programming. I also have a short Scheme function lload (for “literate load”) that calls the awk program and loads the tangled code into a running Scheme listener. I started once to re-write tangle in Scheme, and quickly got a basic version working, but I never used it — perhaps I was too lazy to replace a working program.

You can see the assembled awk program at http://programmingpraxis.codepad.org/WneWNXBF.

Posted by programmingpraxis

Filed in Exercises

4 Comments »

4 Responses to “Literate Programming”

Programming Praxis – Literate Programming « Bonsai Code said
August 10, 2010 at 3:01 PM
[…] Praxis – Literate Programming By Remco Niemeijer In today’s Programming Praxis exercise, our task is to write a program that converts K&R-style literate […]

Remco Niemeijer said

August 10, 2010 at 3:01 PM

My Haskell solution (see http://bonsaicode.wordpress.com/2010/08/10/programming-praxis-literate-programming/ for a version with comments):

import Control.Applicative hiding (many, (<|>))
import Text.Parsec

data Lit = Line String | Ref String String

def = (,) <$> (many literate *> chunk <* char '=' <* literate)
          <*> sepEndBy1 line newline where
    rol = many (noneOf "\n")
    upto end = manyTill anyChar (try $ end)
    chunk = string "<<" *> many space *> upto (many space *> string ">>")
    literate = notFollowedBy chunk *> rol <* newline
    line = try (Ref <$> many space <*> chunk <* rol)
           <|> (Line <$> many1 (noneOf "\n"))

tangle :: String -> String
tangle src = maybe "" (unlines . (f =<<)) $ lookup "*" defs where
    defs = either (error . show) id $ parse (many def) "" src
    f (Line s)  = [s]
    f (Ref n s) = map (n ++) . maybe [] (f =<<) $ lookup s defs

main :: IO ()
main = putStrLn . tangle =<< readFile "tangle.txt"

Aidan said
August 16, 2010 at 4:39 AM
Here’s my solution, in Python. Not tremendously elegant, but it’s simple. I was tempted to write classes and node traversers and whatnot… then I thought of this.
import sys
chunks = {} current = '*' filename = sys.argv[1] with open(filename,'r') as file: for line in file: line = line.rstrip() startdef = line.find('<<') enddef = line.find('>>=') if startdef != -1 and enddef != -1: current = line[startdef+2:enddef] chunks[current] = [] elif line == '': current = '' else: pass chunks[current].append(line) def processNode(name): root = chunks[name] for i in range(len(root)): line = root[i] startdef = line.find('<<') enddef = line.find('>>') if startdef != -1 and enddef != -1: root[i] = processNode(line[startdef+2:enddef]) return '\n'.join(root) print processNode('*')
Chez Scheme Setup | Programming Praxis said
October 3, 2017 at 9:01 AM
[…] Also loaded at each run of Chez Scheme are my literate programming environment described at https://programmingpraxis.com/2010/08/10/literate-programming and my interactive editing environment described at […]

S	M	T	W	T	F	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

Programming Praxis