Mini Markdown
January 21, 2020
The first step is to create an input file to be used for testing:
This is a text paragraph. # Heading level 1 This is another text paragraph. - List item 1 - List item 2 - List item 3 This is still another text paragraph. ### Heading level 3 This is the last text paragraph.
I’ll write the program in Awk, which is rapidly becoming my second-favorite language, because it has a “paragraph mode” that is very useful. From the Posix specification, in the definition of RS:
If RS is null, then records are separated by sequences consisting of a <newline> plus one or more blank lines, leading or trailing blank lines shall not result in empty records at the beginning or end of the input, and a <newline> shall always be a field separator, no matter what the value of FS is.
It pays to know the dark corners of your language. Here’s the code:
BEGIN { FS = OFS = "\n"; RS = "" } # paragraph mode
$1 ~ /^[#]{1,6} / {
if (inlist) { inlist = 0; print "
” } len = index($1, ” “) – 1 print “<h” len “>” substr($1,len+2) “</h” len “>” next } $1 ~ /^[-] / { if (! inlist) { inlist = 1; print ”
-
- ” } print ”
- ” substr($1,3) “
” next } { if (inlist) { inlist = 0; print ”
” } print ”
” $0 ”
” }
Variable inlist keeps track of whether or not the input is currently in a list, and writes list headers and trailers as needed. Here’s the output:
This is a text paragraph.
Heading level 1
This is another text paragraph.
-
- List item 1
- List item 2
- List item 3
This is still another text paragraph.
Heading level 3
This is the last text paragraph.
You can run the program at https://ideone.com/lat9aL.
Here’s a solution in Python.
@programmingpraxis, your solution seemingly does not add a closing
for list items occurring as the last elements of the input text.import os import sys assert len(sys.argv) == 2 with open(sys.argv[1]) as f: lines = [line for line in f.read().splitlines() if line] snippets = [] for line in lines: if line.startswith('-'): if snippets and snippets[-1] == ' </ul>': snippets.pop() else: snippets.append(' <ul>') snippets.append(' <li>' + line[1:].strip() + '</li>') snippets.append(' </ul>') elif line.startswith('#'): # WARN: this approach emits <h7>, <h8>, etc. line_ = line.lstrip('#') level = len(line) - len(line_) snippets.append(f' <h{level}>{line_.strip()}</h{level}>') else: snippets.append(' <p>' + line + '</p>') print('<html>' + os.linesep + '<body>') print(os.linesep.join(snippets)) print('</body>' + os.linesep + '</html>')Example Usage:
$ python3.7 markdown.py input.txt <html> <body> <p>This is a text paragraph.</p> <h1>Heading level 1</h1> <p>This is another text paragraph.</p> <ul> <li>List item 1</li> <li>List item 2</li> <li>List item 3</li> </ul> <p>This is still another text paragraph.</p> <h3>Heading level 3</h3> <p>This is the last text paragraph.</p> </body> </html>Here’s my same comment included above, this time with HTML escaping to try preventing dropped text.
@programmingpraxis, your solution seemingly does not add a closing
</ul>for list items occurring as the last elements of the input text.