Mini Markdown

January 21, 2020

The first step is to create an input file to be used for testing:

This is a text paragraph.

# Heading level 1

This is another text paragraph.

- List item 1

- List item 2

- List item 3

This is still another text paragraph.

### Heading level 3

This is the last text paragraph.

I’ll write the program in Awk, which is rapidly becoming my second-favorite language, because it has a “paragraph mode” that is very useful. From the Posix specification, in the definition of RS:

If RS is null, then records are separated by sequences consisting of a <newline> plus one or more blank lines, leading or trailing blank lines shall not result in empty records at the beginning or end of the input, and a <newline> shall always be a field separator, no matter what the value of FS is.

It pays to know the dark corners of your language. Here’s the code:

BEGIN { FS = OFS = "\n"; RS = "" } # paragraph mode
$1 ~ /^[#]{1,6} / {
    if (inlist) { inlist = 0; print "

” } len = index($1, ” “) – 1 print “<h” len “>” substr($1,len+2) “</h” len “>” next } $1 ~ /^[-] / { if (! inlist) { inlist = 1; print ”

    • ” } print ”

    • ” substr($1,3) “

” next } { if (inlist) { inlist = 0; print ”

” } print ”

” $0 ”

” }

Variable inlist keeps track of whether or not the input is currently in a list, and writes list headers and trailers as needed. Here’s the output:

This is a text paragraph.

Heading level 1

This is another text paragraph.

    • List item 1
  • List item 2
  • List item 3

This is still another text paragraph.

Heading level 3

This is the last text paragraph.

You can run the program at https://ideone.com/lat9aL.

Advertisement

Pages: 1 2

2 Responses to “Mini Markdown”

  1. Daniel said

    Here’s a solution in Python.

    @programmingpraxis, your solution seemingly does not add a closing for list items occurring as the last elements of the input text.

    import os
    import sys
    
    assert len(sys.argv) == 2
    
    with open(sys.argv[1]) as f:
        lines = [line for line in f.read().splitlines() if line]
    snippets = []
    for line in lines:
        if line.startswith('-'):
            if snippets and snippets[-1] == '  </ul>':
                snippets.pop()
            else:
                snippets.append('  <ul>')
            snippets.append('    <li>' + line[1:].strip() + '</li>')
            snippets.append('  </ul>')
        elif line.startswith('#'):
            # WARN: this approach emits <h7>, <h8>, etc.
            line_ = line.lstrip('#')
            level = len(line) - len(line_)
            snippets.append(f'  <h{level}>{line_.strip()}</h{level}>')
        else:
            snippets.append('  <p>' + line + '</p>')
    
    print('<html>' + os.linesep + '<body>')
    print(os.linesep.join(snippets))
    print('</body>' + os.linesep + '</html>')
    

    Example Usage:

    $ python3.7 markdown.py input.txt
    <html>
    <body>
      <p>This is a text paragraph.</p>
      <h1>Heading level 1</h1>
      <p>This is another text paragraph.</p>
      <ul>
        <li>List item 1</li>
        <li>List item 2</li>
        <li>List item 3</li>
      </ul>
      <p>This is still another text paragraph.</p>
      <h3>Heading level 3</h3>
      <p>This is the last text paragraph.</p>
    </body>
    </html>
    
  2. Daniel said

    Here’s my same comment included above, this time with HTML escaping to try preventing dropped text.

    @programmingpraxis, your solution seemingly does not add a closing </ul> for list items occurring as the last elements of the input text.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: