CSV To HTML

March 17, 2020

Here is my version of a CSV to HTML converter; I’m still experimenting with various CSS formatting of the table for display and printing, so that part of the program is missing. The fieldlist is my attempt to create a mini-language for specifying formatting; an earlier version of the program used isnumeric to add commas to all numeric fields, which meant that things like the year 2020 and five-digit zip codes got commas, but shouldn’t. The program is already useful in its present form, and provides a useful base for improvements.

# CSVtoHTML -- neatly print CSV file using HTML markup
#
# call as:
#
#     gawk -f csvtohtml.awk \
#          -v nomenclature="..." \
#          -v title="..." \
#          -v subtitle="..." \
#          -v user="..." \
#          -v dbname="..." \
#          -v fieldlist="..." \
#          infile > outfile
#
#     fieldlist contains space-separated field definitions with
#     input field number, optional comma, optional alignment
#     where alignment may be L, C or R
#
# for example:
#
#     gawk -f csvtohtml.awk \
#          -v nomenclature="AZPSAWD" \
#          -v title="Scholarship Awards" \
#          -v subtitle="Fiscal Year = $(getparm 01)" \
#          -v user="${BAN9UID}" \
#          -v dbname="${ORACLE_SID}" \
#          -v fieldlist="1L 2L 4L 5L 6L 7L 8L 9L 10L 11L 12L 13,R" \
#          azpsawd_${ONE_UP}.csv > azpsawd_${ONE_UP}.html
#
#     note that column 3 is omitted from the output
#
# the only tricky bit is the expression arr[fs[i]+0] that looks
# up the field definition (for example: 13,R) from the field list
# fs, extracts the field number from the field definition (adding
# 0 forces the ,R to be ignored and the expression 13,R to be
# converted to the number 13), then looks up the appropriate value
# in the array of values arr; is awk bad-ass or what?

function addcomma(num,    arr) {
    split(num, arr, "."); arr[1] = arr[1] "."
    while (arr[1] ~ /[0-9][0-9][0-9][0-9]/) {
        sub(/[0-9][0-9][0-9][,.]/, ",&", arr[1]) }
    if (arr[2] != "") { return arr[1] arr[2] }
    else { sub(/\.$/, "", arr[1]); return arr[1] } }

function csvsplit(str, arr,     i,j,n,s,fs,qt) {
    # split comma-separated fields into arr; return number of fields in arr
    # fields surrounded by double-quotes may contain commas;
    #     doubled double-quotes represent a single embedded quote
    # embedded, quoted newlines are handled improperly
    delete arr; s = "START"; n = 0; fs = ","; qt = "\""
    for (i = 1; i <= length(str); i++) {
        if (s == "START") {
            if (substr(str,i,1) == fs) { arr[++n] = "" }
            else if (substr(str,i,1) == qt) { j = i+1; s = "INQUOTES" }
            else { j = i; s = "INFIELD" } }
        else if (s == "INFIELD") {
            if (substr(str,i,1) == fs) {
                arr[++n] = substr(str,j,i-j); j = 0; s = "START" } }
        else if (s == "INQUOTES") {
            if (substr(str,i,1) == qt) { s = "MAYBEDOUBLE" } }
        else if (s == "MAYBEDOUBLE") {
            if (substr(str,i,1) == fs) {
                arr[++n] = substr(str,j,i-j-1)
                gsub(qt qt, qt, arr[n]); j = 0; s = "START" } } }
    if (s == "INFIELD" || s == "INQUOTES") { arr[++n] = substr(str,j) }
    else if (s == "MAYBEDOUBLE") {
        arr[++n] = substr(str,j,length(str)-j); gsub(qt qt, qt, arr[n]) }
    else if (s == "START") { arr[++n] = "" }
    return n }    

function isnumeric(x) {
    return x ~ /^[+-]?(([0-9]+([.][0-9]*)?)|([.][0-9]+))$/ }

BEGIN { "date +%m/%d/%Y" | getline today }

BEGIN { print ""
        # print css prelude
        print ""
        print "
"
        print ""
        print "
" title ""
        print "
" today "
"
        print "
"
        print "
" subtitle ""
        print "
" dbname "
"
        print "
” nomenclature “
” user “


print ”

"
        m = split(fieldlist, fs) }

NR == 1 { # print header row
    gsub(/</, "\\/, "\\>"); gsub(/&/, "\\&")
    print ""
    n = csvsplit($0, arr)
    for (i = 1; i  1 { # print detail rows
    gsub(/</, "\\/, "\\>"); gsub(/&/, "\\&")
    print "
"
    n = csvsplit($0, arr)
    for (i = 1; i <= m; i++) {
        if (fs[i] ~ /R/) { align = "align=\"right\"" }
        else if (fs[i] ~ /C/) { align = "align=\"center\"" }
        else { align = "align=\" left\"" }
        if (fs[i] ~ /,/) { val = addcomma(arr[fs[i]+0]) }
        else { val = arr[fs[i]+0] }
        print "
" val "" }
    print "" }

END { print "
” arr[fs[i]+0] “

 

" }

Pages: 1 2

2 Responses to “CSV To HTML”

  1. Daniel said

    Here’s a solution in Python.

    import csv
    import html
    import os
    import sys
    
    assert len(sys.argv) == 2
    
    html_lines = ['<html>', '<head></head>', '<body>', '<table>']
    with open(sys.argv[1]) as csvfile:
        reader = csv.DictReader(csvfile)
        html_lines.append('  <tr>')
        for name in reader.fieldnames:
            html_lines.append('    <th>' + html.escape(name) + '</th>')
        html_lines.append('  </tr>')
        for row in reader:
            html_lines.append('  <tr>')
            for name in reader.fieldnames:
                html_lines.append('    <td>' + html.escape(row[name]) + '</td>')
            html_lines.append('  </tr>')
    html_lines.extend(('</table>', '</body>', '</html>'))
    print(os.linesep.join(html_lines))
    

    Example Output:

    <html>
    <head></head>
    <body>
    <table>
      <tr>
        <th>Year</th>
        <th>Make</th>
        <th>Model</th>
      </tr>
      <tr>
        <td>1997</td>
        <td>Ford</td>
        <td>E350</td>
      </tr>
      <tr>
        <td>2000</td>
        <td>Mercury</td>
        <td>Cougar</td>
      </tr>
    </table>
    </body>
    </html>
    

    Year
    Make
    Model

    1997
    Ford
    E350

    2000
    Mercury
    Cougar

  2. Daniel said

    In my last comment, I tried pasting the HTML for the table itself, but it does not appear to work. Here’s another attempt, this time with the newlines removed.

    YearMakeModel1997FordE3502000MercuryCougar

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: