CSV To HTML

March 17, 2020

Here is my version of a CSV to HTML converter; I’m still experimenting with various CSS formatting of the table for display and printing, so that part of the program is missing. The fieldlist is my attempt to create a mini-language for specifying formatting; an earlier version of the program used isnumeric to add commas to all numeric fields, which meant that things like the year 2020 and five-digit zip codes got commas, but shouldn’t. The program is already useful in its present form, and provides a useful base for improvements.

# CSVtoHTML -- neatly print CSV file using HTML markup
#
# call as:
#
#     gawk -f csvtohtml.awk \
#          -v nomenclature="..." \
#          -v title="..." \
#          -v subtitle="..." \
#          -v user="..." \
#          -v dbname="..." \
#          -v fieldlist="..." \
#          infile > outfile
#
#     fieldlist contains space-separated field definitions with
#     input field number, optional comma, optional alignment
#     where alignment may be L, C or R
#
# for example:
#
#     gawk -f csvtohtml.awk \
#          -v nomenclature="AZPSAWD" \
#          -v title="Scholarship Awards" \
#          -v subtitle="Fiscal Year = $(getparm 01)" \
#          -v user="${BAN9UID}" \
#          -v dbname="${ORACLE_SID}" \
#          -v fieldlist="1L 2L 4L 5L 6L 7L 8L 9L 10L 11L 12L 13,R" \
#          azpsawd_${ONE_UP}.csv > azpsawd_${ONE_UP}.html
#
#     note that column 3 is omitted from the output
#
# the only tricky bit is the expression arr[fs[i]+0] that looks
# up the field definition (for example: 13,R) from the field list
# fs, extracts the field number from the field definition (adding
# 0 forces the ,R to be ignored and the expression 13,R to be
# converted to the number 13), then looks up the appropriate value
# in the array of values arr; is awk bad-ass or what?

function addcomma(num,    arr) {
    split(num, arr, "."); arr[1] = arr[1] "."
    while (arr[1] ~ /[0-9][0-9][0-9][0-9]/) {
        sub(/[0-9][0-9][0-9][,.]/, ",&", arr[1]) }
    if (arr[2] != "") { return arr[1] arr[2] }
    else { sub(/\.$/, "", arr[1]); return arr[1] } }

function csvsplit(str, arr,     i,j,n,s,fs,qt) {
    # split comma-separated fields into arr; return number of fields in arr
    # fields surrounded by double-quotes may contain commas;
    #     doubled double-quotes represent a single embedded quote
    # embedded, quoted newlines are handled improperly
    delete arr; s = "START"; n = 0; fs = ","; qt = "\""
    for (i = 1; i <= length(str); i++) {
        if (s == "START") {
            if (substr(str,i,1) == fs) { arr[++n] = "" }
            else if (substr(str,i,1) == qt) { j = i+1; s = "INQUOTES" }
            else { j = i; s = "INFIELD" } }
        else if (s == "INFIELD") {
            if (substr(str,i,1) == fs) {
                arr[++n] = substr(str,j,i-j); j = 0; s = "START" } }
        else if (s == "INQUOTES") {
            if (substr(str,i,1) == qt) { s = "MAYBEDOUBLE" } }
        else if (s == "MAYBEDOUBLE") {
            if (substr(str,i,1) == fs) {
                arr[++n] = substr(str,j,i-j-1)
                gsub(qt qt, qt, arr[n]); j = 0; s = "START" } } }
    if (s == "INFIELD" || s == "INQUOTES") { arr[++n] = substr(str,j) }
    else if (s == "MAYBEDOUBLE") {
        arr[++n] = substr(str,j,length(str)-j); gsub(qt qt, qt, arr[n]) }
    else if (s == "START") { arr[++n] = "" }
    return n }    

function isnumeric(x) {
    return x ~ /^[+-]?(([0-9]+([.][0-9]*)?)|([.][0-9]+))$/ }

BEGIN { "date +%m/%d/%Y" | getline today }

BEGIN { print ""
        # print css prelude
        print ""
        print "
"
        print ""
        print "
" title ""
        print "
" today "
"
        print "
"
        print "
" subtitle ""
        print "
" dbname "
"
        print "

” nomenclature “

” user “

”
print ”

"
        m = split(fieldlist, fs) }

NR == 1 { # print header row
    gsub(/</, "\\/, "\\>"); gsub(/&/, "\\&")
    print ""
    n = csvsplit($0, arr)
    for (i = 1; i  1 { # print detail rows
    gsub(/</, "\\/, "\\>"); gsub(/&/, "\\&")
    print "
"
    n = csvsplit($0, arr)
    for (i = 1; i <= m; i++) {
        if (fs[i] ~ /R/) { align = "align=\"right\"" }
        else if (fs[i] ~ /C/) { align = "align=\"center\"" }
        else { align = "align=\" left\"" }
        if (fs[i] ~ /,/) { val = addcomma(arr[fs[i]+0]) }
        else { val = arr[fs[i]+0] }
        print "
" val "" }
    print "" }

END { print "

” arr[fs[i]+0] “

" }

Posted by programmingpraxis

Filed in Exercises

4 Comments »

4 Responses to “CSV To HTML”

Daniel said

March 17, 2020 at 6:16 PM

Here’s a solution in Python.

import csv
import html
import os
import sys

assert len(sys.argv) == 2

html_lines = ['<html>', '<head></head>', '<body>', '<table>']
with open(sys.argv[1]) as csvfile:
    reader = csv.DictReader(csvfile)
    html_lines.append('  <tr>')
    for name in reader.fieldnames:
        html_lines.append('    <th>' + html.escape(name) + '</th>')
    html_lines.append('  </tr>')
    for row in reader:
        html_lines.append('  <tr>')
        for name in reader.fieldnames:
            html_lines.append('    <td>' + html.escape(row[name]) + '</td>')
        html_lines.append('  </tr>')
html_lines.extend(('</table>', '</body>', '</html>'))
print(os.linesep.join(html_lines))

Example Output:

<html>
<head></head>
<body>
<table>
  <tr>
    <th>Year</th>
    <th>Make</th>
    <th>Model</th>
  </tr>
  <tr>
    <td>1997</td>
    <td>Ford</td>
    <td>E350</td>
  </tr>
  <tr>
    <td>2000</td>
    <td>Mercury</td>
    <td>Cougar</td>
  </tr>
</table>
</body>
</html>

Year
Make
Model

1997
Ford
E350

2000
Mercury
Cougar

Daniel said
March 17, 2020 at 6:20 PM
In my last comment, I tried pasting the HTML for the table itself, but it does not appear to work. Here’s another attempt, this time with the newlines removed.

YearMakeModel1997FordE3502000MercuryCougar
John Martin said
April 8, 2020 at 10:09 AM
Great post! I really learned a lot from it.
penpoe said
April 12, 2020 at 3:57 PM
Racket: https://github.com/xojoc/programming-challenges/blob/master/programming-praxis/2020_03_17.rkt

S	M	T	W	T	F	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

Programming Praxis

CSV To HTML

March 17, 2020

4 Responses to “CSV To HTML”

Leave a comment

Categories

Archives

Archives

Programming Praxis

CSV To HTML

March 17, 2020

Share this:

Related

4 Responses to “CSV To HTML”

Leave a comment

Categories

Archives

Archives