CSV To HTML
March 17, 2020
Here is my version of a CSV to HTML converter; I’m still experimenting with various CSS formatting of the table for display and printing, so that part of the program is missing. The fieldlist is my attempt to create a mini-language for specifying formatting; an earlier version of the program used isnumeric to add commas to all numeric fields, which meant that things like the year 2020 and five-digit zip codes got commas, but shouldn’t. The program is already useful in its present form, and provides a useful base for improvements.
# CSVtoHTML -- neatly print CSV file using HTML markup
#
# call as:
#
# gawk -f csvtohtml.awk \
# -v nomenclature="..." \
# -v title="..." \
# -v subtitle="..." \
# -v user="..." \
# -v dbname="..." \
# -v fieldlist="..." \
# infile > outfile
#
# fieldlist contains space-separated field definitions with
# input field number, optional comma, optional alignment
# where alignment may be L, C or R
#
# for example:
#
# gawk -f csvtohtml.awk \
# -v nomenclature="AZPSAWD" \
# -v title="Scholarship Awards" \
# -v subtitle="Fiscal Year = $(getparm 01)" \
# -v user="${BAN9UID}" \
# -v dbname="${ORACLE_SID}" \
# -v fieldlist="1L 2L 4L 5L 6L 7L 8L 9L 10L 11L 12L 13,R" \
# azpsawd_${ONE_UP}.csv > azpsawd_${ONE_UP}.html
#
# note that column 3 is omitted from the output
#
# the only tricky bit is the expression arr[fs[i]+0] that looks
# up the field definition (for example: 13,R) from the field list
# fs, extracts the field number from the field definition (adding
# 0 forces the ,R to be ignored and the expression 13,R to be
# converted to the number 13), then looks up the appropriate value
# in the array of values arr; is awk bad-ass or what?
function addcomma(num, arr) {
split(num, arr, "."); arr[1] = arr[1] "."
while (arr[1] ~ /[0-9][0-9][0-9][0-9]/) {
sub(/[0-9][0-9][0-9][,.]/, ",&", arr[1]) }
if (arr[2] != "") { return arr[1] arr[2] }
else { sub(/\.$/, "", arr[1]); return arr[1] } }
function csvsplit(str, arr, i,j,n,s,fs,qt) {
# split comma-separated fields into arr; return number of fields in arr
# fields surrounded by double-quotes may contain commas;
# doubled double-quotes represent a single embedded quote
# embedded, quoted newlines are handled improperly
delete arr; s = "START"; n = 0; fs = ","; qt = "\""
for (i = 1; i <= length(str); i++) {
if (s == "START") {
if (substr(str,i,1) == fs) { arr[++n] = "" }
else if (substr(str,i,1) == qt) { j = i+1; s = "INQUOTES" }
else { j = i; s = "INFIELD" } }
else if (s == "INFIELD") {
if (substr(str,i,1) == fs) {
arr[++n] = substr(str,j,i-j); j = 0; s = "START" } }
else if (s == "INQUOTES") {
if (substr(str,i,1) == qt) { s = "MAYBEDOUBLE" } }
else if (s == "MAYBEDOUBLE") {
if (substr(str,i,1) == fs) {
arr[++n] = substr(str,j,i-j-1)
gsub(qt qt, qt, arr[n]); j = 0; s = "START" } } }
if (s == "INFIELD" || s == "INQUOTES") { arr[++n] = substr(str,j) }
else if (s == "MAYBEDOUBLE") {
arr[++n] = substr(str,j,length(str)-j); gsub(qt qt, qt, arr[n]) }
else if (s == "START") { arr[++n] = "" }
return n }
function isnumeric(x) {
return x ~ /^[+-]?(([0-9]+([.][0-9]*)?)|([.][0-9]+))$/ }
BEGIN { "date +%m/%d/%Y" | getline today }
BEGIN { print ""
# print css prelude
print ""
print "
"
print ""
print "
" title ""
print "
" today "
"
print "
"
print "
" subtitle ""
print "
" dbname "
"
print "
| ” nomenclature “ |
| ” user “ |
”
print ”
"
m = split(fieldlist, fs) }
NR == 1 { # print header row
gsub(/</, "\\/, "\\>"); gsub(/&/, "\\&")
print ""
n = csvsplit($0, arr)
for (i = 1; i 1 { # print detail rows
gsub(/</, "\\/, "\\>"); gsub(/&/, "\\&")
print "
"
n = csvsplit($0, arr)
for (i = 1; i <= m; i++) {
if (fs[i] ~ /R/) { align = "align=\"right\"" }
else if (fs[i] ~ /C/) { align = "align=\"center\"" }
else { align = "align=\" left\"" }
if (fs[i] ~ /,/) { val = addcomma(arr[fs[i]+0]) }
else { val = arr[fs[i]+0] }
print "
" val "" }
print "" }
END { print "
” arr[fs[i]+0] “
" }
4 Responses to “CSV To HTML”
Leave a comment
Here’s a solution in Python.
import csv import html import os import sys assert len(sys.argv) == 2 html_lines = ['<html>', '<head></head>', '<body>', '<table>'] with open(sys.argv[1]) as csvfile: reader = csv.DictReader(csvfile) html_lines.append(' <tr>') for name in reader.fieldnames: html_lines.append(' <th>' + html.escape(name) + '</th>') html_lines.append(' </tr>') for row in reader: html_lines.append(' <tr>') for name in reader.fieldnames: html_lines.append(' <td>' + html.escape(row[name]) + '</td>') html_lines.append(' </tr>') html_lines.extend(('</table>', '</body>', '</html>')) print(os.linesep.join(html_lines))Example Output:
<html> <head></head> <body> <table> <tr> <th>Year</th> <th>Make</th> <th>Model</th> </tr> <tr> <td>1997</td> <td>Ford</td> <td>E350</td> </tr> <tr> <td>2000</td> <td>Mercury</td> <td>Cougar</td> </tr> </table> </body> </html>Year
Make
Model
1997
Ford
E350
2000
Mercury
Cougar
In my last comment, I tried pasting the HTML for the table itself, but it does not appear to work. Here’s another attempt, this time with the newlines removed.
YearMakeModel1997FordE3502000MercuryCougar
Great post! I really learned a lot from it.
Racket: https://github.com/xojoc/programming-challenges/blob/master/programming-praxis/2020_03_17.rkt