CSV To HTML
March 17, 2020
Here is my version of a CSV to HTML converter; I’m still experimenting with various CSS formatting of the table for display and printing, so that part of the program is missing. The fieldlist
is my attempt to create a mini-language for specifying formatting; an earlier version of the program used isnumeric
to add commas to all numeric fields, which meant that things like the year 2020 and five-digit zip codes got commas, but shouldn’t. The program is already useful in its present form, and provides a useful base for improvements.
# CSVtoHTML -- neatly print CSV file using HTML markup # # call as: # # gawk -f csvtohtml.awk \ # -v nomenclature="..." \ # -v title="..." \ # -v subtitle="..." \ # -v user="..." \ # -v dbname="..." \ # -v fieldlist="..." \ # infile > outfile # # fieldlist contains space-separated field definitions with # input field number, optional comma, optional alignment # where alignment may be L, C or R # # for example: # # gawk -f csvtohtml.awk \ # -v nomenclature="AZPSAWD" \ # -v title="Scholarship Awards" \ # -v subtitle="Fiscal Year = $(getparm 01)" \ # -v user="${BAN9UID}" \ # -v dbname="${ORACLE_SID}" \ # -v fieldlist="1L 2L 4L 5L 6L 7L 8L 9L 10L 11L 12L 13,R" \ # azpsawd_${ONE_UP}.csv > azpsawd_${ONE_UP}.html # # note that column 3 is omitted from the output # # the only tricky bit is the expression arr[fs[i]+0] that looks # up the field definition (for example: 13,R) from the field list # fs, extracts the field number from the field definition (adding # 0 forces the ,R to be ignored and the expression 13,R to be # converted to the number 13), then looks up the appropriate value # in the array of values arr; is awk bad-ass or what? function addcomma(num, arr) { split(num, arr, "."); arr[1] = arr[1] "." while (arr[1] ~ /[0-9][0-9][0-9][0-9]/) { sub(/[0-9][0-9][0-9][,.]/, ",&", arr[1]) } if (arr[2] != "") { return arr[1] arr[2] } else { sub(/\.$/, "", arr[1]); return arr[1] } } function csvsplit(str, arr, i,j,n,s,fs,qt) { # split comma-separated fields into arr; return number of fields in arr # fields surrounded by double-quotes may contain commas; # doubled double-quotes represent a single embedded quote # embedded, quoted newlines are handled improperly delete arr; s = "START"; n = 0; fs = ","; qt = "\"" for (i = 1; i <= length(str); i++) { if (s == "START") { if (substr(str,i,1) == fs) { arr[++n] = "" } else if (substr(str,i,1) == qt) { j = i+1; s = "INQUOTES" } else { j = i; s = "INFIELD" } } else if (s == "INFIELD") { if (substr(str,i,1) == fs) { arr[++n] = substr(str,j,i-j); j = 0; s = "START" } } else if (s == "INQUOTES") { if (substr(str,i,1) == qt) { s = "MAYBEDOUBLE" } } else if (s == "MAYBEDOUBLE") { if (substr(str,i,1) == fs) { arr[++n] = substr(str,j,i-j-1) gsub(qt qt, qt, arr[n]); j = 0; s = "START" } } } if (s == "INFIELD" || s == "INQUOTES") { arr[++n] = substr(str,j) } else if (s == "MAYBEDOUBLE") { arr[++n] = substr(str,j,length(str)-j); gsub(qt qt, qt, arr[n]) } else if (s == "START") { arr[++n] = "" } return n } function isnumeric(x) { return x ~ /^[+-]?(([0-9]+([.][0-9]*)?)|([.][0-9]+))$/ } BEGIN { "date +%m/%d/%Y" | getline today } BEGIN { print "" # print css prelude print "" print " " print "" print " " title "" print " " today " " print " " print " " subtitle "" print " " dbname " " print "
” nomenclature “ |
” user “ |
”
print ”
" m = split(fieldlist, fs) } NR == 1 { # print header row gsub(/</, "\\/, "\\>"); gsub(/&/, "\\&") print "" n = csvsplit($0, arr) for (i = 1; i 1 { # print detail rows gsub(/</, "\\/, "\\>"); gsub(/&/, "\\&") print " " n = csvsplit($0, arr) for (i = 1; i <= m; i++) { if (fs[i] ~ /R/) { align = "align=\"right\"" } else if (fs[i] ~ /C/) { align = "align=\"center\"" } else { align = "align=\" left\"" } if (fs[i] ~ /,/) { val = addcomma(arr[fs[i]+0]) } else { val = arr[fs[i]+0] } print "" val "" } print "" } END { print "
” arr[fs[i]+0] “
" }4 Responses to “CSV To HTML”
Leave a Reply
%d bloggers like this:
Here’s a solution in Python.
Example Output:
Year
Make
Model
1997
Ford
E350
2000
Mercury
Cougar
In my last comment, I tried pasting the HTML for the table itself, but it does not appear to work. Here’s another attempt, this time with the newlines removed.
YearMakeModel1997FordE3502000MercuryCougar
Great post! I really learned a lot from it.
Racket: https://github.com/xojoc/programming-challenges/blob/master/programming-praxis/2020_03_17.rkt