CsvSplit
July 30, 2019
There was a question the other day on Reddit or Stack Overflow or someplace about handling CSV files with awk. We’ve done that in a previous exercise, but today I decided to handle CSV files in a different way. Specifically, I wrote an awk function csvsplit that works the same way as awk’s built-in split function except that it handles CSV strings instead of splitting on a regular expression:
n = csvsplit(str,arr)
Csvsplit takes a string and an array, deletes any current contents of the array, splits the string into fields using the normal CSV rules, stores the fields in arr[1] .. arr[n], and returns n. The splitting rules are: every comma splits a field, except that double-quotes around a field protect commas inside the field, and double-quotes may appear in a quoted field by doubling them (two successive double-quotes).
Your task is to write a csvsplit function for awk. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
The only problem is that newlines are allowed within a double-quoted field, at least by some programs as well as by RFC 4180, the nearest thing to a standard. So awk’s line-by-line model really doesn’t work without great pain.
That’s correct. If you need that functionality, the previous exercise linked in the task description provides it. But the current exercise provides a function that is useful in a large percentage of cases.