July 30, 2019

There was a question the other day on Reddit or Stack Overflow or someplace about handling CSV files with awk. We’ve done that in a previous exercise, but today I decided to handle CSV files in a different way. Specifically, I wrote an awk function csvsplit that works the same way as awk’s built-in split function except that it handles CSV strings instead of splitting on a regular expression:

n = csvsplit(str,arr)

Csvsplit takes a string and an array, deletes any current contents of the array, splits the string into fields using the normal CSV rules, stores the fields in arr[1] .. arr[n], and returns n. The splitting rules are: every comma splits a field, except that double-quotes around a field protect commas inside the field, and double-quotes may appear in a quoted field by doubling them (two successive double-quotes).

Your task is to write a csvsplit function for awk. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.


Pages: 1 2

2 Responses to “CsvSplit”

  1. John Cowan said

    The only problem is that newlines are allowed within a double-quoted field, at least by some programs as well as by RFC 4180, the nearest thing to a standard. So awk’s line-by-line model really doesn’t work without great pain.

  2. programmingpraxis said

    That’s correct. If you need that functionality, the previous exercise linked in the task description provides it. But the current exercise provides a function that is useful in a large percentage of cases.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: