Grep-CSV
April 9, 2019
Regular readers of this blog know that. in my day job, I frequently process input files from vendors; almost always, they were created in Excel and arrive in CSV format. Sometimes I have to peek inside the files, looking for invalid data, and I have commonly used grep
for that task. Sometimes grep
gives me unwanted records, because there is a match in some field that is not the field of interested, and I just ignore the extra records. But the other day I had a mess, with lots of unwanted records, so I used awk
to parse out the fields and find the records of interest.
I realized as I was performing that task that it would be useful to have a version of grep
that understood the CSV file format. So I wrote grep-csv
that takes a field number (counting from 1, like awk
) and a regular expression and returns the matching rows of a CSV file.
Your task is to write a grep-csv
program. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Quikie one in Ruby.
Given the
example.csv
file look like this:When we run the program like so:
The output is:
A Haskell version. It uses the Cassava library for parsing and printing CSV
files, pcre-light along with the pcre-heavy front-end for regular expressions,
and optparse-applicative for argument parsing.
The program allows skipping a “header” record, and handles UTF-8 content and
fields that span multiple lines. (In the example below, we match on a word
that appears in the second line of a field.)
The data are lines from poems, taken from UTF-8 SAMPLER.