Grep-CSV
April 9, 2019
I used the CSV-processing code from the essay on text-file databases, and the regular-expression matcher from a previous exercise, to build this simple grep-csv
that reads from standard input and writes to standard output:
(define (grep-csv n regex) (for-each-port (filter-port read-csv-record (lambda (line) (trex regex (list-ref line (- n 1))))) (lambda (line) (write-csv-record line))))
Given an input like this:
Charles,Dickens,Great Expectations,1861 Mark,Twain,The Adventures of Tom Sawyer,1876 William,Shakespeare,Julius Caesar,1599 Isaac,Newton,Philosophiae Naturalis Principia Mathematica,1687
We get an output like this:
> (grep-csv 2 "e.*s") Charles,Dickens,Great Expectations,1861 William,Shakespeare,Julius Caesar,1599
Note that The Adventures of Tom Sawyer and Philosophiae Naturalis Principia Mathematica match the pattern in the title, but those records are not returned because the match is in the wrong field; with grep
, they would be returned as unwanted records.
It is handy to have CSV-splitting and regular-expression-matching code available when needed. You can see all of that code and run the program at https://ideone.com/fGHhRb.
Quikie one in Ruby.
Given the
example.csv
file look like this:When we run the program like so:
The output is:
A Haskell version. It uses the Cassava library for parsing and printing CSV
files, pcre-light along with the pcre-heavy front-end for regular expressions,
and optparse-applicative for argument parsing.
The program allows skipping a “header” record, and handles UTF-8 content and
fields that span multiple lines. (In the example below, we match on a word
that appears in the second line of a field.)
The data are lines from poems, taken from UTF-8 SAMPLER.