Cut
August 17, 2010
We begin with a function to expand ranges, which is reminiscent of a previous exercise:
(define (expand-ranges str)
(define (make-range str)
(let ((endpoints (map string->number (string-split #\- str))))
(if (null? (cdr endpoints))
(list (car endpoints))
(range (car endpoints) (+ (cadr endpoints) 1)))))
(apply append (map make-range (string-split #\, str))))
Cut
operates in two modes. In character mode, it writes the characters corresponding to the expanded range (remember that character positions are counted from one, not zero), followed by a newline. Field mode is harder, because first the fields must be split on the delimiter, then the delimiter must be inserted between fields (but not at the end of the line):
(define (write-chars cs str)
(do ((cs cs (cdr cs))) ((null? cs) (newline))
(display (string-ref str (- (car cs) 1)))))
(define (write-fields fs str delim)
(let ((fields (string-split delim str)))
(do ((fs fs (cdr fs))) ((null? fs))
(display (list-ref fields (- (car fs) 1)))
(display (if (pair? (cdr fs)) delim #\newline)))))
Do-file
handles a single file, regardless of character mode or field mode, leaving the task of setting the current input port to the caller. The two legs of the if
each handle one mode, using a do
loop to process each line individually:
(define (do-file opts)
(if (assoc #\c opts)
(let ((cs (expand-ranges (cdr (assoc #\c opts)))))
(do ((line (read-line) (read-line)))
((eof-object? line))
(write-chars cs line)))
(let ((fs (expand-ranges (cdr (assoc #\f opts))))
(delim (string-ref (cdr (assoc #\d opts)) 0)))
(do ((line (read-line) (read-line)))
((eof-object? line))
(write-fields fs line delim)))))
All that’s left is the main program, which extracts parameters from the command line, then calls do-file
to handle the current input port if there are no files on the command line, or processes the files individually in a do
loop if one or more files are named on the command line:
(let-values (((opts files) (getopt "c:d:f:"
"usage: cut -clist [file ...] or cut -flist [-dchar] [file ...]"
(cdr (command-line)))))
(if (null? files) (do-file opts)
(do ((files files (cdr files))) ((null? files))
(with-input-from-file (car files) (lambda () (do-file opts))))))
Note that command-line
is specific to Chez Scheme, and must change for other Scheme implementations. We used range
, read-line
, and string-split
from the Standard Prelude, and getopt
from an earlier exercise. You can see the program assembled at http://programmingpraxis.codepad.org/U3Z6l5bV.
This was pretty fun to write.
I couldn’t help but have a try at it in Elisp. Of course I think it’s useless, but there it is:
Here it is in ruby …
This one prints the fields/columns in the order that they are given in in the command line unlike the Unix cut. I like it better this way but if you don’t then just sort the print_list on the return from parse_list(). The only other oddity is printing the separator in the field. Basically, if we decide to print a field and we’re past the first element, we’ll print the separator before the field. This makes it so we don’t have to have a separator hanging out there after the last field is printed.
Here is my complete implementation in c
http://codepad.org/QZL317EK