Data Laundry
September 26, 2017
I often use Awk for data laundry tasks:
BEGIN { FS = "[ \t]+"; print "interface,inet,status" }
func printrec() {
if (interface != "") {
printf "%s,%s,%s\n", interface, inet, status } }
$1 != "" { printrec()
inet = status = ""; gsub(/:/, "", $1); interface = $1 }
$2 == "inet" { inet = $3 }
$2 == "status:" { status = $3 }
END { printrec() }
The trick here is the field separater [ \t]+, which makes any white space, including leading white space at the beginning of a line, a field separator; thus, a line line “status: inactive” will have “status” as field 2 rather than field 1.
You can run the program at https://ideone.com/Z3KFhl.
AWK
BEGIN { print "interface,inet,status"; state = 0 } { if (($1 ~ /:$/) && ($1 == substr($0,1,length($1)))) { if (state > 0) { print ntwk "," ip "," status } n = split($1,arr,":"); ntwk = arr[1]; ip = ""; status = ""; state = 1 } else { if ($1 == "inet") { ip = $2 } else { if ($1 == "status:") status = $2 } } } END { print ntwk "," ip "," status }steve@steve-Satellite-L555D:~$ awk -f data_laundry.awk data_laundry.txt
interface,inet,status
lo0,127.0.0.1,
gif0,,
en0,10.176.85.19,active
en1,,inactive
p2p0,,inactive
Perl 1-liner… {OK needed a “begin” block}
ifconfig | perl -Mfeature=say -e 'undef$/;say join"\n",q(interface,inet,status),map{join",",(/^(\w+)/)[0]||"",(/inet (\S+)/)[0]||"",(/status: (\w+)/)[0]||""}split/\n(?=\w)/,<>'probably not the most efficient as it has to slurp stdin… but not to shabby esp as ifconfig is a compact format… Unfortunately there isn’t a nice line ending or could have used $/ to define the separator – instead had to use a positive lookahead to split on a return followed by a non-whitespace character to split the output into chunks for each interface…
[…] to another, or when external data must be checked for validity. We looked at data laundry in a previous exercise. We return to it today because I have been doing data laundry all week, handling data from a new […]
[…] of my time at work, so it’s an exercise worth examining. We looked at data laundry in two previous exercises. Today’s exercise in data laundry comes to us from Stack […]