Data Laundry
September 26, 2017
We have a simple task today, converting a file from one format to another. Such tasks are often called data laundry, in the sense of washing or cleaning the data as it moves from one format to another. Here’s the input:
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384 options=1203<RXCSUM,TXCSUM,TXSTATUS,SW_TIMESTAMP> inet 127.0.0.1 netmask 0xff000000 inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 nd6 options=201<PERFORMNUD,DAD> gif0: flags=8010<POINTOPOINT,MULTICAST> mtu 1280 en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500 ether f4:0f:24:29:df:4d inet6 fe80::1cb5:1689:1826:cc7b%en0 prefixlen 64 secured scopeid 0x4 inet 10.176.85.19 netmask 0xffffff00 broadcast 10.176.85.255 nd6 options=201<PERFORMNUD,DAD> media: autoselect status: active en1: flags=963<UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX> mtu 1500 options=60<TSO4,TSO6> ether 06:00:58:62:a3:00 media: autoselect <full-duplex> status: inactive p2p0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 2304 ether 06:0f:24:29:df:4d media: autoselect status: inactive
The desired output is a comma-separated values file with three fields:
interface,inet,status lo0,127.0.0.1, gif0,, en0,10.176.85.19,active en1,,inactive p2p0,,inactive
Your task is to write a program that converts the input shown above to the desired output. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
AWK
steve@steve-Satellite-L555D:~$ awk -f data_laundry.awk data_laundry.txt
interface,inet,status
lo0,127.0.0.1,
gif0,,
en0,10.176.85.19,active
en1,,inactive
p2p0,,inactive
Perl 1-liner… {OK needed a “begin” block}
probably not the most efficient as it has to slurp stdin… but not to shabby esp as ifconfig is a compact format… Unfortunately there isn’t a nice line ending or could have used $/ to define the separator – instead had to use a positive lookahead to split on a return followed by a non-whitespace character to split the output into chunks for each interface…
[…] to another, or when external data must be checked for validity. We looked at data laundry in a previous exercise. We return to it today because I have been doing data laundry all week, handling data from a new […]
[…] of my time at work, so it’s an exercise worth examining. We looked at data laundry in two previous exercises. Today’s exercise in data laundry comes to us from Stack […]