Data Laundry

September 26, 2017

We have a simple task today, converting a file from one format to another. Such tasks are often called data laundry, in the sense of washing or cleaning the data as it moves from one format to another. Here’s the input:

lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384
	options=1203<RXCSUM,TXCSUM,TXSTATUS,SW_TIMESTAMP>
	inet 127.0.0.1 netmask 0xff000000 
	inet6 ::1 prefixlen 128 
	inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 
	nd6 options=201<PERFORMNUD,DAD>
gif0: flags=8010<POINTOPOINT,MULTICAST> mtu 1280
en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
	ether f4:0f:24:29:df:4d 
	inet6 fe80::1cb5:1689:1826:cc7b%en0 prefixlen 64 secured scopeid 0x4 
	inet 10.176.85.19 netmask 0xffffff00 broadcast 10.176.85.255
	nd6 options=201<PERFORMNUD,DAD>
	media: autoselect
	status: active
en1: flags=963<UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX> mtu 1500
	options=60<TSO4,TSO6>
	ether 06:00:58:62:a3:00 
	media: autoselect <full-duplex>
	status: inactive
p2p0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 2304
	ether 06:0f:24:29:df:4d 
	media: autoselect
	status: inactive

The desired output is a comma-separated values file with three fields:

interface,inet,status
lo0,127.0.0.1,
gif0,,
en0,10.176.85.19,active
en1,,inactive
p2p0,,inactive

Your task is to write a program that converts the input shown above to the desired output. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Advertisement

Pages: 1 2

4 Responses to “Data Laundry”

  1. AWK

      BEGIN { print "interface,inet,status";
              state = 0
              }
    
      { if (($1 ~ /:$/) && ($1 == substr($0,1,length($1)))) { 
           if (state > 0) {
              print ntwk "," ip "," status
              }
           n = split($1,arr,":");
           ntwk   = arr[1];
           ip     = "";
           status = "";
           state  = 1
           }
        else {
          if ($1 == "inet") { ip = $2 }
          else {
            if ($1 == "status:") status = $2
            }
          }
        }
    
      END {
        print ntwk "," ip "," status
        }
    

    steve@steve-Satellite-L555D:~$ awk -f data_laundry.awk data_laundry.txt
    interface,inet,status
    lo0,127.0.0.1,
    gif0,,
    en0,10.176.85.19,active
    en1,,inactive
    p2p0,,inactive

  2. Perl 1-liner… {OK needed a “begin” block}

    ifconfig | perl -Mfeature=say -e 'undef$/;say join"\n",q(interface,inet,status),map{join",",(/^(\w+)/)[0]||"",(/inet (\S+)/)[0]||"",(/status: (\w+)/)[0]||""}split/\n(?=\w)/,<>'
    

    probably not the most efficient as it has to slurp stdin… but not to shabby esp as ifconfig is a compact format… Unfortunately there isn’t a nice line ending or could have used $/ to define the separator – instead had to use a positive lookahead to split on a return followed by a non-whitespace character to split the output into chunks for each interface…

  3. […] to another, or when external data must be checked for validity. We looked at data laundry in a previous exercise. We return to it today because I have been doing data laundry all week, handling data from a new […]

  4. […] of my time at work, so it’s an exercise worth examining. We looked at data laundry in two previous exercises. Today’s exercise in data laundry comes to us from Stack […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: