Data Laundry, Again
July 13, 2018
Data laundry is the act of cleaning data, as when it arrives in one format and must be translated to another, or when external data must be checked for validity. We looked at data laundry in a previous exercise. We return to it today because I have been doing data laundry all week, handling data from a new vendor. Today’s task is similar to one I have been doing this week; convert the input to the output shown below, changing all appearances of the string ABCDE to an incrementally-numbered string with a prefix:
ABCDE This is some text. This is more text. ABCDE, ABCDE. ABCDE And this is [ABCDE] still more text.
X1 This is some text. This is more text. X2, X3. X4 And this is [X5] still more text.
Your task is to write a program to perform the data laundry shown above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Using Perl’s regular expressions….
g – every occurance, e – evaluate replacement string, r – return string after replacements…
Here’s a solution in Python.
Output:
It sounded like you need to do this a lot, so here is a python function that takes a search pattern and a sequence number format and returns a function that can be used to clean up a text string. That way, you can make many cleanup functions with independent sequence numbers.
It can be used on the entire text:
It can also be applied to the text in chunks (e.g. line by line). The sequence number format can also be more elaborate.
I forgot to include the imports
Kotlin at https://pastebin.com/rrAZM3gA
[…] time at work, so it’s an exercise worth examining. We looked at data laundry in two previous exercises. Today’s exercise in data laundry comes to us from Stack […]