Date Conversion

September 3, 2019

I chose to write the program in sed for its regular expression and text editing functionality. First I wrote a program to list all of the rows of an input file that don’t contain three dates in the desired format:

sed '/,[0-9][0-9]\/[0-9][0-9]\/20[0-9][0-9],[0-9][0-9]\/[0-9][0-9]\/20[0-9][0-9],[0-9][0-9]\/[0-9][0-9]\/20[0-9][0-9],/d'

When that sed command is applied to the input file, any lines with invalid dates are printed, so a file passes when nothing is printed. On the other hand, invalid dates are printed, so it is easy to see what needs to be done. The edit script below was built in stages, with one or more lines added to the script each time a date failed to match:

sed '
  s/,\([0-9]\)\/\([0-9]\)\/\([0-9][0-9]\),/,0\1\/0\2\/20\3,/
  s/,\([0-9][0-9]\)\/\([0-9]\)\/\([0-9][0-9]\),/,\1\/0\2\/20\3,/
  s/,\([0-9]\)\/\([0-9][0-9]\)\/\([0-9][0-9]\),/,0\1\/\2\/20\3,/
  s/,\([0-9][0-9]\)\/\([0-9][0-9]\)\/\([0-9][0-9]\),/,\1\/\2\/20\3,/
  s/,\([0-9]\)-\([0-9]\)-\([0-9][0-9]\),/,0\1\/0\2\/20\3,/
  s/,\([0-9][0-9]\)-\([0-9]\)-\([0-9][0-9]\),/,\1\/0\2\/20\3,/
  s/,\([0-9]\)-\([0-9][0-9]\)-\([0-9][0-9]\),/,0\1\/\2\/20\3,/
  s/,\([0-9][0-9]\)-\([0-9][0-9]\)-\([0-9][0-9]\),/,\1\/\2\/20\3,/
  s/,\([0-9]\)-[Jj][Aa][Nn]-\([0-9][0-9]\),/,01\/0\1\/20\2,/
  s/,\([0-9][0-9]\)-[Jj][Aa][Nn]-\([0-9][0-9]\),/,01\/\1\/20\2,/
  s/,\([0-9]\)-[Ff][Ee][Bb]-\([0-9][0-9]\),/,02\/0\1\/20\2,/
  s/,\([0-9][0-9]\)-[Ff][Ee][Bb]-\([0-9][0-9]\),/,02\/\1\/20\2,/
  s/,\([0-9]\)-[Mm][Aa][Rr]-\([0-9][0-9]\),/,03\/0\1\/20\2,/
  s/,\([0-9][0-9]\)-[Mm][Aa][Rr]-\([0-9][0-9]\),/,03\/\1\/20\2,/
  s/,\([0-9]\)-[Aa][Pp][Rr]-\([0-9][0-9]\),/,04\/0\1\/20\2,/
  s/,\([0-9][0-9]\)-[Aa][Pp][Rr]-\([0-9][0-9]\),/,04\/\1\/20\2,/
  s/,\([0-9]\)-[Mm][Aa][Yy]-\([0-9][0-9]\),/,05\/0\1\/20\2,/
  s/,\([0-9][0-9]\)-[Mm][Aa][Yy]-\([0-9][0-9]\),/,05\/\1\/20\2,/
  s/,\([0-9]\)-[Jj][Uu][Nn]-\([0-9][0-9]\),/,06\/0\1\/20\2,/
  s/,\([0-9][0-9]\)-[Jj][Uu][Nn]-\([0-9][0-9]\),/,06\/\1\/20\2,/
  s/,\([0-9]\)-[Jj][Uu][Ll]-\([0-9][0-9]\),/,07\/0\1\/20\2,/
  s/,\([0-9][0-9]\)-[Jj][Uu][Ll]-\([0-9][0-9]\),/,07\/\1\/20\2,/
  s/,\([0-9]\)-[Aa][Uu][Gg]-\([0-9][0-9]\),/,08\/0\1\/20\2,/
  s/,\([0-9][0-9]\)-[Aa][Uu][G]-\([0-9][0-9]\),/,08\/\1\/20\2,/
  s/,\([0-9]\)-[Ss][Ee][Pp]-\([0-9][0-9]\),/,09\/0\1\/20\2,/
  s/,\([0-9][0-9]\)-[Ssj][Ee][Pp]-\([0-9][0-9]\),/,09\/\1\/20\2,/
  s/,\([0-9]\)-[Oo][Cc][Tt]-\([0-9][0-9]\),/,10\/0\1\/20\2,/
  s/,\([0-9][0-9]\)-[Ooj][Cc][Tt]-\([0-9][0-9]\),/,10\/\1\/20\2,/
  s/,\([0-9]\)-[Nn][Oo][Vv]-\([0-9][0-9]\),/,11\/0\1\/20\2,/
  s/,\([0-9][0-9]\)-[Nnj][Oo][Vv]-\([0-9][0-9]\),/,11\/\1\/20\2,/
  s/,\([0-9]\)-[Dd][Ee][Cc]-\([0-9][0-9]\),/,12\/0\1\/20\2,/
  s/,\([0-9][0-9]\)-[Dd][Ee][Cc]-\([0-9][0-9]\),/,12\/\1\/20\2,/
  s/,\([0-9][0-9]\/[0-9][0-9]\/20[0-9][0-9]\)T[0-9:.]*,/,\1,/
  s/,"January \([0-9]\), 20\([0-9][0-9]\)",/,09\/0\1\/\2,/
  s/,"January \([0-9][0-9]\), 20\([0-9][0-9]\)",/,09\/\1\/\2,/
  s/,"February \([0-9]\), 20\([0-9][0-9]\)",/,09\/0\1\/\2,/
  s/,"February \([0-9][0-9]\), 20\([0-9][0-9]\)",/,09\/\1\/\2,/
  s/,"March \([0-9]\), 20\([0-9][0-9]\)",/,09\/0\1\/\2,/
  s/,"March \([0-9][0-9]\), 20\([0-9][0-9]\)",/,09\/\1\/\2,/
  s/,"April \([0-9]\), 20\([0-9][0-9]\)",/,09\/0\1\/\2,/
  s/,"April \([0-9][0-9]\), 20\([0-9][0-9]\)",/,09\/\1\/\2,/
  s/,"May \([0-9]\), 20\([0-9][0-9]\)",/,09\/0\1\/\2,/
  s/,"May \([0-9][0-9]\), 20\([0-9][0-9]\)",/,09\/\1\/\2,/
  s/,"June \([0-9]\), 20\([0-9][0-9]\)",/,09\/0\1\/\2,/
  s/,"June \([0-9][0-9]\), 20\([0-9][0-9]\)",/,09\/\1\/\2,/
  s/,"July \([0-9]\), 20\([0-9][0-9]\)",/,09\/0\1\/\2,/
  s/,"July \([0-9][0-9]\), 20\([0-9][0-9]\)",/,09\/\1\/\2,/
  s/,"August \([0-9]\), 20\([0-9][0-9]\)",/,09\/0\1\/\2,/
  s/,"August \([0-9][0-9]\), 20\([0-9][0-9]\)",/,09\/\1\/\2,/
  s/,"September \([0-9]\), 20\([0-9][0-9]\)",/,09\/0\1\/\2,/
  s/,"September \([0-9][0-9]\), 20\([0-9][0-9]\)",/,09\/\1\/\2,/
  s/,"October \([0-9]\), 20\([0-9][0-9]\)",/,09\/0\1\/\2,/
  s/,"October \([0-9][0-9]\), 20\([0-9][0-9]\)",/,09\/\1\/\2,/
  s/,"November \([0-9]\), 20\([0-9][0-9]\)",/,09\/0\1\/\2,/
  s/,"November \([0-9][0-9]\), 20\([0-9][0-9]\)",/,09\/\1\/\2,/
  s/,"December \([0-9]\), 20\([0-9][0-9]\)",/,09\/0\1\/\2,/
  s/,"December \([0-9][0-9]\), 20\([0-9][0-9]\)",/,09\/\1\/\2,/'

 

That’s just enough to handle the examples given; the actual script is several times larger, but it was built the same way, adding new pattern substitution rules any time a date conversion failed. In the end all of the files passed the test program without exceptions. It took about an hour of script building, then it was very quick to process the vendor files.

Sed</code isn't available at ideone.com, but you can see the code at https://ideone.com/gAfJ6I.

Advertisements

Pages: 1 2

4 Responses to “Date Conversion”

  1. John Cowan said

    All of which is fine until someone sends you a date like “December 7th, 1941” or “die achte Mai, 1945”. Fortunately, there’s a Python library “dateparser” that knows about many such unusual date formats. You could translate it into Scheme if you had to.

  2. programmingpraxis said

    @JohnCowan: The real version of my program actually handled dates like December 7th, 1941 (but not your German date). I also had a few date formats that were so infrequently seen that I didn’t bother to write a conversion; instead, I edited the file manually. This was a quick-and-dirty program, I spent less than an afternoon on the entire project, and my primary goal was to get it done — pretty didn’t count. It does make a good exercise in sed, however.

  3. The read link does not work

  4. programmingpraxis said

    @bookofstevegraham: Fixed. Thank you.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: