Non-Breaking Hyphen
October 2, 2020
Today’s exercise tells the story of a problem I faced in my day job. A user apparently copy/pasted a field from Word, Excel or Outlook into our data processing system, which assumes plain ascii (the underlying database, Oracle, handles unicode properly, but the system on top of Oracle doesn’t). Unfortunately, although the field looked okay,
10001366650-1
the dash was actually a non-breaking hyphen, unicode 201110, which broke the system in a rather large way. The field in question was the vendor invoice number in an accounts payable system, and when the check paying that invoice was written, the check-writing program dropped the remittance advice from that check, so every subsequent check had the wrong remittance advice attached. The error wasn’t discovered immediately, so some of the checks were already mailed, making recovery difficult (we couldn’t just void the check run and start over because some of the checks were already mailed, and restarting the check run meant the check numbers wouldn’t match). So it was a grand old mess. In case you’re curious, I demonstrated the error with this SQL statement
select asciistr(fabinvh_vend_inv_code) from fabinvh where fabinvh_code = 'I2104519'
which returned
1000136650\20111
Unicode is nearly thirty years old. Users have the right to expect that their systems handle unicode characters properly.
Your task is to write a program that detects unicode/ascii error; you might also tell us about any unicode horror stories you have faced. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.