Non-Breaking Hyphen

October 2, 2020

Today’s exercise tells the story of a problem I faced in my day job. A user apparently copy/pasted a field from Word, Excel or Outlook into our data processing system, which assumes plain ascii (the underlying database, Oracle, handles unicode properly, but the system on top of Oracle doesn’t). Unfortunately, although the field looked okay,

10001366650-1

the dash was actually a non-breaking hyphen, unicode 201110, which broke the system in a rather large way. The field in question was the vendor invoice number in an accounts payable system, and when the check paying that invoice was written, the check-writing program dropped the remittance advice from that check, so every subsequent check had the wrong remittance advice attached. The error wasn’t discovered immediately, so some of the checks were already mailed, making recovery difficult (we couldn’t just void the check run and start over because some of the checks were already mailed, and restarting the check run meant the check numbers wouldn’t match). So it was a grand old mess. In case you’re curious, I demonstrated the error with this SQL statement

select asciistr(fabinvh_vend_inv_code)
from   fabinvh
where  fabinvh_code = 'I2104519'

which returned

1000136650\20111

Unicode is nearly thirty years old. Users have the right to expect that their systems handle unicode characters properly.

Your task is to write a program that detects unicode/ascii error; you might also tell us about any unicode horror stories you have faced. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Advertisement

Pages: 1 2