## I Before E

### November 29, 2013

School children learning to spell English words are taught a series of rules. One of them is:

I before E except after C, or when sounded as AY as in NEIGHBOR and WEIGH.

Your task is to write a program that finds exceptions to that rule; you may want to look at the pronouncing dictionary at http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/cmudict.0.6d. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2 3

### 4 Responses to “I Before E”

1. Paul said

A very interesting problem. I looked at your solution and I think, I see problems. In the CMU list you find for example:
ATHEISM AH0 TH AY1 S AH0 M (– identified by your method –)
ATHEISM(1) EY1 TH IY0 IH2 Z AH0 M
ATHEIST EY1 TH IY0 AH0 S T
ATHEISTIC EY2 TH IY0 IH1 S T IH0 K
It is clear, that none of these entries obey the rule, as none of the EI map to EY1 or EY2. For the 2-4 line there is a EY1 or EY2 sound, but these are for the leading character.
And what about IE that sound like EY0, EY1 and EY2. Probably these combinations are not in the list, but I did not see this checked in your script.

2. programmingpraxis said

@Paul: Obviously my approach is imperfect. The solution is for you to write a better program.

3. Paul said

This is IMO a very tough problem. The list contains a lot of English word, but also many foreign names (German, Scottish, etc.). My attempt can be found here and the list of exceptions (2521 in total) is here.
First I tried to find a location of the “IE” or “EI” in the word and than compare that with the location of “EY” in the phonetics. This is not perfect, as the characters of the word often do not map to the phonetics. Then I tried to convert the word and the phonetics to a character string like “CVCVCSC”, where C, S and S stand for consonant, vowel and special (IE or EI) and then map the 2 character strings. That works somewhat better, but it is still not perfect.

4. programmingpraxis said

Agreed. This is a very tough problem. Like you, I had trouble matching the spelling with the phonetics, and finally I just gave up; sometimes a simple solution that is mostly right is better than a much more complicated solution that is better, but still only mostly right.

I was amused to find this exercise in a beginning programming course. I think they ignored the “when sounded as AY” part and just looked at the “except after C” part, which is easy enough.