I Before E

November 29, 2013

School children learning to spell English words are taught a series of rules. One of them is:

I before E except after C, or when sounded as AY as in NEIGHBOR and WEIGH.

Your task is to write a program that finds exceptions to that rule; you may want to look at the pronouncing dictionary at http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/cmudict.0.6d. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

About these ads

Pages: 1 2 3

4 Responses to “I Before E”

  1. Paul said

    A very interesting problem. I looked at your solution and I think, I see problems. In the CMU list you find for example:
    ATHEISM AH0 TH AY1 S AH0 M (– identified by your method –)
    ATHEISM(1) EY1 TH IY0 IH2 Z AH0 M
    ATHEIST EY1 TH IY0 AH0 S T
    ATHEISTIC EY2 TH IY0 IH1 S T IH0 K
    It is clear, that none of these entries obey the rule, as none of the EI map to EY1 or EY2. For the 2-4 line there is a EY1 or EY2 sound, but these are for the leading character.
    And what about IE that sound like EY0, EY1 and EY2. Probably these combinations are not in the list, but I did not see this checked in your script.

  2. programmingpraxis said

    @Paul: Obviously my approach is imperfect. The solution is for you to write a better program.

  3. Paul said

    This is IMO a very tough problem. The list contains a lot of English word, but also many foreign names (German, Scottish, etc.). My attempt can be found here and the list of exceptions (2521 in total) is here.
    First I tried to find a location of the “IE” or “EI” in the word and than compare that with the location of “EY” in the phonetics. This is not perfect, as the characters of the word often do not map to the phonetics. Then I tried to convert the word and the phonetics to a character string like “CVCVCSC”, where C, S and S stand for consonant, vowel and special (IE or EI) and then map the 2 character strings. That works somewhat better, but it is still not perfect.

  4. programmingpraxis said

    Agreed. This is a very tough problem. Like you, I had trouble matching the spelling with the phonetics, and finally I just gave up; sometimes a simple solution that is mostly right is better than a much more complicated solution that is better, but still only mostly right.

    I was amused to find this exercise in a beginning programming course. I think they ignored the “when sounded as AY” part and just looked at the “except after C” part, which is easy enough.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 576 other followers

%d bloggers like this: