Rhyming Dictionary

April 24, 2012

One of the sites that I visit regularly is Reddit, and one of my favorite forums there is learnprogramming, which is often a source of ideas for the exercises that I write here. A recent question on learnprogramming came from a poet who wanted a rhyming dictionary. The poet says the dictionary must be the OED (won’t accept the CMU dictionary), isn’t a programmer but has a little bit of experience with HTML but expects to be able to write a rhyming dictionary program in a couple of days, doesn’t know that Java and JavaScript are two different languages, has already decided to use Python or Ruby, expects to be able to copy all the pronunciations from the OED without license, and seems to not have a clue what he really wants (“Basically I want to have multiple inputs (somewhere around a dozen) in different categories that when pooled together can give you what you’re looking for.” — whatever that means). The poet also has a foul mouth. Still, building a rhyming dictionary is a good exercise, and we can have some fun with it.

First we need a pronunciation dictionary. CMU has one, which we will use even though the poet found it “too choppy.” You’ll want the c06d.gz file, which after a brief commentary provides one word per line in the form PROGRAMMING P R OW1 G R AE2 M IH0 NG. The dictionary uses 39 phonemes, 15 vowels and 24 consonants, with three possible stresses for each vowel, indicated by a trailing number, 1 for primary stress, 2 for secondary stress, and 0 for unstressed; thus, there are 69 unique pronunciation symbols. We’ll say that two words rhyme if their final vowel (including stress) and any trailing consonants are identical.

Your task is to write a function that determines if two words rhyme and a function that takes one word and returns all the words that rhyme with it. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

About these ads

Pages: 1 2 3

4 Responses to “Rhyming Dictionary”

  1. Mike said

    Python 2.7 version

    Uses the pronunciation file to build two dictionaries:
    The first one maps a word to its ending phenomes (from the primary stress to the end of the word);
    The second one maps the ending phonemes to a list of words that have that ending.

    do_rhyme() and find_rhymes() then use the dictionaries.

    import gzip
    from collections import defaultdict
    
    db = defaultdict(set)
    rdb = defaultdict(set)
    
    with gzip.open('c06d.gz') as f:
      
      for line in (s.strip() for s in f if s[0].isalpha()):
        
        line = line.split()
        word = line[0]
    
        if word.endswith(')'):
            word = word[:word.index('(')]
    
        ending = []
        for phoneme in reversed(line[1:]):
            ending.append(phoneme)
            if phoneme.endswith('1'):
                break
        ending = ' '.join(reversed(ending))
    
        db[word].add(ending)
        rdb[ending].add(word)
    
    db  = {k:sorted(v) for k,v in db.items()}
    rdb = {k:sorted(v) for k,v in rdb.items()}
    
    def do_rhyme(word1, word2):
      return bool(set(db[word1]).intersection(db[word2]))
    
    def find_rhymes(word):
      return [rdb[sound] for sound in db[word]]
    
    
  2. Gray said

    Ruby style, matching every item after a vowel and building a map of maps of arrays hash[vowel][consonant] = [words with these stuff] (if I understood the rules well :-))

    f = File.new(ARGV[0])
    daMap = Hash.new() do |h, k|
      h[k] = Hash.new() do |hh, kk|
        hh[kk] = Array.new()
      end
    end
    
    while !f.eof()
      line = f.readline()
      trailingStr = /[A-Z]+[0-2] ?([A-Z ]+)*$/.match(line)
      next if trailingStr.nil?()
    
      daWord = /^.*? /.match(line).to_s().chop()
    
      trailingBits = trailingStr.to_s().split(' ')
      vowel = trailingBits.shift()
    
      trailingBits.each() do |bit|
        daMap[vowel][bit] << daWord
      end
    end
    
    daMap.each() do |vowel, trailingGuys|
      trailingGuys.each() do |guy, people|
        if people.size() > 1
          puts "#{people.join(', ')} do rhyme !"
        end
      end
    end
    
  3. robert said

    i like the programming job done at http://www.prime-rhyme.com

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 576 other followers

%d bloggers like this: