Automating Poetry, Pt. 2

So the first step gave us everything we needed in terms of our dictionary database, word searching, and phonetics.  That’s the better part of the foundation laid, but those by themselves are not quite all we need.  (Luckily, they should give us everything we need to figure the rest out for ourselves.)

So the question remaining is, of course, “how?”  Well, the first answer to that is to start figuring out what exactly “rhyming” means.

Disclaimer: I like to think of myself as a good student — back when I was one, and even now — but for some reason, with this project at least, I didn’t start by doing my homework.  There are a lot of reasons for that (where impatience is probably the main one), but for whatever reason I started this project wanting to figure out things like “what is rhyme” or “what constitutes a syllable” for myself.  That’s the “puzzle” aspect I was talking about — the fun of which is probably the second reason for skipping the required reading.

The point of this is, nothing about this part of the project is even kind of groundbreaking research, I know.  But part of the fun is diving right in, and using tools like that to check (and revise) your own work later.  Please, by all means, work smarter than I sometimes do.

So, what is a rhyme?

Well, according to the first stab I took at this (spoiler alert: this isn’t the right answer), a rhyme is when you can match up the tail ends of two words, counting their phonetic segments backwards from the end of the word, until you come to a vowel.


From the last section, the phonetic segments (according to ARPAbet and CMUdict) of the word “phonetic.”

(And how do we know when we’ve found our vowel?  That’s easy enough, we just need a regular old regular expression.)


preg_match() in this case will find any numerals in the $lastletter (a substring) of our word.

The lexical stress numbers, by ARPAbet/CMUdict’s notations, are the only numerals in the phonetics field.  So, we pattern match against the last character in that field and, if it’s a number, we’ve found our vowel.


(So, counting backwards to the first vowel, it would give us these two bits.)

It’s easy enough to take that result (a string) and just feed that into a query against our CMUdict database:


On the ugliest, cringe-inducing rhyming level (think bad pop music), this is at least the start of a usable rhyme finder.  Granted, if we used this as-is, it would tell you that our word “phonetic” is a rhyming match with, say, “academic,” but, you know, close enough.  (Yeah, not quite.)

So, to make that into a slightly-less-horrible rhyme match, we can continue counting (backwards) through more of the word, gathering up any consonants that occur immediately before that vowel, and feeding those into our query string as well.  This will at least get the entire last syllable sound to match.

I did that next, continuing to match backward until I found another vowel, and then recording everything after (and not including) that vowel.


This turned what was a search for the “-ic” of “phonetic” into a search for “-tic.”



This does kill off those cringeworthiest matches (like “algebraic”), but it still leaves us with pretty weak matches like “aristocratic,” since it’s really only matching on that “-tic” ending.

So, at that point, it seems like we ought to include the next vowel sound before this last syllable after all.  (Of course, this is assuming that there is a second vowel — but, hey, maybe that’s a good thing if this excludes words with just one vowel/syllable… after all, does “phonetic” actually rhyme with, say, “click”?)

After trying that, we get:


Now we’re getting somewhere.

It’s easy to see a huge quality jump in our rhyming here, with just this one more vowel.

We could go a step further, of course, and add the preceding consonants to that vowel as well, at which point our list would drop to words like “kinetic,” etc.  That’s certainly a closer match than words like “athletic,” but I’m not sure if it’s a significantly better rhyme.  (Again, I’m following no formal definitions here, but doing this first round just on “my gut.”)

So my question (to any/everyone)is:  is “phonetic / kinetic” a significant (or necessary) step up in rhyme quality from “phonetic / athletic”?  (I should probably keep my vote out of this, but whatever, my own answer would be “meh, not really.”)

So, at least temporarily satisfied to leave that there, I’d say that at this point there are already a few conclusions that I can come to:

  1. This level of matching is probably a “good enough” rule for rhyming, or at least good enough to leave this as it is for now.Obviously it’s not perfect, but it’s started to answer the task of the research question, providing passable rhyme suggestions to users that might not think of them on their own.
  2. This type of syllable-matching works great for this word, but has scaling issues at shorter or larger words.  Consider single syllable words — take “quirk”, for instance.  For this, we’d need to omit any consonants before the (only) vowel, so that “quirk” can find rhymes like “perk” or “lurk.”That would mean the rule for single syllable words would then be:“match the vowel and anything after”where, for two syllable words, it might be:“match the first vowel, and anything after”which we can obviously combine, as a rule, into:

    “find the first (even if it’s the only) vowel, and match that and anything after

    I’m satisfied with that.  But consider three syllable words again.  “phonetic / kinetic” is great (and, at least by the CMUdict’s phonetics, an exact vowel match in both syllables).  But “athletic / kinetic” — that’s a reasonable rhyme, isn’t it?  (I’m actually asking.)

    So, we could say, then, that:

    “the penultimate vowel and everything after it, are the only syllables that matter for rhyming in a 2+ syllable word”

    (Also, I just like the word “penultimate.”  …let’s see, “intimate,” “proximate,” “legitimate” — okay, I’ve been doing this too long now.)

    But, seriously, is this applicable to any 2+ syllable words, even if they’re a lot longer or shorter than one another?  Is “parthenogenetic” a reasonable rhyme for “phonetic”?  Or “hettick”?

    No conclusions here, sorry to say — just some questions to leave open in future steps.  Back to my last conclusion, so far:

  3. The way we’re matching (so far) is matching only the exact same emphasis per syllable.  While that definitely sounds the most graceful in terms of proper rhyming, should words that have different syllabic emphasis still be “good enough?”  And, if so, how do we revise that query?  (One solution is to remove the exact numerals and query instead with pattern matches that find those same phonetics with any numerals at the end of the vowels.  Regular expressions could handle that gracefully enough, I’d imagine.)


    Something like this, only with more flexibility and hopefully less manual “OR” clause building?

Review of Goals 

I think at a moment like this, it’s important to review the actual goals of the project, since there are a lot of potential paths into some pretty scary forests, in terms of the time and effort that could go into making “good” into “perfect.”  (Especially considering that “perfect” might very well be impossible anyway.)

Since this project is meant to engage people’s interest in poetry (more than to create some ultimate rhyming authority application), I decided that, for now, our earlier solution was “good enough,” especially in that it gives fewer/higher-quality results, which I think is a nice way to lean when given the choice.  To a certain extent, opening our pattern up to weaker emphasis matching would only bloat the results list — and with weaker results, on top of that.

So our final rhyme rule, for now, is:

“Find the second-to-last vowel in any word, unless there’s only one vowel.  Take that vowel, and everything after it, and match it against other words that end with the same phoneme sequence, and the same emphasis per vowel.”

So… yeah.  That’s my rhyming solution.  (For now.)

One nice thing about applications like these is that it’s easy enough to call this the first version, working as intended (if not perfect), and revise/improve that bit later down the line.


And, from my World’s-Most-Glamorous-Website test script, this is what my test code/page looks like.


A few of the (50) matched words.

In terms of putting this into our application, we’ll Ajax-ify this (soon) to be something that we can call when the user clicks on a particular word.  That function (in Javascript) can ask this script for any rhyming matches on that word, and get back something we can then present to the user as a list of words.

Next, we’ll need to start thinking about the meter.  Which means getting the syllable counts for all of these words…

(And, in the next episode, if you haven’t been anticipating it already, there’s a huge lurking question here of “so what about words that aren’t in this dictionary?” <cue dramatic musical swell>)


Leave a Reply