So the first step gave us everything we needed in terms of our dictionary database, word searching, and phonetics. That’s the better part of the foundation laid, but those by themselves are not quite all we need.
The next task is to start figuring out what exactly “rhyming” means.
Disclaimer: For whatever stubborn reason, I really wanted to (and had a spare weekend to) try to figure out things like “what is rhyme” or “what constitutes a syllable” for myself. That’s part of the “puzzle” aspect I mentioned earlier. The pieces seem simpler than they are, and something about thinking through them seemed satisfying.
Nothing about this part of the project is groundbreaking research, I know. But at least for my own sake, this was probably the most rewarding part of the project.
So, what is a rhyme?
Well, as a first intuitive stab (spoiler: this isn’t the right answer), I tried the idea that, basically, a rhyme is the same final syllable, in terms of vowel and (if there is one), consonant.
In more phonetic terms, that might translate to, roughly: “a rhyme is when you can match up the tail ends of two words, counting their phonetic segments backwards from the end of the word, until you come to a vowel.”
And how do we know when we’ve found our vowel? That’s easy enough, as a regular expression.
The lexical stress numbers, by ARPAbet/CMUdict’s notations, are the only numerals in the phonetics field. So, we pattern match against the last character in that field and, if it’s a number, we’ve found our vowel.
It’s easy enough to take that result (a string) and just feed that into a query against our CMUdict database:
On the simplest level, this is at least the beginning of a usable rhyme finder. Granted, if we used this as-is, it would tell you that our word “phonetic” is a rhyming match with, say, “academic,” but, you know, close enough. (Okay, not quite.)
So, to make that into a slightly-less-awful rhyme match (and to get it closer to the idea of what we’re defining here), we can continue counting (backwards) through more of the word, gathering up any consonants that occur immediately before that vowel, and feeding those into our query string as well. This should get us the entire last syllable sound to match.
I did that next, continuing to match backward until I found another vowel, and then recording everything after (and not including) that vowel.
This turned what was a search for the “-ic” of “phonetic” into a search for “-tic.”
This does kill off the weakest matches (like “phonetic” and “algebraic”), but it still leaves us with weak matches like “aristocratic,” since, as we defined it, it’s really only matching on that final “-tic” syllable.
So, at that point, it seems like we need to go back to include the next vowel sound before this last syllable after all. (Of course, this is assuming that there is a second vowel. (And at this point, it raised the question of whether a user might want to exclude result words with just one vowel/syllable — after all, it’s not necessarily obvious whether “phonetic” is a great rhyme with, say, “click”?)
Going back to include the previous vowel as well (and necessarily just the two syllable+ words), we get:
It’s easy to see a huge quality jump in our rhyming here, with just this one more vowel.
We could go a step further, of course, and add the preceding consonants to that vowel as well, at which point our list of matches for “phonetic” would drop to words like “kinetic,” etc. That’s certainly a closer match than words like “athletic,” but I’m not sure if it’s a significantly better rhyme. (Again, I was following no formal definitions here, but doing this first round on essentially “gut feeling.”)
So my question (to anyone) at this point is: is “phonetic / kinetic” a “necessary” step up in rhyme quality from “phonetic / athletic”? (I should probably keep my vote out of this, but my own answer felt like “not really.”)
So, at least temporarily satisfied to leave that there, there were already a few conclusions that I could come to:
- This level of matching is probably a “good enough” rule for rhyming, or at least good enough to use for now. Obviously it’s not perfect, but it started to answer part of the research question, providing passable rhyme suggestions to users that might be able to think of them on their own.
- This type of syllable-matching works in this case, but scales poorly to shorter or longer words. Consider single syllable words — take “quirk”, for instance. For this, we’d need to omit any consonants before the (only) vowel, so that “quirk” can find rhymes like “perk” or “lurk.” That would mean the rule for single syllable words would then be: “match the vowel and anything after” and, for two syllable words, it might be: “match the first vowel, and anything after” which we could collapse into the single rule:“find the first (even if it’s the only) vowel, and match that and anything after”
I’m satisfied with that. But consider three syllable words again. “phonetic / kinetic” is great (and, at least by the CMUdict’s phonetics, an exact vowel match in both syllables). But “athletic / kinetic” — that’s a reasonable rhyme, isn’t it? (I’m actually asking.)
So, we could say, then, that:
“the penultimate vowel and everything after it, are the only syllables that matter for rhyming in a 2+ syllable word”
(Also, I just like the word “penultimate.” Which, let’s see, by this definition should rhyme with “intimate,” “proximate,” “legitimate.” This is where you realize you’ve been playing with this too long.)But, does this apply to any 2+ syllable words, even if they’re a lot longer or shorter than one another? Is “parthenogenetic” a reasonable rhyme for “phonetic”? Or “hettick”? It would be easy to be generous and say “yes,” and probably equally easy to be strict and say, “no.” No conclusions here, really.
- The way we’re matching (so far) is matching only the exact same emphasis per syllable. While that definitely sounds the most graceful in terms of proper rhyming, should words that have different syllabic emphasis still be “good enough?” And, if so, how do we revise that query? (One solution is to remove the exact numerals and query instead with pattern matches that find those same phonetics with any numerals at the end of the vowels. Regular expressions could handle that easily enough.)Something like this (but generalized to handle any such case):
Review of Goals
After getting to this point, it felt like time to step back and review the actual goals of the project. There are a lot of paths into some pretty scary forests, in terms of the time and effort one could spend chasing “perfect” rhyming rules. (Especially since we haven’t even started to consider slant rhymes, etc.)
Since this project is meant to engage people’s interest in poetry (rather than to create any kind of rhyming authority), it felt like this solution was “good enough.” When considering that this is meant to offer suggestions, it also felt more manageable and hopefully more helpful to offer a few strong matches, rather than a potential flood of weaker ones.
So our final rhyme rule, for now, is:
“Find the second-to-last vowel in any word, unless there’s only one vowel. Then take that vowel, and everything after it, and match it against other words that end with the same phoneme sequence, and the same emphasis per vowel.”
One nice thing about applications like these is that it’s easy enough to call this the first version, where later versions can revise/improve all of this further down the line.
A few of the (50) matched words:
In terms of putting this into the web application, the next steps would be to Ajax-ify this, to be something that we can call when the user enters a particular word. That function (in Javascript) can ask this script for any rhyming matches based on that word, and then hand back something that we can then present to the user as a list of suggested rhymes.
In terms of the overall project, the more general next step is to start thinking about the meter. Which means it’s time to start counting syllables.
And, if you haven’t been wondering about it already, there’s still the lurking question of: “what about words that aren’t in this dictionary?” More on that next.