So the first step gave us everything we needed in terms of our dictionary database, word searching, and phonetics. Â That’s the better part of the foundation laid, butÂ those by themselves are not quite allÂ we need. Â (Luckily, they should give us everything we need to figure the rest out for ourselves.)
So the question remaining is, of course, â€œhow?â€ Â Well, the first answer to that is to start figuring out what exactly â€œrhymingâ€ means.
Disclaimer: I like to think of myself as a good student — back when I was one, and even now — but for some reason, with this project at least, I didnâ€™t start by doing my homework. Â There are a lot of reasons for that (where impatience is probably the main one), but for whatever reason I started this project wanting to figure out things like â€œwhat is rhymeâ€ or â€œwhat constitutes a syllableâ€ for myself. Â Thatâ€™s the â€œpuzzleâ€ aspect I was talking about — the fun of which is probably the second reason for skipping the required reading.
The point of this is, nothing about this part of the projectÂ is even kind of groundbreaking research, I know. Â But part of the fun is diving right in, and using tools like that to check (and revise) your own work later. Â Please, by all means, work smarter than I sometimes do.
So, what is a rhyme?
Well, according to the first stab I took at this (spoiler alert: this isnâ€™t the right answer), a rhyme is when you can match up the tail ends of two words, counting their phonetic segmentsÂ backwardsÂ from the end of the word, until you come to a vowel.
From the last section, the phonetic segments (according to ARPAbet and CMUdict) of the word “phonetic.”
(And how do we know when weâ€™ve found our vowel? Â That’s easy enough, we just need a regularÂ old regular expression.)
preg_match() in this case will find any numerals in the $lastletter (a substring) of our word.
The lexical stress numbers, by ARPAbet/CMUdictâ€™s notations, are the only numerals in the phonetics field. Â So, we pattern match against the last character in that field and, if itâ€™s a number, weâ€™ve found our vowel.
(So, counting backwards to the first vowel, it would give usÂ these two bits.)
Itâ€™s easy enough to take that result (a string) and just feed that into a query against our CMUdict database:
On the ugliest, cringe-inducing rhyming level (think bad pop music), this is at least the start of a usable rhyme finder. Â Granted, if we used this as-is, itÂ would tell you that our word “phonetic” is a rhyming match with, say, “academic,”Â but, you know, close enough. Â (Yeah, not quite.)
So, to make that into a slightly-less-horrible rhyme match, we can continue counting (backwards) through more of the word, gathering up any consonants that occur immediately before that vowel, and feeding those into our query string as well. Â This will at least get the entire last syllable sound to match.
I did that next, continuing to match backward until I found another vowel, and then recording everything after (and not including) that vowel.
This turned what was a search for the “-ic” of “phonetic” into a search for “-tic.”
This does kill off those cringeworthiest matches (like â€œalgebraicâ€), but it still leaves us with pretty weak matches like â€œaristocratic,â€ since itâ€™s really only matching on that â€œ-ticâ€ ending.
So, at that point, it seems like we ought toÂ include theÂ next vowel sound before this last syllable after all.Â (Of course, this is assuming that there is a second vowel — but, hey, maybe that’s a good thing if this excludes words with just one vowel/syllable… after all, does â€œphoneticâ€ actually rhyme with, say, â€œclickâ€?)
After trying that, we get:
Now we’re getting somewhere.
It’s easy to see a huge quality jump in our rhyming here, with just this one more vowel.
We could go a step further, of course, and add the preceding consonants to thatÂ vowel as well, at which point our list would drop to words like “kinetic,” etc. Â That’sÂ certainly a closer match than words like “athletic,” but I’m not sure if it’s a significantly betterÂ rhyme. Â (Again, I’m following no formal definitions here, but doing this first round just on “my gut.”)
So my question (to any/everyone)is:Â Â is “phonetic / kinetic” a significant (or necessary) step upÂ in rhyme quality from “phonetic / athletic”? Â (I should probably keep my vote out of this, but whatever, my own answer would be “meh, not really.”)
So, at least temporarily satisfied to leave that there, I’d say that at thisÂ point there are already a few conclusions that I can come to:
- This level of matching is probablyÂ a â€œgood enoughâ€ rule for rhyming, or at least good enough to leave this as it is for now.Obviously it’sÂ not perfect, but itâ€™s started to answer the task of the research question, providing passable rhyme suggestions to users that might not think of them on their own.
- This type of syllable-matchingÂ works great for this word, but has scaling issues at shorter or larger words.Â Consider single syllable words — take â€œquirkâ€, for instance. Â For this, weâ€™d need to omit any consonantsÂ before the (only) vowel, so that “quirk” can find rhymes likeÂ â€œperkâ€ or â€œlurk.â€That would mean the rule for single syllable words would then be:“matchÂ the vowel and anything after”where, for two syllable words, it might be:“match the first vowel, and anything after”which we can obviously combine, as a rule, into:
“find the first (even if it’s the only) vowel, and match that and anything after”
I’m satisfied with that. Â But consider three syllable words again. Â “phonetic / kinetic” is great (and, at least by the CMUdict’s phonetics, an exact vowel match in both syllables). Â But “athletic / kinetic” — that’s a reasonable rhyme, isn’t it? Â (I’m actually asking.)
So, we could say, then, that:
“the penultimate vowel and everything after it,Â are the only syllablesÂ that matter for rhyming in a 2+ syllable word”
(Also, I just like the word “penultimate.” Â …let’s see, “intimate,” “proximate,” “legitimate” — okay, I’ve been doing this too long now.)
But, seriously, is this applicable to any 2+ syllable words, even if they’re a lot longer or shorter than one another? Â Is “parthenogenetic” a reasonable rhyme for “phonetic”? Â Or “hettick”?
No conclusions here, sorry to say — just some questions to leave open in future steps. Â Back to myÂ last conclusion, so far:
- The way we’re matching (so far) is matching only the exact same emphasis per syllable. Â While that definitely sounds the most graceful in terms of proper rhyming, should words that have different syllabic emphasis still be â€œgood enough?â€ Â And, if so, how do we revise that query? Â (One solution is to remove the exact numerals and query instead with pattern matches that find those same phonetics withÂ any numerals at the end of the vowels. Â Regular expressions could handle that gracefully enough, I’d imagine.)
Something like this, only with more flexibility and hopefully less manual “OR” clause building?
Review of GoalsÂ
I think at a moment like this, it’s important to review the actual goals of the project, since there are a lot of potential pathsÂ into some pretty scary forests, in terms of the time and effort that could go into making “good” into “perfect.” Â (Especially considering that “perfect” might very well be impossible anyway.)
Since this project is meant to engage peopleâ€™s interest in poetry (more than to create some ultimate rhyming authority application), I decided that, for now, our earlier solution was â€œgood enough,â€ especially in that it gives fewer/higher-quality results, which I think is a nice way to lean when given the choice. Â To a certain extent, opening our patternÂ up to weaker emphasis matching would only bloat the results list — and with weaker results, on top of that.
So our final rhyme rule, for now, is:
“Find the second-to-last vowel in any word, unless there’s only one vowel. Â Take that vowel, and everything after it, and match it against other words that end with the same phoneme sequence, and the same emphasis per vowel.”
So… yeah. Â Thatâ€™s my rhyming solution. Â (For now.)
One nice thing about applications like these is that itâ€™s easy enough to call this the first version, working as intended (if not perfect), and revise/improve that bit later down the line.
And, from my Worldâ€™s-Most-Glamorous-Website test script, this is what my test code/page looks like.
A few of the (50) matched words.
Next, we’ll need to start thinking about the meter. Â Which means getting the syllable counts for all of these words…
(And, in the next episode, if you havenâ€™t been anticipatingÂ it already, thereâ€™s a huge lurking question here of â€œso what about words that arenâ€™t in this dictionary?â€ <cue dramatic musical swell>)