A question was posted on StackOverflow about whether the ability existed on the Mac OS to match a talking head or face to the spoken phonemes of the built-in speech synthesizer. The answer is: of course there is, it’s a Mac! I’ll show you how it’s done.

The source code and demo app have been moved to their own page.

Cocoa & Speech Synthesis

The two key terms for which to search the documentation are NSSpeechSynthesizer and NSSpeechSynthesizerDelegate (the protocol). It’s so ridiculously simple to do in code, the hardest part is making the images for each of the phonemes. About that, a disclaimer: my phoneme images are free to use but they’re horribly done. In fact, they’re first-season-of-South-Park bad. Maybe worse. Okay, definitely worse. Have a look for yourself.

Talking Mouth Demo

Awful, right? Just awful. But funny and a whole lot of fun. With a little artistic talent, and a well-designed Quartz Composition, you could create a good-looking face with speech-synthesizer-controled mouth and maybe even realistically-moving cheeks, etc. Of course I won’t be going that far.

The Design

It’s a simple demo app. Nothing much to it. Everything you care about is in the App delegate class. Let’s ignore the UI and initialization-related code – we’re beyond that, aren’t we? Let’s take a high-level look.

First, the speech synthesizer class’s delegate methods let you know when a phoneme will be spoken (with an “op code” to identify the phoneme). It’ll also let you know when speaking has stopped (handy for not leaving your face’s mouth dangling rudely). Of course there’s a simple call to tell the synthesizer to start speaking a string. That’s really all you need.

I want to talk a little bit about the phoneme array approach I took. I couldn’t see any obvious means of translating the phoneme “op code” to the string-based identifiers listed in the Speech Synthesis Programming Guide’s Phonemes section, but then again, I didn’t really look too hard. I made (or duplicated) an image for each item in that list. Since the op codes seem to coincide (more or less) with the phoneme list, I simply made an array with each of the images in order of appearance in the phoneme list. To get the corresponding phoneme, I pass the op code as the array index. It works well enough, but some phrases ask for phonemes with op codes higher than the highest index, suggesting there are some missing from the documentation (tsk, tsk).

Have Fun!

The source, license, and a code snippet are available on the TalkingMouthDemo page.

Conclusion

That’s all there is to it! It’s clear to see how easy this would be to refine (assuming you have an ounce of artistic abilities). Enjoy.

 

I released Transcriva 2.014 this morning. Transcriva is transcription software for your Mac.

The changes include:

  • Fixed bug where adding a person to a transcript then undoing would cause an error.
  • Fixed bug where the end-of-clip chime would not play when requested.
  • Fixed bug where media controls in transcript properties didn’t update when a clip became available or unavailable.
  • Added automatic, one-click crash reporting.

It’s a free update for all registered 2.x users. Comments, questions, or suggestions are welcomed, just send an e-mail to support@bartastechnologies.com.

© 2011 Joshua NozziJoshua Nozzi is a Cocoa developer for hire.Suffusion theme by Sayontan Sinha