Talking Face Speech Synthesizer Demo

A question was posted on StackOverflow about whether the ability existed on the Mac OS to match a talking head or face to the spoken phonemes of the built-in speech synthesizer. The answer is: of course there is, it’s a Mac! I’ll show you how it’s done.

Cocoa & Speech Synthesis

The two key terms for which to search the documentation are NSSpeechSynthesizer and NSSpeechSynthesizerDelegate (the protocol). It’s so ridiculously simple to do in code, the hardest part is making the images for each of the phonemes. About that, a disclaimer: my phoneme images are free to use but they’re horribly done. In fact, they’re first-season-of-South-Park bad. Maybe worse. Okay, definitely worse. Have a look for yourself.

Talking Mouth In Action

Awful, right? Just awful. But funny and a whole lot of fun. With a little artistic talent, and a well-designed Quartz Composition, you could create a good-looking face with speech-synthesizer-controled mouth and maybe even realistically-moving cheeks, etc. Of course I won’t be going that far.

The Design

It’s a simple demo app. Nothing much to it. Everything you care about is in the App delegate class. Let’s ignore the UI and initialization-related code – we’re beyond that, aren’t we? Let’s take a high-level look.

First, the speech synthesizer class’s delegate methods let you know when a phoneme will be spoken (with an “op code” to identify the phoneme). It’ll also let you know when speaking has stopped (handy for not leaving your face’s mouth dangling rudely). Of course there’s a simple call to tell the synthesizer to start speaking a string. That’s really all you need.

I want to talk a little bit about the phoneme array approach I took. I couldn’t see any obvious means of translating the phoneme “op code” to the string-based identifiers listed in the Speech Synthesis Programming Guide’s Phonemes section, but then again, I didn’t really look too hard. I made (or duplicated) an image for each item in that list. Since the op codes seem to coincide (more or less) with the phoneme list, I simply made an array with each of the images in order of appearance in the phoneme list. To get the corresponding phoneme, I pass the op code as the array index. It works well enough, but some phrases ask for phonemes with op codes higher than the highest index, suggesting there are some missing from the documentation (tsk, tsk).

Have Fun!

That’s all there is to it! It’s clear to see how easy this would be to refine (assuming you have an ounce of artistic abilities). The source is available here. Enjoy.

Related Posts