Talking Face Speech Synthesizer Demo

Download TalkingMouthDemoA question was posted on StackOverflow about whether the ability existed on the Mac OS to match a talking head or face to the spoken phonemes of the built-in speech synthesizer. The answer is: of course there is, it's a Mac! I'll show you how it's done.

The two key terms for which to search the documentation are NSSpeechSynthesizer and NSSpeechSynthesizerDelegate (the protocol). It's so ridiculously simple to do in code, the hardest part is making the images for each of the phonemes. About that, a disclaimer: my phoneme images are free to use but they're horribly done. In fact, they're first-season-of-South-Park bad. Maybe worse. Okay, definitely worse. Have a look for yourself.

Talking Mouth Demo

Watch the Talking Mouth Demo

Awful, right? Just awful. But funny and a whole lot of fun. With a little artistic talent, and a well-designed Quartz Composition, you could create a good-looking face with speech-synthesizer-controled mouth and maybe even realistically-moving cheeks, etc. Of course I won't be going that far.

The Design

It's a simple demo app. Nothing much too it. Everything you care about is in the App delegate class. Let's ignore the UI and initialization-related code - we're beyond that, aren't we? Let's take a high-level look.

First, the speech synthesizer class's delegate methods let you know when a phoneme will be spoken (with an "op code" to identify the phoneme). It'll also let you know when speaking has stopped (handy for not leaving your face's mouth dangling rudely). Of course there's a simple call to tell the synthesizer to start speaking a string. That's really all you need.

I want to talk a little bit about the phoneme array approach I took. I couldn't see any obvious means of translating the phoneme "op code" to the string-based identifiers listed in the Speech Synthesis Programming Guide's Phonemes section, but then again, I didn't really look too hard. I made (or duplicated) an image for each item in that list. Since the op codes seem to coincide (more or less) with the phoneme list, I simply made an array with each of the images in order of appearance in the phoneme list. To get the corresponding phoneme, I pass the op code as the array index. It works well enough, but some phrases ask for phonemes with op codes higher than the highest index, suggesting there are some missing from the documentation (tsk, tsk).

The Code

Download TalkingMouthDemoSo, on to the code. Only relevant parts are listed. The full code (including the -imageForPhoneme: method as well as the example phoneme images are available in the project. Click the zip icon to the right to download this example.

- (IBAction)speak:(id)sender
{
     // Start speaking
     [synthesizer startSpeakingString:[textField stringValue]];
}
 
- (void)speechSynthesizer:(NSSpeechSynthesizer *)sender
 willSpeakPhoneme:(short)phonemeOpcode
{
     // Set the right image for the incoming phoneme opcode
     if (phonemeOpcode < [[self phonemes] count])
     [imageView setImage:[self imageForPhoneme:phonemeOpcode]];
}
 
- (void)speechSynthesizer:(NSSpeechSynthesizer *)sender
 didFinishSpeaking:(BOOL)success
{
     // Finished speaking, so silence is golden
     [imageView setImage:[[self phonemes] objectAtIndex:0]];
}
 
- (NSImage *)imageForPhoneme:(short)phonemeOpcode
{
     return [[self phonemes] objectAtIndex:phonemeOpcode];
}

Conclusion

That's all there is to it! It's clear to see how easy this would be to refine (assuming you have an ounce of artistic abilities). Enjoy.