Researchers: Michael Frydrych, Mikko Sams, Andrej Krylov, Jari Kätsyri, Laura Laitinen, Pertti Palo
In face-to-face communication speech perception is both visual and auditory. Visible speech is particularly effective when the auditory speech is degraded, because of the noise, bandwidth filtering or hearing impairment. There is an evidence that the speech perception improves significantly also with computer animated audio-visual speech synthesizers. Moreover, facial expressions during speech carry additional important information - they add emphasis, reveal emotions, and support the interaction during the dialog. Currently there are at least English, Swedish, Japanese and French speaking audio-visual speech synthesizers, so-called talking heads.
Figure 41: Our first-generation head is shown on the left. It is a relatively inflexible derivative of "Parke's model". The new head model, still lacking the eyes and the tongue, is on the right. It is much more accurate. The expressions and articulation are based on the data measured from the acting face.
We are developing a 3D talking head capable of faithful imitation of facial expressions and other gestures accompanying speech and interaction in a dialog. We call our head "Artificial Person". The Artificial Person is a next generation of Finnish talking head which our group has been developing during previous years. We pay special attention to the quality of audio-visual speech now - improving coarticulation model and modelling a prosody features in both speech modalities. The initial target language is Finnish but the modular approach taken permits inclusion of other languages too.
Numerous application areas benefit of high-quality visual speech synthesis, including human-computer interfaces, basic research on audio-visual speech perception, speech therapy and telecommunication. Our group will use the Artificial Person in research on speech perception and dialog systems.