Researchers: Kimmo Kaski, Janne Kulju, Jan-Mikael Lepistö, and Mikko Sams
We have started the development of visual speech synthesis, one of the main objectives for Multimodality project launched by Academy of Finland. This means constructing a "talking head", an animated face on a computer display that articulates Finnish and other languages as naturally as possible. We have developed a real-time parametrized 3D model which can speak the text given from keyboard. The model is driven by about 50 parameters, 14 of which are currently used in producing the actual speech. Acoustical speech is produced by using an external speech synthesizer synchronized with the articulation.
Numerous application areas will benefit of high-quality visual speech synthesis including human-computer interfaces, basic research on audiovisual speech perception, speech therapy and telecommunication. The model is easy to port to several hardware platforms. Currently there are versions for SGI, DEC alpha and PC (Windows) environments. While the speech quality will be the major concern in future, we will also work on adjustable facial geometry and dialogue made possible by speech recognition. In improving speech quality a careful mapping of Finnish visemes - the basic visual speech units - will be crucial.