Researchers: Jean-Luc Olivés, Janne Kulju, Riikka Möttönen, and Mikko Sams
In this project we are developing an Artificial Person (AP), that can communicate and interact with humans via natural communications channels. Over the last two years we have worked on an animated character, using audio-visual speech synthesis, which gives the AP a more apparent personality, see figure 35. The intelligibility of our synthesizer has been evaluated and an appropriate user interface for controlling the synthesizer has been developed.
|Figure 35: Set of expressions of the Artificial Person that is based on a parametrised facial animation and synchronised speech synthesizer.|
The first version of the audiovisual synthesizer is a combination of acoustic synthesizer (MikroPuhe 4.1 by TimeHouse Ltd) and a dynamic animated facial model. One key aspect of the development is a continuous evaluation of the quality of the synthesis. Therefore, we have created tools to run experiments that help us with the evaluation. A high controllability is needed in order to produce appropriate stimuli for perception experiments, and thus the interface development for the synthesizer has been given extra effort. The user-interface has been developed in collaboration with Professor Kari-Jouko Räihä's research group at University of Tampere.
In addition, an intelligibility study for this synthesiser has been carried out. The test corpus consisted of 39 VCV words that were presented under natural audiovisual, synthetic audiovisual, natural audio only, synthetic audio only, natural audio + synthetic vision or synthetic audio + natural vision conditions and with 0, -6, -12 and -18 dB signal-to-noise ratios (SNRs). The subjects were 11 male and 9 female native speakers of Finnish. The results of global intelligibility are depicted in figure 36.
The facial animation was found to improve the intelligibility of both the synthetic and natural acoustic speech. The average improvement was about 15% being somewhat larger for smaller SNRs.
Our future objective is to improve the quality of synthesis and to use the synthesizer as a stimulus generator for speech perception experiments. We are also developing applications for the synthesizer, e.g., in teaching lip-reading.