Artificial Person

Researchers: Martin Dobsik, Michael Frydrych, Andrej Krylov, Jari Kätsyri, Pertti Palo, Mikko Sams

In social interaction, speech is both heard and seen. Visible articulatory movements significantly improve speech perception, especially when the acoustic speech is degraded because of, e.g., hearing impairment or environmental noise. Facial expressions are an important dimension in face-to-face communication. They may accentuate or add emphasis to spoken information, convey additional information or regulate conversation between several speakers. Facial expressions reveal information about a speaker's emotional state and intentions, which may even be contradictory to that indicated in speech.

Figure 37

Figure 37: Artificial Person, a new generation of Finnish-speaking talking head.

We have developed computer-animated 3-D audiovisual speech synthesizers, talking heads, which are able to produce visual speech as well as facial expressions (see Fig. 37). Talking heads enhance speech perception. They have also been utilized in facial expression research. We have been developing Finnish-speaking talking heads. The present version is called an "Artificial Person". We have paid special attention in improving the quality of audiovisual speech. An improved coarticulation model has been developed. We have also improved the expressions of the new talking head. A facial expression model has been created, which is roughly based on human facial muscle anatomy (see Fig. 38). The Artificial Person is able to express six basic emotions (anger, disgust, fear, happiness, sadness and surprise) and their combinations. This work will be continued by modelling expressions based on expression data collected from real people.

Figure 38

Figure 38: Facial expression model based on human facial muscle anatomy.

Figure 39

Figure 39: Texture mapping and head topology morhping of the Artificial Person. The upper left picture shows the non-morphed topology of model. Lower left picture show original front and side views of the photographs used in modifying the model. The upper picture in the middle shows the new morphed head topology extracted from the photographs. The lower picture in the middle shows originalphotographs merged together. Picture in the right shows the final result, with facial texture added to the morphed head topology. Head is rotated slightly to the left.

The realism of the Artificial Person is remarkably better than that of our previous talking head. Head topology is much more detailed and also contains improved eyes and teeth. An important new feature is the possibility to texture map photographs of real persons to our Artificial Person. Its 3-dimensional head shape can be morphed to that of a real person (see Fig. 39). An animated 3-dimensional talking head capable of producing high-quality audiovisual text-to-speech synthesis and facial expressions may benefit various application areas from human-computer interfaces to telecommunication. Talking heads have also been applied in basic research on audio-visual speech and facial expression perception. In both these areas talking heads can be used for generating fully controllable research stimuli containing no superfluous events which are difficult to avoid with natural stimuli. Especially in facial expression research, stimuli with different head orientations and gaze directions are easily created with a talking head. Our group will use the Artificial Person mostly in studies concentrating on facial expression research.