Researchers: Martin Dobšík, Michael Frydrych, Andrej Krylov, Jari Kätsyri, Pertti Palo, Mikko Sams
In social interaction, speech is both heard and seen. Visible articulatory movements significantly improve speech perception, especially when the acoustic speech is degraded because of, e.g. hearing impairment or environmental noise. There is an evidence that the speech perception improves significantly also with computer animated audio-visual speech synthesizers, talking heads. Facial expressions are an important dimension in face-to-face communication. They may accentuate spoken information, convey additional information or regulate conversation between several speakers. Non-verbal body language, which also involves facial expressions, has been claimed to compose as much as 65% of human communication. Understanding how people process, recognize, and interpret each other’s faces and facial motion is a challenging task that has attracted hundreds of scientists in both the social science, computer vision and psychology communities.
Figure 49: Artificial Person. Expressions from left: neutral, sad and surprised.
We have developed a toolkit for real-time animation of Finnish-speaking 3D talking head, "Artificial Person". We have paid special attention in improving the quality of audiovisual speech. Synchronized auditory and visual speech are automatically produced from input text, which can be enriched by user definable commands to perform specific gestures, as for example facial expressions (Fig. 49). The Artificial Person is able to express six basic emotions (anger, disgust, fear, happiness, sadness and surprise) and their combinations.
Figure 50: Identification percentages for expressions of actors in different datasets.
We started to create a digital database of emotion related facial movements, first of this kind in Finland. At the moment, the database contains static pictures and short video sequences of six basic expressions performed by two actors and Artificial Person. Both actors are certi- fied FACS (Facial Action Coding System) coders. FACS is an objective and comprehensive system for recognizing, describing and coding facial expressions. The database has been evaluated. Preliminary results indicate that the expressions of Artificial Person, except fear, were identified as expected (Fig. 50).