Neurocognitive mechanisms of multisensory perception

Researcher: Tobias Andersen, Toni Auranen, Iiro Jääskeläinen, Vasily Klucharev, Riikka Möttönen, Ville Ojanen, Johanna Pekkola, Mikko Sams, Kaisa Tiippana

Effect of preceding audiovisual context on auditory perception We studied the representations underlying audiovisual integration using a priming paradigm. Audiovisual primes, preceding auditory targets, were either incongruent (auditory /ba/ & visual /va/) or congruent (auditory /va/ & visual /va/, auditory /ba/ & visual /ba/). The targets were /ba/ or /va/. The intensity of the prime’s auditory component was either 50 dB or 60 dB. Identification speed of the target /ba/ was strongly affected by the nature of the prime. The effect of the incongruent audiovisual prime depended on the intensity of its acoustic component. Our results can be explained by assuming that some properties of the visual representation were mapped into the auditory representation.

Processing of sine-wave speech in the human brain Neural mechanisms for speech perception are localized in left posterior temporal cortex according to previous neuroimaging studies. However, since speech sounds are acoustically different from other sounds, it is possible that the assumed speech-specific activity reflects sensitivity to the complex acoustic structure of speech sounds. "Sine wave speech" (SWS) provides a tool to study neural speech-specificity using identical acoustic stimuli which can be perceived as speech or non-speech, depending on previous experience of the stimuli. We scanned subjects using 3T functional MRI in two sessions, each including SWS, control stimuli, with an intervening period of speech training. In the pre-training session, subjects perceived the SWS stimuli as non-speech, and in the post-training session, the identical stimuli were perceived as speech. Activity elicited by SWS stimuli was significantly greater in the post- vs. pre-training session within left posterior superior temporal sulcus (STS) (see figure 33). Importantly, activity elicited by the control stimuli, which were always perceived as non-speech, did not change during the whole experiment (see figure 33). We conclude that left posterior STS subserves neural processing specific for speech perception. This study was done in collaboration with the FMRIB Centre in the University of Oxford.

Figure 33

Figure 33: Speech-specific activation in the left posterior STS. The left side of the figure shows the region, which was activated more in the post- than in the pre-training session for the SWS stimuli. The analysis was carried out within a left superior temporal ROI (indicated as blue). Statistical images were thresholded using clusters determined by Z > 2.3 and a cluster significance threshold of P < 0.05, corrected for multiple comparisons across the ROI. The right side of the figure depicts the mean (SEM) BOLD signal changes in the left posterior STS for all stimulus types in the pre- and post-speech training sessions (n = 16). The statistical significances are indicated. Modified from Möttönen et al. (submitted).

Perception of matching and conflicting audiovisual speech in dyslexic and fluent readers as assessed using functional MRI As a clinical research appliation of our audiovisual speech research, we presented phonetically matching and conflicting audiovisual vowels to ten dyslexic and ten fluent-reading young adults during "clustered volume acquisition" functional magnetic resonance imaging (fMRI) at 3 Tesla. We further assessed co-variation between the dyslexic readers’ phonological processing abilities, as indexed by neuropsychological test scores, and BOLD signal changes within visual cortex, auditory cortex, and Broca’s area. Both dyslexic and fluent readers showed increased activation during observation of phonetically conflicting compared to matching vowels within the classical motor speech regions (Broca’s area and the left premotor cortex), this activation difference being more extensive and bilateral in the dyslexic group. The between-groups activation difference in the conflicting > matching contrast reached significance in the motor speech regions and in the left inferior parietal lobule, with dyslexic readers exhibiting stronger activation compared to fluent readers. The dyslexic readers. BOLD signal change co-varied with their phonological processing abilities within the visual cortex and Broca’s area, and to a lesser extent within the auditory cortex. We suggest the findings to reflect dyslexic readers’ greater use of motor-articulatory and visual strategies during phonetic processing of audiovisual speech, possibly in order to compensate for their difficulties in auditory speech perception.

Modulation of auditory cortex activation by sound presentation rate and attention We studied the effects of sound presentation rate and attention on supratemporal cortex (STC) activation with 3-Tesla functional magnetic resonance imaging (fMRI) in 12 healthy adults. The sounds were presented at steady rates of 0.5, 1, 1.7, 2, or 4 Hz while subjects either had to focus their attention to the sounds or were to ignore the sounds and attend to visual stimuli presented with a mean rate of 1 Hz in all conditions. Consistently with previous results obtained in separate studies, we found that both increase in the stimulation rate and attention to sounds enhanced activity in bilateral STC. Further, we observed larger attention effects with higher stimulation rates. Our results separate the rate-dependent and attentionrelated modulation of STC activation and indicate that both factors should be controlled in fMRI studies on auditory processing.

Processing of audiovisual speech in the Broca’s area We investigated neural mechanisms underlying processing of audiovisual phonetic information in humans using functional magnetic resonance imaging (fMRI) (See figure 34) . Ten healthy volunteers were scanned with a ’clustered volume acquisition’ paradigm at 3T during presentation of phonetically congruent and incongruent audiovisual vowels /a/, /o/, /i/ and /y/. Comparing activations to congruent and incongruent audiovisual vowels enabled us to specifically map the cerebral areas participating in the audiovisual speech processing at the phonetic level. Phonetic incongruency (e.g., visual /a/ and auditory /y/), as compared with congruency (e.g., visual and auditory /y/), significantly activated the Broca’s area, the prefrontal cortex and the superior parietal lobule in the left hemisphere. In contrast, we failed to see any enhanced activity to phonetically congruent stimulation in comparison to the incongruent stimulation. Our results highlight the role of the Broca’s area in the processing of audiovisual speech and suggest that it might provide a common representational space for auditory and visual speech.

Figure 34

Figure 34: Across-subjects (N=10) z-statistic maps overlaid on an anatomical template. Congruent audiovisual speech activated the auditory and the visual cortical areas, as well as the inferior frontal, the premotor and the visual-parietal areas bilaterally (upper panel). Incongruent audiovisual speech caused a similar but more extensive pattern of brain activity (middle panel). The difference reached significance in three left hemisphere areas: Brocat’s area (BA44/45), superior parietal lobule (BA7) and prefrontal cortex (BA10) (lower panel). In the contrast ’Congruent > Incongruent’ no statistically significant voxels were detected. Activation images were thresholded using clusters determined by voxel-wise Z>3.0 and a cluster significance threshold of p<0.05, corrected for multiple comparisons. (Ojanen et al. in press Neuroimage)

Auditory and visual speech perception activate the speech motor regions We investigated the neural basis of auditory and visual speech processing using a "clustered volume acquisition" functional magnetic resonance imaging (fMRI) pulse sequence at 3T (See figure 35). Common activation areas to presentation of auditory and visual vowels were observed in the left Insula, the Broca’s area, the lateral premotor cortex, and the inferior parietal area as well as the right superior temporal gyrus/sulcus. Significantly stronger activation for visual than auditory speech was observed in the left motor and sensory areas, inferior parietal lobule, posterior cingulate gyrus and visual sensory specific areas. Significantly stronger activation for auditory speech, in turn, was observed in the left lingual gyrus, the left insula, anterior cingulate bilaterally and auditory sensory specific areas. Our results suggest that the speech motor areas provide a common representational space for auditory and visual speech.

Figure 35

Figure 35: Speech motor regions, including Broca’s area and the premotor cortex, are activated in common during auditory and visual speech perception.

Effects of lip-reading in the auditory cortex How auditory cortex works is generally less well understood than e.g. functions of the visual cortex. Only recently, evidence has emerged about active information processing and possible multisensory engagement in the auditory areas. For example, lip-reading is known to activate secondary auditory areas, and, in deaf people, even simple visual stimuli (like moving dots) have been shown to activate "auditory" temporal lobe areas.

Using fMRI (functional magnetic resonance imaging), we studied which areas of the auditory cortex would be activated by silent lip-reading, specially focusing to the primary auditory cortex (See figure 36). During fMRI scanning the subjects were intermittently shown a face either silently uttering vowels or a still image of the same face.

Figure 36

Figure 36: Example of three subjects showing activation in the primary auditory cortex by visual speech. The yellow line outlines the brain area accommodating the primary auditory cortex, and the loci of statistically significant activations are marked with red. (Pekkola et al. in press NeuroReport)

We found secondary auditory cortex activation by visual speech in all subjects and primary auditory cortex activation in seven out of ten subjects. This suggests, that primary auditory cortex could actually receive visual input, or possibly modulation of its function by attentional mechanisms (where visual speech cues would "sensitize" the auditory cortex to listening).

In a related study, we utilized 306-channel magnetoencaphalogaphy (MEG) in 8 healthy volunteers to test whether seeing speech modulates the responsiveness of auditory-cortex neurons tuned on phonetic stimuli. Specifically, we hypothesized that seeing a visual articulation causes adaptation of auditory cortex MEG responses to a subsequently presented phonetic sound. Auditory ’test’ stimuli (Finnish vowels /ä/ and /ö/) were preceded (500-ms lag) by auditory (/ä/, /ö/, and the F2-midpoint between /ä/ and /ö/) or visual articulatory (/ä/ and /ö/) ’adaptor’ stimuli. As a separate control, the auditory /ä/ and /ö/ stimuli were presented without the adaptors. The subjects’ task was to behaviorally discriminate between the /ä/ and /ö/ test stimuli. The amplitude of the left-hemisphere N1m response to test stimuli was significantly suppressed with auditory (P<0.001) and visual (P<0.05) adaptors, this effect being signifi- cantly greater with the auditory adaptors (P<0.01) (see Fig.37). These findings suggest that seeing the articulatory gestures of a speaker influences auditory speech perception via modulation of the responsiveness of auditory cortex feature-detector neurons tuned on phonetic sounds features. This may relate to recent animal studies suggesting that tuning properties of auditory cortex neurons are modulated by the attentional/motivational state of the organism. The fact that adaptation was significantly greater when auditory as compared to visual adaptors preceded the test stimuli can be explained by additional adaptation to acoustic stimulus features.

We also investigated integration of audiovisual speech (i.e., speech sounds and seen articulatory gestures) and non-speech (i.e. simple tones and seen expanding ellipsoids) objects in the human auditory cortex using EEG. We found out that the auditory N100 response was suppressed when both acoustic and visual components of an object were speech. However, when either acoustic or visual component (or both) was non-speech, the N100 response was not suppressed. The results suggest that the human auditory cortex is involved in integration of speech-specific features of audiovisual speech objects.

Figure 37

Figure 37: The effects of auditory and visual adaptor stimuli on subsequently presented auditory cortex N100m responses to auditory phonemes. Auditory phonemes preceding the target phonemes caused significant decrease in response amplitudes. Visual phonemes (articulations) presented before the auditory phonetic stimuli caused significant suppression of the auditory reponses, which was significantly less than by the auditory adaptors. (Jääskeläinen et al. NeuroReport 2004)