Abstract
Viewing articulatory lip movements can improve the detection and comprehension of congruent auditory speech. However, the specific audiovisual correspondences that underlie these crossmodal facilitation effects are largely unknown. We hypothesized that the perceptual system may exploit reliable natural relationships between articulatory lip motion and speech acoustics. In particular, analysis of four speakers (two female) revealed a strong correlation (all R2 > .9) between the horizontal width of the oral aperture and the height of the frequency of the second formant as the speakers expanded and contracted their lips to pronounce the syllables /wi/ and /yu/ (APA transcription). To test whether this dynamic relationship between mouth aspect ratio and second-formant frequency underlies crossmodal facilitation of speech perception, we produced artificial stimuli that reproduced the audiovisual relationship. Visual stimuli were dark ellipses on a light background whose width expanded or contracted over 350 ms, approximating the sigmoidal change in mouth width when people pronounce /wi/ and /yu/. We verified that these ellipses were not perceived as mouths. Auditory stimuli were 100-Hz-wide bandpass-filtered white noise whose mean frequency rose or fell between 500 and 3000 Hz, approximating the second-formant frequency change. Using a bias-free 2IFC auditory-detection task (in noise), we estimated 18 participants' detection thresholds for the frequency sweeps while they viewed ellipses with correlated horizontal motion, anti-correlated horizontal motion, or no motion (using interleaved QUEST staircases). To rule out the possibility of a more general non-speech-specific mechanism, we also included conditions in which ellipses expanded or contracted vertically. As predicted, only correlated horizontal motion significantly decreased auditory detection thresholds relative to the static control. Anti-correlated and vertical motion produced no reliable changes. Together, these results suggest that the perceptual system exploits natural relationships between articulatory lip motion and vocal acoustics to facilitate speech perception.
Meeting abstract presented at VSS 2016