Abstract
Audiovisual integration (AVI) is known to be an important mechanism in speech perception, but evidence for AVI in person recognition has been less clear. We demonstrate AVI in the recognition of familiar (but not unfamiliar) speakers. Specifically, systematic behavioural benefits in recognising a familiar voice occur when it is combined with an articulating face of corresponding speaker identity, whereas costs occur for voices combined with precisely time-synchronised noncorresponding faces. In a series of behavioural experiments, we show that AVI in person recognition (a) depends on familiarity with a speaker, (b) is sensitive to temporal synchronization of facial and vocal articulations (with a qualitatively similar, but quantitatively extended integration time window, when compared to AVI in speech perception), and (c) occurs both for voice and face recognition tasks. Subsequent experiments with event-related brain potentials (ERPs) compared unimodal (face only, voice only) conditions with audiovisual conditions (AV corresponding, AV noncorresponding). The results suggest that audiovisual speaker identity correspondence influences later ERPs beyond 250 ms only, with increased right frontotemporal negativity for noncorresponding identities. In two separate experiments, the above findings were demonstrated both for face-voice combinations using sentence-stimuli as well as McGurk-type syllable utterances. Remarkably, when compared with the summed responses from both unimodal conditions, both audiovisual conditions elicited a much earlier onset of frontocentral negativity, with maximal differences around 50-80 ms. Overall, electrophysiological data demonstrate AVI at multiple levels of neuronal processing. Although the perception of a voice and a time-synchronised articulating face triggers surprisingly early and mandatory mechanisms of audiovisual processing, audiovisual speaker identity may only be computed about 200 ms later.
Meeting abstract presented at VSS 2012