August 2012
Volume 12, Issue 9
Vision Sciences Society Annual Meeting Abstract  |   August 2012
Face-voice integration in person recognition
Author Affiliations
  • Stefan R. Schweinberger
    DFG Research Unit Person Perception, Friedrich Schiller University of Jena
  • Nadine Kloth
    School of Psychology, University of Western Australia
  • David M.C. Robertson
    DFG Research Unit Person Perception, Friedrich Schiller University of Jena
Journal of Vision August 2012, Vol.12, 1183. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Stefan R. Schweinberger, Nadine Kloth, David M.C. Robertson; Face-voice integration in person recognition. Journal of Vision 2012;12(9):1183.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

Audiovisual integration (AVI) is known to be an important mechanism in speech perception, but evidence for AVI in person recognition has been less clear. We demonstrate AVI in the recognition of familiar (but not unfamiliar) speakers. Specifically, systematic behavioural benefits in recognising a familiar voice occur when it is combined with an articulating face of corresponding speaker identity, whereas costs occur for voices combined with precisely time-synchronised noncorresponding faces. In a series of behavioural experiments, we show that AVI in person recognition (a) depends on familiarity with a speaker, (b) is sensitive to temporal synchronization of facial and vocal articulations (with a qualitatively similar, but quantitatively extended integration time window, when compared to AVI in speech perception), and (c) occurs both for voice and face recognition tasks. Subsequent experiments with event-related brain potentials (ERPs) compared unimodal (face only, voice only) conditions with audiovisual conditions (AV corresponding, AV noncorresponding). The results suggest that audiovisual speaker identity correspondence influences later ERPs beyond 250 ms only, with increased right frontotemporal negativity for noncorresponding identities. In two separate experiments, the above findings were demonstrated both for face-voice combinations using sentence-stimuli as well as McGurk-type syllable utterances. Remarkably, when compared with the summed responses from both unimodal conditions, both audiovisual conditions elicited a much earlier onset of frontocentral negativity, with maximal differences around 50-80 ms. Overall, electrophysiological data demonstrate AVI at multiple levels of neuronal processing. Although the perception of a voice and a time-synchronised articulating face triggers surprisingly early and mandatory mechanisms of audiovisual processing, audiovisual speaker identity may only be computed about 200 ms later.

Meeting abstract presented at VSS 2012


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.