September 2024
Volume 24, Issue 10
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2024
Dynamic Synthetic Faces Improve the Intelligibility of Noisy Speech, But Not As Much As Real Faces
Author Affiliations & Notes
  • Yingjia Yu
    University of Pennsylvania
  • Anastasia Lado
    University of Pennsylvania
  • Yue Zhang
    Baylor College of Medicine
  • John Magnotti
    University of Pennsylvania
  • Michael S. Beauchamp
    University of Pennsylvania
  • Footnotes
    Acknowledgements  The research was funded by NIH NS065395 and NS113339.
Journal of Vision September 2024, Vol.24, 1159. doi:https://doi.org/10.1167/jov.24.10.1159
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Yingjia Yu, Anastasia Lado, Yue Zhang, John Magnotti, Michael S. Beauchamp; Dynamic Synthetic Faces Improve the Intelligibility of Noisy Speech, But Not As Much As Real Faces. Journal of Vision 2024;24(10):1159. https://doi.org/10.1167/jov.24.10.1159.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Seeing the face of a talker aids speech perception, especially for noisy speech. Advances in computer graphics have made encounters with synthetic faces more frequent, but little is known about their perceptual properties. We examined the benefit for noisy speech perception of two types of synthetic faces, one that used the facial action coding system (FACS) to simulate the musculature underlying jaw and lip movements during speech production, and one generated with a deep neural network (DNN). Audiovisual recordings of 64 single words were combined with pink noise at a signal-to-noise ratio of -12 dB. The words were presented in four formats: noisy auditory-only (An); noisy audiovisual with a real face (AnV:Real) and noisy audiovisual with a synthetic face (AnV:FACS or AnV:DNN). Sixty participants recruited from Amazon Mechanical Turk attempted to identify each word. Within participants, each word was presented in only a single format and counterbalancing across participants ensured that every word was presented in every format. Seeing the real talker’s face improved the intelligibility of noisy auditory words (accuracy of 59% for AnV:Real vs. 10% for An). Synthetic faces also improved intelligibility, but by a smaller amount (accuracy of 29% for AnV:FACS and 30% for AnV:DNN vs. 10% for An). A mixed-effects model showed that real faces provided more benefit than synthetic faces (p < 10-16) but there was no difference between synthetic face types (t = 0.2, p = 0.99). The accuracy difference between real and synthetic faces was more pronounced for some speech tokens than others, and was the largest for /th/ and /f/ tokens. These data show that synthetic faces may provide a useful experimental tool for studying audiovisual integration during speech perception and suggest ways to improve the verisimilitude of synthetic faces.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×