September 2024
Volume 24, Issue 10
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2024
Efficient Inverse Graphics with Differentiable Generative Models Explains Trial-level Face Discriminations and Robustness of Face Perception to Unusual Viewing Angles
Author Affiliations
  • Hakan Yilmaz
    Yale University
  • Matthew Muellner
    Yale University
  • Joshua B. Tenenbaum
    Massachusetts Institute of Technology
  • Katharina Dobs
    Justus-Liebig University Giessen
  • Ilker Yildirim
    Yale University
Journal of Vision September 2024, Vol.24, 1354. doi:https://doi.org/10.1167/jov.24.10.1354
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Hakan Yilmaz, Matthew Muellner, Joshua B. Tenenbaum, Katharina Dobs, Ilker Yildirim; Efficient Inverse Graphics with Differentiable Generative Models Explains Trial-level Face Discriminations and Robustness of Face Perception to Unusual Viewing Angles. Journal of Vision 2024;24(10):1354. https://doi.org/10.1167/jov.24.10.1354.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

At a glance, we not only recognize the category or identity of objects, but also perceive their rich three-dimensional (3D) structure. Critically, this richness of perception is not brittle: Our percepts may degrade under unusual viewing conditions, but they do so gracefully, remaining far above chance, even when the best computer vision systems fail. What renders human perception so distinct—with efficiently inferred, rich representations that are nevertheless robust? Here, we present a new computational architecture of visual perception, Efficient Differentiable Inverse Graphics (EDIG), that integrates discriminative and generative computations to achieve fast and robust inferences of rich 3D scenes. In a bottom-up pass, EDIG uses a discriminatively trained deep neural network (DNN) to initialize a percept by mapping an observed real-world image to its underlying 3D scene. Crucially, EDIG can further refine this initial estimate via iterative, optimization-based inference over a differentiable graphics-based generative model. In a case study of face perception, we train EDIG on a dataset of upright face images, to learn to map these images to 3D scenes in a weakly supervised fashion. We also train an architecture-matched DNN with a standard supervised classification objective, using the same training dataset. We test EDIG, EDIG’s bottom-up component, and this alternative on a behavioral dataset of 2AFC identity-matching tasks—with upright and inverted face conditions—consisting of 1560 unique trials per condition. We show that although EDIG and bottom-up only alternatives match average human accuracy on upright faces, only EDIG achieves human-level accuracy on inverted faces. Moreover, EDIG explains significantly more variance in trial-level human accuracy levels than alternatives. EDIG and humans also match qualitatively, both requiring extended processing to match inverted faces, relative to upright faces. These results suggest that human face perception integrates discriminative and generative computations, and provide a blueprint for building humanlike perception systems.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×