September 2021
Volume 21, Issue 9
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2021
Inverse graphics explains population responses in body-selective regions of the IT cortex
Author Affiliations
  • Hakan Yilmaz
    Yale University
  • Aalap D Shah
    Yale University
  • Ariadne Letrou
    Yale University
  • Satwant Kumar
    University of Texas at Austin
  • Rufin Vogels
    KU Leuven
  • Joshua B Tenenbaum
  • Ilker Yildirim
    Yale University
Journal of Vision September 2021, Vol.21, 2409. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Hakan Yilmaz, Aalap D Shah, Ariadne Letrou, Satwant Kumar, Rufin Vogels, Joshua B Tenenbaum, Ilker Yildirim; Inverse graphics explains population responses in body-selective regions of the IT cortex. Journal of Vision 2021;21(9):2409.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

Vision does not merely detect and recognize patterns and contours, but makes rich inferences about objects and agents including their three-dimensional (3D) shapes and configurations. Current modeling approaches based on deep convolutional neural network (DCNN) classifiers can explain aspects of neural processing, but they do not address how perception can be so rich and they typically do not provide an interpretable functional account of neural computation. To address these shortcomings, we take a different approach based on “efficient inverse graphics” (EIG), instantiating the hypothesis that the goal of visual processing is to invert generative models of how 3D scenes form and project to images. Instead of classification, we use DCNNs to build inference networks that invert generative scene models. EIG meets the functional goal of quickly computing rich 3D percepts and provides an interpretable reverse-engineering account of biological computation in the language of objects and generative models. We tested this approach in body perception: Two macaques, EIG and state-of-the-art DCNN classifiers saw images of monkey bodies that varied in shape, posture and viewpoint. EIG is designed to recover these variables from the images in an articulated 3D generative model, whereas the classifiers discriminated between object identities or body postures. Using representational similarity analysis, we compared layer activations of EIG to population-level activity obtained from single-cell recordings in body-selective regions of the inferotemporal cortex. Similarity matrices arising from the EIG layers were highly correlated with the neural similarity matrices. EIG explained neural activity significantly better than the classification networks (p<.05), which additionally failed to reproduce the key qualitative patterns observed in the data. These results provide an integrated account of how in the ventral stream, raw sense inputs are transformed into percepts of objects and agents, spanning the neural and cognitive levels of analysis, through the computation of inverse graphics.


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.