September 2024
Volume 24, Issue 10
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2024
Building better models of biological vision by searching for more ecological data diets and learning objectives
Author Affiliations & Notes
  • Drew Linsley
    Brown University
  • Akash Nagaraj
    Brown University
  • Alekh Ashok
    Brown University
  • Francis Lewis
    Brown University
  • Peisen Zhou
    Brown University
  • Thomas Serre
    Brown University
  • Footnotes
    Acknowledgements  This work was supported by ONR (N00014-19-1-2029), NSF (IIS-1912280 and EAR-1925481), DARPA (D19AC00015), and NIH/NINDS (R21 NS 112743), Cloud TPU hardware through the TensorFlow Research Cloud (TFRC) program as well as computing hardware supported by NIH Office of the Director grant S10OD025181.
Journal of Vision September 2024, Vol.24, 1493. doi:https://doi.org/10.1167/jov.24.10.1493
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Drew Linsley, Akash Nagaraj, Alekh Ashok, Francis Lewis, Peisen Zhou, Thomas Serre; Building better models of biological vision by searching for more ecological data diets and learning objectives. Journal of Vision 2024;24(10):1493. https://doi.org/10.1167/jov.24.10.1493.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

The many successes of deep neural networks (DNNs) over the past decade have been driven by data and computational scale rather than biological insights. However, as DNNs have continued to improve on benchmarks like ImageNet, they have worsened as models of biological brains and behavior. For instance, recent DNNs with human-level object classification accuracy are no better at predicting human perception or image-evoked responses in primate inferotemporal (IT) cortex than DNNs from a decade ago (e.g., Linsley et al., 2023). Here, we build better DNN models of biological vision by finding data diets and objective functions that more closely resemble those that shape biological brains. We began by building a platform for searching through naturalistic data diets and objective functions for training a standardized DNN architecture at scale. Each DNN’s data diet was sampled from our rendering engine, which generates life-like videos of objects in real-world scenes. In parallel, each model’s objective function was sampled from a parametrized space of image reconstruction objectives, which made it possible to train models to learn combinations of causal and acausal recognition strategies over space or space and time. We evaluated the ability of hundreds of DNNs trained on this platform to predict human performance on a novel “Greebles” object recognition task (Ashworth et al., 2008). We found that DNNs trained to capture the causal structure of data were significantly more predictive of human decisions and reaction times than any other DNN tested. Moreover, these causal DNNs learned strong equivariance to out-of-plane variations in pose, recapitulating classical theory on the foundations of object constancy (Sinha & Poggio,1996) despite no explicit constraints to do so. Our work identifies key limitations in how DNNs are trained today and introduces a better approach for building DNN-based models of human vision that can ultimately advance perceptual science.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×