August 2023
Volume 23, Issue 9
Open Access
Vision Sciences Society Annual Meeting Abstract  |   August 2023
Invariant object recognition in deep neural networks: impact of visual diet and learning goals
Author Affiliations & Notes
  • Haider Al-Tahan
    Western University
    Meta AI
  • Farzad Shayanfar
  • Ehsan Tousi
    Western University
  • Marieke Mur
    Western University
  • Footnotes
    Acknowledgements  We thank Jeremy Schwartz and Seth Alter for assisting in resolving various issues inregard to ThreeDWorld
Journal of Vision August 2023, Vol.23, 5979. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Haider Al-Tahan, Farzad Shayanfar, Ehsan Tousi, Marieke Mur; Invariant object recognition in deep neural networks: impact of visual diet and learning goals. Journal of Vision 2023;23(9):5979.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

Invariant object recognition is a hallmark of human vision. Humans recognize objects across a wide range of rotations, positions, and scales. A good model of human object recognition should, like humans, be able to generalize across real-world object transformations. Deep neural networks are currently the most popular computational models of the human ventral visual stream. Prior studies reported that these models show signatures of invariant object recognition but showed mixed results on how closely the models match human performance. Inconsistencies across studies in the ability of deep neural networks to recognize objects across transformations may be due to differences in the tested model architectures or training regimes. Here we test object recognition performance for different families of pretrained feedforward deep neural networks across object rotation, position, and scale. We included 95 models and defined model families based on three dimensions: model architecture, visual diet, and learning objective. Along the model architecture dimension, we tested convolutional neural networks and visual transformers. For each architecture, we tested models trained on relatively poor and relatively rich visual diets, ranging from 1.2 to 14 million training images, and models trained with supervised and unsupervised learning objectives. We created test images using ThreeDWorld, a 3D virtual world simulation platform that includes 583 3D objects from 58 ImageNet categories. We found that all tested model families show a drop in object recognition performance after applying object transformations, with lowest performance for object rotation and scale. Model architecture did not noticeably affect model performance, but models trained with rich visual diets and unsupervised generative learning objectives outperformed the other model families in our set. Our results suggest that, while different models agree on which object transformations are most challenging, visual diet and learning goals affect their ability to match human performance at invariant object recognition.


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.