September 2021
Volume 21, Issue 9
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2021
Unsupervised object learning explains face but not animate category structure in human visual cortex
Author Affiliations & Notes
  • Ehsan Tousi
    Brain and Mind Institute, Western University
    Department of Psychology, Western University
  • Marieke Mur
    Brain and Mind Institute, Western University
    Department of Psychology, Western University
    Department of Computer Science, Western University
  • Footnotes
    Acknowledgements  This work is supported by the Natural Sciences and Engineering Research Council of Canada.
Journal of Vision September 2021, Vol.21, 2501. doi:https://doi.org/10.1167/jov.21.9.2501
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Ehsan Tousi, Marieke Mur; Unsupervised object learning explains face but not animate category structure in human visual cortex. Journal of Vision 2021;21(9):2501. https://doi.org/10.1167/jov.21.9.2501.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Deep convolutional neural networks (DCNNs) are currently the best computational models of human vision. However, DCNNs cannot fully explain the representation of natural object categories in high-level human visual cortex. DCNNs are classically trained to recognize objects using supervised learning, while humans rely heavily on unsupervised learning. Here, we test whether unsupervised learning yields an object representation that more strongly emphasizes natural categories and better explains human brain activity than supervised learning. We trained ResNet50 on the ImageNet database, using both supervised and contrastive unsupervised learning. For both types of learning, we characterized the network’s internal representation of 96 real-world object images over the course of 200 training epochs. We fitted a category model to the resulting learning trajectories using ordinary least squares to measure the strength of category clustering for faces, animate objects and inanimate objects. We then compared the networks’ learning trajectories and clustering strengths with the object representation in high-level visual cortex, measured with fMRI in human adult observers. We focused our analysis on the deepest convolutional layer and used bootstrap resampling for statistical inference. We found that the unsupervised network better explains the human object representation than the supervised network (FDR corrected p<0.05 for 80 percent of epochs). This difference emerges relatively early in training and increases as learning progresses. Better performance of the unsupervised network is partly driven by its ability to discover natural face category structure in the input images. Importantly, both supervised and unsupervised models fall short of predicting category clusters of animate and inanimate objects in the human brain data (FDR corrected p<0.05 for all epochs), suggesting that these categories are difficult to learn from static images alone. Our findings suggest that the natural category structure in the human high-level visual cortex may arise from unsupervised learning during development.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×