August 2023
Volume 23, Issue 9
Open Access
Vision Sciences Society Annual Meeting Abstract  |   August 2023
Unsupervised contrastive learning and supervised classification training have opposite effects on the human-likeness of CNNs during occluded object processing
Author Affiliations & Notes
  • David Coggan
    Vanderbilt University
  • Frank Tong
    Vanderbilt University
  • Footnotes
    Acknowledgements  This research was supported by grants from the National Institutes of Health R01-EY029278 (to F.T.) and P30-EY008126 to the Vanderbilt Vision Research Center (Director Dr. Calkins).
Journal of Vision August 2023, Vol.23, 5448. doi:https://doi.org/10.1167/jov.23.9.5448
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      David Coggan, Frank Tong; Unsupervised contrastive learning and supervised classification training have opposite effects on the human-likeness of CNNs during occluded object processing. Journal of Vision 2023;23(9):5448. https://doi.org/10.1167/jov.23.9.5448.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Human observers can readily perceive and recognize visual objects even when occluding stimuli obscure much of the object from view. By contrast, state-of-the-art convolutional neural network (CNN) models of primate vision perform poorly at classifying occluded objects (Coggan and Tong, VSS, 2022). A key difference between biological and artificial visual systems is how they learn from visual examples. CNNs are typically trained using supervised methods to classify images by object category based on labelled data. By contrast, humans learn about objects with a broader range of learning objectives and fewer opportunities for supervised feedback. Here, we asked whether a more naturalistic approach to training CNNs might yield more occlusion-robust models that better predict human neural and behavioural responses to occluded objects. To address this question, we trained an array of CORnet-S model instances with either supervised classification or unsupervised contrastive learning. We also augmented the standard ImageNet dataset by superimposing artificial occluders onto the images. The contrastive learning objective was to produce similar unit activations in the highest layer to differently occluded instances of the same underlying object image. Once training was complete, each model was tested for occlusion robustness and compared to human behavioural and neural responses to occluded objects. For supervised models, we found that training on the occluded dataset led to substantial improvements in classification accuracy for novel occluded objects, relative to the standard dataset. Despite these improvements, occlusion-trained models performed worse at predicting both human behavioural and neural responses to occluded objects, suggesting that these supervised models learned a different type of occlusion-robust mechanism. By contrast, the layer-wise activity patterns found in the unsupervised, contrastively trained models exhibited stronger occlusion robustness and greater human-likeness than any other model, suggesting that human robustness to occlusion may be attributable in part to a natural, unsupervised visual learning environment.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×