September 2024
Volume 24, Issue 10
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2024
Naturalistic dataset augmentation and self-supervised learning lead to more human-like recognition of occluded objects in convolutional neural networks
Author Affiliations & Notes
  • David Coggan
    Vanderbilt University
  • Frank Tong
    Vanderbilt University
  • Footnotes
    Acknowledgements  Supported by NIH grants R01EY029278 and R01EY035157 to FT
Journal of Vision September 2024, Vol.24, 1335. doi:https://doi.org/10.1167/jov.24.10.1335
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      David Coggan, Frank Tong; Naturalistic dataset augmentation and self-supervised learning lead to more human-like recognition of occluded objects in convolutional neural networks. Journal of Vision 2024;24(10):1335. https://doi.org/10.1167/jov.24.10.1335.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Human observers can readily perceive and recognize visual objects, even when occluding stimuli obscure much of the object from view. By contrast, state-of-the-art convolutional neural networks (CNNs) perform poorly at classifying occluded objects. In previous work, we evaluated 30 humans and various CNNs on an occluded object benchmark containing nine occluder types (e.g., mud-splashes, bars, polkadots) and six visibility levels. We showed that augmenting CNN training datasets with artificial occluders led to higher classification accuracy, but a less human-like pattern of accuracy across the different occluder types. In the present study, we explored whether a human-like form of occlusion robustness could be acquired through more naturalistic modifications to the learning environment that better reflect human visual experience. First, we trained three CNNs with identical architectures to classify differently augmented ImageNet databases: unaltered (baseline), occlusion by uniformly coloured, computer-generated shapes (artificial), and occlusion by other objects extracted from photographs (natural). After training each model, we measured classification accuracy and human-likeness using the occluded object benchmark. Human-likeness was measured through both image-wise error-consistency and by correlating the profile of accuracies across conditions. Both types of occlusion training increased accuracy compared to baseline. However, natural occlusion training increased human-likeness (both measures), while artificial occlusion training showed mixed results (higher image-wise, lower condition-wise). Improvements in both accuracy and human-likeness from natural occlusion training were partially reduced when the natural occluder texture was replaced by uniform colour during training, suggesting that both the shape and texture of natural occluders play a role in human occlusion robustness. Finally, for each dataset augmentation, substituting supervised classification training with a more naturalistic, self-supervised task (contrastive learning) led to equal-or-better human-likeness. Taken together, these results indicate that occlusion-robust object recognition in humans emerges in part from unsupervised engagement with the specific forms of occlusion that occur in nature.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×