August 2023
Volume 23, Issue 9
Open Access
Vision Sciences Society Annual Meeting Abstract  |   August 2023
The role of scene context in object recognition by humans and convolutional neural networks
Author Affiliations & Notes
  • Haley G. Frey
    Vanderbilt University
  • Hojin Jang
    Vanderbilt University
  • Hui-Yuan Miao
    Vanderbilt University
  • Frank Tong
    Vanderbilt University
  • Footnotes
    Acknowledgements  Supported by NIH grant R01EY029278
Journal of Vision August 2023, Vol.23, 5698. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Haley G. Frey, Hojin Jang, Hui-Yuan Miao, Frank Tong; The role of scene context in object recognition by humans and convolutional neural networks. Journal of Vision 2023;23(9):5698.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

It is rare that humans are required to recognize objects without a surrounding context. Previous research has shown that modifying the scene information can decrease the speed and accuracy of object recognition in human observers. Although convolutional neural networks (CNNs) can attain near human-level performance on simple object recognition tasks, it remains unclear whether these models of biological vision continue to reflect human abilities when objects occur in complex scenes. Here, we investigated the impact of visual clutter and semantic incongruence on object recognition accuracy in humans and CNNs. Eighteen undergraduate students and four CNNs implemented with Pytorch were shown 384 greyscale images consisting of a target object superimposed on a background scene. We manipulated the level of visual clutter, defined as how much texture, pattern, or excess information is in an image, and the semantic congruency, defined as whether the object-scene pairing was realistic. The eight target categories consisted of animals (bear, bison, elephant, owl) and common indoor objects (lamp, teapot, vacuum, vase), which were presented in either outdoor nature scenes or indoor scenes. The scenes were rated on their degree of clutter by separate participants and sorted into low or high clutter scenes. We found that human observers performed significantly worse with increased clutter, yet CNN performance was unaffected by clutter. Interestingly, the CNNs showed significantly better classification accuracy for congruent than incongruent object-scene pairings while the human observers did not. However, human participants did show a congruency bias effect, choosing a congruent category over an incongruent category in a significant portion of trials where they reported low confidence. Our findings reveal notable deviations between human and CNN object classification performance and indicate that CNN models do not process background scene context in the same way that humans do.


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.