September 2017
Volume 17, Issue 10
Open Access
Vision Sciences Society Annual Meeting Abstract  |   August 2017
Comparing human and convolutional neural network performance on scene segmentation
Author Affiliations
  • Noor Seijdel
    Brain and Cognition, Psychology, University of Amsterdam
  • Max Losch
    Brain and Cognition, Psychology, University of Amsterdam
  • Edward De haan
    Brain and Cognition, Psychology, University of Amsterdam
  • Steven Scholte
    Brain and Cognition, Psychology, University of Amsterdam
Journal of Vision August 2017, Vol.17, 1344. doi:10.1167/17.10.1344
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Noor Seijdel, Max Losch, Edward De haan, Steven Scholte; Comparing human and convolutional neural network performance on scene segmentation. Journal of Vision 2017;17(10):1344. doi: 10.1167/17.10.1344.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

The most recent variations of convolutional neural networks (ConvNets) have managed to match and surpass human performance on classification of objects in images (Russakovsky et al., 2015; He et al., 2015). An open question remains whether humans and ConvNets process visual information in a similar fashion. It is known that humans perform object recognition best under certain conditions: e.g. when the object is shown in the canonical view (often three-quarter view) and when the object is presented on a homogenous background. In the current study we compare human and computer model performance on those different levels. Carrying forward the object classification task, we manipulate the relationship between object and background by presenting 3D models of objects A) in isolation, B) with a congruent background and C) with an incongruent background. Finally, we manipulate the viewpoint by presenting these 3D models of objects in different angles. We evaluate the performance of 40 human subjects with that of ConvNets with different depth and complexity. Preliminary results indicate an important, implicit, function of depth in CNN's to segregate the object from the scene. Overall, comparing the performance of humans and computer models on these more specific or detailed tasks will give a more fine-grained view of the similarity between both and could link more cognitive descriptions of behavior to ConvNets.

Meeting abstract presented at VSS 2017

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×