September 2017
Volume 17, Issue 10
Open Access
Vision Sciences Society Annual Meeting Abstract  |   August 2017
Of Human Observers and Deep Neural Networks: A Detailed Psychophysical Comparison
Author Affiliations
  • Robert Geirhos
    Neural Information Processing Group, University of Tübingen, Germany
  • David Janssen
    Neural Information Processing Group, University of Tübingen, Germany
    Graduate Training Centre of Neuroscience, University of Tübingen, Germany
  • Heiko Schütt
    Neural Information Processing Group, University of Tübingen, Germany
    Graduate Training Centre of Neuroscience, University of Tübingen, Germany
  • Matthias Bethge
    Centre for Integrative Neuroscience, University of Tübingen, Germany
    Bernstein Center for Computational Neuroscience, Tübingen, Germany
  • Felix Wichmann
    Neural Information Processing Group, University of Tübingen, Germany
    Bernstein Center for Computational Neuroscience, Tübingen, Germany
Journal of Vision August 2017, Vol.17, 806. doi:10.1167/17.10.806
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Robert Geirhos, David Janssen, Heiko Schütt, Matthias Bethge, Felix Wichmann; Of Human Observers and Deep Neural Networks: A Detailed Psychophysical Comparison. Journal of Vision 2017;17(10):806. doi: 10.1167/17.10.806.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Deep Neural Networks (DNNs) have recently been put forward as computational models for feedforward processing in the human and monkey ventral streams. Not only do they achieve human-level performance in image classification tasks, recent studies also found striking similarities between DNNs and ventral stream processing systems in terms of the learned representations (e.g. Cadieu et al., 2014, PLOS Comput. Biol.) or the spatial and temporal stages of processing (Cichy et al., 2016, arXiv). In order to obtain a more precise understanding of the similarities and differences between current DNNs and the human visual system, we here investigate how classification accuracies depend on image properties such as colour, contrast, the amount of additive visual noise, as well as on image distortions resulting from the Eidolon Factory. We report results from a series of image classification (object recognition) experiments on both human observers and three DNNs (AlexNet, VGG-16, GoogLeNet). We used experimental conditions favouring single-fixation, purely feedforward processing in human observers (short presentation time of t = 200 ms followed by a high contrast mask); additionally, we used exactly the same images from 16 basic level categories for human observers and DNNs. Under non-manipulated conditions we find that DNNs indeed outperformed human observers (96.2% correct versus 88.5%; colour, full-contrast, noise-free images). However, human observers clearly outperformed DNNs for all of the image degrading manipulations: most strikingly, DNN performance severely breaks down with even small quantities of visual random noise. Our findings reinforce how robust the human visual system is against various image degradations, and indicate that there may still be marked differences in the way the human visual system and the three tested DNNs process visual information. We discuss which differences between known properties of the early and higher visual system and DNNs may be responsible for the behavioural discrepancies we find.

Meeting abstract presented at VSS 2017

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×