October 2020
Volume 20, Issue 11
Open Access
Vision Sciences Society Annual Meeting Abstract  |   October 2020
Deep Neural networks are “doing the categorization task,” but how?
Author Affiliations & Notes
  • Christoph Daube
    University of Glasgow
  • Tian Xu
  • Jiayu Zhan
  • Andrew Webb
  • Robin A A Ince
  • Oliver G B Garrod
  • Philippe Schyns
  • Footnotes
    Acknowledgements  This work was funded by a Multidisciplinary University Research Initiative/Engineering and Physical Sciences Research Council grant (USA, UK; 172046-01) awarded to PGS.
Journal of Vision October 2020, Vol.20, 1411. doi:https://doi.org/10.1167/jov.20.11.1411
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Christoph Daube, Tian Xu, Jiayu Zhan, Andrew Webb, Robin A A Ince, Oliver G B Garrod, Philippe Schyns; Deep Neural networks are “doing the categorization task,” but how?. Journal of Vision 2020;20(11):1411. https://doi.org/10.1167/jov.20.11.1411.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

Deep neural networks (DNNs) are appealing as models of human vision because they categorize complex images with human-like performance — i.e. they “do the categorization task” with unprecedented accuracy. However, there is a considerable gap between doing the task with similar performance and doing it like the human brain, using similar stimulus representations. Thus, before evaluating the potential contribution of DNNs as models of the brain, we first need to ensure that they perform the same task with the same features (e.g. to categorize a happy face with its diagnostic mouth and wrinkled eyes). Here, we exploited the tight experimental control over such stimulus features afforded by an interpretable and realistic generative model of face information (GMF, SuppMatA), which generated 3.5M images with varying categorical factors (2,004 identities, 2 genders, 3 ages, 2 ethnicities and 7 expressions of emotion, and 3 vertical and horizontal angles of pose and illumination). We trained (> 99% accuracy) two ResNet10s to classify 2,004 identities across image variations (ResNetId) plus the multiple other factors of the GMF causing these variations (ResNetMulti). Following training, we reverse correlated the internal representations of 4 familiar faces in humans and the ResNet models. All faithfully (and unfaithfully) represented the 4 identities using similar shape features (e.g. a forehead or chin, SuppMatC&D). However, face noise testing (i.e. GMF shape vs. texture noise, SuppMatB) revealed that ResNetId generalization behavior was hyposensitive to shape but hypersensitive to texture, whereas ResNetMulti was less extreme. These generalization biases were reflected in the hidden layers, as shown in a comprehensive analysis of activation variance in each layer up to decision (SuppMatE). In sum, we enhanced the understanding of how DNNs “do the task” by first establishing a similarity of decision features with humans, before tracing how the DNNs’ layers represent the factors of the GMF.


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.