August 2016
Volume 16, Issue 12
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2016
A deep neural net trained for person categorization develops both detailed local features and broad contextual specificities
Author Affiliations
  • Stella Yu
    Computer Science Department, UC Berkeley / ICSI
  • Karl Zipser
    Helen Wills Neuroscience Institute, UC Berkeley
Journal of Vision September 2016, Vol.16, 411. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Stella Yu, Karl Zipser; A deep neural net trained for person categorization develops both detailed local features and broad contextual specificities. Journal of Vision 2016;16(12):411.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

A deep convolutional neural network was trained to classify person categories from digital photographs. We studied the specific tuning properties of nodes in this network at different hierarchical levels and explored the emergent properties when the network was connected as a recurrent system to optimize its visual input with respect to various high level criteria. The network design was borrowed from the GoogLeNet architecture and trained from scratch on 584 diverse person categories (e.g., economist, violinist, grandmother, child) from the ImageNet dataset. With the trained model we used gradient descent to generate preferred stimulus images for each convolutional node class in the network. At the first convolutional layer, these preferred images were simple grating patterns, monochrome or color, varying in orientation and spatial frequency and indistinguishable from the types of preferred images for the same network architecture trained on non-human categories. Nodes in the next two convolutional layers showed increasingly complex pattern specificity, without any recognizable person-specific features. Person-specific node were apparent in all subsequent convolutional layers. In the first layers where person-specific feature arise, they appear to be generic face detectors. In subsequent layers, detailed eye, nose and mouth detectors begin to appear, as well as nodes selective for more restrictive person categories — selective for age and gender, for example. At still higher levels, some nodes became selective for complete human bodies. The larger receptive fields of these higher level nodes provide the spatial scope for increasing degrees of context specificity: in particular, there are nodes which prefer a main human figure to one or more smaller figures in the background. These modeling results emphasize the importance of both facial features and figure ensembles in recognition of person categories.

Meeting abstract presented at VSS 2016


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.