August 2016
Volume 16, Issue 12
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2016
Generating the features for category representation using a deep convolutional neural network
Author Affiliations
  • Chen-Ping Yu
    Department of Computer Science
  • Justin Maxfield
    Department of Psychology
  • Gregory Zelinsky
    Department of Computer Science
Journal of Vision September 2016, Vol.16, 251. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Chen-Ping Yu, Justin Maxfield, Gregory Zelinsky; Generating the features for category representation using a deep convolutional neural network. Journal of Vision 2016;16(12):251. doi:

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

What are the visual feature representations of common object categories and how are these representations used in goal-directed behavior? Our previous work on this topic (VSS2015) introduced the idea of category-consistent features (CCFs), those maximally-informative features that appear both frequently and with low variability across category exemplars. Here we extend this idea by using a deep (6-layer) convolutional neural network (CNN) to learn CCFs. The four convolutional layers corresponded to visual processing in the ventral stream (V1, V2, V4, and IT), with the number of filters (neurons) at each layer reflecting relative estimates of neural volume in these brain areas. The CNN was trained on 48,000 images of closely-cropped objects, divided evenly into 48 subordinate-level categories. Filter responses (firing rates) for 16 basic-level and 4 superordinate-level categories were also obtained from this trained network (68 categories in total), with the filters having the least variable and highest responses indicating a given category's CCFs. We tested the CNN-CCF model against data from 26 subjects searching for targets (cued by name) from the same 68 superordinate/basic/subordinate-level categories. Category exemplars used for model training and target exemplars appearing in the search displays were disjoint sets. A categorical search task was used so as to explore model predictions of both attention guidance (time between search display onset and first fixation of the target) and object verification/detection (time between first fixating the target and the present/absent target judgment). Using the CNN-CCF model we were able to predict, not just differences in mean guidance and verification behavior across hierarchical levels, but also mean guidance and verification times across individual categories. We conclude that the features used to represent common object categories, and the behaviorally-meaningful similarity relationships between categories (effects of hierarchical level) can be well approximated by features learned by CNNs reflecting ventral stream visual processing.

Meeting abstract presented at VSS 2016


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.