September 2017
Volume 17, Issue 10
Open Access
Vision Sciences Society Annual Meeting Abstract  |   August 2017
Modeling categorical search guidance using a convolutional neural network designed after the ventral visual pathway
Author Affiliations
  • Gregory Zelinsky
    Departments of Psychology and Computer Science, Stony Brook University
  • Chen-Ping Yu
    Department of Psychology, Harvard University
Journal of Vision August 2017, Vol.17, 88. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Gregory Zelinsky, Chen-Ping Yu; Modeling categorical search guidance using a convolutional neural network designed after the ventral visual pathway. Journal of Vision 2017;17(10):88. doi:

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

Category-consistent features (CCFs) are those features occurring both frequently and consistently across the exemplars of an object category. Recently, we showed that a CCF-based generative model captured the overt attentional guidance of people searching for name-cued target categories from a 68-category subordinate/basic/superordinate-level hierarchy (Yu, Maxfield, & Zelinsky, 2016, Psychological Science). Here we extend this work by selecting CCFs for the same 68 target categories using an 8-layer Convolutional Neural Network (CNN) designed to reflect areas, receptive field (kernel) sizes, and bypass connections in the primate ventral stream (VsNet). We replicated our previously-reported finding that the number of CCFs, averaged over categories at each hierarchical level, explains the subordinate-level guidance advantage observed in gaze time-to-target. However, we now show stronger guidance to individual categories for which more CNN-CCFs were extracted (r=0.29, p=0.01). We also found that CCFs extracted from VsNet's V1-V2 layers were most important for guidance to subordinate-cued targets (police cars); CCFs from TEO-V4 and V4-TE contributed most to guidance at the basic (car) and superordinate (vehicle) levels, respectively. This pattern suggests a broad coding of priority throughout the ventral stream that varies with cue specificity; early visual areas contribute more to subordinate-level guidance while less specific cues engage later areas coding representative parts of the target category. Converging evidence for this suggestion was obtained by finding the image patches eliciting the strongest filter responses and showing that these units from areas V4 and higher had receptive fields tuned to highly category-specific parts (police car sirens). VsNet also better captured observed attentional guidance behavior, and achieved higher classification accuracy, than comparable CNN models (AlexNet, Deep-HMAX), despite VsNet having fewer convolutional filters. We conclude that CCFs are important for explaining search guidance, and that the best model for extracting CCFs is a deep neural network inspired by the primate visual system.

Meeting abstract presented at VSS 2017


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.