Purchase this article with an account.
Gregory Zelinsky, Chen-Ping Yu; Modeling categorical search guidance using a convolutional neural network designed after the ventral visual pathway. Journal of Vision 2017;17(10):88. doi: 10.1167/17.10.88.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
Category-consistent features (CCFs) are those features occurring both frequently and consistently across the exemplars of an object category. Recently, we showed that a CCF-based generative model captured the overt attentional guidance of people searching for name-cued target categories from a 68-category subordinate/basic/superordinate-level hierarchy (Yu, Maxfield, & Zelinsky, 2016, Psychological Science). Here we extend this work by selecting CCFs for the same 68 target categories using an 8-layer Convolutional Neural Network (CNN) designed to reflect areas, receptive field (kernel) sizes, and bypass connections in the primate ventral stream (VsNet). We replicated our previously-reported finding that the number of CCFs, averaged over categories at each hierarchical level, explains the subordinate-level guidance advantage observed in gaze time-to-target. However, we now show stronger guidance to individual categories for which more CNN-CCFs were extracted (r=0.29, p=0.01). We also found that CCFs extracted from VsNet's V1-V2 layers were most important for guidance to subordinate-cued targets (police cars); CCFs from TEO-V4 and V4-TE contributed most to guidance at the basic (car) and superordinate (vehicle) levels, respectively. This pattern suggests a broad coding of priority throughout the ventral stream that varies with cue specificity; early visual areas contribute more to subordinate-level guidance while less specific cues engage later areas coding representative parts of the target category. Converging evidence for this suggestion was obtained by finding the image patches eliciting the strongest filter responses and showing that these units from areas V4 and higher had receptive fields tuned to highly category-specific parts (police car sirens). VsNet also better captured observed attentional guidance behavior, and achieved higher classification accuracy, than comparable CNN models (AlexNet, Deep-HMAX), despite VsNet having fewer convolutional filters. We conclude that CCFs are important for explaining search guidance, and that the best model for extracting CCFs is a deep neural network inspired by the primate visual system.
Meeting abstract presented at VSS 2017
This PDF is available to Subscribers Only