September 2019
Volume 19, Issue 10
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2019
Modeling voxel visual selectivities through convolutional neural network clustering
Author Affiliations & Notes
  • Daniel D Leeds
    Computer and Information Sciences, Fordham University
  • Amy Feng
    Computer and Information Sciences, Fordham University
Journal of Vision September 2019, Vol.19, 115b. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Daniel D Leeds, Amy Feng; Modeling voxel visual selectivities through convolutional neural network clustering. Journal of Vision 2019;19(10):115b.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

Visual properties used in cortical perception are subject to ongoing study, and features of intermediate complexity are particularly elusive. Recent works have used layers of convolutional neural networks (CNNs) to predict cortical activity in visual regions of the brain (e.g., Yamins 2014). Understanding the visual properties captured by CNN models can suggest similar structures represented in the brain. We use layers 2 through 5 of AlexNet (Krizhevsky 2012, trained on ImageNet) to identify candidate visual groupings. At each layer, we group image patches from ImageNet (Deng 2009) based on the corresponding pattern of CNN unit responses (Leeds 2017). We study the image patches in resulting clusters for similarity in unit responses and for intuitive visual/semantic consistency, based on labels from five subjects. We additionally assess the ability of clusters to improve the performance in predicting single voxel responses to visual stimuli measured from separate subjects from Kay (2008). For each CNN layer, we use each cluster’s average unit response pattern as a candidate set of weights to predict voxel activity from activity of all CNN units. We correlate cluster-based stimulus responses with voxel responses across ventral temporal cortex. For all four CNN layers studied, cluster-based stimulus responses strongly correlate (r>0.3) with voxels in mid-level visual regions – V4, LO, and IT. Correlations are larger at higher CNN layers. Within each layer, there is significant correlation between cluster density (similarity of CNN responses to patches within the cluster) and voxel correlation magnitude. However, there is consistently less agreement on subject-reported image patch qualities for high-correlation clusters compared to patches from low-correlation clusters. Frequently occurring “properties” include texture, color, and full objects. In intermediate cortical vision, voxels may tune for complex mixtures of shade and texture properties less intuitive to human observers, but still uncovered through trained computer vision models.

Acknowledgement: Fordham University Faculty Research Grant to DDL in 2016 Clare Boothe Luce Scholarship to AF 

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.