December 2022
Volume 22, Issue 14
Open Access
Vision Sciences Society Annual Meeting Abstract  |   December 2022
Interpretable mid-level encoding models of human visual cortex reveal associations between feature and semantic tuning for natural scene images
Author Affiliations & Notes
  • Margaret Henderson
    Machine Learning Department, Carnegie Mellon University
    Neuroscience Institute, Carnegie Mellon University
    Psychology Department, Carnegie Mellon University
    Center for the Neural Basis of Cognition (CNBC), Carnegie Mellon University
  • Michael Tarr
    Neuroscience Institute, Carnegie Mellon University
    Psychology Department, Carnegie Mellon University
    Center for the Neural Basis of Cognition (CNBC), Carnegie Mellon University
  • Leila Wehbe
    Machine Learning Department, Carnegie Mellon University
    Neuroscience Institute, Carnegie Mellon University
    Center for the Neural Basis of Cognition (CNBC), Carnegie Mellon University
  • Footnotes
    Acknowledgements  This work was supported by the Carnegie Mellon Neuroscience Institute. Collection of the NSD dataset was supported by NSF IIS-1822683 and NSF IIS-1822929.
Journal of Vision December 2022, Vol.22, 4118. doi:https://doi.org/10.1167/jov.22.14.4118
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Margaret Henderson, Michael Tarr, Leila Wehbe; Interpretable mid-level encoding models of human visual cortex reveal associations between feature and semantic tuning for natural scene images. Journal of Vision 2022;22(14):4118. https://doi.org/10.1167/jov.22.14.4118.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Populations of neurons in the ventral visual stream show preferential activation for specific categories, such as faces and buildings, as well as tuning for low- and mid-level visual features, such as spatial frequency, orientation, curvature, and color. Because these visual features tend to co-vary with semantic content in the statistics of natural images, one hypothesis is that the visual system utilizes the extraction of lower-level visual features as a mechanism for separating the representations of images with different high-level semantic meaning. Here, we investigate this question using a publicly available fMRI dataset in which participants (n=8) viewed a large number of naturalistic scene images (Natural Scenes Dataset; Allen et al., 2021). We constructed several voxel-wise encoding models that explicitly model sets of low- and mid-level visual features, including a Gabor model, a model of texture statistics based on a steerable pyramid representation (Portilla & Simoncelli, 2000), a contour model (Sketch Tokens; Lim, Zitnick, & Dollar, 2013), as well as a semantic model based on high-level image properties (e.g. animacy). Our encoding models were able to accurately predict held-out voxel responses in a range of early and high-level visual cortical areas, and exhibited a substantial amount of shared variance with AlexNet, a deep neural network (DNN) model that has commonly been used to model ventral stream areas. In addition, the high degree of interpretability of our model permitted us to investigate voxels’ selectivity for particular feature values, and how these feature preferences relate to the semantic information carried by each visual feature. Overall, our results suggest a framework in which the low- and mid-level feature tuning of visual cortical populations supports the separation of images according to their semantic meaning, and this separation increases with progressive stages of processing in the ventral visual stream.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×