September 2021
Volume 21, Issue 9
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2021
Quantifying the latent semantic content of visual representations
Author Affiliations
  • Chihye Han
    Department of Cognitive Science, Johns Hopkins University
  • Caterina Magri
    Department of Cognitive Science, Johns Hopkins University
  • Michael Bonner
    Department of Cognitive Science, Johns Hopkins University
Journal of Vision September 2021, Vol.21, 2578. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Chihye Han, Caterina Magri, Michael Bonner; Quantifying the latent semantic content of visual representations. Journal of Vision 2021;21(9):2578.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

How does visual cortex extract semantic meaning from images? We hypothesize that visual cortex leverages the natural covariance between perceptual features and semantic properties and that it does so by representing perceptual features that support the efficient decoding of large numbers of semantic properties. Using convolutional neural networks (CNNs) and word embeddings (e.g., word2vec), we developed a statistical measure called semantic dimensionality that quantifies the number of language-derived semantic properties that can be decoded from a set of image-computable perceptual features. We combined this method with fMRI encoding models to estimate the semantic dimensionality of perceptual-feature tuning in the ventral visual stream. We first fit image-computable encoding models to object-evoked fMRI responses using mid-level features from pre-trained CNNs. Encoding-model performance reached the noise ceiling and thus fully accounted for explainable variance throughout much of occipitotemporal cortex. We then performed in silico experiments on the trained encoding models in regions of interest to estimate their semantic dimensionality. We found that encoding models of the ventral stream were tuned to subspaces of CNNs that yielded higher semantic dimensionality scores than other statistically-matched subspaces of the same CNNs, and we found that semantic dimensionality increased from low-level to high-level regions in the cortical visual hierarchy. Thus, the efficiency of decoding language-derived semantic properties from perceptual features increases along the ventral stream and far exceeds what would be expected by chance. In ongoing work, we are investigating whether tuning for semantic dimensionality develops separately or comes “for free” when optimizing for object recognition. Together, our approach provides a rigorous statistical assessment of the latent semantic content of perceptual feature representations, and, in doing so, it may help to resolve longstanding questions about the nature of perceptual and semantic tuning in visual cortex.


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.