Abstract
How does visual cortex extract semantic meaning from images? We hypothesize that visual cortex leverages the natural covariance between perceptual features and semantic properties and that it does so by representing perceptual features that support the efficient decoding of large numbers of semantic properties. Using convolutional neural networks (CNNs) and word embeddings (e.g., word2vec), we developed a statistical measure called semantic dimensionality that quantifies the number of language-derived semantic properties that can be decoded from a set of image-computable perceptual features. We combined this method with fMRI encoding models to estimate the semantic dimensionality of perceptual-feature tuning in the ventral visual stream. We first fit image-computable encoding models to object-evoked fMRI responses using mid-level features from pre-trained CNNs. Encoding-model performance reached the noise ceiling and thus fully accounted for explainable variance throughout much of occipitotemporal cortex. We then performed in silico experiments on the trained encoding models in regions of interest to estimate their semantic dimensionality. We found that encoding models of the ventral stream were tuned to subspaces of CNNs that yielded higher semantic dimensionality scores than other statistically-matched subspaces of the same CNNs, and we found that semantic dimensionality increased from low-level to high-level regions in the cortical visual hierarchy. Thus, the efficiency of decoding language-derived semantic properties from perceptual features increases along the ventral stream and far exceeds what would be expected by chance. In ongoing work, we are investigating whether tuning for semantic dimensionality develops separately or comes “for free” when optimizing for object recognition. Together, our approach provides a rigorous statistical assessment of the latent semantic content of perceptual feature representations, and, in doing so, it may help to resolve longstanding questions about the nature of perceptual and semantic tuning in visual cortex.