Abstract
Building quantitative models of neural activity in the visual system is a long-standing goal in neuroscience. Though this research program is fundamentally limited by the small scale and low signal-to-noise of most existing datasets, with the advent of large-scale datasets it has become possible to build, test, and discriminate increasingly expressive competing models of neural representation. In this talk I will describe how the scale of the 7T fMRI Natural Scenes Dataset (NSD) has made possible novel insights into the mechanisms underlying scene perception. We harnessed recent advancements in linguistic artificial intelligence to construct models that capture progressively richer semantic information, ranging from object categories to word embeddings to scene captions. Our findings reveal a positive correlation between a model's capacity to capture semantic information and its ability to predict NSD data, a feature then replicated with recurrent convolutional networks trained to predict sentence embeddings from visual inputs. This collective evidence suggests that the visual system, as a whole, is better characterized by an aim to extract rich semantic information rather than merely cataloging object inventories from visual inputs. Considering the substantial power of NSD, collecting additional neuroimaging and behavioral data using the same image set becomes highly appealing. We are expanding NSD through the development of two innovative datasets: an electroencephalography dataset called NSD-EEG, and a mental imagery vividness ratings dataset called NSD-Vividness. Datasets like NSD not only provide fresh insights into the visual system but also inspire the development of new datasets in the field.