Abstract
Previous studies investigating semantic representations in the human brain present conflicting results regarding the localisation of such representations and whether they are distributed or modular. This is possibly due to the limited sensitivity and/or limited stimulus samplings of past experiments. We conducted a large-scale, high-field fMRI experiment (7T, whole-brain, gradient-echo EPI, 1.8 mm, 1.6 s) in which 8 subjects each viewed 9,000–10,000 colour natural scenes (taken from Microsoft COCO) while fixating and performing a continuous recognition task. We exploited this unprecedented dataset, termed the Natural Scenes Dataset (NSD), to investigate semantic representations in the brain using representational similarity analysis (RSA). Capitalising on the large array of naturalistic scenes, we obtained human-derived semantic labelings (sentence descriptions) from COCO and applied smooth inverse frequency sentence embeddings to construct semantic representational dissimilarity matrices (RDMs). We then used searchlight-based RSA in NSD to identify brain regions whose representation closely matches the semantic RDM. Significance was assessed using permutation testing. In each subject, we measured highly significant and reliable (t-values reaching 88) semantic representations in a highly distributed network including the ventral visual stream, the angular gyrus and temporoparietal junction, the medial and anterior temporal lobes (anterior hippocampus, perirhinal cortex), and the inferior frontal gyrus. In a separate analysis, we constructed RDMs based on a recurrent convolutional neural network (RCNN) model of object recognition and found that these RDMs mapped to a strikingly similar network of brain regions, albeit with weaker correlations. The immense scale of NSD allows precise quantification of semantic representations in individual subjects across the entire human brain. The promising performance levels achieved by the RCNN indicates that the NSD dataset can support further detailed investigations of the types of semantic brain representations that can and cannot be captured by current computational models of visual scene interpretation.