Abstract
During visual search, selective attention shifts the representation of visual semantic information towards the attended category (Cukur et al. 2013). Here we sought to determine whether semantic tuning shifts occur intrinsically during naturalistic tasks. We used fMRI to record brain activity from five participants while they performed a taxi-driver task in a large virtual world (110-180 minutes of data per subject). Subjects freely viewed the stimulus while eye movements were recorded at 60 Hz. The video game engine provided ground-truth semantic segmentation of video frames. Attended semantics were operationally defined as the semantic content of the video frame within 2.5° of fixation, which was used as a proxy for top-down attention. Global semantics were operationally defined as the semantic content of the entire video frame. Banded ridge regression (Nunez-Elizalde et al., 2019, Dupré la Tour et al., 2022) was used to estimate voxelwise encoding models simultaneously for attended semantics and for global semantics, along with 31 additional feature spaces that captured other aspects of the taxi-driver task. A held-out dataset was used to test statistical significance, prediction accuracy, and generalization. The participants’ visual behavior caused the semantic content of attended semantics to be dramatically different from the content of global semantics. Analysis of voxelwise encoding models show that attended semantics accounts for 12 ± 4% (mean ± std across subjects) of the total explained variance in well-predicted voxels. In contrast, global semantics accounts for only 1 ± 0.2% of the total explained variance. These results suggest that visual behavior in a naturalistic task focusses attention on task-relevant semantic categories, and that visual semantic representations in the brain are further biased to favor task-relevant categories at the expense of categories that are not important for the task. Thus, attentional effects must be accounted for in studies of naturalistic tasks.