Abstract
In vision science, a topic of great interest and controversy is the processing of objects that are (in)consistent with the overall meaning of the scene in which they occur. Two interrelated questions are at the core of this debate. First, how much time is needed to determine the meanings and identities of objects in a scene, and second, whether foveal vision is required to access object-scene semantics. Results from eye-movement studies have been mixed, ranging from rapid processing of semantic information in peripheral vision to a lack of evidence for extrafoveal semantic processing. Here, we aimed at contributing to this debate by co-registering eye-movements and EEG during unconstrained viewing of naturalistic scenes. Participants inspected photographs of real-world scenes (e.g. a bathroom) in which the consistency of a target object (e.g. toothpaste vs. flashlight) was systematically manipulated in preparation of a later change detection task. The EEG was analyzed using linear deconvolution modelling to examine the fixation-related brain potentials (FRPs) evoked by the first fixation on the target object (t), the preceding fixation (t-1), as well as all other non-target fixations on the scene. The eye-movement data showed that inconsistent objects were prioritized over consistent objects (looked at earlier) and were more effortful to process (looked at longer). The FRPs demonstrated that the frontocentral N400 effect for scenes generalizes from steady-fixation paradigms to natural viewing. Crucially, this fixation-related N400 initiated during the pre-target fixation t-1 and continued throughout fixation t, which suggests that the extraction of object semantics already began in extrafoveal vision. However, there was no significant effect of consistency on the probability of immediate target fixation and on the ERP aligned to scene onset, which suggests that object-scene semantics was not immediately accessed. Taken together, these results emphasize the usefulness of combined EEG/eye-movement recordings for understanding high-level scene processing.
Acknowledgement: This research was supported by the Leverhulme Trust (award number: ECF-014-205) and Fundação para a Ciência e Tecnologia (award number: PTDC/PSI-ESP/30958/2017) to MIC