There has been ongoing debate concerning how quickly we can detect and process objects that do not fit the global gist of a scene, and whether initial eye movements can be modulated by the computation of such object–scene inconsistencies. Ever since the influential “octopus in farmyard” study by Loftus and Mackworth (
1978) showed that scene inconsistencies can be detected early enough to affect initial eye movements, various research groups have either been able to replicate this finding (e.g., Becker, Pashler, & Lubin,
2007; Bonitz & Gordon,
2008; Underwood & Foulsham,
2006; Underwood, Humphreys, & Cross,
2007; Underwood, Templeman, Lamming, & Foulsham,
2008) or have found evidence that argues against an early impact of scene inconsistencies on eye movement control (e.g., De Graef, Christiaens, & d'Ydewalle,
1990; Gareze & Findlay,
2007; Henderson, Weeks, & Hollingworth,
1999; Rayner, Castelhano, & Yang,
2009). The debate is based on the paradox that the gist of a scene can be perceived within a very short glance, while in the same amount of time only a few objects can be identified (e.g., Castelhano & Henderson,
2005; Henderson & Hollingworth,
1999a,
2003; Tatler, Gilchrist, & Rusted,
2003). The question therefore is whether object–scene inconsistencies can influence eye movement control prior to foveal processing of the inconsistent object, which due to the limitations of the visual acuity in the visual periphery would imply a semantic pop-out of such inconsistencies independent of foveal processing. Thus, finding an effect of object–scene inconsistency on early eye movements during scene viewing would argue for an attraction of attention and gaze without the need of full object identification, while only finding effects upon fixation of an inconsistent object would strengthen the claim that in order to compute the inconsistency between the object and the scene a higher degree of object identification by means of foveal processing is necessary.