Despite the substantial influence of the saliency-map approach on research in scene perception, it is well established that the semantic content of a scene and the viewer's task also influence viewing (Buswell,
1936; Yarbus,
1967). Indeed, when directly tested, image salience often does a poor job of accounting for attention in real-world scene viewing (Einhäuser, Rutishauser, & Koch,
2008; Henderson, Brockmole, Castelhano, & Mack,
2007; Henderson, Malcolm, & Schandl,
2009; Tatler, Hayhoe, Land, & Ballard,
2011; Underwood, Foulsham, & Humphrey,
2009). To account for these observations, cognitive-guidance models place primary emphasis on cognitive control of attention. In this view, attention is pushed by the cognitive system to scene regions that are semantically informative and cognitively relevant in the current situation (Henderson,
2007). For example, in the cognitive-relevance model (Henderson et al.,
2007; Henderson et al.,
2009), attention is guided by semantic representations that code the meaning of the scene and its local regions (objects, surfaces, and other interpretable entities) with respect to the viewer's current goals and task (Buswell,
1935; Hayhoe & Ballard,
2005; Hayhoe, Shrivastava, Mruczek, & Pelz,
2003; Henderson,
2003;
Henderson, 2007;
Henderson, 2017; Henderson & Hollingworth,
1999; Land & Hayhoe,
2001; Rothkopf, Ballard, & Hayhoe,
2007; Tatler et al.,
2011; Turano, Geruschat, & Baker,
2003; Võ & Wolfe,
2013; Yarbus,
1967). The cognitive-relevance model posits that the representations used to assign task relevance and meaning for attentional priority encode knowledge about the world itself (world knowledge) as well as knowledge about the general scene concept (scene schema knowledge) and the current scene instance (episodic scene knowledge; Henderson & Ferreira,
2004; Henderson & Hollingworth,
1999).