Eye movements are also influenced by many other cognitive factors not predicted by feature saliency. For example, in the presence of a task, eye movements depend on the task demands and the observer's internal goals (Buswell,
1935; Hayhoe & Ballard,
2005; Land,
2009; Land & Hayhoe,
2001; Noton & Stark,
1971; Rothkopf et al.,
2007; Turano, Geruschat, & Baker,
2003; Yarbus,
1967). Contextual knowledge based on the co-occurrence of objects (e.g., a plate on a dining table) and semantic content of the scene can facilitate the selection of attentional targets and bias gaze strategy (Eckstein, Drescher, & Shimozaki,
2006; Henderson et al.,
1999; Neider & Zelinsky,
2006; Torralba et al.,
2006). In fact, it has been argued that bottom-up saliency does not necessarily drive eye movements causally, as the local image statistics underlying saliency are also correlated with higher level scene content (such as semantic informativeness; Einhauser & Konig,
2003; Einhäuser, Spain, & Perona,
2008; Henderson, Brockmole, Castelhano, & Mack,
2007). Several models for predicting fixation locations incorporate both bottom-up and top-down elements. In some implementations, saliency maps are selectively modulated by information that reflect top-down control or prior expectations (e.g., about the location or features of a target object), based on knowledge of a task or an understanding of scene gist (Navalpakkam & Itti,
2005; Oliva, Torralba, Castelhano, & Henderson,
2003; Peters & Itti,
2007; Torralba et al.,
2006). In other implementations, a probabilistic model learns preattentive targets from scene statistics, therefore combining both bottom-up saliency and top-down biases (Butko, Zhang, Cottrell, & Movellan,
2008; Kanan, Tong, Zhang, & Cottrell,
2009; Yamada & Cottrell,
1995; Zhang, Tong, & Cottrell,
2009). Alternatively, some models integrate both saliency and top-down information at the level of object representation (Sun, Fisher, Wang, & Gomes,
2008; Wischnewski, Belardinelli, Schneider, & Steil,
2010), reflecting the hypothesis that the “proto-object” (i.e., the position and a cluster of features relevant to an object) represents the basic unit for prioritizing attention (Einhäuser et al.,
2008; Hollingworth & Henderson,
2002; Scholl,
2001). The combination of bottom-up and top-down information outperforms purely bottom-up models when fixations are of immediate behavioral relevance, such as during search tasks (Kanan et al.,
2009; Navalpakkam & Itti,
2005; Oliva et al.,
2003; Torralba et al.,
2006) or tasks involving interactive viewing (e.g., video game playing; Peters & Itti,
2007). Finally, socially relevant cues not predicted by saliency models, such as faces, gaze direction, and body movement, also serve as powerful predictors of eye movements (Birmingham et al.,
2008; Friesen & Kingstone,
1998; Shepherd et al.,
2010).