Humans have access to high-level scene information with very brief presentation times (Fei-Fei, Iyer, Koch, & Perona,
2007; Potter & Levy,
1969), even when attention is engaged with another task (Li, VanRullen, Koch, & Perona,
2002). This fast extraction of high-level scene information, sometimes referred to as “gist,” is purported to serve subsequent biasing of visual attention for tasks such as navigation (Greene & Oliva,
2009b), avoidance of threats (Righart & de Gelder,
2008), visual search (Castelhano & Henderson,
2007; Torralba, Oliva, Castelhano, & Henderson,
2006; Wolfe, Võ, Evans, & Greene,
2011), or providing context for more detailed visual exploration (Bar,
2004) and memory encoding (Brockmole, Castelhano, & Henderson,
2006; Chun & Jiang
1998; Hollingworth, Williams, & Henderson,
2001). Torralba et al. (
2006) explored the interaction of physical salience, global scene properties, and specific task for visual search, showing that combining the three contributions in a Bayesian framework predicts eye movements within novel scenes. In visual search tasks, both visual (Castelhano & Henderson,
2007) and semantic (Castelhano & Heaven,
2010) scene category cues do not facilitate visual search, but scene category does facilitate transfer of contextual cues between semantically related scenes (Brockmole & Võ,
2010).