When scene grammar is left intact, it can be used to guide visual search in naturalistic environments. A large body of evidence has shown that eye movements during visual search are guided by both the physical properties of an image (Bruce & Tsotsos,
2009; Itti & Koch,
2000,
2001) as well as features of the target object (Castelhano & Heaven,
2010; Malcolm & Henderson,
2009,
2010). Although models that consider both of these inputs do fairly well at predicting eye movements during visual search in random displays (Adeli, Vitu, & Zelinsky,
2016), understanding how humans search through naturalistic scenes is more nuanced. Wolfe and colleagues (Wolfe, Alvarez, Rosenholtz, Kuzmova, & Sherman,
2011) set out to define the visual set size of complex scenes—a metric common to basic visual search research—by annotating the nontarget objects within a particular scene. They found that visual search within scenes was much more efficient than search for arbitrary objects on a blank background and proposed that scene-guided attention effectively eliminated most regions from the “functional set size.” This guidance has repeatedly been shown to drive eye movements during visual search (Henderson, Malcolm, & Schandl,
2009; Torralba, Oliva, Castelhano, & Henderson,
2006; Võ & Henderson,
2010; Wolfe, Võ, Evans, & Greene,
2011; for a review see Malcolm et al.,
2016).