A large body of studies deals with the allocation of gaze as measure of so-called “overt” attention. In a typical setting, dating at least back to Buswell's (
1935) seminal work, observers are shown photographs on paper or on a computer screen while their eye-position is tracked. Many studies show that fixation probability on natural images correlates with low-level features such as luminance contrast (Reinagel & Zador,
1999), edge density (Baddeley & Tatler,
2006; Mannan, Ruddock, & Wooding,
1996,
1997), and texture contrast (Einhäuser & König,
2003; Parkhurst & Niebur,
2004). Bottom-up models of attention such as Koch and Ullman's (
1985) saliency map can, to some extent, predict gaze allocation exclusively by stimulus features (Parkhurst, Law, & Niebur,
2002; Peters, Iyer, Itti, & Koch,
2005). The
causal role of low-level features in guiding attention has, however, been challenged (Einhäuser & König,
2003), suggesting a decisive role of higher-order correlations (Krieger, Rentschler, Hauske, Schill, & Zetzsche,
2000), contextual cues (Torralba, Oliva, Castelhano, & Henderson,
2006), and objects such as faces (Cerf, Harel, Einhäuser, & Koch,
2008). Along similar lines, informative image regions are preferentially fixated (Kayser, Nielsen, & Logothetis,
2006), and interesting objects in turn correlate with low-level saliency in natural scenes (Elazary & Itti,
2008). Similarly, early studies already demonstrated the importance of task on eye movements (Buswell,
1935; Yarbus,
1967), to an extent that bottom-up models may lose all their predictive power (Henderson, Brockmole, Castelhano, & Mack,
2006; Rothkopf, Ballard, & Hayhoe,
2007) and bottom-up cues are immediately overruled or even reversed (Einhäuser, Rutishauser, & Koch,
2008). Recently, combining bottom-up saliency with top-down biases has improved performance in the modeling of search tasks (Hamker,
2006; Navalpakkam & Itti,
2007; Pomplun,
2006; Rutishauser & Koch,
2007). Although extensions to interactive scenarios have been proposed (Peters & Itti,
2008), such models are typically tested in head-restrained laboratory settings for technical reasons, which are restricted to eye-in-head movements, involve a potentially biased choice of stimuli, and present stimuli in a limited field of view. This demands models of gaze allocation to be tested in less restrained settings.