Pioneering work on scene viewing by Buswell (
1935) and Yarbus (
1967) produced early qualitative findings that described differences in scan patterns related to the properties of the scene stimulus, the scene task, and the viewer. Since this early scene work, there has been a great deal of research investigating how image properties such as color, intensity, and orientation contribute to bottom-up scene saliency (Itti & Koch,
2000,
2001; Koch & Ullman,
1985; Torralba,
2003), and the ways in which visual saliency influences where people look in a given scene (Bruce & Tsotsos,
2009; Koehler, Guo, Zhang, & Eckstein,
2014; Parkhurst, Law, & Niebur,
2002). More recently, there has been a renewed interest in attempting to quantify the extent to which eye movements reflect top-down cognitive relevance as a function of scene viewing task. For instance, it has been shown that scene task can be predicted at above-chance levels from fixation number, fixation duration, and saccade amplitude distributions (Borji & Itti,
2014; Henderson, Shinkareva, Wang, Luke, & Olejarczyk,
2013; Kardan, Berman, Yourganov, Schmidt, & Henderson,
2015). In addition, it has been shown that target scene objects that are in semantically appropriate, low-salience locations are quickly located and fixated, whereas visually salient, but cognitively irrelevant scene locations are rarely fixated (Henderson, Malcolm, & Schandl,
2009). These findings highlight that gaze control is also strongly modulated by top-down task demands during scene viewing. Together these lines of research are both theoretically important as they identify and quantify how bottom-up stimulus properties and top-down cognitive relevance each contribute to gaze control during scene viewing (Koehler et al.,
2014; Torralba, Oliva, Castelhano, & Henderson,
2006).