Although the neural circuitry underlying gaze shifts once a target is chosen has been intensively studied, we do not have much understanding of the control mechanisms that specify what location should be chosen as a target in the first place (Gottlieb,
2012). While it has long been recognized that behavioral goals of the observer play a central role in target selection (e.g., Buswell,
1935; Kowler,
1990), obtaining a detailed understanding of exactly how gaze targets are chosen on the basis of cognitive state has proved very difficult. Attempts to formalize this problem have typically taken the approach of explaining the effect of top-down factors by weighting a feature-based saliency map. For example, several models weight the stimulus saliency computations by factors that reflect likely gaze locations, such as sidewalks or horizontal surfaces, or introduce a specific task such as searching for a particular object (Kanan, Tong, Zhang, & Cottrell,
2009; Oliva & Torralba,
2006; Torralba, Oliva, Castelhano, & Henderson,
2006). Other models base top-down guidance on learned associations between features observed in locations that humans fixated when performing the tasks (Borji, Sihite, & Itti,
2011; Itti & Baldi,
2006; L. Zhang, Tong, Marks, Shan, & Cottrell,
2008). These models reflect the consensus that saccadic target selection is determined by activity in a neural priority map of some kind in areas such as the lateral intraparietal cortex and frontal eye fields (Bichot & Schall,
1999; Bisley & Goldberg,
2010; Findlay & Walker,
1999). However, the critical limitation of this kind of modeling is that it applies to situations where the subject inspects an image on a computer monitor, and this situation does not make the same demands on vision that are made in the context of active behavior, where visual information is used to inform ongoing actions. A broad range of different natural tasks have been investigated in the last two decades, and it is very clear that gaze is tightly linked, in time and location, to the momentary task requirements, and often task demands can explain all but a few percent of the fixations (see Tatler, Hayhoe, Ballard, & Land,
2011, for a review). In attempting to understand these top-down effects, a critical limitation is that there is no formal representation of the task being performed. Whereas there have been successful attempts to model specific behaviors such as reading or visual search (Najemnik & Geisler,
2008; Reichle, Rayner, & Pollatsek,
2003), we need to develop a general understanding of how the priority map actively transitions from one target to the next as behavior evolves in time. The problem that we address here is how to capture the underlying principles that control these gaze transitions.