Numerous experiments monitoring eye movements in natural tasks have shown that human gaze is tightly linked to ongoing task demands (Droll, Hayhoe, Triesch, & Sullivan,
2005; Hayhoe, Bensinger, & Ballard,
1998; Jovancevic & Hayhoe,
2009; Jovancevic-Misic, Hayhoe, & Sullivan,
2006; Land & Hayhoe,
2001; Land, Mennie, Rusted,
1999; Pelz, Hayhoe, & Loeber,
2001; Tatler, Hayhoe, Land, & Ballard,
2011). However, unlike the study of the bottom-up control of vision, there are few computational models of top-down control that have been proposed to explain gaze in natural tasks. In part this is because it is unclear how task structure should be represented, in contrast to the more straightforward image processing algorithms often found in bottom-up models. However, given the pervasive influence of task goals in gaze behavior, it is important to develop ‘a theory of tasks' to understand how sensory information can be used to guide motor output towards some set of desired states. Theoretically, there are several ways one might approach this, in particular Sprague and Ballard (
2003; Sprague, Ballard, & Robinson,
2007) have proposed a modular architecture for goal oriented visuomotor control and suggested that eye movements may be driven by two key parameters: reward and uncertainty. Within animal learning, the terms positive and negative reinforcement respectively refer to presenting a learner with an appetitive reward or withholding that reward. Similarly, positive and negative punishment refer to presenting or withholding an aversive stimulus. Note, however, that Sprague and Ballard describe these situations by using reward as a blanket term to refer to a numerical representation of an external learning signal that could be either numerically positive (appetitive), negative (aversive) or zero (neutral) and encompasses all of the above distinctions. This follows the naming tradition concerning Markov decision processes, an underlying mathematical framework for reinforcement learning, that use the generic term reward function for a mapping between a state of the world and a learning signal. In this article we use reward in this general sense of a utility function for brevity although it lacks precision. Their model uses a set of context-specific task modules, which individually represent state variables and their uncertainty for their respective tasks. Over time these uncertainties increase and can only be reduced by obtaining a sensory measurement through an eye movement. By tracking the respective uncertainties of the state variables in each module and using the individual tasks' rewards, one can compute an expected value of obtainable reward. If the expected value of reward for updating a particular task module is high, then gaze is allocated to update this module. A central premise of the model is that complex behavior can be broken down into a set of independent subtasks, and visual attention is allocated sequentially between these different tasks. Importantly the model allows flexible prioritization of visual tasks via reward weighting. Once the modules have been trained, their respective reward tables are normalized and each can be weighted (with the sum of weights across modules equaling one). The reward weighting on a module is proportional to its task priority and will directly influence how often that visual task receives new sensory information. On the face of it, this seems to describe the task selectivity of a wide range of natural behaviors and has the potential to guide our understanding of how gaze is allocated between competing task demands. While this model has been further developed for new visuomotor control scenarios (Nunez-Varela, Ravindran, & Wyatt,
2012, May; Rothkopf & Ballard,
2010, Sullivan, Johnson, Ballard, & Hayhoe,
2011), there has been little work addressing if and how the human visual system incorporates reward and uncertainty to control gaze in natural tasks. In this study our goal was to further the understanding of how these variables might be used by the visual system for eye movement control and provide behavioral observations for further modeling.