How do we decide where to look next? During natural, active vision, we move our eyes to gather task-relevant information from the visual scene. Information theory provides an elegant framework for investigating how visual stimulus information combines with prior knowledge and task goals to plan an eye movement. We measured eye movements as observers performed a shape-learning and -matching task, for which the task-relevant information was tightly controlled. Using computational models, we probe the underlying strategies used by observers when planning their next eye movement. One strategy is to move the eyes to locations that maximize the total information gained about the shape, which is equivalent to reducing global uncertainty. Observers' behavior may appear highly similar to this strategy, but a rigorous analysis of sequential fixation placement reveals that observers may instead be using a local rule: fixate only the *most* informative locations, that is, reduce local uncertainty.

*f*noise at various eccentricities. Using these measurements, they implemented a search model for the Gabor target in noise that adopts one of three strategies: (1) move to random locations, (2) move to locations that reduce global uncertainty about target location, and (3) move only to locations that are most likely to contain the target (reduce local uncertainty). The second strategy will collect information optimally, and the third is the maximum

*a posteriori*(MAP) strategy. The probability of target presence is monitored at every location, and the target is “found” when probability at one location exceeds a predetermined threshold. The authors demonstrate that their optimal and MAP searchers locate the target with roughly the same number of fixations as human observers. Although the aggregate behavior of human fixations qualitatively resembled their model fixations, the landing of individual saccades during the task was not examined. We have yet to unravel what decision strategies underlie the choice of human fixation locations.

*Individual fixations*are compared against strategy predictions using a signal detection theory approach. At first inspection, human eye movements appear “optimal” (reduce global uncertainty); however, our rigorous analysis of individual fixation placement reveals that an approximate, local rule may actually govern eye-movement decisions.

*r*= −.90). That is, observers who made large saccades tended to fixate for shorter periods and vice versa. Subjects typically made three to five fixations around the object in the viewing time allowed.

*first*fixations to the object. First fixations do not have the same donut distribution of the other object-exploring fixations and are biased in the preview direction. This clustering of the first fixation for very different shapes may indicate that it may simply be a localizing saccade that is mostly independent of detailed shape information. The absolute scale of the donut distribution might suggest that observers are making fixations within object boundaries. Further analysis revealed that although fixations may cluster near boundaries on average, they often fall outside of the boundary; 8.4% to 27.6% of fixations landed outside object boundaries depending on observer. Next, we evaluate different eye-movement decision strategies by examining the placement of

*individual fixation locations*.

*next*fixation. Predicting the fixation

*sequence*that maximized information gain is more computationally intensive, although there is some evidence that humans may indeed plan more than one fixation at a time.

*f*) is overlaid on the current strategy map, which is updated using the previous series of fixations (1, …,

*f*

*−*1). The map is rescaled from 0 to 1, and the prediction value is taken as the maximum value that falls within 1° of the human fixation ( Figure 6A), following the approach outlined by Tatler, Baddeley, and Gilchrist (2005). A criterion window of 1° allows some wiggle room for natural fixation error and imprecision in our sampling of the global prediction (we interpolate a grid with 0.25° spaced samples). Because it is unlikely that information for eye-movement planning is processed in less than 100 ms (Araujo, Kowler, & Pavel, 2001; Caspi, Beutter, & Eckstein, 2004), the prediction map is only updated by a fixation if its dwell time exceeds this value.

*not*fixated by the observer. We determine “not-fixated” locations by simply evaluating locations predicted by the random strategy. Hits and false alarms are plotted with changing threshold, sweeping out the ROC curve. If the global prediction is no better than random at predicting human fixations, the ROC curve should lie along the positive diagonal (AUC = 0.5). If the global strategy is a good predictor of human fixations, it will tend toward the upper left-hand corner of the plot (AUC = 1.0). To assess the significance of the AUC, we resampled the hits and false alarms in our ROC analysis with replacement to produce bootstrapped estimates. A prediction is considered significantly better than chance if the 95% confidence interval for the AUC does not include 0.5. Figure 6B shows that for all of our observers, the global model is significantly better than chance at predicting the next fixation.

*uniform*random model. The simple fact that the global model produces a donut-shaped distribution of fixations may be enough to align it with human fixation patterns. A much more stringent test is one that compares the performance of a strategy against a “smarter” random strategy that knows shape information is near the edges in this task.

*not fixated*, from which we generate false alarms for the ROC analysis. Using this much stricter test, how well does the global strategy predict human fixations? Figure 7 demonstrates that the fixation error is lower for the smart random strategy compared with the uniform random strategy, but the global strategy still has a significantly smaller error. When the smart random strategy serves as a baseline comparison, ROC curves shift toward the diagonal but the AUC is still significantly greater than 0.5. The magnitude of the AUC values demonstrate that, although far from perfect, the global strategy has some power to predict human eye movements.

*expected*information rather than the actual information. It is unclear how the visual system would do this without complex computation. Is there a simpler, more efficient strategy that produces similar fixation behavior?

*next*fixation.

*f*are biased toward the centroid by a weight

*w*:

*C*is the centroid and

*w*, we can calculate the observer's

*intended*fixation and superimpose it on our local uncertainty strategy map. Using the prediction values from these maps, we again compute ROC curves. Figure 12 plots the AUC as a function of centroid weighting for each subject. The straight lines indicate the baseline AUC for the global and local uncertainty strategies (i.e., without centroid weighting). The 95% confidence intervals attained with bootstrapping allow us to determine which points are significant. For all subjects, the local uncertainty strategy with centroid weighting provides the best prediction of human fixation locations.

*within*the overall fixation distribution. This stringent test allows us to compare the microstructure of different strategies and to better discriminate between them.

*expected*information gain must still be computed globally across the visual field.

*task-relevant information*changes. Defining and quantifying information for a variety of tasks remains one of the great challenges of vision research.

*edgelets*or small straight-line segments that approximate the continuous shape boundary. Each edgelet can assume any one of eight possible orientations, which is a discretization of all possible orientations from 0° to 180°. There are a total of

*n*edgelet orientations along the boundary, labeled

*x*

_{ i}, where

*i*= 1, 2, …,

*n*, and

*x*

_{ i}= 1, 2, …, 8 for each

*i*. We set

*n*to be equal to the number of boundary or edge pixels. The edgelet orientations are unknown to the observer and need to be inferred from visual information.

*r*(

*E*) are represented as a histogram over eight orientations. We choose

*r*(

*E*) to be equal in size to a “perceptive hypercolumn,” as described by Levi et al. (1985) for vernier acuity in the periphery. Specifically,

*r*(

*E*) is the distance at which small flankers begin to elevate thresholds for a vernier acuity stimulus. It is thought that these flankers encroach on the orientation-selective cells that are analyzing the vernier stimulus and is, therefore, a rough measure of orientation hypercolumns. As this is a perceptual finding, Levi et al. coin it the perceptive hypercolumn. Quantitatively,

*E*

_{2}is the eccentricity at which acuity drops to half its value in the fovea and

*s*is the slope. We further interpret

*r*(

*E*) as an effective radius over which the visual system spatially pools orientation information (Figure A1). Unpublished data from our laboratory support the Levi et al. parameters.

*E*

_{ i}(

**) denote the eccentricity of location**

*F**i*relative to fixation

**. Thus, we write**

*F**r*(

*E*

_{ i}(

**)) for the radius of the histogram at edgelet**

*F**i*given fixation

**. The histogram is normalized by the total number of edgelets within the radius so that all the histogram entries sum to 1. For each edgelet**

*F**i*viewed from fixation

**, we denote the histogram by**

*F*

*h*_{ i}(

**), where the boldface indicates that it is a vector with eight components (see Figure A1).**

*F*

*h*_{ i}(

**) provides a summary of the shape boundary near edgelet**

*F**i*. If the boundary is perfectly straight within the receptive field radius, then the histogram will show the presence of only one orientation in the entire local population, which uniquely determines all the edgelet orientations in that population. Conversely, a

*flatter*, higher entropy histogram indicates that the local shape is more complex.

*h*_{ i}(

**) provides about edgelet orientation**

*F**x*

_{ i}using a simple likelihood model:

*P*(

*h*_{ i}(

**)|**

*F**x*

_{ i},

*E*

_{ i}(

**)) =**

*F**h*

_{ i, xi}(

**)/**

*F**Z*, where

*h*

_{ i, xi}(

**) is the**

*F**x*

_{ i}th component of

*h*_{ i}(

**), that is, the fraction of edgelets within the pooling neighborhood with orientation**

*F**x*

_{ i}.

*Z*is a normalization constant. For an intuitive interpretation of the likelihood function, notice that if

*h*_{ i}(

**) is 0 for some component**

*F**x*

_{ i}=

*z*, then no edgelet in the local population has orientation

*z*; thus, the likelihood function

*P*(

*h*_{ i}(

**)|**

*F**x*

_{ i},

*E*

_{ i}(

**)) equals 0 for**

*F**x*

_{ i}=

*z*, which rules out the possibility that

*x*

_{ i}=

*z*. Conversely, the higher the value of

*P*(

*h*_{ i}(

**)|**

*F**x*

_{ i},

*E*

_{ i}(

**)) for any component value**

*F**x*

_{ i}=

*z*, the more likely that the true value of

*x*

_{ i}is actually

*z*.

*P*(

*x*

_{ i}) = 1/8, which means that all orientations are a priori equally likely. Using Bayes' rule, we obtain the posterior distribution

*Z*′ is a normalization constant.

*F*_{1}and

*F*_{2}as follows:

*Z*

_{2}is a normalization constant. Although this approximation has some undesirable properties (such as making the marginal distribution more peaked if the same fixation is made repeatedly), it provides a simple mechanism for combining histogram evidence from multiple, distinct fixations.

*i*is the sum of entropies corresponding to all edgelets within radius

*r*(

*E*

_{ i}(

*F*)) of pixel

*i*and is equal to 0 if there are no edgelets within this radius. In other words, the RDE is defined at every pixel location

*i*(given fixation

*F*) as follows:

*H*

_{ j}is the entropy of edgelet

*j*; that is,

*P*(

*x*

_{ j}) on histogram data (as in Equations A2 and A3) for simplicity.

**to minimize the total entropy of all**

*F**n*edgelets:

*everywhere*in the image! The visual system would, instead, need to compute the

*expected*information gain through the use of priors or heuristics.

**to be the pixel location**

*F**i*that is the maximum of RDE

_{ i}. Because the RDE map is updated with each new fixation, it is straightforward in determining its maximum, and thus, this strategy could be implemented easily in the human visual system.