Most models of visual search are based on the intuition that humans choose fixation locations containing features that best match the features of the target. The optimal version of this feature-based strategy is what we term “maximum *a posteriori* (MAP) search.” Alternatively, humans could choose fixations that maximize information gained about the target's location. We term this information-based strategy “ideal search.” Here we compare eye movements of human, MAP, and ideal searchers in tasks where known targets are embedded at unknown locations within random backgrounds having the spectral characteristics of natural scenes. We find that both human and ideal searchers preferentially fixate locations in a donut-shaped region around the center of the circular search area, with a high density of fixations at top and bottom, while MAP searchers distribute their fixations more uniformly, with low density at top and bottom. Our results argue for a sophisticated search mechanism that maximizes the information collected across fixations.

*f*noise, which have Fourier amplitude spectra similar to natural images (see the target and background texture in Figure 2A). We also showed that to achieve this nearly ideal performance in our task humans must have sufficient memory for past fixation locations to reduce the probability of revisiting them and must perform efficient parallel detection on each fixation (for additional evidence of efficient parallel detection, see Eckstein, Beutter, & Stone, 2001; Eckstein, Thomas, Palmer, & Shimozaki, 2000; Palmer, Verghese, & Pavel, 2000).

*f*noise background region 15 deg in diameter. An example of the noise background with an added target is shown in Figure 2A (the 25 circles indicate target locations for a separate target detection experiment; see below). The observer was required to press a button as soon as the target was located and then to indicate the location by fixating the target and pressing the button again; the trial was counted as correct if the eye was closer to the true target location than any other potential target location. The observer's eye position was continuously monitored with a high-precision eye tracker. The first step in evaluating models for this search task is to measure detection sensitivity across the visual field (visibility maps) for the stimulus conditions under which the search performance was measured. To determine the visibility maps, detection accuracy as a function of target contrast (a psychometric function) was measured at each of the 25 locations indicated by the circles in Figure 2A, for four different levels of noise contrast (see Najemnik & Geisler, 2005).

*f*noise at a mean luminance of 40 cd/m

^{2}; the display outside the background region was set to the mean luminance. The 1/

*f*noise was created by filtering Gaussian white noise in the Fourier domain, truncating the filtered noise waveform to ±2

*SD*(0.46% of the noise pixels clipped), scaling to obtain the desired RMS amplitude, and then adding a constant to obtain the mean luminance. Eye position was measured with a Forward-Technologies SRI Version-6 dual Purkinje eye tracker. The eye-position signals were sampled from the eye tracker at 100 Hz by a National Instruments data acquisition board. Head position was maintained with a bite bar and headrest. To calibrate the eye tracker, the observer was asked to fixate points on a 3 × 3 calibration grid, and the results were then used to establish the transformation between the output voltages of the eye tracker and the position of the observer's gaze on the computer display. The algorithm for computing fixation points from eye positions was a modified version of one in the Applied Science Laboratory Series 5000 data analysis software.

*c*

_{ T}is a position parameter and

*s*is a steepness parameter. We interpolated the psychometric functions measured at the 25 locations to obtain a continuous map of human detection accuracy across the visual field. Specifically, we interpolated the psychometric functions, using a surface interpolation technique that assumes the interpolant is the sum of 25 radial linear basis functions centered on the 25 locations where we made measurements (Carr, Fright, & Beatson, 1997). This visibility map was then used to constrain the ideal and MAP search models.

*f*noise at one of 85 locations densely tiling the 15 deg display in a triangular array. The observer's task was to find the target as rapidly as possible without making errors. As soon as the target was located, the observer pressed a button, a little dot was rendered at the observer's point of gaze in real time. The observer then indicated the judged location of the target by fixating that location and pressing the button again. The response was considered correct if the eye position (at the time of the second button press) was closer to the target location than to any other potential target location. Prior to each block, the eye tracker was calibrated and the search target was shown on a uniform background at the center of the display. Each observer completed at least twelve 40-minute sessions. Each session consisted of 6 blocks of 32 search trials. The trials in the session were blocked by target contrast (six contrast levels per session). The rms contrast of the background noise was fixed within each session. The six target contrast levels were set based on the results of the detection experiment described above so that they corresponded to six levels of foveal detectability

*d*′ = 3, 3.5, 4, 5, 6, and 7 (

*d*′ is monotonically related to detection accuracy; see later for more details). Two contrast levels for the background noise were tested, 0.05 and 0.20 rms. The eye-tracker calibration was repeated prior to each block. The observer had an option to recalibrate after each trial by pressing a recalibration button.

*d*′(

**p**;

*c,*

*e*

_{ n}), where

**p**is the retinal position,

*c*is the rms contrast of the target, and

*e*

_{ n}is the contrast power (rms contrast squared) of the background noise. The signal-to-noise ratio is monotonically related to detection accuracy and is obtained by taking the inverse normal integral of the accuracy function,

*f*(

**p**;

*c,*

*e*

_{ n}), measured in the visibility-map experiment

^{−1}[

*x*] is the inverse of the standard normal integral. The √2 factor takes into account that there were two intervals in the forced choice detection task described above, but (effectively) only a single interval in each fixation of the search task (Green & Swets, 1966). In the auxiliary material we show that

*αe*

_{n}is the effective 1/

*f*noise power passing through the sine wave template and

*β*(

**p**;

*c,*

*e*

_{n}) is the effective noise from the observer's own sources of inefficiency, which might include internal variability, reduced spatial resolution in the periphery, criterion variations, and so on (Burgess & Ghandeharian, 1984; Lu & Dosher, 1999; Pelli & Farell, 1999). The value of

*α*(= 0.0218) was determined by measuring the template responses to the target and by measuring the variance of the template responses to very large number of samples of the actual background noise used in the experiments. Once

*α*was determined, the values of

*β*(

**p**;

*c,*

*e*

_{n}) were obtained directly from the psychophysically measured visibility map using Equations 2 and 3. We also note that the signal-to-noise ratio for an ideal detector limited only by external noise is given by

*d*′

_{E}(

*c,*

*e*

_{n})

^{2}=

*c*

^{2}/

*αe*

_{n}and the signal-to-noise ratio of an ideal detector limited by only by the internal inefficiencies is given by

*d*′

_{I}(

**p**;

*c,*

*e*

_{n})

^{2}=

*c*

^{2}/

*β*(

**p**;

*c,*

*e*

_{n})

^{2}.

*W*

_{ ik( t)}be the template response from potential target location

*i*on the

*t*

^{ th}fixation, where the fixation location is

*k*(

*t*). Similarly, let

*d*′

_{ I}[

*i,*

*k*(

*t*)] and

*d*′

_{ E}be the corresponding signal-to-noise ratios. Here we have simplified the notation by dropping

*c*and

*e*

_{ n}. The retinal position

**p**is given implicitly by the values of

*i*and

*k*(

*t*). The steps for each simulated search trial were implemented as follows:

- Fixation began at the center of the display (as in the search experiment).
- One of the
*n*possible target locations was selected at random (prior(*i*) = 1/*n*); 0.5 was added to that location and −0.5 to all other locations. This gives the template responses an expected value of 0.5 when the target is at that location and −0.5 otherwise. This was done for mathematical convenience, it has no effect on the predictions as long as we add noise to the simulated template responses so that the values of*d*′ exactly match those given by the visibility map. - A static Gaussian noise sample
*X*_{ i}was generated for each potential target location. These samples were statistically independent with mean zero and variance 1/*d*_{ E}′^{2}. Note that these numbers represent the template responses to the external 1/*f*noise, not the external noise itself. The static noise samples remained the same throughout the search trial, reflecting the fact that the 1/*f*noise was static. - On each fixation
*t,*an independent Gaussian noise sample*N*_{ ik( t)}was generated for each potential target location and added to the static noise sample. These noise samples had mean zero and variance 1/*d*′_{ I}[*i,**k*(*t*)]^{2}. Thus, at the target location the template response is*W*_{ ik( t)}=*X*_{ i}+*N*_{ ik( t)}+ 0.5, and at all other locations*W*_{ jk( t)}=*X*_{ j}+*N*_{ jk( t)}− 0.5. - The posterior probability at each potential target location
*i*after*T*fixations was computed using the formulawhere$ p i(T)= p \u2062 r \u2062 i \u2062 o \u2062 r ( i ) exp [ \u2211 t = 1 T g T [ i , k ( t ) ] W i \u2062 k ( t ) ] \u2211 j = 1 n p \u2062 r \u2062 i \u2062 o \u2062 r ( j ) exp [ \u2211 t = 1 T g T [ j , k ( t ) ] W j \u2062 k ( t ) ],$(4)In other words, optimal integration of information across fixations is achieved by keeping an appropriately weighted sum of the template responses from each potential target location (the sums over$ g T[i,k(t)]= d E \u2032 2 d I \u2032 [ i , k ( t ) ] 2 d E \u2032 2 + \u2211 x = 1 T d I \u2032 [ i , k ( x ) ] 2.$(5)*t*in Equation 4), where the weights,*g*_{ T}[*i,**k*(*t*)], are determined by the visibility map. For a derivation, see the auxiliary material. (The icon for this article at the*JOV*Web site illustrates how the posterior probabilities are updated across fixations.) - If the maximum posterior probability exceeded a criterion, the search stopped. The stopping criterion for each condition was picked so that the error rate of the ideal searcher approximately matched that of the human observers (this was done by trial and error).
- To compute the optimal next fixation point,
*k*_{opt}(*T*+ 1), the ideal searcher considers each possible next fixation and picks the location that, given its knowledge of the data from previous fixations and the visibility map, will maximize the probability of correctly identifying the location of the target after the next fixation:Explicit expressions that allow Equation 6 to be evaluated efficiently in simulations are derived in the auxiliary material. The MAP searcher always fixates the location with maximum posterior probability$ k o \u2062 p \u2062 t(T+1)=arg max k ( T + 1 ) { \u2211 i = 1 n p i ( T ) p [ C \u2062 | \u2062 i , k ( T + 1 ) ]}.$(6)$ k M \u2062 A \u2062 P(T+1)=arg max i[ p i(T)].$(7) - The process jumps back to step 4 and repeats until the criterion posterior probability is exceeded at some location.

*d*′ values, which are determined in the preliminary experiment. If the external noise is static (as in the current experiment), then the mechanism of target detection affects the predictions of search performance because it affects the estimated ratio of external to internal noise. However, the effect is small. For example, predicted search performance for the purely dynamic noise case is parallel to and only modestly superior to that predicted for the static external noise case reported here.

*d*′) for any combination of target contrast and background noise contrast, at all possible target locations (see Methods). Figure 2E show how the visibility (

*d*′) of the target varies across the display, for one combination of target contrast and background contrast. The visibility peaks at the center of the fovea and falls off smoothly with retinal eccentricity; it falls fastest in the vertical direction, and thus visibility is poorest in the upper and lower visual fields. There is an anatomical correlate of this asymmetry in the topography of human photoreceptors (Curcio, Sloan, Kalina, & Hendrickson, 1990) and ganglion cells (Curcio & Allen, 1990).

*f*noise texture. Humans and ideal start with relatively small saccade lengths. Humans have an initial bias for fixations along the horizontal meridian and a later bias for fixations along the vertical meridian, whereas the ideal has an initial bias for the vertical meridian and a later bias for the horizontal meridian.

*f*noise texture.