Various models have been proposed to explain the interplay between bottom-up and top-down mechanisms in driving saccades rapidly to one or a few isolated targets. We investigate this relationship using eye-tracking data from subjects viewing natural scenes to test attentional allocation to high-level objects within a mathematical decision-making framework. We show the existence of two distinct types of bottom-up saliency to objects within a visual scene, which disappear within a few fixations, and modification of this saliency by top-down influences. Our analysis reveals a subpopulation of early saccades, which are capable of accurately fixating salient targets after prior fixation within the same image. These data can be described quantitatively in terms of bottom-up saliency, including an explicit face channel, weighted by top-down influences, determining the mean rate of rise of a decision-making model to a threshold that triggers a saccade. These results are compatible with a rapid subcortical pathway generating accurate saccades to salient targets after analysis by cortical mechanisms.

*S*rises linearly from a starting value,

*S*

_{0}, at a rate

*r*until some threshold level,

*S*

_{ T }, is reached, at which point in time,

*T,*a saccade is triggered.

*S*

_{0}represents any initial bias with

*S*

_{0}= 0 implying none. If

*r*varies randomly from saccade to saccade with a Gaussian distribution, then the result will be a latency histogram with a tail skewed to longer latencies as is commonly observed. More specifically, the saccadic latency distribution can be reflected as a straight line when plotted cumulatively on a

*probit*ordinate and

*reciprocal*abscissa, a

*reciprobit*plot (Figures 1a and 1b).

*r*, would predict. These early responses can be fitted by a separate trend line that intersects the

*T*= infinity axis at 50%. This trend line is of shallower slope than the main distribution intersecting it, and is more pronounced when the target is expected or there is a high degree of urgency in the task (Reddi & Carpenter, 2000). These early responses may include express saccades. However, express saccades form a distinctly bimodal distribution and these early responses are only apparent on a reciprobit plot, unlike express saccades (Carpenter, 2001; Fischer & Ramsperger, 1986). Early saccades are thought to represent relatively automatic responses, mediated by a subcortical structure such as the superior colliculus (Carpenter, 1994; Schiller, Sandell et al., 1987).

*S*represents the log likelihood of a hypothesis (“there's something to look at here”) being correct at any given time. The initial value

*S*

_{0}represents the logarithm of prior probability.

*S*rises linearly at a rate

*r,*with incoming confirmatory sensory information until it reaches a threshold

*S*

_{ T }. This threshold reflects the probability that justifies the initiation of a saccade. Thus, the reciprocal of latency 1/

*T*can be described by the following equation:

*r*is a Gaussian random variable with mean

*μ*and standard deviation

*σ,*the distribution of 1/

*T*will also vary in a Gaussian manner with a mean of

*μ*/(

*S*

_{ T }−

*S*

_{0}) and variance of

*σ*

^{2}/(

*S*

_{ T }−

*S*

_{0})

^{2}. Any proportional change in both

*S*

_{ T }and

*S*

_{0}will have no effect on the distribution. Thus, they can be combined as Δ

*S*. Equally, the distribution would be unchanged by a proportional change in

*μ*,

*σ*, and Δ

*S*. As such, the system has two degrees of freedom; determination of any two of these three parameters defines the system completely.

*T*=

*μ*/Δ

*S*. The gradient is given by Δ

*S*/

*σ*and the

*y*intercept (the intercept with

*T*= infinity) is defined by

*μ*/

*σ*. This intercept corresponds to the probability that no saccade will be generated in finite time (i.e.,

*r*≤ 0).

*S*, we change the gradient and the median but not the

*y*intercept, thus causing a swivel of the curve around the intercept with the

*T*=

*infinity*axis. Increases in prior probability,

*S*

_{0}(Carpenter & Williams, 1995), and decreases in threshold probability,

*S*

_{ T }(Reddi & Carpenter, 2000), reduce Δ

*S*and thus reduce the gradient causing a clockwise swivel around the intercept. Decreases in

*S*

_{0}and increases in

*S*

_{ T }have the opposite effect (Figure 1c). If we alter the

*mean rate of rise*,

*μ*, we move both the

*y*intercept and the intercept with the horizontal 50% axis (the median), but the gradient remains unchanged. Thus, we shift the curve in parallel (Reddi et al., 2003). An increase in

*μ*corresponds to a leftward shift in the curve, a decrease to a rightward shift (Figure 1d).

*δ*, by a second, competing target (Leach & Carpenter, 2001). Where

*δ*is large, the first target is usually fixated first. As

*δ*becomes smaller, the first target is fixated proportionally less, and at

*δ*= 0 ms, the fixations are split 50/50 to each target. As well as demonstrating competitive racing, this paradigm also demonstrates independent randomness between units. If the randomness were correlated, the target that appears first would always win.

*x*and

*y*positions at any given millisecond. These are the absolute velocities measured as the Euclidean sum of

*x*and

*y*components. The EyeLink 1000 parser computes velocity by use of a 9-sample moving filter. For each data sample, the parser computes instantaneous velocity and acceleration and compares these to the velocity and acceleration thresholds. If either is above threshold, a saccade signal is generated. The parser checks that the saccade signal is on or off for a critical time before deciding that a saccade has begun or ended (Cerf, Harel et al., 2008). Following a calibration process, subjects initiated the experiment. Prior to each stimulus presentation, the subjects were instructed to look at a black fixation cross at the center of the screen. If the calculated gaze position was not at the center of the screen, the calibration process was repeated to ensure that position was consistent throughout the experiment. Images were presented on a CRT2 screen (120 Hz), using Matlab's Psychophysics and the Eyelink toolbox extension. Stimulus luminance was linear in pixel values. The distance between the screen and the subject was 80 cm, giving a total visual angle for each image of 28° × 21°. Subjects used a chin rest to stabilize their head. Eye movement data were acquired from the right eye alone. All subjects had normal or corrected-to-normal eyesight. All subjects were naive to the purpose of the experiment. The experiment was undertaken with the understanding and written consent of each subject. All experimental procedures were approved by Caltech's Institutional Review Board.

*p*> 0.05, Wilcoxon rank sum). We also found a significant increase in the percentage of fixations landing on the face starting 10 ms prior to the onset of the initiation of main distribution saccades (

*p*< 5.6 × 10

^{−43}, Wilcoxon rank sum). This increased proportion of facial fixations is maintained throughout the main distribution, though it declines after 100 ms. The increased proportion of facial fixations starting at 10 ms prior to the onset of the main distribution is attributed to the fastest saccades of the main distribution being below the cutoff latency and to the discrete nature of the 20-ms bins. There are two facets to the main distribution: an early peak in facial saliency (20 ms in Figure 3a), followed by a general decline in facial saliency with increasing saccadic latency (40 to 200 ms in Figure 3a); 63.2% ± 1.3% (mean ± 95% confidence interval) of all MSs are to faces, highly above chance (

*p*< 10

^{−15}, Wilcoxon rank sum).

*gradient or steepness*is an inverse measure of prior probability for face viewing (Carpenter & Williams, 1995). High gradients correspond to low prior probability, or low expectation, of fixating a face (per unit time) and vice versa. Given our hypothesis about differences in image viewing between the first and subsequent fixations (unpredictable vs. predictable image), we can look at the gradients. These should be high for the first fixation and lower for subsequent ones.

*p*< 10

^{−13}, 2-sample Kolmogorov–Smirnov). Across all subjects, the ratio of the first fixation to second fixation gradients is 2.3 ± 0.8 and the change is significant for all (

*p*< 10

^{−3}, 2-sample Kolmogorov–Smirnov).

*T*= infinity, as in evoked saccade tasks, marking them out as a distinct population (Carpenter & Williams, 1995). There is an increase in the proportion of ES in second fixations (7.9% in face-containing images and 11.1% in text-containing images) as compared to the first fixation (2.9% in face-containing images and 3.4% in text-containing images; see Figure 3). This increased proportion of ES is also more selective for faces and text (77.9% of ESs are to faces and 66.2% to text) than those made to the first fixation (10.3% of ESs are to faces, while no ESs are to text; Figure 3). Thus, ESs evoked by image onset are not selective for faces or text, but ESs made to the second and subsequent fixations are highly selective.

*intercept*with the

*T*= infinity axis as a measure of the

*mean rate of rise*of the decision signal of the face unit. The higher the

*intercept,*or equivalently the more left-shifted the curve, the higher the

*mean rate of rise*(Reddi & Carpenter, 2000). There is a change in the intercept, the mean rate of rise, from fixations one to two (Figure 4a). For the first fixation, the intercept is at 6.4 standard deviations from mean, dropping by the second fixation to 4.5 std, settling on 3.0 std by fixations three and four (2.6 std). From fixations two to three, we see a distinct rightward shift in the reciprobit, corresponding to a change in the mean rate of rise, independent of any change in prior probability (

*p*< 0.01 for all subjects, 2-sample Kolmogorov–Smirnov). This shows that in the absence of any changes in the image or in the behavioral goal, dynamic changes occur in the mean rate of rise and therefore in the speed, and in the outcome, of saccadic decision.

*n*th and the (

*n*+ 1)th fixations (Figure 4a), we show that the saliency histogram for faces changes over the first four fixations, becoming progressively flatter as it loses the early peak in saliency. The attractiveness or saliency of faces for MS with shorter latencies (0–200 ms; Figure 4b), or short-latency saccades (SL-MSs), lessens and becomes equivalent to that for MS with longer latencies (>200 ms; Figure 4b), or long-latency saccades (LL-MSs). The difference between the saliency histograms for each fixation is significant (Figure 4b). This demonstrates a progressive change in the outcome of saccadic decision over the course of 4 saccades.

*p*< 0.05, 2-sample Kolmogorov–Smirnov) for 14 of 19 subjects for the 1st fixation, 15 subjects for the 2nd fixation, 10 subjects for the 3rd fixation, and only 4 subjects for the 4th fixation.

*p*> 0.05, 2-sample Kolmogorov–Smirnov in all 19 subjects between all quartiles). However, when a standard face detection algorithm (Viola & Jones, 2001) was incorporated into the Itti–Koch saliency map, there was a significant separation of the curves (Figure 5a;

*p*= 3.2 × 10

^{−4}first to third quartiles,

*p*= 2.2 × 10

^{−5}first to fourth quartiles, 2-sample Kolmogorov–Smirnov). The most salient quartile produces a curve with the highest mean rate of rise. The mean rate of rise then reduces as the saliency is reduced. This separation in mean rate of rise is also present with later fixations. However, after the second fixation it occurs to a lesser degree, with little or no separation by the fifth fixation (separation between

*y*intercepts of first and fourth quartiles, where the linear unit of the ordinate is standard deviations of a normalized Gaussian

*N*∼ [0,1]; 1st fixation, 2.7 std; 2nd fixation, 0.7 std; 3rd fixation, 0.3 std; 4th fixation, 0.6 std; 5th fixation, 0.03 std) for the subject in Figure 4. This pattern is observed across all subjects (mean

*y*intercept difference between first and fourth quartiles; 1st fixation, 4.5 std; 2nd fixation, 0.2 std; 3rd fixation, −0.04 std; 4th fixation, −0.2 std; 5th fixation, −0.2 std).

*p*< 0.01, Wilcoxon rank sum; Figure 6a). Faces are fixated in 70.5% (59.1–80.3%, 95% confidence interval here and following) of early distribution saccades and in 62.7% (59.1–66.1%) of main distribution saccades when told to look for a face and 16.8% (9.7–26.0%) and 22.1% (19.7–24.8%), respectively, when told to look at an object. The differences between the two search tasks are highly significant for both early and main saccades (

*p*< 1 × 10

^{−16},

*p*< 1 × 10

^{−85}, respectively, Wilcoxon rank sum).

*utility*signal (Good, 1952).