**A core question underlying neurobiological and computational models of behavior is how individuals learn environmental statistics and use them to make predictions. Most investigations of this issue have relied on reactive paradigms, in which inferences about predictive processes are derived by modeling responses to stimuli that vary in likelihood. Here we deployed a novel anticipatory oculomotor metric to determine how input statistics impact anticipatory behavior that is decoupled from target-driven-response. We implemented transition constraints between target locations, so that the probability of a target being presented on the same side as the previous trial was 70% in one condition (pret70) and 30% in the other (pret30). Rather than focus on responses to targets, we studied subtle endogenous anticipatory fixation offsets (AFOs) measured while participants fixated the screen center, awaiting a target. These AFOs were small (<0.4° from center on average), but strongly tracked global-level statistics. Speaking to learning dynamics, trial-by-trial fluctuations in AFO were well-described by a learning model, which identified a lower learning rate in pret70 than pret30, corroborating prior suggestions that pret70 is subjectively treated as more regular. Most importantly, direct comparisons with saccade latencies revealed that AFOs: (a) reflected similar temporal integration windows, (b) carried more information about the statistical context than did saccade latencies, and (c) accounted for most of the information that saccade latencies also contained about inputs statistics. Our work demonstrates how strictly predictive processes reflect learning dynamics, and presents a new direction for studying learning and prediction.**

*SEM*is the measure of spread throughout unless noted otherwise.) They were recruited from the local student population and reimbursed 20 Euro for their time. The Institutional Ethical Review Board approved the study. The sample size was predetermined based on a pilot study with a similar design but that used images of real-life objects rather than abstract shapes (

*N*= 22).

*pret*70,

*pret*30, respectively). There were 10 series in each condition, for a total of 1,000 trials per participant per condition. Transition probabilities were fixed (stationary) within each series. Although the transition probabilities were experimentally manipulated, the proportion of presentations on the left and right screen sides were identical and set at 50% in both conditions. Thus, behavioral effects could only be attributed to differences in transition structure. To compare learning indices for the first and second half of each series, we constructed the series so that the intended transition constraints and screen-side frequencies were exactly maintained across trials 1–50 and trials 51–100.

*x*direction, coded as positive if to the side of the last target and negative otherwise. Because we were interested in comparing predictive to reactive behavior, we compared AFOs to saccade latencies (SL) but excluded anticipatory saccades (∼2% of total trials) and considered only trials where saccade latencies exceeded 80 ms (Fischer & Ramsperger, 1984). We excluded anticipatory saccades (those faster than 80 ms) because our main interest when comparing AFOs and SLs was to determine whether AFO provides more information about the statistical context than stimulus-guided saccades. We then further restricted valid trials to those in which participants made saccades to both the fixation symbol and subsequent target, within a tolerance of 2° from their edge on the horizontal axis. Valid trials accounted for 87% ± 2% of the data. Catch trials and the trials immediately following them were also excluded, resulting in 78% ± 2% of all trials. Following the definition of AFO, we further restricted the set of valid trials to ones that were preceded by a correct saccade to the prior target. In total, following this procedure, we retained 69% ± 3% of trials. Finally we implemented an AFO eccentricity constraint, which limited the analysis to those where AFO was within a radius of 3° from center (

*rad*

_{AFO}= 3°). This was done to reduce the possibility that AFO statistics would be driven by a few outliers. This produced a mean percentage of 65% ± 2% of total trials (i.e., an average of 650 trials per condition per participant). We verified (see Appendix) that the choice of

*rad*

_{AFO}did not alter the main findings for the AFO analysis.

*returns*that were trials where the screen side of the last-presented target was the same as the one that preceded it, and

*alternations*where the screen-side of the last-presented target was opposite to the one preceding it. This split allowed us to determine how AFOs were impacted by whether the last transition was a return or alternation. Saccade latencies were analyzed according to the same schema, with each saccade categorized as a return or alternation saccade.

*S*= 1 if the trial is a return, and 0 if alternation. This information was coded for each of the last

*k*transitions (

*k*= 1,...6).

*β*

_{1}to

*β*

_{6}indicate that a return at lag

*k*prior transitions was associated with larger AFO values. Negative coefficients indicate reduced AFO values. The intercept

*c*is the expected AFO for six consecutive alternations and is not considered further. When analyzing AFO data, we fit these regression models to each participant, predicting the current AFO value separately for the pret70 and pret30 conditions.

*α*is the

*learning rate, K*is a

*scaling factor*transforming internal probability estimates to overt behavior and

*P*

_{0}is a

*probability equilibrium point*reflecting an internal estimate of probability of return above which a participant shows a gaze bias towards the return side.

*α*and

*P*

_{0}were bounded in the interval [0,1]. We fit the

*P*

_{0}parameter because it is known that in binary contexts, subjective points of equilibrium significantly deviate from 50%; a truly random binary series is subjectively perceived as containing too many streaks (see Falk & Konold, 1997). The reduced model where

*P*

_{0}was fixed at 50% offered a significantly poorer fit as evaluated by a Bayesian information criterion (BIC) criterion and is not discussed further; Δ

*BIC*= 18 ± 5 in pret30 and Δ

*BIC*= 16 ± 5 in pret70, both above zero with

*p*< 0.001, bootstrap test.

*w*(see Appendix for validation procedure details). Although the RW model is heuristic in nature, it performs similarly to more complex generative models when the target statistics are stationary (Mengotti, Dombert, Fink, & Vossel, 2017).

*R*, which quantifies the linear relationship (see Equation 3).

*H*(

*x*) is the entropy of the variable

*x*(here, the experimental condition

*pret*), and

*H*(

*x*|

*w*) is the entropy of

*x*given

*w*(the specific known behavioral response). Because the two stochastic processes (pret70, pret30) were equally probable, the entropy related to which condition participants were observing (

*pret*equal to 70 or 30) on any given trial was 1 bit. We used MI to quantify the degree of uncertainty about the variable

*pret*removed by considering several oculomotor information sources and their joint distribution. First, we calculated the entropy reduction separately achieved by AFO or SL,

*I*(

*pret*;

*AFO*) and

*I*(

*pret*;

*SL*). Because saccade latencies on any given trial likely depend on whether the saccade was an alternate or a return (due to inhibition of return), we also partialized by this factor in the MI formulation (see Appendix). This allowed us to determine whether AFO and saccade latency were differentially informative with respect to the experimental conditions. We could also evaluate whether AFO and SL provided redundant information about the transition structure (negative synergy; Schneidman, Bialek, & Berry, 2003), in which case the MI provided by the joint distribution would be lower than the sum of the two former terms, as indicated in Equation 4:

*pret*carried by separate oculomotor contributions to AFO, namely drift and small saccade instabilities. We calculated all these quantities per participant, which licensed statistical tests at the group level.

*AFO*) was around 0.3°,

*t*(20) = 10.10,

*p*< 0.001,

*d*= 1.87.

^{1}

*F*(1, 20) = 75.10,

*p*< 0.001, and a main effect of Condition

*F*(1, 20) = 18.10,

*p*< 0.001, as AFO was higher in pret70, and there was no interaction (

*F*< 1). Regression models probed for the impact of any of the last six transitions and indicated that returns in any of the last five trials (for pret70) or the last three trials (for pret30) contributed positively to AFO, though with a decaying impact (Figure 2C).

^{2}To summarize, AFO reflected learning of global statistics but was also impacted by the immediately preceding transition, and (more weakly) by the preceding three to five transitions.

*K*that reflected the mapping from subjective internal probabilities to AFO magnitudes. The RW model was successfully validated on the single participant level. As detailed in the Methods section, in each condition we estimated the model parameters from nine of the 10 series and applied those parameters to predict trial-by-trial AFO values for the left-out series. The variance accounted for by the model for the left-out series exceeded permutation-derived chance (

*p*< 0.05) for 19 of the 20 participants, in both conditions (see Appendix for methods). Figure 3A shows the model's predicted AFO values for a left-out AFO series, based on parameters estimated from independent data.

*M*= 0.71 ± 0.05) than in pret70 (

*M*= 0.55 ± 0.05),

*t*(20) = 2.44,

*p*= 0.025,

*d*= 0.67, indicating narrower integration windows in pret30. Because we bound the

*P*

_{ret}parameter within the interval [0,1], the range of AFO was determined by the scaling factor

*K*. We found that

*K*was greater in pret70 (

*M*= 0.87 ± 0.12) than in pret30 (

*M*= 0.49 ± 0.09),

*t*(20) = 3.32,

*p*= 0.0036,

*d*= 0.81. This indicates that subjective probabilities translated into larger behavioral signatures for pret70. Finally, the mean equilibrium point,

*P*

_{0}, was greater in pret30 (

*M*= 0.49 ± 0.06) than in pret70 (

*M*= 0.32 ± 0.05),

*t*(20) = 2.92,

*p*= 0.0088,

*d*= 0.69, and differed from 0.5 only for the latter,

*t*(20) = 4.06,

*p*= 0.0015,

*d*= 0.96. An extended RW model that reflected a weighted combination of two independent learning rates (see Methods) did not produce a better fit than the simpler model reported here (Δ

*BIC*not different from zero,

*p*> 0.01).

*F*(1, 20) = 8.99,

*p*= 0.0036. Importantly, this effect was concomitant with an independent effect of Last trial,

*F*(1, 20) = 41.31,

*p*< 0.001, because AFO was larger after returns. In summary, for these 20 random trials, we found a strong effect of the most recent trial, which summed linearly with a longer term impact of the transition structure in the series that preceded the random trials.

*t*(20) = 4.20,

*p*< 0.001,

*d*= 0.85. When quantified in information theoretic terms, a Mutual Information analysis (see Methods) revealed that AFO carried more information about the experimental condition in the last 50 trials (0.0527 ± 0.0063 bits) than in the first 50 trials (0.033 ± 0.0029 bits),

*t*(20) = 4.11,

*p*< 0.001,

*d*= 1.10 (Figure 4C).

*t*(20) = 5.44,

*p*< 0.001,

*d*= 0.48, and alternations were faster in pret30 than pret70,

*t*(20) = 5.03,

*p*< 0.001,

*d*= 0.72.

*μ*) (see Appendix). We found a robust signature of statistical learning in SL, because within each condition, values of the threshold parameter (ϑ) were lower for the more frequent type of saccade. Specifically, for ϑ, a 2 (Condition: pret30, pret70) × 2 (Current Trial: alternate, return) ANOVA revealed a significant two-way interaction,

*F*(1, 20) = 14.37,

*p*< 0.001. In pret30, thresholds were lower for alternate saccades than returns (difference = 0.10 ± 0.048,

*t*(20) = 2.13,

*p*= 0.025,

*d*= 0.90). And conversely, in pret70, thresholds were greater for alternate saccades than returns (difference = 0.061 ± 0.031,

*t*(20) = 1.98,

*p*= 0.041,

*d*= 0.78). For the accumulation rate parameter (

*μ*), a similar ANOVA identified only a main effect of current trial (return vs. alternation),

*F*(1, 20) = 7.79,

*p*= 0.0066, indicating more rapid accumulation for alternate saccades (as in Kim, Gabir, & Gold, 2017). We used regression models to determine the impact of recent transitions on SL. Because return and alternation trials reverse their status as high- versus low-probability events in pret70 and pret30, we fit separate regression models for latencies of return and alternate saccades. We found mixed, and modest signatures for the impact of recent transitions on SL. Alternation saccades in pret30 were not impacted by any of the prior six transitions. A similar null finding held for return saccades in pret70. For return saccades in pret30, impact was limited to the immediately preceding transition: return saccades were faster when preceded by a return,

*β*

_{1}= –7.46 ± 1.84 ms,

*t*(20) = 4.04,

*p*= 0.0019,

*d*= 0.88. For alternate saccades in pret70, the coefficients from lag-1 to lag-4 were significantly positive indicating that alternation saccades were slowed down by a return saccade in any of the four prior transitions.

^{3}

*max lag*). For pret30 the average max lag value as estimated from the AFO data was 1.2 ± 0.1 transitions versus 0.14 ± 0.08 transitions as estimated from the SL analysis (return trials). This indicates that for pret30, AFOs reflected a larger integration window,

*t*(20) = 6.49,

*d*= 2.13,

*p*< 0.001. For pret70 the mean max lag as estimated from AFO was 2.0 ± 0.3 transitions versus 1.5 ± 0.2 transitions as estimated from SL (alternate trials), and the difference was not statistically significant (

*p*> 0.1). We conclude that saccade latencies showed sensitivity to recent transition structure, but only for less likely events, and in any case were never associated with larger temporal integration windows as compared to AFO.

*Z*-transformed Pearson's

*R*= −0.046 ± 0.021,

*t*(20) = 2.10,

*p*= 0.048,

*d*= 0.46), and a positive correlation for alternate saccades (across participants, mean

*Z*-transformed Pearson's

*R*= 0.125 ± 0.023,

*t*(20) = 5.20,

*p*< 0.001,

*d*= 1.13). For pret30, the AFO/SL correlation was negative for return saccades (across participants, mean

*Z*-transformed Pearson's

*R*= –0.060 ± 0.022,

*t*(20) = 2.69,

*p*= 0.014,

*d*= 0.59), and no significant correlation was found for alternate saccades. Overall, these findings show that AFOs prior to a saccade contains information about saccade latencies in a manner consistent with anticipatory predictions.

^{4}we found that AFOs conveyed around twice as much information about the statistical process compared to SL: 0.0527 ± 0.0063 bits for AFO versus 0.0245 ± 0.0050 bits for SL,

*t*(20) = 4.34,

*p*< 0.001,

*d*= 1.09. We also found that AFO provided an information gain of 85% with respect to

*I*(

*pret*;

*SL*|

*S*) = 0.0285 ± 0.0056 bits,

*t*(20) = 2.79,

*p*= 0.011,

*d*= 0.90. Since AFOs were more informative than SLs, and preceded them temporally, we could determine whether AFOs accounted for some of the information that SLs carried about the statistical process (pret70, pret30). In that case, there would be negative synergy (see Methods) between AFO and SL. Evaluating this quantity, we found

*Syn*(

*AFO*,

*SL*) = –0.0135 ± 0.0039 bits,

*t*(20) = 3.44,

*p*= 0.0026,

*d*= 0.75, which was about 55% of

*I*(

*pret*;

*SL*). That said, saccade latencies did carry some independent information about the statistical process: the quantity

*I*(

*pret*;

*SL*|(

*S*&

*AFO*)), that is the information carried by SL, conditioned on the joint occurrence of AFO and S was 0.02735 ± 0.0064 bits, was significantly greater than zero,

*t*(20) = 4.25,

*p*< 0.001,

*d*= 0.93. These results suggest that some redundancy notwithstanding, SL and AFO do convey substantially different information about the target location statistic.

*x*-direction), within a trial. We begin the description relative to the time point at which participants saccaded, from the target stimulus in the periphery back to the center of the display, prior to the start of a new trial. This analysis is time-locked to

*saccade landing*in the vicinity of the fixation symbol, which tended to occur approximately 10 ms in advance of presentation of fixation symbol. Figure 6 presents the timelines of mean gaze location relative to landing position, continuing temporally through the presentation of the fixation symbol and the subsequent blank screen, in 10 ms time bins (negative

*y*values indicate left screen side; positive values indicate the right side).

*t*= 0 in the

*x*axis) was on the same screen side as the prior target. This was followed by an adjustment toward the screen center during the next ∼ 200 ms. As we detail as follows, these adjustments reflected both drifts and small corrective saccadic movements during the presentation of the fixation symbol and the subsequent blank screen. After this, gaze trajectories further diverged based on experimental condition; in pret70 (lighter shading in Figure 6), gaze remained on the side of the prior target (plateauing during the presentation of the fixation symbol), whereas in pret30, the gaze continued a trajectory toward the alternate side (darker shading). For all time bins we found a significant difference between the mean gaze location in the two conditions (

*p*< 0.01, Bonferroni corrected). Importantly, however, as expressed by the Cohen's effect size

*d*, (Figure 6, red lines) the difference in gaze position between pret70 and pret30 demonstrated a continuous increase during the fixation symbol presentation ∼ 400 ms and during the blank screen ∼ 400–560 ms, Spearman's

*R*= 0.99,

*p*< 10

^{–6}). In the Appendix we present density plots of group-level gaze locations in pret70 and pret30, in trials following a target on the left or right screen side (Figure A1). These demonstrate the tight clustering of gaze locations at screen center during the window where AFO was quantified, as well as the offsets induced by transition structure.

*R*= 1.1 ± 0.3, significantly positive,

*t*(20) = 17.73,

*p*< 0.001,

*d*= 3.87). On average, SIs occurred during presentation of the fixation symbol (233 ± 6 ms after landing). Although the magnitude of SIs was similar for pret70 and pret30 (∼2° in both conditions), they occurred more frequently in pret30 (SI frequency for pret30:

*M*= 1.374 ± 0.068 Hz; for pret70:

*M*= 1.305 ± 0.075 Hz;

*t*(20) = 2.76,

*p*= 0.012,

*d*= 0.22). Note that targets were presented at a rate of 0.93 Hz.

^{5}. Although drift magnitudes were small (< 0.1°), drift patterns strongly dissociated between pret70 and pret30: in pret70 drifts were generally toward the prior target, whereas in pret30 they were toward the opposite screen side. A statistical analysis based on a 2 (Condition: pret30, pret70) × 2 (Last trial: return, alternate) ANOVA revealed a main effect of condition,

*F*(1,20) = 9.57,

*p*= 0.0027. Follow-up (nonindependent) contrasts showed that drifts were significantly negative for pret30, and (not significantly) positive for pret70 (for pret30:

*M*= −0.0223 ± 0.0054°,

*t*(20) = −4.13,

*p*< 0.001,

*d*= 0.90; for pret70:

*M*= 0.0112 ± 0.0057°,

*t*(20) = 1.96,

*p*= 0.064,

*d*= 0.43). Independently, drift values were slightly positive after return trials and slightly negative after alternate trials, resulting in a significant main effect of the last trial,

*F*(1,20) = 38.82,

*p*< 0.001.

*I*(

*pret*;

*landing*) = 0.0203 ± 0.0034 bits,

*t*(20) = 5.91,

*p*< 0.001,

*d*= 1.40;

*I*(

*pret*;

*SI*) = 0.0370 ± 0.0047 bits,

*t*(20) = 2.17,

*p*= 0.042,

*d*= 0.62;

*I*(

*pret*;

*drift*) = 0.0326 ± 0.0065 bits,

*t*(20) = 2.78,

*p*= 0.012,

*d*= 0.69). Interestingly, like AFO, drift demonstrated signatures of knowledge consolidation, since the information drift carried about the experimental condition increased from the first to the second half of trials [

*t*(20) = 3.06,

*p*= 0.0062,

*d*= 0.38].

*K*, which reflects the transformation from subjective probability to AFO, was larger for pret70 than pret30. This means that, all else being equal, the transformation from the subjective probability estimate to anticipatory behavior was associated with larger scaling effect in pret70. It remains to be determined whether this reflects different levels of confidence in the internal distributional estimations (as captured, e.g., by hyperparameters in Dirichlet distributions), or a difference in how distributional information translates into oculomotor commands. Finally, the findings for the equilibrium point

*P*

_{0}only partially confirmed expectation. Because prior work suggests that series are perceived as random when the proportion of returns is around 30%, we expected

*P*

_{0}to be in that range for both conditions. Although

*P*

_{0}differed between the conditions, the distribution in pret30 was qualitatively larger (encompassing almost the entire [0,1] interval), and more work is needed to determine this issue. Future work could improve the modeling of AFO dynamics by considering, for example, nonlinear transformations of subjective probabilities, Bayesian models that formally consider the parameters' distribution via hyperparameters, potential hysteresis effects, and the superimposition of stochastic resonant mechanisms.

*repetition expectation*on which the next trial should repeat the features of the prior one. We note that a core finding in the current study is supportive of an expectation account: specifically, location of gaze after return saccades was more likely to be positioned toward the direction of the prior target, in both pret70 and pret30. Notwithstanding, AFO was still substantially larger in pret70, and in both conditions AFO was independently impacted by the prior three to five transitions, speaking to longer-term learning effects beyond the most recent trial.

*all*oculomotor processes that took place between the offset of one target and the onset of the subsequent target, including both drifts and fixational saccades. The relative contribution of these oculomotor components is likely to be task dependent, as for instance longer gap periods could be conducive to more frequent saccade instabilities. We found that once a target was removed from the screen, the difference in gaze patterns and AFOs in the pret70 and pret30 developed gradually over the subsequent 560 ms (i.e., the combined period of the fixation screen and 160 ms of blank screen) culminating in the differences documented in the main analysis.

*, 44, 2675–2690, https://doi.org/10.1016/j.visres.2004.05.009.*

*Vision Research**, 16 (8), 437–443, https://doi.org/10.1016/j.tics.2012.06.010.*

*Trends in cognitive sciences**, 34 (38), 12701–12715, https://doi.org/10.1523/JNEUROSCI.0229-14.2014.*

*Journal of Neuroscience**, 103 (2), 449–454, https://doi.org/10.1073/pnas.0507062103.*

*Proceedings of the National Academy of Sciences, USA**, 68 (3), 309–326, https://doi.org/10.1016/j.bandc.2008.08.020.*

*Brain and Cognition**, 18 (18), 7519–7534.*

*Journal of Neuroscience**, 299 (5603), 81–86, https://doi.org/10.1126/science.1077395.*

*Science**, 35 (7), 1011–1023, https://doi.org/10.1111/j.1460-9568.2011.07920.*

*European Journal of Neuroscience**, 10 (4), 433–436. PMID:9176952.*

*Spatial Vision**, 21 (9), 1247–1260, https://doi.org/10.1016/j.neunet.2008.08.007.*

*Neural Networks**, 57 (3), 153–178, https://doi.org/10.1016/j.cogpsych.2007.12.002.*

*Cognitive Psychology**, 372 (1718), https://doi.org/10.1098/rstb.2016.0542.*

*Philosophical Transactions of the Royal Society B**, 377 (6544), 59–62, https://doi.org/10.1038/377059a0.*

*Nature**, 12 (6): 31, 1–16, https://doi.org/10.1167/12.6.31. [PubMed] [Article]*

*Journal of Vision**, 76 (5), 2841–2852, https://doi.org/10.1152/jn.1996.76.5.2841.*

*Journal of Neurophysiology**, 21 (4), 761–773, https://doi.org/10.1016/S0896-6273(00)80593-0.*

*Neuron**. Hoboken, NJ: John Wiley & Sons, Inc.*

*Elements of information theory**, 8 (2), 196–209, https://doi.org/10.2478/v10053-008-0115-z.*

*Advances in Cognitive Psychology**, 18 (11): 14, 1–18, https://doi.org/10.1167/18.11.14. [PubMed] [Article]*

*Journal of Vision**, 10, https://doi.org/10.3389/fnsys.2016.00034.*

*Frontiers in Systems Neuroscience**, 30 (9), 3210–3219, https://doi.org/10.1523/JNEUROSCI.4458-09.2010.*

*Journal of Neuroscience**, 72 (5), 2532–2537, https://doi.org/10.1152/jn.1994.72.5.2532.*

*Journal of Neurophysiology**, 5, 1001, https://doi.org/10.3389/fpsyg.2014.01001.*

*Frontiers in Psychology**, 5, 1247, https://doi.org/10.3389/fpsyg.2014.01247.*

*Frontiers in Psychology**, 43 (9), 1035–1045, https://doi.org/10.1016/S0042-6989(03)00084-1.*

*Vision research**, 104 (2), 301.*

*Psychological Review**, 107 (2), 929–934, https://doi.org/10.1073/pnas.0906845107.*

*Proceedings of the National Academy of Sciences, USA**, 57 (1), 191–195.*

*Experimental Brain Research**, 13 (7), 293–301, https://doi.org/10.1016/j.tics.2009.04.005.*

*Trends in Cognitive Sciences**, 28 (5), 1039, https://doi.org/10.1037//0096-1523.28.5.1039.*

*Journal of Experimental Psychology: Human Perception and Performance**, 32 (31), 10627–10636, https://doi.org/10.1523/JNEUROSCI.0696-12.2012.*

*Journal of Neuroscience**, 11 (1), 23–63, https://doi.org/10.1016/S0364-0213(87)80025-3.*

*Cognitive Science**, 103 (4), 1988–2001, https://doi.org/10.1152/jn.00771.2009.*

*Journal of Neurophysiology**, 42 (22), 2533–2545, https://doi.org/10.1016/S0042-6989(02)00263-8.*

*Vision Research**, 28 (32), 8124–8137, https://doi.org/10.1523/JNEUROSCI.1317-08.2008.*

*Journal of Neuroscience**, 36 (10), 3007–3015, https://doi.org/10.1523/JNEUROSCI.3245-15.2016.*

*Journal of Neuroscience**, 5, 37, https://doi.org/10.3389/fnhum.2011.00037.*

*Frontiers in Human Neuroscience**, 6, https://doi.org/10.3389/fpsyg.2015.00012.*

*Frontiers in Psychology**, 86 (5), 2543–2558, https://doi.org/10.1152/jn.2001.86.5.2543.*

*Journal of Neurophysiology**, 32 (1), 31–36, https://doi.org/10.1212/WNL.32.1.31.*

*Neurology**, 7 (1), 56, https://doi.org/10.1038/nn1169.*

*Nature Neuroscience**, 40 (3), 1161, https://doi.org/10.1037/a0035961.*

*Journal of Experimental Psychology: Human Perception and Performance**, 37 (13), 3632–3645, https://doi.org/10.1523/JNEUROSCI.3078-16.2017.*

*Journal of Neuroscience**(pp. 46–65). New York, NY: Springer.*

*Eye movements and Visual Cognition**, 201705652, https://doi.org/10.1073/pnas.1705652114.*

*Proceedings of the National Academy of Sciences, USA**, 29 (9), 1049–1057, https://doi.org/10.1016/0042-6989(89)90052-7.*

*Vision Research**, 372 (1718), 20160205, https://doi.org/10.1098/rstb.2016.0205.*

*Philosophical Transactions of the Royal Society B**, 36, 165–182, https://doi.org/10.1146/annurev-neuro-062012-170249.*

*Annual Review of Neuroscience**, 384 (6604), 74, https://doi.org/10.1038/384074a0.*

*Nature**, 418 (6896), 413–417, https://doi.org/10.1038/nature00892.*

*Nature**, 10 (1), 81, https://doi.org/10.1186/1471-2202-10-81.*

*BMC Neuroscience**, 37 (22), 5419–5428, https://doi.org/10.1523/JNEUROSCI.3683-16.2017.*

*Journal of Neuroscience**, 99, 64–80, https://doi.org/10.1016/j.neuropsychologia.2017.02.023.*

*Neuropsychologia**, 36 (9), 1341–1348, https://doi.org/10.1016/0042-6989(95)00218-9.*

*Vision Research**, 11 (3), 210–216, https://doi.org/10.1006/nimg.2000.0539.*

*Neuroimage**, 64, 229–251, https://doi.org/10.1016/j.neubiorev.2016.02.018.*

*Neuroscience & Biobehavioral Reviews**, 42 (1), 188–204, https://doi.org/10.3758/BRM.42.1.188.*

*Behavior Research Methods**, 110 (38), E3660–E3669, https://doi.org/10.1073/pnas.1305373110.*

*Proceedings of the National Academy of Sciences, USA**, 98 (3), 1064–1072, https://doi.org/10.1152/jn.00559.2007.*

*Journal of Neurophysiology**Computation in neural systems*, 7 (1), 87–107.

*, 32 (1), 3–25, https://doi.org/10.1080/00335558008248231.*

*Quarterly Journal of Experimental Psychology**, 15 (4), 673–685.*

*Journal of Experimental Psychology: Human Perception and Performance**(pp. 64–99). New York, NY: Appleton Century Crofts.*

*Classical conditioning II: current research and theory**, 38 (4), 195–206, https://doi.org/10.1016/j.tins.2015.01.005.*

*Trends in Neurosciences**, 17 (13): 13, 1–16, https://doi.org/10.1167/17.13.13. [PubMed] [Article]*

*Journal of Vision**, 23 (37), 11539–11553.*

*Journal of Neuroscience**, 10 (1), 38–45, https://doi.org/10.1016/j.tics.2005.11.008.*

*Trends in Cognitive Sciences**, 25 (1), 83–98, https://doi.org/10.1016/0042-6989(85)90083-5.*

*Vision Research**, 69, 9–21, https://doi.org/10.1016/j.neuropsychologia.2015.01.024.*

*Neuropsychologia**, 121 (1), 92–98, https://doi.org/10.1007/s002210050.*

*Experimental Brain Research**, 76, 31–42, https://doi.org/10.1016/j.visres.2012.10.012.*

*Vision Research**, 13 (2), 256–271, https://doi.org/10.1162/089892901564306.*

*Journal of Cognitive Neuroscience**, 30 (33), 11177–11187, https://doi.org/10.1523/JNEUROSCI.0858-10.2010.*

*Journal of Neuroscience**, 70 (3), 579–589, https://doi.org/10.1080/17470218.2016.1172095.*

*The Quarterly Journal of Experimental Psychology**, 24 (6), 1436–1450, https://doi.org/10.1093/cercor/bhs418.*

*Cerebral Cortex**, 37 (47), 11424–11430, https://doi.org/10.1523/JNEUROSCI.2186-17.2017.*

*Journal of Neuroscience**, 110 (2), 522–535, https://doi.org/10.1152/jn.01096.2012.*

*Journal of Neurophysiology**, 39 (5), 1473, https://doi.org/10.1037/a0032397.*

*Journal of Experimental Psychology: Learning, Memory, and Cognition*^{1}We performed three validation and robustness analyses of Δ

*AFO*. First, we determined split-half reliability by deriving two separate Δ

*AFO*values per participant: one from odd trials and one from even trials. Split-half reliability was very robust (0.90 after correction). Second, we evaluated to what extent Δ

*AFO*depended on the specific trial inclusion criteria. We found that Δ

*AFO*was robust across a range of trial inclusion values, including trials where AFO was restricted to 1.2° from screen center (see Appendix). Third, we verified whether Δ

*AFO*was driven by transition structure or the number of returns and alternate trials in each series. We used bootstrapping to construct synthetic series from the pret70 and pret30 data, but where the number of alternation and return trials were equated (see Appendix). We found statistically significant Δ

*AFO*values in these cases.

^{2}Group level

*t*tests of Beta values against zero. For pret70: (

*β*

_{1}:

*t*(20) = 7.40,

*p*< .001,

*d*= 1.61 ;

*β*

_{2}

*t*(20) = 6.82,

*p*< 0.001,

*d*= 1.49 ;

*β*

_{3}

*t*(20) = 3.39,

*p*= 0.0088,

*d*= 0.74 ;

*β*

_{4}

*t*(20) = 3.30,

*p*= 0.010,

*d*= 0.72 ;

*β*

_{5}

*t*(20) = 2.72,

*p*= 0.039,

*d*= 0.59. for pret30:

*β*

_{1}

*t*(20) = 7.34 ,

*p*< 0.001,

*d*= 1.60;

*β*

_{2}

*t*(20) = 3.66 ,

*p*= 0.0046,

*d*= 0.80;

*β*

_{3}

*t*(20) = 4.70 ,

*p*< 0.001,

*d*= 1.03. All Bonferroni corrected within condition. We note that for some lags, a few participants

*did*show negative beta values for

*lags*>1; but there were only 18 such cases out of 147 beta values estimated.

^{3}The four Beta values were:

*β*

_{1}= 10.87 ± .69 ms,

*t*(20) = 6.64,

*p*< 0.001,

*d*= 1.45;

*β*

_{2}= 6.98 ± 1.27 ms,

*t*(20) = 5.33,

*p*< 0.001,

*d*= 1.16 ;

*β*

_{3}= 4.74 ± 1.46 ms,

*t*(20) = 3.17,

*p*= 0.012,

*d*= 0.69;

*β*

_{4}= 4.04 ± 1.39 ms,

*t*(20) = 4.41,

*p*= 0.026,

*d*= 0.96. When not partitioning the trials into alternations and returns, we found much weaker effects of recent trials on saccade latencies. There was no impact of recent trials for pret70, while for pret30, there was a lag-1 effect where a recent return produced faster saccades.

^{4}For trials 1–50,

*I*(

*pret*;

*SL*|

*S*) = 0.0137 ± 0.0043 bits; for trials 51–100:

*I*(

*pret*;

*SL*|

*S*) = 0.0285 ± 0.0056 bits,

*t*(20) = 4.42,

*p*< 0.001,

*d*= 0.66. Without conditioning SL to the kind of transition we did not observe a significant increase of information between the two halves of trials: in trials 1–50

*I*(

*pret*;

*SL*) = 0.0184 ± 0.0030 bits, in trials 51–100

*I*(

*pret*;

*SL*) = 0.0245 ± 0.0050 bits (

*p*> 0.1).

^{2}), we combined data across participants (for a total of 32,752 points). For each condition, we partitioned the fixation data based on screen-side of prior target. The figure communicates that: (a) the area with maximal density was always at center (0,0), demonstrating participants' success in maintaining fixation near the center of the fixation symbol, (b) gaze density steeply decreased in surrounding areas to 1/5 of maximum density, and (c) mean gazes (red crosses) was qualitatively shifted in the direction of the most likely next target location.

*p*< 0.001) from the first half of the series (trials 1–50) to the second half of the series (trials 51–100), indicating consolidation of learning over time, as documented in the main analysis (

*rad*= 1.2 deg,

_{AFO}*t*(20) = 8.24,

*p*< 0.001,

*d*= 1.80;

*rad*

_{AFO}= 3 deg,

*t*(20) = 10.10,

*p*< 0.001,

*d*= 1.87;

*rad*= 5 deg,

_{AFO}*t*(20) = 9.62,

*p*< 0.001,

*d*= 2.10).

*AFO*= (

*AFO*

_{pret70}) − (

*AFO*

_{pret30}) as a measure of sensitivity to global statistics. However this grand-average quantity could also reflect the different proportion of alternate and return trials in the two conditions: given that return trials induced positive AFO (in both conditions), a greater proportion of returns could bias the overall statistic even if returns had the same impact on AFO in both conditions. This concern only applies to the grand-average measure; other analyses that quantify trial-by-trial effects or partialed out the impact of last transition are not impacted. To evaluate this issue we used bootstrapping to create surrogate bootstrapped series, for each condition, so that each contained an equal number of alternate and return trials, and evaluated Δ

*AFO*in those.

*nalt*

_{70}). We then generated 100 surrogate distributions of 2 ×

*nalt*

_{70}elements with all the elements sampled from the pret70 condition:

*nalt*

_{70}elements were sampled with replacement from the return trials and

*nalt*

_{70}elements were sampled with replacement from the alternate trials. This produced 100

*bootAFO*

_{pret}_{70}distributions. Similarly we calculated the number of returns in the pret30 condition (

*nret*

_{30}) and we derived 100

*bootAFO*

_{pret}_{30}distributions with an equal number of alternate and returns. We could then derive Δ

*AFO*for these bootstrapped series as in the main analysis,

*boot*Δ

*AFO*= mean

*bootAFO*

_{pret}_{70}– mean

*bootAFO*

_{pret}_{30}. Averaging across participants we obtained mean

*boot*Δ

*AFO*= 0.27 ± 0.03° in the first 50 trials and mean

*boot*Δ

*AFO*= 0.39 ± 0.04° in the second half of trials. These values were significantly different with

*t*(20) = 2.57,

*p*= 0.018,

*d*= 0.53. This analysis demonstrates that it is the distribution of AFO values in alternate and return trials that drives Δ

*AFO*rather than the proportions of the two types of trials.

*n*is the number of points of the validating series and

*k*is the number of free parameters;

*SSE*is the sum of squared fit errors and

*SST*is the sum of squared deviation from the mean of the series to predict. The reported variance reduction per participant was the mean of the adjusted coefficients of determination calculated for each of the 10 validations. To determine whether the individuals' variance reductions were significantly greater than would be expected by chance, we constructed synthetic series of alternations and returns that predict, through the estimated RW model parameters, the left-out AFO data. This was done by permuting the sequence of screen side transitions in the left-out series (1,000 times). The participant's mean variance reduction was than ranked in relation to the mean distribution of the permuted variance reduction.

*I*(

*pret*;

*SL*|

*S*), where

*S*just identifies whether a trial is an alternation or return (Equation 5):

**R**= (AFO, SL, S). For each participant, we calculated

*I*(

*pret*;

**R**) implementing bias corrections due to limiting sampling (Panzeri & Treves, 2006), by using the Information Toolbox described in Magri, Whittingstall, Singh, Logothetis, and Panzeri (2009), where relevant technical details can be found. To calculate the Information carried by the joint response of two variables (i.e., AFO and SL) we randomized the trial labels of the third variable (i.e., S) obtaining in this way Rperm; we than calculated

*I*(

*pret*; Rperm). In this example,

*I*(

*pret*;

*AFO*&

*SL*) will be the mean of one hundreds such permutations. The same procedure was applied for one-dimensional responses, with the other two variables randomized independently. We also randomized independently all the three variables to obtain an estimate of residual bias, and we removed this quantity from all the calculated values.

*I*(

*pret*;

*SL*|

*S*) =

*I*(

*pret*;

*SL*&

*S*) −

*I*(

*pret*;

*S*). Information values were calculated by using the direct method, that is, by discretizing the AFO and SL responses using six equi-populated bins (variable S was already binary). The number of bins was the maximum that allowed having at least four trials for joint response, even when considering half of each series (see Panzeri et al., 2007). But unless explicitly stated, we reported the values calculated in the second half of trials for series.

*r*) is considered to be normally distributed (mean rate

*μ*and standard deviation

*σ*) and these are the only independent parameters of the model. Nevertheless as implemented in prior studies, it is useful to explicitly derive a threshold parameter (

*θ*), since it may reflect the a priori state before the appearance of the target (Noorani & Carpenter, 2016). We applied this model to our SL data, first dividing trials according to last transition, where

*r*

*= 𝒩 (*

_{j}*μ*,

*σ*):

*θ*and

*μ*by minimizing the likelihood function,

*ϕ*is the standard normal probability density function (following Kim et al., 2017). Each parameter was then normalized by its mean across different conditions. In five participants we excluded a small subset (about 5%) of trials whose values on the recinormal plot lay on a line with a smaller slope respect the other points suggestive of express saccade dynamics (see Carpenter & Williams, 1995).