Open Access
Article  |   February 2019
Predictions as a window into learning: Anticipatory fixation offsets carry more information about environmental statistics than reactive stimulus-responses
Author Affiliations
  • Giuseppe Notaro
    Center for Mind/Brain Sciences (CIMeC), The University of Trento, Trento, Italy
    giuseppe.notaro@unitn.it
  • Wieske van Zoest
    Center for Mind/Brain Sciences (CIMeC), The University of Trento, Trento, Italy
  • Magda Altman
    Center for Mind/Brain Sciences (CIMeC), The University of Trento, Trento, Italy
    Basque Center on Cognition, Brain and Language (BCBL), San Sebastián, Spain
  • David Melcher
    Center for Mind/Brain Sciences (CIMeC), The University of Trento, Trento, Italy
  • Uri Hasson
    Center for Mind/Brain Sciences (CIMeC), The University of Trento, Trento, Italy
    Center for Practical Wisdom, The University of Chicago, Chicago, USA
    http://www.hasson.org/
Journal of Vision February 2019, Vol.19, 8. doi:10.1167/19.2.8
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Giuseppe Notaro, Wieske van Zoest, Magda Altman, David Melcher, Uri Hasson; Predictions as a window into learning: Anticipatory fixation offsets carry more information about environmental statistics than reactive stimulus-responses. Journal of Vision 2019;19(2):8. doi: 10.1167/19.2.8.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

A core question underlying neurobiological and computational models of behavior is how individuals learn environmental statistics and use them to make predictions. Most investigations of this issue have relied on reactive paradigms, in which inferences about predictive processes are derived by modeling responses to stimuli that vary in likelihood. Here we deployed a novel anticipatory oculomotor metric to determine how input statistics impact anticipatory behavior that is decoupled from target-driven-response. We implemented transition constraints between target locations, so that the probability of a target being presented on the same side as the previous trial was 70% in one condition (pret70) and 30% in the other (pret30). Rather than focus on responses to targets, we studied subtle endogenous anticipatory fixation offsets (AFOs) measured while participants fixated the screen center, awaiting a target. These AFOs were small (<0.4° from center on average), but strongly tracked global-level statistics. Speaking to learning dynamics, trial-by-trial fluctuations in AFO were well-described by a learning model, which identified a lower learning rate in pret70 than pret30, corroborating prior suggestions that pret70 is subjectively treated as more regular. Most importantly, direct comparisons with saccade latencies revealed that AFOs: (a) reflected similar temporal integration windows, (b) carried more information about the statistical context than did saccade latencies, and (c) accounted for most of the information that saccade latencies also contained about inputs statistics. Our work demonstrates how strictly predictive processes reflect learning dynamics, and presents a new direction for studying learning and prediction.

Introduction
Humans are skilled at learning the temporally unfolding statistical regularities of their environment. This capacity is thought to minimize potential surprise (e.g., Friston, 2009) by predicting future events. There is growing and converging evidence that predictions take place: neurobiologically, statistically regular inputs produce anticipatory activity in brain systems involved in sensory processing (e.g., Kok, Mostert, & de Lange, 2017) and memory (e.g., Turk-Browne, Scholl, Johnson, & Chun, 2010), and such inputs also help produce anticipatory actions (e.g., Damasse, Perrinet, Madelain, & Montagnini, 2018; Santos & Kowler, 2017; Watamaniuk, Bal, & Heinen, 2017). To date, advances in understanding how humans learn and adapt to environmental statistics have been based on studies of behavioral or neurobiological responses to stimuli with different statistical features. Our departure point is that although statistical learning has been shown to optimize perception and behavior, previous studies looking at the relationship between stimulus and responses may nonetheless provide only a partial view of statistical learning. This is because the relation between a stimulus and responses is determined not only by prior knowledge, but also by low-level perception, accumulation of evidence, surprise, decisions, and response initiation (see Bar et al., 2006; Grossberg, 1987; O'Reilly et al., 2013; Vossel et al., 2014). Recent work further suggests that computations related to the initiation of responses after stimulus presentation are independent of mechanisms that determine response preparation prior to stimulus appearance (Haith, Pakpoor, & Krakauer, 2016). This constrains the use of reaction times in the study of preparatory processes (see Haith et al., 2016, for a discussion). For these reasons, stimulus response metrics constitute an important, but only indirect measure of what people know or expect. More information can potentially be gleaned by understanding the state of the system prior to arrival of the stimulus, in relation to the external environmental statistics. 
To this end, we focused here on statistical learning as manifested in anticipatory predictive behavior, independent of stimulus-related response. At the most general level, by differentiating anticipatory behaviors from (reactive) stimulus-driven-responses we aimed to quantify, compare, and relate learning signatures expressed in anticipatory and reactive behaviors. We had three main specific aims. First, given the initial application of this approach, we intended to document what aspects of learning are manifested in anticipatory behavior. Of particular interest here were temporal integration constants (learning rates) and measurable signatures of knowledge consolidation over longer temporal epochs. Second, we evaluated whether anticipatory and reactive behaviors convey different sorts (and amounts) of information about environmental statistics. Third, using trial-level correlation and joint distribution analyses we examined whether anticipatory behaviors can account for learning signatures manifested via (reactive) stimulus responses. 
To address these aims, we used a statistical learning paradigm (as in, e.g., Brodersen et al., 2008; Kim, Kabir, & Gold, 2017) where, over multiple series of 100 trials each, participants saccaded to targets presented to the left or right of fixation. These series of left/right target locations were determined by a first-order Markov process, so that the probability of a target being presented on the same screen side as the previous target was 70% in one condition (pret70) and 30% in the other (pret30). We used these two conditions because although they are formally equally predictable, they are very likely associated with different learning trajectories. A well-documented finding (e.g., Falk & Konold, 1997) is that pret70 is typically perceived as highly regular, whereas pret30 is perceived as random. This provides an opportunity to apply a formal learning model to anticipatory behavior, to determine whether pret70 and pret30 are associated with different learning trajectories. 
Departing from prior studies, our main dependent variable of interest was not the latencies of saccades to targets (reaction times). Instead, we studied subtle anticipatory fixation offsets (AFOs) that we measured with an eye tracker while participants fixated the center of the screen, prior to target appearance. These offsets from center were small (around 0.4° around center on average), and were endogenously driven since no cues had yet been presented. However, AFOs were strongly influenced by statistical features of the input in a way that allowed quantifying, (a) how learning impacted anticipatory behavior, and (b) how trial-by-trial anticipatory behaviors related to latencies for the subsequent saccade. 
We found that AFOs tracked input statistics, as seen in a bias toward the return side in pret70, and a bias toward the alternate side in pret30. A Rescorla-Wagner model suggested that AFOs reflect a trial-level estimate of input statistics. The differences in AFO between pret70 and pret30 were larger when computed from later trials in each series than from earlier trials, reflecting knowledge buildup. Not surprisingly, saccade latencies also reflected learning. However, formal analyses indicated that AFO distributions carried two times more information about statistical context than did saccade latencies, and that a considerable portion (∼50%) of the information that saccade latencies carried about the statistical context could be accounted for when considering the AFOs that preceded each saccade. 
Methods
Participants
Twenty-one volunteers participated in the study. (Mean age = 23.8 ± 0.9; SEM is the measure of spread throughout unless noted otherwise.) They were recruited from the local student population and reimbursed 20 Euro for their time. The Institutional Ethical Review Board approved the study. The sample size was predetermined based on a pilot study with a similar design but that used images of real-life objects rather than abstract shapes (N = 22). 
Design
Each participant observed ten series of 100 trials each, per condition. Each trial consisted of a fixation symbol appearing at center, followed by a target that appeared to the right or left of center (Figure 1). The design consisted of one factor with two levels—high or low probability of return to the screen side of the last target. Specifically, in one condition, the probability of returning to the same side (probability of return) was 70% and in the other condition it was 30% (pret70, pret30, respectively). There were 10 series in each condition, for a total of 1,000 trials per participant per condition. Transition probabilities were fixed (stationary) within each series. Although the transition probabilities were experimentally manipulated, the proportion of presentations on the left and right screen sides were identical and set at 50% in both conditions. Thus, behavioral effects could only be attributed to differences in transition structure. To compare learning indices for the first and second half of each series, we constructed the series so that the intended transition constraints and screen-side frequencies were exactly maintained across trials 1–50 and trials 51–100. 
Figure 1
 
Trial structure and fixation locations in Experiment 1. (A) Trial timing. Anticipatory fixation offset (AFO) was defined at the mean gaze location during the last 10 ms of the blank screen that followed the fixation symbol and that preceded the target. AFO was coded as positive if to the side of the last target, negative otherwise. (B) Spatial features of fixation and targets. Targets were positioned on an invisible arc that extended 10° above and below the fixation symbol, at 12° eccentricity. The exact location on the arc was set randomly on each trial. The fixation symbol consisted of an inner gray circle (radius = 0.4°) within an outer black circle (radius = 1.2°).
Figure 1
 
Trial structure and fixation locations in Experiment 1. (A) Trial timing. Anticipatory fixation offset (AFO) was defined at the mean gaze location during the last 10 ms of the blank screen that followed the fixation symbol and that preceded the target. AFO was coded as positive if to the side of the last target, negative otherwise. (B) Spatial features of fixation and targets. Targets were positioned on an invisible arc that extended 10° above and below the fixation symbol, at 12° eccentricity. The exact location on the arc was set randomly on each trial. The fixation symbol consisted of an inner gray circle (radius = 0.4°) within an outer black circle (radius = 1.2°).
To each 100-trial series we appended 20 trials whose screen side was randomly determined. These were included to evaluate the impact of the prior series' transition structure on responses to random trials (a transfer measure) and to aid clearing memory of the current stochastic process before beginning the next series. 
Procedure
Eye-tracking
Stimuli were displayed on a CRT display (Diamond Pro 2070SB, Mitsubishi Electric Corporation, Tokyo, Japan) with a spatial resolution of 1,280×1,024 pixels, and a 75 Hz refresh rate. We generated the experimental software using MATLAB (MathWorks, Natick, MA) and the Psychophysics Toolbox extensions (Brainard, 1997). Participants' eyes were set at the same height as the screen center and at a distance of 58 cm. Eye position signals were recorded by a separate computer with a tower-mounted, video-based eye tracker (EyeLink 1000 Tower mount, SR Research Ltd, Mississauga, Canada) and were sampled monocularly at 1000 Hz. We performed a nine-point calibration procedure during which the eye tracker calculated a mapping between sensor and display positions. To increase the accuracy of this mapping we performed calibration only within a display region that was slightly larger than the area used in the study (960×718 pixels around the center). We performed calibration after each break. Before beginning the experiment we identified each participant's dominant eye using the Dolman method. 
Trial structure
Participants were instructed to saccade to a target presented after the fixation symbol disappeared. The timeline of each trial (see Figure 1A) was as follows: a fixation symbol appeared for 400 ms; a postfixation blank screen for 160 ms; the target for 360 ms; and a posttarget blank screen for 160 ms. The fixation symbol consisted of an inner gray circle with a radius of 0.4° (same color as background) within an outer black circle with a radius of 1.2°. We chose this fixation symbol as it has been shown to allow some variance in eye movement during fixation (Thaler, Schutz, Goodale, Gegenfurtner, 2013). Targets consisted of black circles with a 1° radius that appeared to the left or right of the screen center, at 12° eccentricity (Figure 1B). The target centers were located on a virtual (imaginary) arc extending 10° vertically above and below the horizontal midline. On any given trial, the target's specific position on the arc was set randomly. Participants could therefore potentially anticipate the screen side of the next target but not its exact location. The specific instructions were to saccade rapidly to the target and fixation symbol when they appeared. 
Instructions and training
To maintain participants' alertness, we included catch trials that consisted of target symbols with a white line through them. These appeared every 16–20 trials following a uniform distribution. Participants were told that catch trials would appear infrequently and that they were to press the mouse button when they saw them. Following each series, participants were presented with performance indicators for that series, which included the number of targets and fixation symbols saccaded to within an allowed spatial and temporal tolerance (see the following), the number of correct catch trials and eye blinks, as well as their overall mean performance to that point. This was done to motivate participants to perform well and to provide a buffer between the stochastic contexts of the just completed and the following series. 
Before beginning the proper study, participants underwent training where they viewed series of 20 trials each, until they were comfortable with the procedure (typically within two to seven series). The training session differed in some respects from the main experiment. In the training series there were no transition constraints (probability of return = 50%) so that participants could not develop experience with the transition structure used in the study proper. In addition, during training (but not the study proper) we provided real-time positive auditory feedback when participants' gaze hit the target or the fixation circle within 200 ms from appearance and with a maximum deviation of 1° from their borders, and whenever participants correctly responded to catch trials with a mouse click. We provided negative auditory feedback whenever participants failed to hit the target, failed to respond to catch trials, or blinked. Although participants were instructed and trained to arrive at fixation within 200 ms from appearance, we did not penalize participants for saccading to the screen center prior to stimulus appearance. Here and in the rest of the text, the term “anticipatory saccades” refers to saccades to the lateralized targets (which were of main interest) rather than saccades to central fixation. A summary of the positive and negative scores was presented at the end of each training session. 
Analysis
Saccade classification
During the study, participants made large saccades towards the targets, departing from screen center (fixation). They also made smaller saccades, typically during fixation. To detect saccades across this range, we applied a method (Nyström & Holmqvist, 2010) for detecting saccades by adaptively determining speed thresholds relative to saccade onset, offset, and peak. We defined saccade onset (offset) time as the time of the first local minimum with speed below an adaptive threshold, preceding (following) a saccade peak. When a saccade was followed by an assessment oscillation (glissade), the time of saccade offset was considered as the end of the glissade. 
AFO definition and trial selection
We defined AFO as the mean gaze location measured during the final 10 ms of the postfixation blank screen, prior to target presentation. For clarity we considered only the component in the x direction, coded as positive if to the side of the last target and negative otherwise. Because we were interested in comparing predictive to reactive behavior, we compared AFOs to saccade latencies (SL) but excluded anticipatory saccades (∼2% of total trials) and considered only trials where saccade latencies exceeded 80 ms (Fischer & Ramsperger, 1984). We excluded anticipatory saccades (those faster than 80 ms) because our main interest when comparing AFOs and SLs was to determine whether AFO provides more information about the statistical context than stimulus-guided saccades. We then further restricted valid trials to those in which participants made saccades to both the fixation symbol and subsequent target, within a tolerance of 2° from their edge on the horizontal axis. Valid trials accounted for 87% ± 2% of the data. Catch trials and the trials immediately following them were also excluded, resulting in 78% ± 2% of all trials. Following the definition of AFO, we further restricted the set of valid trials to ones that were preceded by a correct saccade to the prior target. In total, following this procedure, we retained 69% ± 3% of trials. Finally we implemented an AFO eccentricity constraint, which limited the analysis to those where AFO was within a radius of 3° from center (radAFO = 3°). This was done to reduce the possibility that AFO statistics would be driven by a few outliers. This produced a mean percentage of 65% ± 2% of total trials (i.e., an average of 650 trials per condition per participant). We verified (see Appendix) that the choice of radAFO did not alter the main findings for the AFO analysis. 
Impact of recent trials
To determine the impact of previous trials on current oculomotor behavior (as captured by AFO) we defined two types of trials: returns that were trials where the screen side of the last-presented target was the same as the one that preceded it, and alternations where the screen-side of the last-presented target was opposite to the one preceding it. This split allowed us to determine how AFOs were impacted by whether the last transition was a return or alternation. Saccade latencies were analyzed according to the same schema, with each saccade categorized as a return or alternation saccade. 
In a separate analysis we modeled the impact of each of the last six transitions on AFO in each trial. We used a regression model in which dummy variables coded the status of each of the last six transitions as a return or alternation. This approach has been successfully used in prior work on statistical learning of transition probabilities (e.g., Bornstein & Daw, 2012). The complete regression model is presented in the expression, Display Formula\(\def\upalpha{\unicode[Times]{x3B1}}\)\(\def\upbeta{\unicode[Times]{x3B2}}\)\(\def\upgamma{\unicode[Times]{x3B3}}\)\(\def\updelta{\unicode[Times]{x3B4}}\)\(\def\upvarepsilon{\unicode[Times]{x3B5}}\)\(\def\upzeta{\unicode[Times]{x3B6}}\)\(\def\upeta{\unicode[Times]{x3B7}}\)\(\def\uptheta{\unicode[Times]{x3B8}}\)\(\def\upiota{\unicode[Times]{x3B9}}\)\(\def\upkappa{\unicode[Times]{x3BA}}\)\(\def\uplambda{\unicode[Times]{x3BB}}\)\(\def\upmu{\unicode[Times]{x3BC}}\)\(\def\upnu{\unicode[Times]{x3BD}}\)\(\def\upxi{\unicode[Times]{x3BE}}\)\(\def\upomicron{\unicode[Times]{x3BF}}\)\(\def\uppi{\unicode[Times]{x3C0}}\)\(\def\uprho{\unicode[Times]{x3C1}}\)\(\def\upsigma{\unicode[Times]{x3C3}}\)\(\def\uptau{\unicode[Times]{x3C4}}\)\(\def\upupsilon{\unicode[Times]{x3C5}}\)\(\def\upphi{\unicode[Times]{x3C6}}\)\(\def\upchi{\unicode[Times]{x3C7}}\)\(\def\uppsy{\unicode[Times]{x3C8}}\)\(\def\upomega{\unicode[Times]{x3C9}}\)\(\def\bialpha{\boldsymbol{\alpha}}\)\(\def\bibeta{\boldsymbol{\beta}}\)\(\def\bigamma{\boldsymbol{\gamma}}\)\(\def\bidelta{\boldsymbol{\delta}}\)\(\def\bivarepsilon{\boldsymbol{\varepsilon}}\)\(\def\bizeta{\boldsymbol{\zeta}}\)\(\def\bieta{\boldsymbol{\eta}}\)\(\def\bitheta{\boldsymbol{\theta}}\)\(\def\biiota{\boldsymbol{\iota}}\)\(\def\bikappa{\boldsymbol{\kappa}}\)\(\def\bilambda{\boldsymbol{\lambda}}\)\(\def\bimu{\boldsymbol{\mu}}\)\(\def\binu{\boldsymbol{\nu}}\)\(\def\bixi{\boldsymbol{\xi}}\)\(\def\biomicron{\boldsymbol{\micron}}\)\(\def\bipi{\boldsymbol{\pi}}\)\(\def\birho{\boldsymbol{\rho}}\)\(\def\bisigma{\boldsymbol{\sigma}}\)\(\def\bitau{\boldsymbol{\tau}}\)\(\def\biupsilon{\boldsymbol{\upsilon}}\)\(\def\biphi{\boldsymbol{\phi}}\)\(\def\bichi{\boldsymbol{\chi}}\)\(\def\bipsy{\boldsymbol{\psy}}\)\(\def\biomega{\boldsymbol{\omega}}\)\(\def\bupalpha{\unicode[Times]{x1D6C2}}\)\(\def\bupbeta{\unicode[Times]{x1D6C3}}\)\(\def\bupgamma{\unicode[Times]{x1D6C4}}\)\(\def\bupdelta{\unicode[Times]{x1D6C5}}\)\(\def\bupepsilon{\unicode[Times]{x1D6C6}}\)\(\def\bupvarepsilon{\unicode[Times]{x1D6DC}}\)\(\def\bupzeta{\unicode[Times]{x1D6C7}}\)\(\def\bupeta{\unicode[Times]{x1D6C8}}\)\(\def\buptheta{\unicode[Times]{x1D6C9}}\)\(\def\bupiota{\unicode[Times]{x1D6CA}}\)\(\def\bupkappa{\unicode[Times]{x1D6CB}}\)\(\def\buplambda{\unicode[Times]{x1D6CC}}\)\(\def\bupmu{\unicode[Times]{x1D6CD}}\)\(\def\bupnu{\unicode[Times]{x1D6CE}}\)\(\def\bupxi{\unicode[Times]{x1D6CF}}\)\(\def\bupomicron{\unicode[Times]{x1D6D0}}\)\(\def\buppi{\unicode[Times]{x1D6D1}}\)\(\def\buprho{\unicode[Times]{x1D6D2}}\)\(\def\bupsigma{\unicode[Times]{x1D6D4}}\)\(\def\buptau{\unicode[Times]{x1D6D5}}\)\(\def\bupupsilon{\unicode[Times]{x1D6D6}}\)\(\def\bupphi{\unicode[Times]{x1D6D7}}\)\(\def\bupchi{\unicode[Times]{x1D6D8}}\)\(\def\buppsy{\unicode[Times]{x1D6D9}}\)\(\def\bupomega{\unicode[Times]{x1D6DA}}\)\(\def\bupvartheta{\unicode[Times]{x1D6DD}}\)\(\def\bGamma{\bf{\Gamma}}\)\(\def\bDelta{\bf{\Delta}}\)\(\def\bTheta{\bf{\Theta}}\)\(\def\bLambda{\bf{\Lambda}}\)\(\def\bXi{\bf{\Xi}}\)\(\def\bPi{\bf{\Pi}}\)\(\def\bSigma{\bf{\Sigma}}\)\(\def\bUpsilon{\bf{\Upsilon}}\)\(\def\bPhi{\bf{\Phi}}\)\(\def\bPsi{\bf{\Psi}}\)\(\def\bOmega{\bf{\Omega}}\)\(\def\iGamma{\unicode[Times]{x1D6E4}}\)\(\def\iDelta{\unicode[Times]{x1D6E5}}\)\(\def\iTheta{\unicode[Times]{x1D6E9}}\)\(\def\iLambda{\unicode[Times]{x1D6EC}}\)\(\def\iXi{\unicode[Times]{x1D6EF}}\)\(\def\iPi{\unicode[Times]{x1D6F1}}\)\(\def\iSigma{\unicode[Times]{x1D6F4}}\)\(\def\iUpsilon{\unicode[Times]{x1D6F6}}\)\(\def\iPhi{\unicode[Times]{x1D6F7}}\)\(\def\iPsi{\unicode[Times]{x1D6F9}}\)\(\def\iOmega{\unicode[Times]{x1D6FA}}\)\(\def\biGamma{\unicode[Times]{x1D71E}}\)\(\def\biDelta{\unicode[Times]{x1D71F}}\)\(\def\biTheta{\unicode[Times]{x1D723}}\)\(\def\biLambda{\unicode[Times]{x1D726}}\)\(\def\biXi{\unicode[Times]{x1D729}}\)\(\def\biPi{\unicode[Times]{x1D72B}}\)\(\def\biSigma{\unicode[Times]{x1D72E}}\)\(\def\biUpsilon{\unicode[Times]{x1D730}}\)\(\def\biPhi{\unicode[Times]{x1D731}}\)\(\def\biPsi{\unicode[Times]{x1D733}}\)\(\def\biOmega{\unicode[Times]{x1D734}}\)\(AFO = \sum\nolimits_{k = 1}^6 {{\beta _k}{S_k} + c + \varepsilon } \) where S = 1 if the trial is a return, and 0 if alternation. This information was coded for each of the last k transitions (k = 1,...6). 
In this model, positive coefficients for any of the regressors β1 to β6 indicate that a return at lag k prior transitions was associated with larger AFO values. Negative coefficients indicate reduced AFO values. The intercept c is the expected AFO for six consecutive alternations and is not considered further. When analyzing AFO data, we fit these regression models to each participant, predicting the current AFO value separately for the pret70 and pret30 conditions. 
For SL, we similarly fit regressions separately for the two conditions, but constructed separate models for return and alternation saccades. This was done because (a) returns and alternations reversed their status as high- versus low-probability targets across the two conditions, and (b) return saccades are known to be strongly impacted by inhibition of return (IOR, e.g., Rafal, Calabresi, Brennan, & Sciolto, 1989), and for this reason could provide less information about the impact of recent trials. The resulting Beta coefficients estimated from the AFO or SL analyses were then analyzed on the group level to compute group level statistics. They were also analyzed on the individual level to identify the largest lag, for each participant, for which the associated Beta was statistically significant. These were used in comparing lags in the AFO and SL data. 
Estimation of learning rate from Rescorla-Wagner model applied to AFO data
To compute and compare learning rates for pret70 and pret30, we used a Rescorla-Wagner model (Rescorla & Wagner, 1972). We first applied a RW model to the data and validated it on sample data to determine its effectiveness. We then studied the parameter values. The basic model we constructed fit the AFO data according to transition probabilities estimated from a RW process, implemented as in Equation 1:  
\begin{equation}\tag{1}\left\{ {\matrix{ {{P_{ret}}(t + 1) = {P_{ret}}(t) + \alpha (1 - {P_{ret}}(t))} \hfill&{{\rm{after\ a\ return}}} \hfill \cr {{P_{ret}}(t + 1) = {P_{ret}}(t) - \alpha {P_{ret}}(t)} \hfill&{{\rm{after\ an\ alternation}}} \hfill \cr {AFO(t + 1) = K({P_{ret}}(t + 1) - {P_0})} \hfill&{} \hfill \cr } } \right.\end{equation}
This is a standard RW model, with the exception that it fits anticipatory behavior captured by AFO rather than a response to a stimulus. The third line presents the response model that maps a participant's belief about the transition distribution to the observed AFOs: it is a simple linear relationship between the internal probability and AFO. In Equation 1, α is the learning rate, K is a scaling factor transforming internal probability estimates to overt behavior and P0 is a probability equilibrium point reflecting an internal estimate of probability of return above which a participant shows a gaze bias towards the return side. α and P0 were bounded in the interval [0,1]. We fit the P0 parameter because it is known that in binary contexts, subjective points of equilibrium significantly deviate from 50%; a truly random binary series is subjectively perceived as containing too many streaks (see Falk & Konold, 1997). The reduced model where P0 was fixed at 50% offered a significantly poorer fit as evaluated by a Bayesian information criterion (BIC) criterion and is not discussed further; ΔBIC = 18 ± 5 in pret30 and ΔBIC = 16 ± 5 in pret70, both above zero with p < 0.001, bootstrap test.  
To evaluate whether AFO is a product of two learning processes with different learning rates we also fit an extended model in which probabilities were updated based on two processes with different learning rates (see Bornstein & Daw, 2012). In this model, two estimations of the transition probability are updated independently, Display Formula\(P_{ret}^{(1)}({\alpha _1})\) and Display Formula\(P_{ret}^{(2)}({\alpha _2})\) as in Equation 1, and an overall summary statistic is defined as their weighted average as in Equation 2:  
\begin{equation}\tag{2}{P_{ret}}(t) = w\,P_{ret}^{(1)}(t,{\alpha _1}) + (1 - w)\,P_{ret}^{(2)}(t,{\alpha _2})\end{equation}
Compared to the simpler model in Equation 1, this model has two additional parameters; an additional learning rate parameter and a weighting coefficient, w (see Appendix for validation procedure details). Although the RW model is heuristic in nature, it performs similarly to more complex generative models when the target statistics are stationary (Mengotti, Dombert, Fink, & Vossel, 2017).  
Information provided about transition structure by AFOs and SLs
To evaluate whether AFO and saccade latencies provided complementary or independent information about the transition structure in the series, we used a mutual information (MI) analysis. MI captures the amount of knowledge one variable provides about another, or equivalently, the uncertainty about one variable that is reduced by knowing another (Cover & Thomas, 1991). MI does not assume any particular relationship between two variables and captures all orders of correlations, as opposed to Pearson's correlation coefficient R, which quantifies the linear relationship (see Equation 3).  
\begin{equation}\tag{3}I(x;w) = H(x) - H(x|w) = \sum\limits_{x \in X} \, \sum\limits_{w \in W} \, p(x,w)\,{\rm log}\left( {{{p(x,w)} \over {p(x)p(w)}}} \right)\end{equation}
 
In Equation 3, H(x) is the entropy of the variable x (here, the experimental condition pret), and H(x|w) is the entropy of x given w (the specific known behavioral response). Because the two stochastic processes (pret70, pret30) were equally probable, the entropy related to which condition participants were observing (pret equal to 70 or 30) on any given trial was 1 bit. We used MI to quantify the degree of uncertainty about the variable pret removed by considering several oculomotor information sources and their joint distribution. First, we calculated the entropy reduction separately achieved by AFO or SL, I (pret;AFO) and I(pret;SL). Because saccade latencies on any given trial likely depend on whether the saccade was an alternate or a return (due to inhibition of return), we also partialized by this factor in the MI formulation (see Appendix). This allowed us to determine whether AFO and saccade latency were differentially informative with respect to the experimental conditions. We could also evaluate whether AFO and SL provided redundant information about the transition structure (negative synergy; Schneidman, Bialek, & Berry, 2003), in which case the MI provided by the joint distribution would be lower than the sum of the two former terms, as indicated in Equation 4:  
\begin{equation}\tag{4}Syn\left( {SL,AFO} \right) = I\left( {pret;AFO\ \&\ SL} \right) - \left( {I\left( {pret;SL} \right) + I\left( {pret;AFO} \right)} \right) \lt 0\end{equation}
 
Finally, we calculated the information about pret carried by separate oculomotor contributions to AFO, namely drift and small saccade instabilities. We calculated all these quantities per participant, which licensed statistical tests at the group level. 
Eye-movement sources underlying AFO
This analysis quantified the types of oculomotor movements that may underlie AFO. To this end we identified different types of eye movements in the period encompassing the presentation of the fixation symbol and the subsequent pretarget blank screen and evaluated their direction. We implemented the same coding as for AFO: positive/negative values for movements made towards/away from the direction of the last target. We evaluated whether AFO was driven by small involuntary saccadic movements in the range 0.1°–4.0° observed during fixation (Abadi & Gowen, 2004), as well as small drifts during fixation (Hartmann, Mast, & Fischer, 2015; Cherici, Kuang, Poletti, & Rucci, 2012). To avoid contamination of the drift measurement due to the oscillation following the saccade to fixation symbol, we quantified drift only when saccades did not occur. We quantified drift assuming a linear trend; that is, we estimated the initial and terminal eye positions of each drift period via linear regression. 
Results
AFOs: Stochastic context and impact of preceding trials
Evidence of statistical learning was seen in that AFOs (here defined as horizontal offset toward the last target) were greater in pret70 than in pret30 (Figure 2A). AFO values were ∼0.4° from center, within the spatial zone of the just-removed fixation symbol. The mean difference between the two conditions (ΔAFO) was around 0.3°, t(20) = 10.10, p < 0.001, d = 1.87.1 
Figure 2
 
The impact of statistical structure on AFO. (A) Mean AFO values were significantly greater in pret70 than pret30, and the pattern held for all participants (each participant marked via single line). (B) Partitioning AFO values by most recent transition indicates an effect of statistical structure as well as an impact of most recent transition, as AFO was greater following returns than following alternate trials. Crosses above each bar indicate significant differences from zero. Asterisks above/below bar-pairs indicate a significant difference. (C) Beta weights estimated via regression models indicate that AFO was impacted by a return in each of the last five transitions in the pret70 condition and in each of the last three transitions for pret30.
Figure 2
 
The impact of statistical structure on AFO. (A) Mean AFO values were significantly greater in pret70 than pret30, and the pattern held for all participants (each participant marked via single line). (B) Partitioning AFO values by most recent transition indicates an effect of statistical structure as well as an impact of most recent transition, as AFO was greater following returns than following alternate trials. Crosses above each bar indicate significant differences from zero. Asterisks above/below bar-pairs indicate a significant difference. (C) Beta weights estimated via regression models indicate that AFO was impacted by a return in each of the last five transitions in the pret70 condition and in each of the last three transitions for pret30.
Because we only manipulated first-order Markov properties, any effect of prior transitions would indicate AFO was sensitive to local trial history. To specifically investigate the impact of the immediately preceding transition, we partitioned the AFO data according to whether the last saccade was a return or alternation. As shown in Figure 2B, AFO was greater after return saccades in both conditions, but there was also an independent effect of statistical structure. This was confirmed by a two-way analysis of variance or ANOVA (Condition: (pret70, pret30) × Last trial: (alternation, repeat)) that produced the aforementioned main effect of Last trial, F(1, 20) = 75.10, p < 0.001, and a main effect of Condition F(1, 20) = 18.10, p < 0.001, as AFO was higher in pret70, and there was no interaction (F < 1). Regression models probed for the impact of any of the last six transitions and indicated that returns in any of the last five trials (for pret70) or the last three trials (for pret30) contributed positively to AFO, though with a decaying impact (Figure 2C).2 To summarize, AFO reflected learning of global statistics but was also impacted by the immediately preceding transition, and (more weakly) by the preceding three to five transitions. 
We applied a Rescorla-Wagner (RW) model to assess learning rates and examine the scaling parameter K that reflected the mapping from subjective internal probabilities to AFO magnitudes. The RW model was successfully validated on the single participant level. As detailed in the Methods section, in each condition we estimated the model parameters from nine of the 10 series and applied those parameters to predict trial-by-trial AFO values for the left-out series. The variance accounted for by the model for the left-out series exceeded permutation-derived chance (p < 0.05) for 19 of the 20 participants, in both conditions (see Appendix for methods). Figure 3A shows the model's predicted AFO values for a left-out AFO series, based on parameters estimated from independent data. 
Figure 3
 
Rescorla-Wagner model of AFOs. (A) AFO data from a sample series in pret70 condition (dashed line) and the matched model prediction (continuous line) that was derived from parameter values estimated from independent series. Asterisks on abscissa mark alternate (side-switch) trials. Data are concatenated to exclude missing or invalid values. (B) Distributions of model parameters in the two conditions. From the left: learning rate, scaling factor, and equilibrium point. P0. The equilibrium point significantly departed from 0.5 only in pret70. Asterisks above/below bar pairs indicate significant differences.
Figure 3
 
Rescorla-Wagner model of AFOs. (A) AFO data from a sample series in pret70 condition (dashed line) and the matched model prediction (continuous line) that was derived from parameter values estimated from independent series. Asterisks on abscissa mark alternate (side-switch) trials. Data are concatenated to exclude missing or invalid values. (B) Distributions of model parameters in the two conditions. From the left: learning rate, scaling factor, and equilibrium point. P0. The equilibrium point significantly departed from 0.5 only in pret70. Asterisks above/below bar pairs indicate significant differences.
The main findings for the RW model are presented in Figure 3B. The learning rate α was higher in pret30 (M = 0.71 ± 0.05) than in pret70 (M = 0.55 ± 0.05), t(20) = 2.44, p = 0.025, d = 0.67, indicating narrower integration windows in pret30. Because we bound the Pret parameter within the interval [0,1], the range of AFO was determined by the scaling factor K. We found that K was greater in pret70 (M = 0.87 ± 0.12) than in pret30 (M = 0.49 ± 0.09), t(20) = 3.32, p = 0.0036, d = 0.81. This indicates that subjective probabilities translated into larger behavioral signatures for pret70. Finally, the mean equilibrium point, P0, was greater in pret30 (M = 0.49 ± 0.06) than in pret70 (M = 0.32 ± 0.05), t(20) = 2.92, p = 0.0088, d = 0.69, and differed from 0.5 only for the latter, t(20) = 4.06, p = 0.0015, d = 0.96. An extended RW model that reflected a weighted combination of two independent learning rates (see Methods) did not produce a better fit than the simpler model reported here (ΔBIC not different from zero, p > 0.01). 
In summary, the RW model identified different learning rates for pret30 and pret70, and a greater scaling factor for pret70. Neither of these findings is compatible with normative accounts of learning, as pret70 and pret30 depart equally from randomness in transition constraints. However, as we detail in the Discussion, these results are completely consistent with models of the subjective perception of randomness. 
AFO reflects knowledge consolidation
By the time participants had completed a series of 100 trials, the statistical structure of the series continued to exert an impact on AFO during the 20 random trials appended to each series. This demonstrates an impact of previously consolidated knowledge. As shown in Figure 4A, within these 20 trials, AFO values were greater following pret70 series. A 2 (Condition: pret30, pret70) × 2 (Last trial: return, alternate) ANOVA applied to AFO values in these 20 trials showed a main effect of Condition, F(1, 20) = 8.99, p = 0.0036. Importantly, this effect was concomitant with an independent effect of Last trial, F(1, 20) = 41.31, p < 0.001, because AFO was larger after returns. In summary, for these 20 random trials, we found a strong effect of the most recent trial, which summed linearly with a longer term impact of the transition structure in the series that preceded the random trials. 
Figure 4
 
Long-term learning signatures in AFO. (A) AFO values in the 20 random trials (pret = 50%) appended to each experimental series. Average AFO magnitudes indicate confinement to the area of the fixation symbol (<0.4° eccentricity). There was a strong impact of the statistical structure of the series presented prior to the random trials, and independently, a strong impact of the immediately preceding trial. Crosses above each bar indicate significant differences from zero. Asterisks above/below bar pairs indicate significant difference (also in following panels). (B) ΔAFO was defined as the difference between AFO values in the pret70 and pret30 conditions. Its values significantly increased from the first half to the second half of the experimental series. (C) Similar results when quantified via Mutual Information. In all panels, measures of spread indicate variance within condition and are provided for completeness; they are not indicative of effect sizes in within-participant contrasts.
Figure 4
 
Long-term learning signatures in AFO. (A) AFO values in the 20 random trials (pret = 50%) appended to each experimental series. Average AFO magnitudes indicate confinement to the area of the fixation symbol (<0.4° eccentricity). There was a strong impact of the statistical structure of the series presented prior to the random trials, and independently, a strong impact of the immediately preceding trial. Crosses above each bar indicate significant differences from zero. Asterisks above/below bar pairs indicate significant difference (also in following panels). (B) ΔAFO was defined as the difference between AFO values in the pret70 and pret30 conditions. Its values significantly increased from the first half to the second half of the experimental series. (C) Similar results when quantified via Mutual Information. In all panels, measures of spread indicate variance within condition and are provided for completeness; they are not indicative of effect sizes in within-participant contrasts.
Gradual consolidation of statistical knowledge was seen in that differences in AFO between pret70 and pret30 (ΔAFO) increased over the 100 trials within each series (Figure 4B). Specifically, ΔAFO increased from 0.25 ± 0.03° in trials 1–50 to 0.39 ± 0.04° in trials 51–100, t(20) = 4.20, p < 0.001, d = 0.85. When quantified in information theoretic terms, a Mutual Information analysis (see Methods) revealed that AFO carried more information about the experimental condition in the last 50 trials (0.0527 ± 0.0063 bits) than in the first 50 trials (0.033 ± 0.0029 bits), t(20) = 4.11, p < 0.001, d = 1.10 (Figure 4C). 
Saccade latencies: Relation to AFO and sensitivity to input statistics
Saccade latencies are impacted by statistical structure, but more weakly than AFO
We first report an analysis of SLs because identifying signatures of learning in SL is a necessary precursor for addressing one of our main aims, which is to relate AFO to SL data on a trial-by-trial basis. An analysis of raw saccade latencies demonstrated an impact of the Markov process, because return and alternate saccades were faster in the condition in which they were more frequent (Figure 5A). Specifically, return saccades were faster in pret70 than pret30, t(20) = 5.44, p < 0.001, d = 0.48, and alternations were faster in pret30 than pret70, t(20) = 5.03, p < 0.001, d = 0.72. 
Figure 5
 
The impact of statistical structure on saccade latency. (A) Saccade latencies indicate learning of statistical structure in addition to an effect of whether a saccade is a return or alternation. Asterisks above bar pairs indicate significant difference. (B) Trial-level correlations between AFOs and saccade latency. Distributions are plotted for pret70 and pret30, partitioned according to whether the saccades were return or alternate saccades. Asterisks above bars mark significant difference from 0, which held in all conditions apart from alternate trials in pret30.
Figure 5
 
The impact of statistical structure on saccade latency. (A) Saccade latencies indicate learning of statistical structure in addition to an effect of whether a saccade is a return or alternation. Asterisks above bar pairs indicate significant difference. (B) Trial-level correlations between AFOs and saccade latency. Distributions are plotted for pret70 and pret30, partitioned according to whether the saccades were return or alternate saccades. Asterisks above bars mark significant difference from 0, which held in all conditions apart from alternate trials in pret30.
Because the analysis of raw saccade latencies does not address the impact of input statistics on evidence accumulation rate or thresholds, we fit a LATER (linear approach to threshold with ergodic rate) model (Carpenter & Williams, 1995) to SL data for return and alternation saccades in pret70 and pret30, solving for threshold (ϑ) and accumulation rate (μ) (see Appendix). We found a robust signature of statistical learning in SL, because within each condition, values of the threshold parameter (ϑ) were lower for the more frequent type of saccade. Specifically, for ϑ, a 2 (Condition: pret30, pret70) × 2 (Current Trial: alternate, return) ANOVA revealed a significant two-way interaction, F(1, 20) = 14.37, p < 0.001. In pret30, thresholds were lower for alternate saccades than returns (difference = 0.10 ± 0.048, t(20) = 2.13, p = 0.025, d = 0.90). And conversely, in pret70, thresholds were greater for alternate saccades than returns (difference = 0.061 ± 0.031, t(20) = 1.98, p = 0.041, d = 0.78). For the accumulation rate parameter (μ), a similar ANOVA identified only a main effect of current trial (return vs. alternation), F(1, 20) = 7.79, p = 0.0066, indicating more rapid accumulation for alternate saccades (as in Kim, Gabir, & Gold, 2017). We used regression models to determine the impact of recent transitions on SL. Because return and alternation trials reverse their status as high- versus low-probability events in pret70 and pret30, we fit separate regression models for latencies of return and alternate saccades. We found mixed, and modest signatures for the impact of recent transitions on SL. Alternation saccades in pret30 were not impacted by any of the prior six transitions. A similar null finding held for return saccades in pret70. For return saccades in pret30, impact was limited to the immediately preceding transition: return saccades were faster when preceded by a return, β1 = –7.46 ± 1.84 ms, t(20) = 4.04, p = 0.0019, d = 0.88. For alternate saccades in pret70, the coefficients from lag-1 to lag-4 were significantly positive indicating that alternation saccades were slowed down by a return saccade in any of the four prior transitions.3 
In summary, saccade latencies were definitely not more sensitive than AFO to the structure of recent transitions: for pret30, SL was only impacted by the immediately prior transition (as compared to three prior transitions for AFO). For pret70, SL was impacted by the prior four transitions (as compared to the five prior transitions for AFO). To directly compare the size of integration windows as determined from AFO and SL analyses, we used bootstrapping to determine, for each participant, the maximal lag (1…6) for which the regression weight was statistically significant (which we termed max lag). For pret30 the average max lag value as estimated from the AFO data was 1.2 ± 0.1 transitions versus 0.14 ± 0.08 transitions as estimated from the SL analysis (return trials). This indicates that for pret30, AFOs reflected a larger integration window, t(20) = 6.49, d = 2.13, p < 0.001. For pret70 the mean max lag as estimated from AFO was 2.0 ± 0.3 transitions versus 1.5 ± 0.2 transitions as estimated from SL (alternate trials), and the difference was not statistically significant (p > 0.1). We conclude that saccade latencies showed sensitivity to recent transition structure, but only for less likely events, and in any case were never associated with larger temporal integration windows as compared to AFO. 
AFOs predict SLs and provide more information about statistical structure (on single-trial level)
If AFO reflects prediction, then at the single-trial level, larger AFO values (a stronger bias toward the side of the prior target) should precede faster return saccades, but slower alternation saccades. These trial-level predictive signatures between anticipatory behavior and stimulus-drive responses should therefore produce negative AFO/SL correlations for return saccades, and positive AFO/SL correlations for alternate saccades. Correlation patterns between AFO and SL at the single trial level confirmed these patterns (correlations were computed per participant and statistically evaluated at group level; Figure 5B). For pret70, we found a negative AFO/SL correlation for return saccades (across participants, mean Z-transformed Pearson's R = −0.046 ± 0.021, t(20) = 2.10, p = 0.048, d = 0.46), and a positive correlation for alternate saccades (across participants, mean Z-transformed Pearson's R = 0.125 ± 0.023, t(20) = 5.20, p < 0.001, d = 1.13). For pret30, the AFO/SL correlation was negative for return saccades (across participants, mean Z-transformed Pearson's R = –0.060 ± 0.022, t(20) = 2.69, p = 0.014, d = 0.59), and no significant correlation was found for alternate saccades. Overall, these findings show that AFOs prior to a saccade contains information about saccade latencies in a manner consistent with anticipatory predictions. 
As indicated in the Introduction, a related issue is whether SL and AFO are similarly sensitive to statistical features of the stochastic process. We used a mutual information analysis to quantify the impact of their linear correlations and more general relationships on the amount of information that the entire response distribution of SL and AFO carried about the experimental conditions. First, considering trials 51–100 where both AFO and SL showed greater sensitivity to statistics,4 we found that AFOs conveyed around twice as much information about the statistical process compared to SL: 0.0527 ± 0.0063 bits for AFO versus 0.0245 ± 0.0050 bits for SL, t(20) = 4.34, p < 0.001, d = 1.09. We also found that AFO provided an information gain of 85% with respect to I(pret;SL|S) = 0.0285 ± 0.0056 bits, t(20) = 2.79, p = 0.011, d = 0.90. Since AFOs were more informative than SLs, and preceded them temporally, we could determine whether AFOs accounted for some of the information that SLs carried about the statistical process (pret70, pret30). In that case, there would be negative synergy (see Methods) between AFO and SL. Evaluating this quantity, we found Syn(AFO,SL) = –0.0135 ± 0.0039 bits, t(20) = 3.44, p = 0.0026, d = 0.75, which was about 55% of I(pret;SL). That said, saccade latencies did carry some independent information about the statistical process: the quantity I(pret;SL|(S&AFO)), that is the information carried by SL, conditioned on the joint occurrence of AFO and S was 0.02735 ± 0.0064 bits, was significantly greater than zero, t(20) = 4.25, p < 0.001, d = 0.93. These results suggest that some redundancy notwithstanding, SL and AFO do convey substantially different information about the target location statistic. 
Fixation offsets develop within a trial and are co-determined by gaze drifts and saccade instabilities
We have shown that AFO tracks statistical context, manifests a multiple-trial temporal integration window, and is predictive of SL, while showing greater sensitivity to the statistical context than SL. AFO, however, is a summary descriptive measure that does not shed light on underlying mechanisms. The oculomotor mechanisms underlying AFO are of interest as both minor saccade instabilities (including micro-saccades) and gaze drifts have been linked to covert attention. Although our main interest in this study was the computational properties captured in anticipatory behavior, we present here a preliminary analysis of the oculomotor mechanism involved. 
We first qualitatively present the trajectories of gaze movements (on the horizontal, x-direction), within a trial. We begin the description relative to the time point at which participants saccaded, from the target stimulus in the periphery back to the center of the display, prior to the start of a new trial. This analysis is time-locked to saccade landing in the vicinity of the fixation symbol, which tended to occur approximately 10 ms in advance of presentation of fixation symbol. Figure 6 presents the timelines of mean gaze location relative to landing position, continuing temporally through the presentation of the fixation symbol and the subsequent blank screen, in 10 ms time bins (negative y values indicate left screen side; positive values indicate the right side). 
Figure 6
 
Mean Gaze locked to the time of landing at the fixation symbol. In both panels, negative values on the y axis indicate gaze to the left of screen center, and positive values indicate gaze to the right of screen center. The x axis marks time lapsed from the saccade to center. (A) Gaze locations on trials following a target presented on the left. Plots are time-locked to the time at which the saccade to center occurred. Each time point is an average of gaze values in 10 ms bins; shaded areas represent ±SEM. The dashed vertical line indicates the temporal onset of the blank screen (∼410 ms from landing at center). Superimposed (red line; second y axis) is Cohen's effect size in each time bin for the difference in gaze locations between the two conditions. (B) Same as Panel A but for trials following a target on the right.
Figure 6
 
Mean Gaze locked to the time of landing at the fixation symbol. In both panels, negative values on the y axis indicate gaze to the left of screen center, and positive values indicate gaze to the right of screen center. The x axis marks time lapsed from the saccade to center. (A) Gaze locations on trials following a target presented on the left. Plots are time-locked to the time at which the saccade to center occurred. Each time point is an average of gaze values in 10 ms bins; shaded areas represent ±SEM. The dashed vertical line indicates the temporal onset of the blank screen (∼410 ms from landing at center). Superimposed (red line; second y axis) is Cohen's effect size in each time bin for the difference in gaze locations between the two conditions. (B) Same as Panel A but for trials following a target on the right.
The process depicted in Figure 6 is straightforward: in both pret30 and pret70, the landing position (marked t = 0 in the x axis) was on the same screen side as the prior target. This was followed by an adjustment toward the screen center during the next ∼ 200 ms. As we detail as follows, these adjustments reflected both drifts and small corrective saccadic movements during the presentation of the fixation symbol and the subsequent blank screen. After this, gaze trajectories further diverged based on experimental condition; in pret70 (lighter shading in Figure 6), gaze remained on the side of the prior target (plateauing during the presentation of the fixation symbol), whereas in pret30, the gaze continued a trajectory toward the alternate side (darker shading). For all time bins we found a significant difference between the mean gaze location in the two conditions (p < 0.01, Bonferroni corrected). Importantly, however, as expressed by the Cohen's effect size d, (Figure 6, red lines) the difference in gaze position between pret70 and pret30 demonstrated a continuous increase during the fixation symbol presentation ∼ 400 ms and during the blank screen ∼ 400–560 ms, Spearman's R = 0.99, p < 10–6). In the Appendix we present density plots of group-level gaze locations in pret70 and pret30, in trials following a target on the left or right screen side (Figure A1). These demonstrate the tight clustering of gaze locations at screen center during the window where AFO was quantified, as well as the offsets induced by transition structure. 
After the gaze arrived at the fixation symbol, we observed relatively frequent saccadic intrusions (SI), which are relatively small saccade instabilities in the range 0.1°–4.0° (Abadi & Gowen, 2004). Microsaccades (amplitude below 0.33°) fall within this range, but in the current study these occurred with a mean frequency of 0.03 ± 0.04 Hz and were therefore too infrequent to be analyzed separately (microsaccade constituted ∼ 2% of all SI cases). In both pret30 and pret70, SIs were oriented away from the last target and toward fixation. These very likely reflected corrections of the modest landing undershoot, which is corroborated by a strong correlation between SI magnitudes and gaze position during landing, indicating more extensive corrections for stronger undershoots (across participants, mean Z-transformed Pearson's R = 1.1 ± 0.3, significantly positive, t(20) = 17.73, p < 0.001, d = 3.87). On average, SIs occurred during presentation of the fixation symbol (233 ± 6 ms after landing). Although the magnitude of SIs was similar for pret70 and pret30 (∼2° in both conditions), they occurred more frequently in pret30 (SI frequency for pret30: M = 1.374 ± 0.068 Hz; for pret70: M = 1.305 ± 0.075 Hz; t(20) = 2.76, p = 0.012, d = 0.22). Note that targets were presented at a rate of 0.93 Hz. 
We then examined eye drifts that occurred during the blank period5. Although drift magnitudes were small (< 0.1°), drift patterns strongly dissociated between pret70 and pret30: in pret70 drifts were generally toward the prior target, whereas in pret30 they were toward the opposite screen side. A statistical analysis based on a 2 (Condition: pret30, pret70) × 2 (Last trial: return, alternate) ANOVA revealed a main effect of condition, F(1,20) = 9.57, p = 0.0027. Follow-up (nonindependent) contrasts showed that drifts were significantly negative for pret30, and (not significantly) positive for pret70 (for pret30: M = −0.0223 ± 0.0054°, t(20) = −4.13, p < 0.001, d = 0.90; for pret70: M = 0.0112 ± 0.0057°, t(20) = 1.96, p = 0.064, d = 0.43). Independently, drift values were slightly positive after return trials and slightly negative after alternate trials, resulting in a significant main effect of the last trial, F(1,20) = 38.82, p < 0.001. 
Finally, we determined the information that was carried by landing positions, SI and drifts about the experimental condition, and compared those to the information carried by AFO. Landing position, drifts and SI amplitudes provided significantly less information than AFO (I(pret;landing) = 0.0203 ± 0.0034 bits, t(20) = 5.91, p < 0.001, d = 1.40; I(pret;SI) = 0.0370 ± 0.0047 bits, t(20) = 2.17, p = 0.042, d = 0.62; I(pret;drift) = 0.0326 ± 0.0065 bits, t(20) = 2.78, p = 0.012, d = 0.69). Interestingly, like AFO, drift demonstrated signatures of knowledge consolidation, since the information drift carried about the experimental condition increased from the first to the second half of trials [t(20) = 3.06, p = 0.0062, d = 0.38]. 
Discussion
There exists extensive literature on how statistical learning impacts response components (e.g., Kim et al., 2017; O'Reilly et al., 2013; Vossel et al., 2014) and the brain regions that are associated with these responses (e.g., Mengotti et al., 2017). Despite these advances, and related demonstrations that strong predictability can produce anticipatory motor behaviors (e.g., Dale, Duran, & Morehead, 2012; Vakil, Bloch, & Cohen, 2017), the impact of learning on predictive processes per se remains an open question. Consequently, theory development has largely been informed by analyses of behavioral or neurobiological responses to stimuli that vary in predictability (e.g., den Ouden et al., 2010; Kim et al., 2017; Vossel et al., 2014). Our findings directly address three core questions on the interface of learning and prediction: (a) the prevalence of predictions, (b) the temporal integration-constants that govern anticipatory activity, and (c) the information carried by predictive versus stimulus-linked behavior. We addressed these questions using a novel approach based on the analysis of subtle AFOs recorded while anticipating targets, in a design where predictions could only be based on transition structure. Critically, the AFO measure captures prediction prior to the arrival of the stimulus, rather than a reaction to it. 
AFO: Intertrial effects and learning
In absolute magnitude, AFOs were subtle, with 90% of all gazes falling within 1.6° from screen center. AFOs significantly differed between pret70 and pret30, with pret70 linked to a stronger bias toward the screen side of the last target. AFO was also strongly impacted by the most recent trial: returns induced a significant offset toward the last screen side, though more strongly for pret70. The regression models identified temporally extended effects, revealing an independent effect of each of the last five transitions for pret70 and each of the last three transitions for pret30. These results accord with previous studies showing that learning of transition structure is associated with a rapidly decreasing effect of recent trials (e.g., Bornstein & Daw, 2012; Harrison, Bestmann, Rosa, Penny, & Green, 2011). 
We also found signatures of learning over longer time scales. Differences between AFOs for pret70 and pret30 (ΔAFO) were larger when computed for trials 51–100 than for trials 1–50 within each series. Second, during the 20 random trials appended to each series, AFO was impacted by the statistical structure inherent in the preceding 100-trial series. Specifically, when the random trials were appended to the pret70 series there was still a greater bias toward the last target location, but when they were appended to the pret30 series, there was still a greater bias toward the alternate screen side. This is consistent with prior demonstrations of long-lasting memory effects in implicit learning (e.g., Jiang, Won, & Swallow, 2014). Interestingly, during these random trials, this continuous impact of prior statistical structure coexisted with a second, independent effect that depended on whether the last trial was an alternation or return. These findings demonstrate that the impact of prior long-term statistical structure, which at that point was not reinforced but memory dependent, was maintained above and beyond the separate strong modulation induced by each prior trial. 
Although this study constitutes an initial examination of AFOs and how they inform models of learning and prediction, the data also produced specific findings that shed new light on the relation between formal uncertainty, subjective uncertainty, and prediction. Formally, the two Markov processes we used, pret70 and pret30, were equally uncertain: they had identical marginal frequencies and the same first-order Markov entropy. Notwithstanding this similarity, pret30 and pret70 produced different learning signatures. Decades of research have shown that humans manifest a specific bias when evaluating the randomness of binary series such as the ones used here: they judge random series as overly regular in that they misperceive them as containing more streaks (repetitions) than one expects from chance (e.g., Falk & Konold, 1997; Williams & Griffiths, 2013). Conversely, they judge binary series as random only once a series exceeds 60%–70% alternations. Our results show that such biases are not limited to deliberative judgment or reasoning. Instead, they are strongly reflected in anticipatory behaviors during online learning, consistent with the idea that pret30 series are indeed treated as more random than pret70 series. We applied a Rescorla-Wagner model to AFO data, which was successfully validated on out-of-sample data for almost all participants. The parameter fits revealed that pret30 was associated with a higher learning rate α, indicative of a narrower temporal integration window (this was consistent with the regression model results, which showed a weaker impact of recent trials in pret30). The estimated parameter K, which reflects the transformation from subjective probability to AFO, was larger for pret70 than pret30. This means that, all else being equal, the transformation from the subjective probability estimate to anticipatory behavior was associated with larger scaling effect in pret70. It remains to be determined whether this reflects different levels of confidence in the internal distributional estimations (as captured, e.g., by hyperparameters in Dirichlet distributions), or a difference in how distributional information translates into oculomotor commands. Finally, the findings for the equilibrium point P0 only partially confirmed expectation. Because prior work suggests that series are perceived as random when the proportion of returns is around 30%, we expected P0 to be in that range for both conditions. Although P0 differed between the conditions, the distribution in pret30 was qualitatively larger (encompassing almost the entire [0,1] interval), and more work is needed to determine this issue. Future work could improve the modeling of AFO dynamics by considering, for example, nonlinear transformations of subjective probabilities, Bayesian models that formally consider the parameters' distribution via hyperparameters, potential hysteresis effects, and the superimposition of stochastic resonant mechanisms. 
Impact of statistical regularity on SLs and relation to AFOs
There is considerable recent interest in perceptual inferences underlying saccades and the temporal time scales governing them. Rise-to-threshold models (e.g., Brown & Heathcote, 2008; Carpenter & Williams, 1995), identify factors that impact different parameters of SL models. These include expectation, response urgency and stimulus features (reviewed in Noorani & Carpenter, 2016). Our analysis of saccade latencies using the LATER model produced findings highly consistent with prior work. Using a saccade sequencing paradigm, Farrell, Ludwig, Ellis, and Gilchrist (2010) manipulated the probability of return to the same location, and modeled SL with a competitive race-to-threshold model (Brown & Heathcote, 2008). They identified two factors that impact saccade initiation: (a) the evidence threshold for initiating a saccade, which was only impacted by probability of return, and (b) the accumulation rate, which was impacted by whether a saccade returned to the same location, but not by the probability of return. Our results confirm those findings. We obtained the same dissociation: thresholds were impacted by transition probability, whereas accumulation rate was only impacted by whether the saccade returned to the same location. Kim et al. (2017) also showed that prior probability for a particular left or right saccade was reflected in the rise-to-threshold parameter of a LATER model. Consistent with several studies (Farrell et al., 2010; Kim et al., 2017; Vossel et al., 2014), we also found that saccade latencies indicated learning of statistical structure (transition probabilities): return saccades were faster in pret70 than in pret30, and conversely, alternate saccades were faster in pret30 than in pret70. In all, our findings for saccade latencies dovetail with recent conclusions about factors that impact threshold and accumulation rates in nonrandom environments. It is notable that while saccades demonstrated a strong effect of Inhibition of Return, there was no indication for IOR in the AFO data. Specifically, in the pret70 condition, AFOs were associated with a strong bias toward the Return side (as licensed by global statistics), as opposed to the subsequent stimulus-guided-saccades, which showed IOR costs. 
The configuration of recent trials impacted SL. This was most strongly evident in pret70, where return saccades in each of the last four transitions impacted latencies on a subsequent alternation saccade, though interestingly these did not impact a subsequent return saccade. This shows that although pret70 was associated with cumulative integration of recent past trials, the behavioral expression of this integration depended on the behavioral response required on any given trial (Awh, Belopolsky, & Theeuwes, 2012). 
As indicated in the Introduction, saccade latencies reflect a combination of anticipatory and reactive computations, whereas AFOs, by definition, provide information restricted to anticipatory processes. This allows evaluating the relationship between anticipatory and poststimulus behaviors, as detailed as follows. In studies of visual attention, probabilistic directional cues are often used to induce a shift in attention, which impacts subsequent stimulus responses, depending on whether the target features are consistent with the information provided by the cue (e.g., Klein, Kingstone, & Pontefract, 1992; Posner 1980). Departing from studies of cued attention, our study avoided external cues to orient attention, so that eventual attention shifts were endogenously produced in response to environmental features. Our findings suggest that AFO reflects an endogenous shift in attention (and potentially, motor preparation as well), seen in that larger AFO values (a bias toward the prior target) preceded both faster return saccades and slower alternation saccades. These trial-by-trial correlations between SL and AFO were modest, though statistically significant (on the group level) in most cases. The relatively modest correlation can be explained by stimulus-driven (exogenous) components of SL that override endogenous information (e.g., Godijn & Theeuwes, 2002; Klein et al., 1992). Modest correlations notwithstanding, AFO and SL provided different information about the stochastic context. First, compared to SL, AFO was impacted by a more extended trial history in both conditions, consistent with a longer integration window. Second, when examining trials in the second halves of the series, for which the impact of experimental condition was more robust, we found that AFO conveyed more information about the experimental condition (pret30, pret70) than did SL. Furthermore, around 55% of the information carried by SL about the statistical process, was already accounted in the preceding AFOs. Taken together with the fact that SLs also carried unique information about the series statistics, this suggests that only a fraction of the information carried by AFO is effectively used when the saccades are initiated. In all, AFO carried more information about the statistical context than SL, carried a substantial portion of nonredundant information about the statistical process, and partially accounted for the impact of the statistical process on SL. 
We did not analyze anticipatory saccade [i.e., saccades planned prior to stimulus appearance; (0 < SL < 80 ms)] as our main interest in comparing AFO and SL was in contrasting predictive and reactive behaviors. Technically, inclusion of anticipatory saccades could increase the information carried by SL. Because anticipatory saccades were very rare (∼2% of trials), we were unable to study them separately. However, in a separate analysis we found their inclusion did not qualitatively impact any of the conclusions we report. Furthermore, it may be that anticipatory saccades are an extreme manifestation of the mechanisms that produce AFOs, in which case they should not be combined with other saccades. We consider this an important topic for future research on the boundaries between fixation and saccade control (see Krauzlis, Goffart, & Hafed, 2017). 
Most generally, AFOs may be related to measures used in prior studies of attention that have not used exogenous cues. Most notably, the large literature on sequential effects in attentional tasks, where responses on a given trial depend on features of the prior trial. There are multiple explanations for these sequential effects (for reviews, see Duthoo, Abrahamse, Braem, Boehler, & Notebaert, 2014; Egner, 2014). Some emphasize online adjustments of control (impacting, e.g., response selection or stimulus attention) whereas others emphasize proactive, anticipatory processes that encapsulate a cognitive repetition expectation on which the next trial should repeat the features of the prior one. We note that a core finding in the current study is supportive of an expectation account: specifically, location of gaze after return saccades was more likely to be positioned toward the direction of the prior target, in both pret70 and pret30. Notwithstanding, AFO was still substantially larger in pret70, and in both conditions AFO was independently impacted by the prior three to five transitions, speaking to longer-term learning effects beyond the most recent trial. 
AFOs: Intratrial dynamics and relation to oculomotor bases
Fixation is an active and demanding process (Carpenter & Noorani, 2017), in which signals from fovea and periphery dynamically interact (e.g., Krauzlis et al., 2017). Despite typical requirements to keep gaze still, subtle eye movements are continuously generated during fixation (e.g., Cherici et al., 2012; Snodderly & Kurtz, 1985), resulting in eye drifts and small fixational saccades. These are thought to provide optimal retinal input for downstream visual processing (e.g., Rucci & Victor, 2015). 
Ocular drifts have been associated with compensation for small head movements (Aytekin, Victor, & Rucci, 2014), but here we find these contain predictive information: in pret30 drifts were toward a direction opposite to the last target, whereas in pret70 drifts were in the direction of the last target. In some studies, eye drifts that anticipate a certain target movement, are referred to as anticipatory pursuits (Barnes, 2008). These anticipatory pursuits (APs) may share several features with AFOs, since they predict target velocities (Moschner, Zangemeister, & Demer, 1996), are impacted by trial history (Kowler, 1989) and probability of presentation (Santos & Kowler, 2017; Damasse et al., 2018), and can modulate subsequent saccades to target (Tanaka, Yoshida, & Fukushima, 1998). That said, a requirement to fixate is thought to inhibit generation of APs (Watamaniuk et al., 2017), and whether the drift component we observe and APs reflect outputs of the same neural control circuit is an open question for future work. 
We also identified small fixational saccades that occurred during or around presentation of the fixation symbol. These are thought to allow sampling of areas of high variability (foveation; Guerrasio, Quinet, Büttner, & Goffart, 2010) and can anticipate volitional saccades (Watanabe, Matsuo, Zha, Munoz, & Kobayashi, 2013). When attention is oriented via a cue, subsequent micro saccades are more likely to occur in the cued direction (e.g., Engbert & Kliegel, 2003; Hafed & Clark, 2002; Meyberg, Sommer, & Dimigen, 2017). The fixational saccades we identified might be partially related to similar processes, as we found that the pret30 condition was associated with more fixational saccades toward the alternate side, consistent with an expectation effect. 
AFOs are a summary measure that is impacted by all oculomotor processes that took place between the offset of one target and the onset of the subsequent target, including both drifts and fixational saccades. The relative contribution of these oculomotor components is likely to be task dependent, as for instance longer gap periods could be conducive to more frequent saccade instabilities. We found that once a target was removed from the screen, the difference in gaze patterns and AFOs in the pret70 and pret30 developed gradually over the subsequent 560 ms (i.e., the combined period of the fixation screen and 160 ms of blank screen) culminating in the differences documented in the main analysis. 
One possibility is that the superior colliculus (SC) is a hub that is responsible for these stochastic-driven AFO effects. The SC serves as a crucial hub between cortical processing centers and the saccade motor circuits (Trappenberg, Dorris, Munoz, & Klein, 2001) and is important for visual attention (Krauzlis, Lovejoy, & Zénon, 2013). Studies in nonhuman primates show that SC neurons track the goal of a gaze (Hafed, Goffart, & Krauzlis, 2008), anticipate the direction of a cued target (Horwitz & Newsome, 2001), increase firing when visual attentional is engaged (Ignashchenkova, Dicke, Haarmeier, & Thier, 2004; Kustov & Robinson, 1996), and that SC activity scales with the probability that a target will be presented in a neuron's receptive field (Basso & Wurtz, 1998; Dash, Nazari, Yan, Wang, & Crawford, 2016). One possibility supported by computational models (Trappenberg et al., 2001) is that AFOs are a consequence of dynamic competitions between neural populations involved in saccade buildup and fixation or alternatively between the left and right colliculi (Krauzlis et al., 2017; Goffart, Hafed, & Krauzlis, 2012). 
Other systems that are involved in expectation and motor programming (Corbetta et al., 1998; Nobre, Gitelman, Dias, & Mesulam, 2000), may also determine AFOs, through SC modulation. In particular, the caudate nucleus (Lauwereyns, Watanabe, Coe, & Hikosaka, 2002) and LIP (Bisley & Goldberg, 2003) code for expected locations prior to target appearance. During fixation LIP neurons increase their activity if attention is engaged in the periphery (Colby, Duhamel, & Goldberg, 1996). Activity in frontal eye fields determines disengagement from fixation (Dias & Bruce, 1994), and codes for the current locus of spatial attention independently of motor plans and overt movements (Serences & Yantis, 2006). The cerebellum is involved in fixation control (Hotson, 1982) and in attention, even in absence of ocular movements (Striemer, Chouinard, Goodale, & de Ribaupierre, 2015); notably chemical impairment of the fastigial oculomotor region causes fixation offsets (Guerrasio et al., 2010). All these areas project to SC directly or indirectly. 
Conclusions
Our study shows that a proactive oculomotor metric, quantified via subtle AFOs, is strongly impacted by input statistics. These biases were on average less than 1° in magnitude, and were measured while participants were fixating the screen center. Although AFOs were moderately predictive of subsequent saccade latencies on a trial-by-trial level, they captured more information about input statistics than did saccade latencies. These results show that strictly anticipatory behavior is impacted by learning on multiple scales, and that AFOs offer a unique and sensitive avenue for understanding learning and prediction in a way that is decoupled from the direct relationship between stimulus and responses. These developments could pave the way for future work that separately quantifies what is learned from how learning impacts behavior, and ultimately provide a better understanding of the relation between prediction and action. 
Acknowledgments
We thank Leonardo Chelazzi for his comments. UH's work was conducted in part while serving at and with support of the National Science Foundation. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF. The study was partially funded by a European Research Council grant to UH (ERC-STG 263318). 
GN, DM, and UH conceived the study, GN conducted the study, GN and UH analyzed the data. WVZ, MA, and DM contributed methods. All authors reviewed the manuscript. 
Commercial relationships: none. 
Corresponding author: Giuseppe Notaro. 
Address: Center for Mind/Brain Sciences (CIMeC), The University of Trento, Trento, Italy. 
References
Abadi, R. V., & Gowen, E. (2004). Characteristics of saccadic intrusions. Vision Research, 44, 2675–2690, https://doi.org/10.1016/j.visres.2004.05.009.
Awh, E., Belopolsky, A. V., & Theeuwes, J. (2012). Top-down versus bottom-up attentional control: A failed theoretical dichotomy. Trends in cognitive sciences, 16 (8), 437–443, https://doi.org/10.1016/j.tics.2012.06.010.
Aytekin, M., Victor, J. D., & Rucci, M. (2014). The visual input to the retina during natural head-free fixation. Journal of Neuroscience, 34 (38), 12701–12715, https://doi.org/10.1523/JNEUROSCI.0229-14.2014.
Bar, M., Kassam, K. S., Ghuman, A. S., Boshyan, J., Schmid, A. M., Dale, A. M.,… Halgren, E. (2006). Top-down facilitation of visual recognition. Proceedings of the National Academy of Sciences, USA, 103 (2), 449–454, https://doi.org/10.1073/pnas.0507062103.
Barnes, G. R. (2008). Cognitive processes involved in smooth pursuit eye movements. Brain and Cognition, 68 (3), 309–326, https://doi.org/10.1016/j.bandc.2008.08.020.
Basso, M. A., & Wurtz, R. H. (1998). Modulation of neuronal activity in superior colliculus by changes in target probability. Journal of Neuroscience, 18 (18), 7519–7534.
Bisley, J. W., & Goldberg, M. E. (2003). Neuronal activity in the lateral intraparietal area and spatial attention. Science, 299 (5603), 81–86, https://doi.org/10.1126/science.1077395.
Bornstein, A. M., & Daw, N. D. (2012). Dissociating hippocampal and striatal contributions to sequential prediction learning. European Journal of Neuroscience, 35 (7), 1011–1023, https://doi.org/10.1111/j.1460-9568.2011.07920.
Brainard, D. H. (1997). The Psychophysics Toolbox. Spatial Vision, 10 (4), 433–436. PMID:9176952.
Brodersen, K. H., Penny, W. D., Harrison, L. M., Daunizeau, J., Ruff, C. C., Duzel, E.,… Stephan, K. E. (2008). Integrated Bayesian models of learning and decision making for saccadic eye movements. Neural Networks, 21 (9), 1247–1260, https://doi.org/10.1016/j.neunet.2008.08.007.
Brown, S. D., & Heathcote, A. (2008). The simplest complete model of choice response time: Linear ballistic accumulation. Cognitive Psychology, 57 (3), 153–178, https://doi.org/10.1016/j.cogpsych.2007.12.002.
Carpenter, R., & Noorani, I. (2017). Movement suppression: Brain mechanisms for stopping and stillness. Philosophical Transactions of the Royal Society B, 372 (1718), https://doi.org/10.1098/rstb.2016.0542.
Carpenter, R. H., & Williams, M. L. (1995). Neural computation of log likelihood in control of saccadic eye movements. Nature, 377 (6544), 59–62, https://doi.org/10.1038/377059a0.
Cherici, C., Kuang, X., Poletti, M., & Rucci, M. (2012). Precision of sustained fixation in trained and untrained observers. Journal of Vision, 12 (6): 31, 1–16, https://doi.org/10.1167/12.6.31. [PubMed] [Article]
Colby, C. L., Duhamel, J. R., & Goldberg, M. E. (1996). Visual, presaccadic, and cognitive activation of single neurons in monkey lateral intraparietal area. Journal of Neurophysiology, 76 (5), 2841–2852, https://doi.org/10.1152/jn.1996.76.5.2841.
Corbetta, M., Akbudak, E., Conturo, T. E., Snyder, A. Z., Ollinger, J. M., Drury, H. A.,… Shulman, G. L. (1998). A common network of functional areas for attention and eye movements. Neuron, 21 (4), 761–773, https://doi.org/10.1016/S0896-6273(00)80593-0.
Cover, T. M., & Thomas, J. A. (1991). Elements of information theory. Hoboken, NJ: John Wiley & Sons, Inc.
Dale, R., Duran, N. D., & Morehead, J. R. (2012). Prediction during statistical learning, and implications for the implicit/explicit divide. Advances in Cognitive Psychology, 8 (2), 196–209, https://doi.org/10.2478/v10053-008-0115-z.
Damasse, J-B., Perrinet, L. U., Madelain, L., & Montagnini, A. (2018). Reinforcement effects in anticipatory smooth eye movements. Journal of Vision, 18 (11): 14, 1–18, https://doi.org/10.1167/18.11.14. [PubMed] [Article]
Dash, S., Nazari, S. A., Yan, X., Wang, H., & Crawford, J. D. (2016). Superior colliculus responses to attended, unattended, and remembered saccade targets during smooth pursuit eye movements. Frontiers in Systems Neuroscience, 10, https://doi.org/10.3389/fnsys.2016.00034.
den Ouden, H. E., Daunizeau, J., Roiser, J., Friston, K. J., & Stephan, K. E. (2010). Striatal prediction error modulates cortical coupling. Journal of Neuroscience, 30 (9), 3210–3219, https://doi.org/10.1523/JNEUROSCI.4458-09.2010.
Dias, E. C., & Bruce, C. J. (1994). Physiological correlate of fixation disengagement in the primate's frontal eye field. Journal of Neurophysiology, 72 (5), 2532–2537, https://doi.org/10.1152/jn.1994.72.5.2532.
Duthoo, W., Abrahamse, E. L., Braem, S., Boehler, C. N., & Notebaert, W. (2014). The heterogeneous world of congruency sequence effects: an update. Frontiers in Psychology, 5, 1001, https://doi.org/10.3389/fpsyg.2014.01001.
Egner, T. (2014). Creatures of habit (and control): A multi-level learning perspective on the modulation of congruency effects. Frontiers in Psychology, 5, 1247, https://doi.org/10.3389/fpsyg.2014.01247.
Engbert, R., & Kliegl, R. (2003). Microsaccades uncover the orientation of covert attention. Vision research, 43 (9), 1035–1045, https://doi.org/10.1016/S0042-6989(03)00084-1.
Falk, R., & Konold, C. (1997). Making sense of randomness: Implicit encoding as a basis for judgment. Psychological Review, 104 (2), 301.
Farrell, S., Ludwig, C. J., Ellis, L. A., & Gilchrist, I. D. (2010). Influence of environmental statistics on inhibition of saccadic return. Proceedings of the National Academy of Sciences, USA, 107 (2), 929–934, https://doi.org/10.1073/pnas.0906845107.
Fischer, B., & Ramsperger, E. (1984). Human express saccades: Extremely short reaction times of goal directed eye movements. Experimental Brain Research, 57 (1), 191–195.
Friston, K. (2009). The free-energy principle: A rough guide to the brain?. Trends in Cognitive Sciences, 13 (7), 293–301, https://doi.org/10.1016/j.tics.2009.04.005.
Godijn, R., & Theeuwes, J. (2002). Programming of endogenous and exogenous saccades: Evidence for a competitive integration model. Journal of Experimental Psychology: Human Perception and Performance, 28 (5), 1039, https://doi.org/10.1037//0096-1523.28.5.1039.
Goffart, L., Hafed, Z. M., & Krauzlis, R. J. (2012). Visual fixation as equilibrium: Evidence from superior colliculus inactivation. Journal of Neuroscience, 32 (31), 10627–10636, https://doi.org/10.1523/JNEUROSCI.0696-12.2012.
Grossberg, S. (1987). Competitive learning: From interactive activation to adaptive resonance. Cognitive Science, 11 (1), 23–63, https://doi.org/10.1016/S0364-0213(87)80025-3.
Guerrasio, L., Quinet, J., Büttner, U., & Goffart, L. (2010). Fastigial oculomotor region and the control of foveation during fixation. Journal of Neurophysiology, 103 (4), 1988–2001, https://doi.org/10.1152/jn.00771.2009.
Hafed, Z. M., & Clark, J. J. (2002). Microsaccades as an overt measure of covert attention shifts. Vision Research, 42 (22), 2533–2545, https://doi.org/10.1016/S0042-6989(02)00263-8.
Hafed, Z. M., Goffart, L., & Krauzlis, R. J. (2008). Superior colliculus inactivation causes stable offsets in eye position during tracking. Journal of Neuroscience, 28 (32), 8124–8137, https://doi.org/10.1523/JNEUROSCI.1317-08.2008.
Haith, A. M., Pakpoor, J., & Krakauer, J. W. (2016). Independence of movement preparation and movement initiation. Journal of Neuroscience, 36 (10), 3007–3015, https://doi.org/10.1523/JNEUROSCI.3245-15.2016.
Harrison, L. M., Bestmann, S., Rosa, M. J., Penny, W., & Green, G. G. (2011). Time scales of representation in the human brain: Weighing past information to predict future events. Frontiers in Human Neuroscience, 5, 37, https://doi.org/10.3389/fnhum.2011.00037.
Hartmann, M., Mast, F. W., & Fischer, M. H. (2015). Spatial biases during mental arithmetic: Evidence from eye movements on a blank screen. Frontiers in Psychology, 6, https://doi.org/10.3389/fpsyg.2015.00012.
Horwitz, G. D., & Newsome, W. T. (2001). Target selection for saccadic eye movements: Prelude activity in the superior colliculus during a direction-discrimination task. Journal of Neurophysiology, 86 (5), 2543–2558, https://doi.org/10.1152/jn.2001.86.5.2543.
Hotson, J. R. (1982). Cerebellar control of fixation eye movements. Neurology, 32 (1), 31–36, https://doi.org/10.1212/WNL.32.1.31.
Ignashchenkova, A., Dicke, P. W., Haarmeier, T., & Thier, P. (2004). Neuron-specific contribution of the superior colliculus to overt and covert shifts of attention. Nature Neuroscience, 7 (1), 56, https://doi.org/10.1038/nn1169.
Jiang, Y. V., Won, B. Y., & Swallow, K. M. (2014). First saccadic eye movement reveals persistent attentional guidance by implicit learning. Journal of Experimental Psychology: Human Perception and Performance, 40 (3), 1161, https://doi.org/10.1037/a0035961.
Kim, T. D., Kabir, M., & Gold, J. I. (2017). Coupled decision processes update and maintain saccadic priors in a dynamic environment. Journal of Neuroscience, 37 (13), 3632–3645, https://doi.org/10.1523/JNEUROSCI.3078-16.2017.
Klein, R., Kingstone, A., & Pontefract, A. (1992). Orienting of visual attention. In Eye movements and Visual Cognition (pp. 46–65). New York, NY: Springer.
Kok, P., Mostert, P., & De Lange, F. P. (2017). Prior expectations induce prestimulus sensory templates. Proceedings of the National Academy of Sciences, USA, 201705652, https://doi.org/10.1073/pnas.1705652114.
Kowler, E. (1989). Cognitive expectations, not habits, control anticipatory smooth oculomotor pursuit. Vision Research, 29 (9), 1049–1057, https://doi.org/10.1016/0042-6989(89)90052-7.
Krauzlis, R. J., Goffart, L., & Hafed, Z. M. (2017). Neuronal control of fixation and fixational eye movements. Philosophical Transactions of the Royal Society B, 372 (1718), 20160205, https://doi.org/10.1098/rstb.2016.0205.
Krauzlis, R. J., Lovejoy, L. P., & Zénon, A. (2013). Superior colliculus and visual spatial attention. Annual Review of Neuroscience, 36, 165–182, https://doi.org/10.1146/annurev-neuro-062012-170249.
Kustov, A. A., & Robinson, D. L. (1996). Shared neural control of attentional shifts and eye movements. Nature, 384 (6604), 74, https://doi.org/10.1038/384074a0.
Lauwereyns, J., Watanabe, K., Coe, B., & Hikosaka, O. (2002). A neural correlate of response bias in monkey caudate nucleus. Nature, 418 (6896), 413–417, https://doi.org/10.1038/nature00892.
Magri, C., Whittingstall, K., Singh, V., Logothetis, N. K., & Panzeri, S. (2009). A toolbox for the fast information analysis of multiple-site LFP, EEG and spike train recordings. BMC Neuroscience, 10 (1), 81, https://doi.org/10.1186/1471-2202-10-81.
Mengotti, P., Dombert, P. L., Fink, G. R., & Vossel, S. (2017). Disruption of the right temporoparietal junction impairs probabilistic belief updating. Journal of Neuroscience, 37 (22), 5419–5428, https://doi.org/10.1523/JNEUROSCI.3683-16.2017.
Meyberg, S., Sommer, W., & Dimigen, O. (2017). How microsaccades relate to lateralized ERP components of spatial attention: A co-registration study. Neuropsychologia, 99, 64–80, https://doi.org/10.1016/j.neuropsychologia.2017.02.023.
Moschner, C., Zangemeister, W. H., & Demer, J. L. (1996). Anticipatory smooth eye movements of high velocity triggered by large target steps: normal performance and effect of cerebellar degeneration. Vision Research, 36 (9), 1341–1348, https://doi.org/10.1016/0042-6989(95)00218-9.
Nobre, A. C., Gitelman, D. R., Dias, E. C., & Mesulam, M. M. (2000). Covert visual spatial orienting and saccades: overlapping neural systems. Neuroimage, 11 (3), 210–216, https://doi.org/10.1006/nimg.2000.0539.
Noorani, I., & Carpenter, R. H. S. (2016). The LATER model of reaction time and decision. Neuroscience & Biobehavioral Reviews, 64, 229–251, https://doi.org/10.1016/j.neubiorev.2016.02.018.
Nyström, M., & Holmqvist, K. (2010). An adaptive algorithm for fixation, saccade, and glissade detection in eyetracking data. Behavior Research Methods, 42 (1), 188–204, https://doi.org/10.3758/BRM.42.1.188.
O'Reilly, J. X., Schüffelgen, U., Cuell, S. F., Behrens, T. E. J., Mars, R. B., & Rushworth, M. F. S. (2013). Dissociable effects of surprise and model update in parietal and anterior cingulate cortex. Proceedings of the National Academy of Sciences, USA, 110 (38), E3660–E3669, https://doi.org/10.1073/pnas.1305373110.
Panzeri, S., Senatore, R., Montemurro, M. A., & Petersen, R. S. (2007). Correcting for the sampling bias problem in spike train information measures. Journal of Neurophysiology, 98 (3), 1064–1072, https://doi.org/10.1152/jn.00559.2007.
Panzeri, S., & Treves, A. (1996). Analytical estimates of limited sampling biases in different information measures. textit{Network: Computation in neural systems, 7 (1), 87–107.
Posner, M. I. (1980). Orienting of attention. Quarterly Journal of Experimental Psychology, 32 (1), 3–25, https://doi.org/10.1080/00335558008248231.
Rafal, R. D., Calabresi, P. A., Brennan, C. W., & Sciolto, T. K. (1989). Saccade preparation inhibits reorienting to recently attended locations. Journal of Experimental Psychology: Human Perception and Performance, 15 (4), 673–685.
Rescorla, R. A., & Wagner, A.R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In Black, A. H. & Prokasy W. F. (Eds.), Classical conditioning II: current research and theory (pp. 64–99). New York, NY: Appleton Century Crofts.
Rucci, M., & Victor, J. D. (2015). The unsteady eye: An information-processing stage, not a bug. Trends in Neurosciences, 38 (4), 195–206, https://doi.org/10.1016/j.tins.2015.01.005.
Santos, E. M., & Kowler, E. (2017). Anticipatory smooth pursuit eye movements evoked by probabilistic cues. Journal of Vision, 17 (13): 13, 1–16, https://doi.org/10.1167/17.13.13. [PubMed] [Article]
Schneidman, E., Bialek, W., & Berry, M. J. (2003). Synergy, redundancy, and independence in population codes. Journal of Neuroscience, 23 (37), 11539–11553.
Serences, J. T., & Yantis, S. (2006). Selective visual attention and perceptual coherence. Trends in Cognitive Sciences, 10 (1), 38–45, https://doi.org/10.1016/j.tics.2005.11.008.
Snodderly, D. M., & Kurtz, D. (1985). Eye position during fixation tasks: comparison of macaque and human. Vision Research, 25 (1), 83–98, https://doi.org/10.1016/0042-6989(85)90083-5.
Striemer, C. L., Chouinard, P. A., Goodale, M. A., & de Ribaupierre, S. (2015). Overlapping neural circuits for visual attention and eye movements in the human cerebellum. Neuropsychologia, 69, 9–21, https://doi.org/10.1016/j.neuropsychologia.2015.01.024.
Tanaka, M., Yoshida, T., & Fukushima, K. (1998). Latency of saccades during smooth-pursuit eye movement in man. Directional asymmetries. Experimental Brain Research, 121 (1), 92–98, https://doi.org/10.1007/s002210050.
Thaler, L., Schutz, A. C., Goodale, M. A., & Gegenfurtner, K. R. (2013). What is the best fixation target? The effect of target shape on stability of fixational eye movements. Vision Research, 76, 31–42, https://doi.org/10.1016/j.visres.2012.10.012.
Trappenberg, T. P., Dorris, M. C., Munoz, D. P., & Klein, R. M. (2001). A model of saccade initiation based on the competitive integration of exogenous and endogenous signals in the superior colliculus. Journal of Cognitive Neuroscience, 13 (2), 256–271, https://doi.org/10.1162/089892901564306.
Turk-Browne, N. B., Scholl, B. J., Johnson, M. K., & Chun, M. M. (2010). Implicit perceptual anticipation triggered by statistical learning. Journal of Neuroscience, 30 (33), 11177–11187, https://doi.org/10.1523/JNEUROSCI.0858-10.2010.
Vakil, E., Bloch, A., & Cohen H. (2017). Anticipation measures of sequence learning: manual versus oculomotor versions of the serial reaction time task. The Quarterly Journal of Experimental Psychology, 70 (3), 579–589, https://doi.org/10.1080/17470218.2016.1172095.
Vossel, S., Mathys, C., Daunizeau, J., Bauer, M., Driver, J., Friston, K. J., & Stephan, K. E. (2014). Spatial attention, precision, and Bayesian inference: A study of saccadic response speed. Cerebral Cortex, 24 (6), 1436–1450, https://doi.org/10.1093/cercor/bhs418.
Watamaniuk, S. N., Bal, J., & Heinen, S. J. (2017). A subconscious interaction between fixation and anticipatory pursuit. Journal of Neuroscience, 37 (47), 11424–11430, https://doi.org/10.1523/JNEUROSCI.2186-17.2017.
Watanabe, M., Matsuo, Y., Zha, L., Munoz, D. P., & Kobayashi, Y. (2013). Fixational saccades reflect volitional action preparation. Journal of Neurophysiology, 110 (2), 522–535, https://doi.org/10.1152/jn.01096.2012.
Williams, J. J., & Griffiths, T. L. (2013). Why are people bad at detecting randomness? A statistical argument. Journal of Experimental Psychology: Learning, Memory, and Cognition, 39 (5), 1473, https://doi.org/10.1037/a0032397.
Footnotes
1  We performed three validation and robustness analyses of ΔAFO. First, we determined split-half reliability by deriving two separate ΔAFO values per participant: one from odd trials and one from even trials. Split-half reliability was very robust (0.90 after correction). Second, we evaluated to what extent ΔAFO depended on the specific trial inclusion criteria. We found that ΔAFO was robust across a range of trial inclusion values, including trials where AFO was restricted to 1.2° from screen center (see Appendix). Third, we verified whether ΔAFO was driven by transition structure or the number of returns and alternate trials in each series. We used bootstrapping to construct synthetic series from the pret70 and pret30 data, but where the number of alternation and return trials were equated (see Appendix). We found statistically significant ΔAFO values in these cases.
Footnotes
2  Group level t tests of Beta values against zero. For pret70: (β1:t(20) = 7.40, p < .001, d = 1.61 ; β2 t(20) = 6.82, p < 0.001, d = 1.49 ; β3 t(20) = 3.39, p = 0.0088, d = 0.74 ; β4 t(20) = 3.30, p = 0.010, d = 0.72 ; β5 t(20) = 2.72, p = 0.039, d = 0.59. for pret30: β1 t(20) = 7.34 , p < 0.001, d = 1.60; β2 t(20) = 3.66 , p = 0.0046, d = 0.80; β3 t(20) = 4.70 , p < 0.001, d = 1.03. All Bonferroni corrected within condition. We note that for some lags, a few participants did show negative beta values for lags>1; but there were only 18 such cases out of 147 beta values estimated.
Footnotes
3  The four Beta values were: β1 = 10.87 ± .69 ms, t(20) = 6.64, p < 0.001, d = 1.45; β2 = 6.98 ± 1.27 ms, t(20) = 5.33, p < 0.001, d = 1.16 ; β3 = 4.74 ± 1.46 ms, t(20) = 3.17, p = 0.012, d = 0.69; β4 = 4.04 ± 1.39 ms, t(20) = 4.41, p = 0.026, d = 0.96. When not partitioning the trials into alternations and returns, we found much weaker effects of recent trials on saccade latencies. There was no impact of recent trials for pret70, while for pret30, there was a lag-1 effect where a recent return produced faster saccades.
Footnotes
4  For trials 1–50, I(pret;SL|S) = 0.0137 ± 0.0043 bits; for trials 51–100: I(pret;SL|S) = 0.0285 ± 0.0056 bits, t(20) = 4.42, p < 0.001, d = 0.66. Without conditioning SL to the kind of transition we did not observe a significant increase of information between the two halves of trials: in trials 1–50 I(pret;SL) = 0.0184 ± 0.0030 bits, in trials 51–100 I(pret;SL) = 0.0245 ± 0.0050 bits (p > 0.1).
Footnotes
5  This was done to maximize sensitivity as an initial analysis indicated that combining over the fixation symbol and blank period carried substantially less information about the experimental condition.
Appendix
Group-level gaze-location density maps
Figure A1. This presents gaze location patterns during the pretarget blank interval in which we measured AFO. To sample gaze with sufficient spatial resolution (0.1×0.1 degrees2), we combined data across participants (for a total of 32,752 points). For each condition, we partitioned the fixation data based on screen-side of prior target. The figure communicates that: (a) the area with maximal density was always at center (0,0), demonstrating participants' success in maintaining fixation near the center of the fixation symbol, (b) gaze density steeply decreased in surrounding areas to 1/5 of maximum density, and (c) mean gazes (red crosses) was qualitatively shifted in the direction of the most likely next target location. 
Figure A1
 
Fixation location during last 10 ms of pretarget blank screen. To present the effect of stochastic context, fixation locations are presented as function of prior target location. Densities were calculated in 0.1×0.1 degrees2, merging data points from all participants and normalizing to the maximum value for condition. The single dark point marks maximal density and is always at the screen center; red points indicate mean values for condition; inner/outer circles mark areas encompassing 50% and 90% of all fixations. In pret70, gaze locations are slightly, but visibly shifted toward the side of the last presented target.
Figure A1
 
Fixation location during last 10 ms of pretarget blank screen. To present the effect of stochastic context, fixation locations are presented as function of prior target location. Densities were calculated in 0.1×0.1 degrees2, merging data points from all participants and normalizing to the maximum value for condition. The single dark point marks maximal density and is always at the screen center; red points indicate mean values for condition; inner/outer circles mark areas encompassing 50% and 90% of all fixations. In pret70, gaze locations are slightly, but visibly shifted toward the side of the last presented target.
Impact of eccentricity criteria on ΔAFO
In the main analysis, we considered trials as valid for AFO analysis if the gaze location during the 10 ms prior to target presentation was within 3° from center. To evaluate whether the results reproduce independently of this criterion, we also examined (a) a restricted set of trials where the limit was reduced to less than 1.2° (i.e., the eye location was within the area of the just presented fixation symbol; analyzed trials: 59 ± 3 %) and (b) an extended set of trials where gaze was within 5° of fixation (analyzed trials: 66 ± 3%). In all cases we found that the resulting ΔAFO was significantly above zero, indicating an impact of statistical structure on AFO (see Figure A2). In all cases ΔAFO significantly increased (p < 0.001) from the first half of the series (trials 1–50) to the second half of the series (trials 51–100), indicating consolidation of learning over time, as documented in the main analysis (radAFO = 1.2 deg, t(20) = 8.24, p < 0.001, d = 1.80; radAFO = 3 deg, t(20) = 10.10, p < 0.001, d = 1.87; radAFO = 5 deg, t(20) = 9.62, p < 0.001, d = 2.10). 
Figure A2
 
Δ AFO calculated in first half of trials (dark gray bars) and in the second half of trials (gray bars) for three different limit settings to the admitted AFO. Crosses above each bar indicate significant differences from zero. Asterisks above bar pairs indicate significant differences.
Figure A2
 
Δ AFO calculated in first half of trials (dark gray bars) and in the second half of trials (gray bars) for three different limit settings to the admitted AFO. Crosses above each bar indicate significant differences from zero. Asterisks above bar pairs indicate significant differences.
Bootstrap ΔAFO
As reported in the main text, we considered the quantity ΔAFO = (AFOpret70) − (AFOpret30) as a measure of sensitivity to global statistics. However this grand-average quantity could also reflect the different proportion of alternate and return trials in the two conditions: given that return trials induced positive AFO (in both conditions), a greater proportion of returns could bias the overall statistic even if returns had the same impact on AFO in both conditions. This concern only applies to the grand-average measure; other analyses that quantify trial-by-trial effects or partialed out the impact of last transition are not impacted. To evaluate this issue we used bootstrapping to create surrogate bootstrapped series, for each condition, so that each contained an equal number of alternate and return trials, and evaluated ΔAFO in those. 
These were constructed as follows. For each participant, we counted the number of alternate trials in the pret70 condition (nalt70). We then generated 100 surrogate distributions of 2 × nalt70 elements with all the elements sampled from the pret70 condition: nalt70 elements were sampled with replacement from the return trials and nalt70 elements were sampled with replacement from the alternate trials. This produced 100 bootAFOpret70 distributions. Similarly we calculated the number of returns in the pret30 condition (nret30) and we derived 100 bootAFOpret30 distributions with an equal number of alternate and returns. We could then derive ΔAFO for these bootstrapped series as in the main analysis, boot ΔAFO = mean bootAFOpret70 – mean bootAFOpret30. Averaging across participants we obtained mean boot ΔAFO = 0.27 ± 0.03° in the first 50 trials and mean bootΔAFO = 0.39 ± 0.04° in the second half of trials. These values were significantly different with t(20) = 2.57, p = 0.018, d = 0.53. This analysis demonstrates that it is the distribution of AFO values in alternate and return trials that drives ΔAFO rather than the proportions of the two types of trials. 
RW model validation
To evaluate the validity of the RW models, we used a leave-one-series-out validation scheme on the single-participant level. For each condition, we fit the model parameters from nine of the 10 series, and the resulting parameter set was then evaluated against the left-out series. Specifically, model-derived series were generated by applying the updating scheme of Equation 1 to the true sequence of screen side transitions in the left-out series. Since every series was left out once, and the left-out time series could have a different number of valid trials in each fold, we evaluated the goodness of fit (percentage of explained variance) for each series using the adjusted coefficient of determination, Display Formula\(R_{ad\,j}^2 = 1 - ((n - 1)\,SSE)/((N - k)SST),\) where n is the number of points of the validating series and k is the number of free parameters; SSE is the sum of squared fit errors and SST is the sum of squared deviation from the mean of the series to predict. The reported variance reduction per participant was the mean of the adjusted coefficients of determination calculated for each of the 10 validations. To determine whether the individuals' variance reductions were significantly greater than would be expected by chance, we constructed synthetic series of alternations and returns that predict, through the estimated RW model parameters, the left-out AFO data. This was done by permuting the sequence of screen side transitions in the left-out series (1,000 times). The participant's mean variance reduction was than ranked in relation to the mean distribution of the permuted variance reduction. 
Mutual information
We used Mutual Information (MI) to quantify the amount of information that is conveyed by AFO and SL about the overall statistic of the target locations. Since SL on any given trial depend on whether it is an alternate or a return (due to IOR) we considered this factor in the MI calculation and computed the quantity I(pret;SL|S), where S just identifies whether a trial is an alternation or return (Equation 5):  
\begin{equation}\tag{5}I(pret;SL|S) = \sum\limits_{S \in [0,1]} p (y)\,\sum\limits_{pret \in [30,70]} \, \sum\limits_{sl \in SL} \, p(pret,sl|S)\,{\rm log}\left( {{{p(pret,sl|S)} \over {p(pret|S)p(sl|S)}}} \right)\end{equation}
 
To calculate joint Information and to condition SL on the variable S, we considered three-dimensional joint responses: R = (AFO, SL, S). For each participant, we calculated I(pret; R) implementing bias corrections due to limiting sampling (Panzeri & Treves, 2006), by using the Information Toolbox described in Magri, Whittingstall, Singh, Logothetis, and Panzeri (2009), where relevant technical details can be found. To calculate the Information carried by the joint response of two variables (i.e., AFO and SL) we randomized the trial labels of the third variable (i.e., S) obtaining in this way Rperm; we than calculated I(pret; Rperm). In this example, I(pret; AFO&SL) will be the mean of one hundreds such permutations. The same procedure was applied for one-dimensional responses, with the other two variables randomized independently. We also randomized independently all the three variables to obtain an estimate of residual bias, and we removed this quantity from all the calculated values. 
We calculated conditional information applying chain rule (Cover & Thomas, 1991), for instance: I(pret;SL|S) = I(pret;SL&S) − I(pret;S). Information values were calculated by using the direct method, that is, by discretizing the AFO and SL responses using six equi-populated bins (variable S was already binary). The number of bins was the maximum that allowed having at least four trials for joint response, even when considering half of each series (see Panzeri et al., 2007). But unless explicitly stated, we reported the values calculated in the second half of trials for series. 
Table A1 resumes all the information quantities discussed in the text as well as several other quantifies that are of potential interest. 
Table A1
 
Summary of calculated information about experimental condition (pret) and synergies among variables. Notes: *p < 0.05; **p < 0.01.
Table A1
 
Summary of calculated information about experimental condition (pret) and synergies among variables. Notes: *p < 0.05; **p < 0.01.
LATER model applied to saccade latencies
The LATER model of saccade latency (Carpenter & Williams, 1995) considers the saccade latency as the time needed for a linear evidence accumulator to reach threshold. The accumulation rate (r) is considered to be normally distributed (mean rate μ and standard deviation σ) and these are the only independent parameters of the model. Nevertheless as implemented in prior studies, it is useful to explicitly derive a threshold parameter (θ), since it may reflect the a priori state before the appearance of the target (Noorani & Carpenter, 2016). We applied this model to our SL data, first dividing trials according to last transition, where rj = 𝒩 (μ,σ):  
\begin{equation}\tag{6}S{L_{alt/pret30}} = {{\theta _1} \over {r_1}},S{L_{alt/pret70}} = {{\theta _2} \over {r_2}},S{L_{ret/pret30}} = {{\theta _3} \over {r_3}},S{L_{ret/pret70}} = {{\theta _4} \over {r_4}},\end{equation}
For each condition and for each participant, we then estimated the parameters θ and μ by minimizing the likelihood function, Display Formula\(L = \sum {_i{\rm log}[\phi ({\textstyle{\theta \over {S{L_i}}}} - \mu )],} \)where ϕ is the standard normal probability density function (following Kim et al., 2017). Each parameter was then normalized by its mean across different conditions. In five participants we excluded a small subset (about 5%) of trials whose values on the recinormal plot lay on a line with a smaller slope respect the other points suggestive of express saccade dynamics (see Carpenter & Williams, 1995).  
Changed March 19, 2019: Basque Center on Cognition, Brain and Language (BCBL), San Sebastián, Spain, was added to Magda Altman's affiliation, which was originally Center for Mind/Brain Sciences (CIMeC), The University of Trento, Trento, Italy, only. 
Figure 1
 
Trial structure and fixation locations in Experiment 1. (A) Trial timing. Anticipatory fixation offset (AFO) was defined at the mean gaze location during the last 10 ms of the blank screen that followed the fixation symbol and that preceded the target. AFO was coded as positive if to the side of the last target, negative otherwise. (B) Spatial features of fixation and targets. Targets were positioned on an invisible arc that extended 10° above and below the fixation symbol, at 12° eccentricity. The exact location on the arc was set randomly on each trial. The fixation symbol consisted of an inner gray circle (radius = 0.4°) within an outer black circle (radius = 1.2°).
Figure 1
 
Trial structure and fixation locations in Experiment 1. (A) Trial timing. Anticipatory fixation offset (AFO) was defined at the mean gaze location during the last 10 ms of the blank screen that followed the fixation symbol and that preceded the target. AFO was coded as positive if to the side of the last target, negative otherwise. (B) Spatial features of fixation and targets. Targets were positioned on an invisible arc that extended 10° above and below the fixation symbol, at 12° eccentricity. The exact location on the arc was set randomly on each trial. The fixation symbol consisted of an inner gray circle (radius = 0.4°) within an outer black circle (radius = 1.2°).
Figure 2
 
The impact of statistical structure on AFO. (A) Mean AFO values were significantly greater in pret70 than pret30, and the pattern held for all participants (each participant marked via single line). (B) Partitioning AFO values by most recent transition indicates an effect of statistical structure as well as an impact of most recent transition, as AFO was greater following returns than following alternate trials. Crosses above each bar indicate significant differences from zero. Asterisks above/below bar-pairs indicate a significant difference. (C) Beta weights estimated via regression models indicate that AFO was impacted by a return in each of the last five transitions in the pret70 condition and in each of the last three transitions for pret30.
Figure 2
 
The impact of statistical structure on AFO. (A) Mean AFO values were significantly greater in pret70 than pret30, and the pattern held for all participants (each participant marked via single line). (B) Partitioning AFO values by most recent transition indicates an effect of statistical structure as well as an impact of most recent transition, as AFO was greater following returns than following alternate trials. Crosses above each bar indicate significant differences from zero. Asterisks above/below bar-pairs indicate a significant difference. (C) Beta weights estimated via regression models indicate that AFO was impacted by a return in each of the last five transitions in the pret70 condition and in each of the last three transitions for pret30.
Figure 3
 
Rescorla-Wagner model of AFOs. (A) AFO data from a sample series in pret70 condition (dashed line) and the matched model prediction (continuous line) that was derived from parameter values estimated from independent series. Asterisks on abscissa mark alternate (side-switch) trials. Data are concatenated to exclude missing or invalid values. (B) Distributions of model parameters in the two conditions. From the left: learning rate, scaling factor, and equilibrium point. P0. The equilibrium point significantly departed from 0.5 only in pret70. Asterisks above/below bar pairs indicate significant differences.
Figure 3
 
Rescorla-Wagner model of AFOs. (A) AFO data from a sample series in pret70 condition (dashed line) and the matched model prediction (continuous line) that was derived from parameter values estimated from independent series. Asterisks on abscissa mark alternate (side-switch) trials. Data are concatenated to exclude missing or invalid values. (B) Distributions of model parameters in the two conditions. From the left: learning rate, scaling factor, and equilibrium point. P0. The equilibrium point significantly departed from 0.5 only in pret70. Asterisks above/below bar pairs indicate significant differences.
Figure 4
 
Long-term learning signatures in AFO. (A) AFO values in the 20 random trials (pret = 50%) appended to each experimental series. Average AFO magnitudes indicate confinement to the area of the fixation symbol (<0.4° eccentricity). There was a strong impact of the statistical structure of the series presented prior to the random trials, and independently, a strong impact of the immediately preceding trial. Crosses above each bar indicate significant differences from zero. Asterisks above/below bar pairs indicate significant difference (also in following panels). (B) ΔAFO was defined as the difference between AFO values in the pret70 and pret30 conditions. Its values significantly increased from the first half to the second half of the experimental series. (C) Similar results when quantified via Mutual Information. In all panels, measures of spread indicate variance within condition and are provided for completeness; they are not indicative of effect sizes in within-participant contrasts.
Figure 4
 
Long-term learning signatures in AFO. (A) AFO values in the 20 random trials (pret = 50%) appended to each experimental series. Average AFO magnitudes indicate confinement to the area of the fixation symbol (<0.4° eccentricity). There was a strong impact of the statistical structure of the series presented prior to the random trials, and independently, a strong impact of the immediately preceding trial. Crosses above each bar indicate significant differences from zero. Asterisks above/below bar pairs indicate significant difference (also in following panels). (B) ΔAFO was defined as the difference between AFO values in the pret70 and pret30 conditions. Its values significantly increased from the first half to the second half of the experimental series. (C) Similar results when quantified via Mutual Information. In all panels, measures of spread indicate variance within condition and are provided for completeness; they are not indicative of effect sizes in within-participant contrasts.
Figure 5
 
The impact of statistical structure on saccade latency. (A) Saccade latencies indicate learning of statistical structure in addition to an effect of whether a saccade is a return or alternation. Asterisks above bar pairs indicate significant difference. (B) Trial-level correlations between AFOs and saccade latency. Distributions are plotted for pret70 and pret30, partitioned according to whether the saccades were return or alternate saccades. Asterisks above bars mark significant difference from 0, which held in all conditions apart from alternate trials in pret30.
Figure 5
 
The impact of statistical structure on saccade latency. (A) Saccade latencies indicate learning of statistical structure in addition to an effect of whether a saccade is a return or alternation. Asterisks above bar pairs indicate significant difference. (B) Trial-level correlations between AFOs and saccade latency. Distributions are plotted for pret70 and pret30, partitioned according to whether the saccades were return or alternate saccades. Asterisks above bars mark significant difference from 0, which held in all conditions apart from alternate trials in pret30.
Figure 6
 
Mean Gaze locked to the time of landing at the fixation symbol. In both panels, negative values on the y axis indicate gaze to the left of screen center, and positive values indicate gaze to the right of screen center. The x axis marks time lapsed from the saccade to center. (A) Gaze locations on trials following a target presented on the left. Plots are time-locked to the time at which the saccade to center occurred. Each time point is an average of gaze values in 10 ms bins; shaded areas represent ±SEM. The dashed vertical line indicates the temporal onset of the blank screen (∼410 ms from landing at center). Superimposed (red line; second y axis) is Cohen's effect size in each time bin for the difference in gaze locations between the two conditions. (B) Same as Panel A but for trials following a target on the right.
Figure 6
 
Mean Gaze locked to the time of landing at the fixation symbol. In both panels, negative values on the y axis indicate gaze to the left of screen center, and positive values indicate gaze to the right of screen center. The x axis marks time lapsed from the saccade to center. (A) Gaze locations on trials following a target presented on the left. Plots are time-locked to the time at which the saccade to center occurred. Each time point is an average of gaze values in 10 ms bins; shaded areas represent ±SEM. The dashed vertical line indicates the temporal onset of the blank screen (∼410 ms from landing at center). Superimposed (red line; second y axis) is Cohen's effect size in each time bin for the difference in gaze locations between the two conditions. (B) Same as Panel A but for trials following a target on the right.
Figure A1
 
Fixation location during last 10 ms of pretarget blank screen. To present the effect of stochastic context, fixation locations are presented as function of prior target location. Densities were calculated in 0.1×0.1 degrees2, merging data points from all participants and normalizing to the maximum value for condition. The single dark point marks maximal density and is always at the screen center; red points indicate mean values for condition; inner/outer circles mark areas encompassing 50% and 90% of all fixations. In pret70, gaze locations are slightly, but visibly shifted toward the side of the last presented target.
Figure A1
 
Fixation location during last 10 ms of pretarget blank screen. To present the effect of stochastic context, fixation locations are presented as function of prior target location. Densities were calculated in 0.1×0.1 degrees2, merging data points from all participants and normalizing to the maximum value for condition. The single dark point marks maximal density and is always at the screen center; red points indicate mean values for condition; inner/outer circles mark areas encompassing 50% and 90% of all fixations. In pret70, gaze locations are slightly, but visibly shifted toward the side of the last presented target.
Figure A2
 
Δ AFO calculated in first half of trials (dark gray bars) and in the second half of trials (gray bars) for three different limit settings to the admitted AFO. Crosses above each bar indicate significant differences from zero. Asterisks above bar pairs indicate significant differences.
Figure A2
 
Δ AFO calculated in first half of trials (dark gray bars) and in the second half of trials (gray bars) for three different limit settings to the admitted AFO. Crosses above each bar indicate significant differences from zero. Asterisks above bar pairs indicate significant differences.
Table A1
 
Summary of calculated information about experimental condition (pret) and synergies among variables. Notes: *p < 0.05; **p < 0.01.
Table A1
 
Summary of calculated information about experimental condition (pret) and synergies among variables. Notes: *p < 0.05; **p < 0.01.
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×