**Microsaccades are high-velocity fixational eye movements, with special roles in perception and cognition. The default microsaccade detection method is to determine when the smoothed eye velocity exceeds a threshold. We have developed a new method, Bayesian microsaccade detection (BMD), which performs inference based on a simple statistical model of eye positions. In this model, a hidden state variable changes between drift and microsaccade states at random times. The eye position is a biased random walk with different velocity distributions for each state. BMD generates samples from the posterior probability distribution over the eye state time series given the eye position time series. Applied to simulated data, BMD recovers the “true” microsaccades with fewer errors than alternative algorithms, especially at high noise. Applied to EyeLink eye tracker data, BMD detects almost all the microsaccades detected by the default method, but also apparent microsaccades embedded in high noise—although these can also be interpreted as false positives. Next we apply the algorithms to data collected with a Dual Purkinje Image eye tracker, whose higher precision justifies defining the inferred microsaccades as ground truth. When we add artificial measurement noise, the inferences of all algorithms degrade; however, at noise levels comparable to EyeLink data, BMD recovers the “true” microsaccades with 54% fewer errors than the default algorithm. Though unsuitable for online detection, BMD has other advantages: It returns probabilities rather than binary judgments, and it can be straightforwardly adapted as the generative model is refined. We make our algorithm available as a software package.**

^{60,000}for 1 min of data sampled at 1 kHz), we use a suitable search algorithm from the class of Markov-chain Monte Carlo (MCMC) sampling algorithms.

*T*. We assume that at time

*t*, the eye is in a hidden state

*C*, which is 0 for a drift/tremor state and 1 for a microsaccade state (Figure 2). As long as the eye remains in the same state, its two-dimensional velocity

_{t}*v*remains constant; when the eye switches state, its velocity changes. Of note, velocity here does not represent the derivative of the measured eye position, but rather an unobservable underlying variable.

_{t}*z*gets augmented with the velocity and Gaussian motor noise with covariance matrix Σ

_{t}*. This eye position is augmented with Gaussian measurement noise with covariance matrix Σ*

_{z}*(independent across time points), yielding the measured eye position*

_{x}*x*.

_{t}*t*where

*C*≠

_{t}*C*

_{t}_{−1}, and denote the

*i*th change point as

*τ*. The duration for which the eye stays in a given state is then Δ

_{i}*τ*=

_{i}*τ*

_{i}_{+1}−

*τ*. We assume that C is a semi-Markov process, which means that these durations are independent. In a hidden Markov model (Bishop, 2006), the probability of

_{i}*C*depends only on the previous state,

_{t}*C*

_{t}_{−1}; however, in a hidden semi-Markov model (also called an explicit-duration hidden Markov model; Yu, 2010), the durations over which the state remains unchanged are independent. Then the prior probability of C is where

*p*(Δ

*τ*|

_{i}*Cτ*) is the state-specific probability of the duration. Specifically, we use a gamma distribution with shape parameter 2 and scale parameter

_{i}*k*: where

*k*=

*k*

_{0}if

*C*= 0 (drift/tremor) and

*k*=

*k*

_{1}if

*C*= 1 (microsaccade). We choose this distribution because it makes very short and very long durations unlikely, consistent with previously reported distributions of durations for drift and microsaccades (Engbert, 2006). Assumptions about the frequency and duration of microsaccades are reflected in the choices of parameters

*k*

_{0}and

*k*

_{1}. We chose

*k*

_{0}= 4 s

^{−1}and

*k*

_{1}= 100 s

^{−1}, corresponding to median durations of 260 ms for drift and 10 ms for microsaccades (Figure 3A, B), which are realistic (Engbert, 2006). We will later examine the robustness of our results to variations in

*k*

_{0}and

*k*

_{1}(Figure A6).

*τ*, we draw the velocity

_{i}*v*from a state-specific probability distribution

_{τi}*τ*

_{i}_{+1}. The distribution of the velocity time series v given an eye state time series C is

*v*in polar coordinates,

*v*= (

*r*cos

*θ*,

*r*sin

*θ*)

^{T}, and assume that in both states, the direction of the velocity

*θ*is uniformly distributed and its magnitude

*r*follows a generalized gamma distribution where

*d*=

*d*

_{0}and

*σ*=

*σ*

_{0}if

*C*= 0, and

*d*=

*d*

_{1}and

*σ*=

*σ*

_{1}if

*C*= 1. Note that our definition of the generalized gamma distribution differs from that of Stacy (1962) by a reparametrization

*d*→

*d*+ 1,

*σ*→

*σ*

*d*

_{0}to 1, which is equivalent to assuming that the distribution of the two-dimensional velocity in the drift/tremor state is a circularly symmetric Gaussian with standard deviation

*σ*

_{0}. The other parameters

*d*

_{1}and

*σ*

_{1}control the shape and scale, respectively, of the distribution of microsaccade velocities. Figure 3B shows examples of these velocity distributions.

*=*

_{z}*=*

_{x}*p*(C) and the likelihood

*p*(x|C):

*p*(C|x). First we need to evaluate the likelihood in Equation 9. This is difficult, because we need to integrate both over the velocities at all change points,

*σ*

_{0},

*σ*

_{1},

*d*

_{1},

*σ*, and

_{z}*σ*. A fully Bayesian algorithm would require priors over these parameters to jointly infer the parameters together with the eye state time series C. This too is intractable.

_{x}*x*

_{0}and

*x*

_{T}_{+1}. For these, we choose

*x*

_{0}=

*x*

_{1}−

*ε*and

*x*

_{T}_{+1}=

*x*+

_{T}*ε*, where

*ε*= (10

^{−4}, 10

^{−4})deg. We need to include the offset

*ε*to avoid numerical instabilities in our implementation. Finally, we subtract the resulting value of

*x*

_{0}from every point in the time series, so that

*x*

_{0}= 0; this has no effect on the detected microsaccades.

_{1}, d

_{1}, σ

_{0}

*d*

_{0}to 1. We initialize

*σ̂*

_{0},

*σ̂*

_{1}, and

*d̂*

_{1}to random values drawn from reasonable ranges (

*σ̂*

_{0}: [0.1, 5] deg/s,

*σ̂*

_{1}: [5, 100] deg/s, and

*d̂*

_{1}: [1.1, 5]). We initialize C by setting

*C*to 1 for time points

_{t}*t*where ||

*x*−

_{t}*x*

_{t}_{−1}|| is in the highest percentile, and to 0 otherwise.

*σ*and

_{x}*σ*given a measured eye position time series x and an estimated eye state time series Ĉ. As stated before, we can disentangle motor and measurement noise because, in our generative model, motor noise accumulates over time while measurement noise does not. Specifically, the autocovariance function of x conditioned on v at time lag

_{z}*s*is

*s*; this gives a slope and a y-intercept. Our estimates of the motor noise and measurement noise are

*p*(x|C) in Equation 9, because the integral over z is both analytically and numerically intractable. However, the integral over

*v̄*is the average velocity between the change points; its standard deviation is of the order of

*σ*. Therefore, if either

_{z}*v̄*or

*τ*

_{i}_{+1}−

*τ*is sufficiently large (we expect the former to hold for microsaccades and the latter for drift), we can neglect the uncertainty in

_{i}*p*(C|ẑ,

*σ*

_{0},

*σ*

_{1},

*d*

_{1},

*σ*) using MCMC sampling with Metropolis–Hastings acceptance probabilities. Using the prior over velocities, Equation 5, and the property of the delta function, we can compute the posterior as

_{z}*τ*

_{i}_{+1}−

*τ*, and implicitly the eye state

_{i}*d*and

*σ*in the prior

*I*

_{0}the modified Bessel function of the first kind. (For details, see the Appendix under From Equation 16 to Equation 17.) We solve this integral analytically in the limits

*a*→ 0 and

*a*→ ∞, which correspond to ||Δ

*z*|| → ∞ and ||Δ

*z*|| → 0, respectively. For intermediate

*α*, we solve the integral numerically.

*p*(C|ẑ) are presented in the Appendix under MCMC sampling. The MCMC algorithm returns a set of 40 samples Ĉ

*. On the last iteration, we convert these samples into a probability time series by averaging them. For some applications, we subsequently transform from probabilities to binary values by thresholding at 0.5. This operation minimizes the absolute-value cost function*

^{j}*σ*

_{0},

*σ*

_{1}, and

*d*

_{1}by maximizing

*p*(ẑ|Ĉ

*,*

^{j}*σ*

_{0},

*σ*

_{1},

*d*

_{1},

*σ*) with a grid search for each sample Ĉ

_{z}*and then taking the median across samples. The grid ranges are [0.1, 100] deg/s for*

^{j}*σ*

_{0}and

*σ*

_{1}and [1.1, 5] for

*d*

_{1}.

*η*for the horizontal dimension and

_{x}*η*for the vertical dimension for a sufficiently long duration. The thresholds are adaptively set as a multiple of the standard deviation of the eye movement velocity, using a median-based estimate of the standard deviation:

_{y}*λ*, and the minimum duration are free parameters set by the user. Of these,

*λ*tends to have the largest effect on the detected microsaccades.

*λ*= 6. This value is used in most subsequent studies. Other studies have used a more liberal threshold (Engbert & Mergenthaler, 2006). We consider two particular cases of

*λ*= 3 and

*λ*= 6, which we will refer to as EK3 and EK6, respectively.

*k*-means, to group putative events obtained from the EK algorithm into clusters of microsaccades or drift. The algorithm separates drift and microsaccade events using three features: peak velocity, initial acceleration peak, and final acceleration peak. Here we use the implementation provided by Otero-Milan et al. as obtained from their website (http://smc.neuralcorrelate.com/sw/microsaccade-detection/).

^{2}). An Apple iMac computer running MATLAB 7.1 (MathWorks, Natick, MA) with the Psychtoolbox (Brainard, 1997; Kleiner et al., 2007; Pelli, 1997) and EyeLink (Cornelissen, Peters, & Palmer, 2002) extensions controlled stimulus presentation and response collection. We recorded eye movements using a remote infrared video-oculographic system (EyeLink 1000; SR Research, Ltd., Mississauga, Ontario, Canada) with a 1-kHz sampling rate, precision of 0.01 deg, and average accuracy of 0.25°–0.5 deg, according to the manual (but see Holmqvist et al., 2011; Poletti & Rucci, 2016). We acquired eye position data with the EyeLink software. We set the “Heuristic filter” option to off to obtain the raw data.

*T*= 60,000 ms according to the generative model. We created every combination of six chosen values of motor noise and six values of measurement noise. We fixed the velocity distribution parameters at

*σ*

_{0}= 0.3°/s,

*d*

_{1}= 4.4, and

*σ*

_{1}= 30°/s, to approximate realistic microsaccade kinematics (Engbert, 2006). We inferred the eye state time series with the BMD algorithm and the standard EK algorithm, which uses a velocity threshold multiplier of 6 (referred to as EK6). After thresholding the BMD inferences, we evaluated their performance in terms of the hit rate (defined as the proportion of 1s correctly identified in the C time series) and the false-alarm rate (the proportion of 1s wrongly identified in the C time series; Figure 4). While the velocity distribution parameters were not perfectly recovered (Figure A3), the BMD hit rates were very high (Figure 4A). The hit rate of the BMD algorithm decreases with increased motor noise, as in standard signal detection theory, but it is remarkably robust to increased measurement noise. By contrast, the hit rate of EK6 is lower and more affected by the noise level. In EK6, the false-alarm rate decreases with increasing noise because the threshold adapts to the noise level. Across the board, BMD has false-alarm rates comparable to EK6's but much higher hit rates, especially at high noise.

*λ*= 3 (EK3; Figure 5). As performance metrics, we use the error rate in identifying the eye state at every time point, the number of microsaccades per unit time, and the hit and false-alarm rates. BMD has a lower error rate than all alternative algorithms in 30 out of 36 noise levels. As in Figure 4, the improvement of BMD over alternative algorithms is larger for higher noise. BMD has a hit rate close to 1 in all but the highest level of motor noise, whereas the false-alarm rate is comparable to those of other algorithms. The BMD algorithm is more robust than all other algorithms: Its hit rate and microsaccade rate vary only weakly with increasing measurement noise.

*k*

_{0}and

*k*

_{1}by an order of magnitude and show that the inferred microsaccade rate is approximately constant (Figure A6), making the BMD algorithm robust to the choice of the prior in a plausible range.

*p*(C|ẑ).

_{reduced}+ threshold, starts with Steps 0–2 of the BMD algorithm. However, instead of sampling from the posterior

*p*(C|ẑ) in Step 3, it estimates C by applying a Kalman smoother (after the Kalman smoother of Step 2) to ẑ to obtain a smoothed eye position time series, differentiating that to obtain eye velocities, and thresholding the velocity time series (Figure 13). We fix the window size of the Kalman smoother to 5.32 ms and use a threshold which scales linearly with the inferred motor noise level: threshold =

*a*σ̂

*+*

_{z}*b*, with

*a*= 32

*b*= 1 deg/s. We chose these values to approximately match the output of BMD and BMD

_{reduced}+ threshold in real and simulated data. This method performs about as well as the full inference algorithm. However, it is unprincipled, does not return a probabilistic estimate, and cannot be directly extended to more sophisticated generative models.

_{reduced}+ threshold, can sometimes approach BMD's performance.

_{reduced}+ threshold, with a Kalman filter (only the forward filter) instead of the Kalman smoother.

*σ*

_{0}and underestimates

*d*

_{1}, which makes the drift and microsaccade velocity distributions less separable. We could solve the incorrect microsaccade inference with some post hoc processing; however, this would introduce arbitrariness. Instead, we accept this as a failure mode of our algorithm: rare, exclusively at low noise, and easily detectable.

*, 44 (23), 2675–2690, doi:10.1016/j.visres.2004.05.009.*

*Vision Research**. Mineola, NY: Dover.*

*Handbook of mathematical functions*

*Journal of Eye Movement Research**,*4 (2), doi:10.16910/jemr.4.2.2.

*Journal of Eye Movement Research**,*3 (5), doi:10.16910/jemr.3.5.1.

*. Secaucus, NJ: Springer-Verlag.*

*Pattern recognition and machine learning**, 10 (4), 433–436.*

*Spatial Vision**, 33 (12), 5375–5386, doi:10.1523/jneurosci.3703-12.2013.*

*The Journal of Neuroscience**. St. Louis, MO: Mosby.*

*Eye movement basics for the clinician**, 34 (4), 613–617, doi:10.3758/bf03195489.*

*Behavior Research Methods, Instruments, & Computers**, 46 (11), 987–988.*

*Journal of the Optical Society of America**, 24 (4), 527–537, doi:10.1364/ao.24.000527.*

*Applied Optics**, 235, 157–168, doi:10.1016/j.jneumeth.2014.06.020.*

*Journal of Neuroscience Methods**, 119 (1), 1–17.*

*The Journal of Physiology**, 7 (23), 3910–3916.*

*Physical Chemistry Chemical Physics**, 154, 177–192.*

*Progress in Brain Research**, 43 (9), 1035–1045.*

*Vision Research**, 103 (18), 7192–7197, doi:10.1073/pnas.0509557103.*

*Proceedings of the National Academy of Sciences, USA**, 108 (39), 16149–16150, doi:10.1073/pnas.1102730108.*

*Proceedings of the National Academy of Sciences, USA**, 42 (22), 2533–2545, doi:10.1016/s0042-6989(02)00263-8.*

*Vision Research**, 107 (7), 1904–1916, doi:10.1152/jn.01125.2011.*

*Journal of Neurophysiology**, 31 (43), 15219–15230, doi:10.1523/jneurosci.3106-11.2011.*

*The Journal of Neuroscience**, 8 (1), http://dx.doi.org/10.16910/jemr.8.1.1.*

*Journal of Eye Movement Research**Oxford, UK: Oxford University Press*.

*, 18 (4), 356–363.*

*Psychological Science**, 82 (1), 35–45.*

*Journal of Basic Engineering**, 36, ECVP Abstract Supplement.*

*Perception**, 13 (12), 1549–1553, doi:10.1038/nn.2663.*

*Nature Neuroscience**, 45 (6), 721–730, doi:10.1016/j.visres.2004.09.029.*

*Vision Research**, 18 (4), 364–366, doi:10.1111/j.1467-9280.2007.01904.x.*

*Psychological Science**, 49 (2), 297–305, doi:10.1016/j.neuron.2005.11.033.*

*Neuron**, 21, 1087–1092.*

*Journal of Chemical Physics**. Oxford, UK: Clarendon Press.*

*Monte Carlo methods in statistical physics**, 118, 17–24, doi:10.1016/j.visres.2014.11.007.*

*Vision Research**, 10 (4), 437–442, doi:10.1163/156856897x00366.*

*Spatial Vision**, 25 (24), 3253–3259, doi:10.1016/j.cub.2015.11.004.*

*Current Biology**, 23 (17), 1691–1695, doi:10.1016/j.cub.2013.07.007.*

*Current Biology**, 118, 83–97, doi:10.1016/j.visres.2015.01.018.*

*Vision Research**, 40 (6), 687–701.*

*Journal of Experimental Psychology**, 3 (8), 1445–1450.*

*AIAA Journal**, 49 (20), 2415–2441, doi:10.1016/j.visres.2009.08.010.*

*Vision Research**, 15 (10), 705–707, doi:10.1111/j.0956-7976.2004.00744.x.*

*Psychological Science**, 447 (7146), 852–855, doi:10.1038/nature05866.*

*Nature**(pp. 923–928). Hillsdale, NJ: Lawrence Erlbaum Associates.*

*Proceedings of the Twentieth Annual Conference of the Cognitive Science Society**(pp. 71–78 ). New York: ACM.*

*In Proceedings of the symposium on eye-tracking research & applications**(pp. NN–NN), doi:10.1145/285749/2857512. Location: Publisher.*

*Proceedings of the ninth biennial ACM Symposium on Eye Tracking Research & Applications**, 19 (6), 675–683, doi:10.1016/0042-6989(79)90243-8.*

*Vision Research**A generalization of the gamma distribution*. The Annals of Mathematical Statistics, 33(3), 1187–1192.

*, 25 (2), 137–142, doi:10.3758/bf03204486.*

*Behavior Research Methods, Instruments, & Computers**, 243 (8), 791–797, doi:10.1007/s00417-005-1148-3.*

*Graefe's Archive for Clinical and Experimental Ophthalmology**Retrieved from http://www.cs.unc.edu/~welch/media/pdf/kalman_intro.pdf*

*, 174 (2), 215–243, doi:10.1016/j.artint.2009.11.011.*

*Artificial Intelligence**, 34 (41), 13693–13700, doi:10.1523/jneurosci.0582-14.2014.*

*The Journal of Neuroscience**, 150 (3702), 1459–1460, doi:10.1126/science.150.3702.1459.*

*Science**z*=

_{t}*z*−

_{t}*z*

_{t}_{−1}. We then expand the product

*v*into the second integral:

_{t}*ζ*and

*ξ*represent independent Gaussian noise with standard deviation

*σ*and

_{z}*σ*, respectively. These equations represent a special case of the Kalman update equations (Kalman, 1960; Welch & Bishop, 2006); therefore, the maximum-likelihood estimate of z given x is a special case of a Kalman smoother. In a Kalman filter, the goal would be to predict a future state

_{x}*z*based on the observations

*x*so far. However, since we have access to the entire time series, the correct inference of

*z*is given by a Kalman smoother. This can be implemented using the RTS algorithm (Rauch et al., 1965; Terejanu, 2008). In our case, this takes the form of a Kalman filter forward in time, followed by another Kalman filter backward in time. The forward Kalman filter is:

*P*is the variance of the posterior over

_{t}*ŷ*, and

_{t}*K*is the Kalman gain. If

_{t}*P*is large, we expect large changes in the states, so we need to be able to update the estimates with new incoming measurements

_{t}*x*. Higher weighting of the incoming measurements is achieved with increased Kalman gains. However, if the measurement noise

_{t}*σ*is high, the observation

_{x}*x*is less reliable and the Kalman gain will decrease accordingly, weighting the observation less when estimating the state

_{t}*ŷ*.

_{t}*P*does not depend on the observations x, only on

_{t}*σ*and

_{z}*σ*, and on

_{x}*P*

_{t}_{−1}through a recurrence relation that follows from Equation 23:

*P*at each time point given a choice for

_{t}*P*

_{0}, the variance of the prior over the first time point. The choice of

*P*

_{0}affects the variance

*P*at early times, but not for

_{t}

_{t}_{→∞}

*P*

_{t}_{−1}= lim

_{t}_{→∞}

*P*=

_{t}*P*. Plugging this into the forward-update equations yields the quadratic equation

*P*

_{0}=

*P*, which implies

*P*=

_{t}*P*for each time point. Therefore, the Kalman gain

*K*is also constant across time. Plugging this Kalman gain into Equation 12 allows us to express the state estimation equation for

_{t}*ŷ*in terms of the previous estimate

_{t}*ŷ*

_{t}_{−1}, current observation

*x*, and process and noise standard deviations:

_{t}*ẑ*to be equal to

_{T}*ŷ*and again set the prior variance of

_{T}*ẑ*equal to the asymptotic limit

_{T}*P*. This backwards filter has a different Kalman gain than the first filter; the RTS update equations in our case yield

*I*

_{0}is the modified Bessel function of the first kind of order zero. Next we change variables from

*r*to

*I*(Δ

*z*, Δ

*τ*,

*d*,

*σ*) by evaluating the integral and plugging in

*α*, we can replace the Bessel function with asymptotic approximations and solve the resulting integrals. Specifically, we define upper and lower bounds

*α*

_{∞}(

*d*) and

*α*

_{0}(

*d*). For

*α*<

*α*

_{0}(

*d*), we use the large-

*s*approximation to the Bessel function (Abramowitz & Stegun, 1965),

*α*>

*α*

_{∞}(

*d*), we approximate

*I*

_{0}(

*s*) by its Taylor series around

*s*= 0 (Abramowitz & Stegun, 1965):

*α*and

*d*, and the corresponding value of log

*A*(

*α*,

*d*), which we compute numerically using MATLAB's integral command. For

*α*

_{0}(

*s*) <

*α*<

*α*

_{∞}(

*s*), we evaluate log

*A*(

*α*,

*d*) by linearly interpolating between entries in the table. Interpolation is a slow operation, so we replace

*I*

_{0}(

*s*) with asymptotic approximations in the limit of small and large

*α*. This causes some error, which grows as

*α*deviates from 0 or ∞. We choose

*α*

_{∞}(

*d*) and

*α*

_{0}(

*d*) such that the total error in log

*A*(

*α*,

*d*) is less than 0.003 (Figure A1).

*p*(C|ẑ,

*σ̂*,

_{x}*σ̂*,

_{z}*σ̂*

_{0},

*σ̂*

_{1},

*d̂*

_{1}). We use an MCMC method (Newman & Barkema, 1999), which performs a biased random walk in the space of all such time series. On each step, we generate a new time series C

^{new}by randomly mutating the current C in one of six possible steps (Figure A2). To concisely express these steps, we reparametrize each time series C in terms of its change points

*τ*, and separately keep track of time points where the eye state changes from drift to microsaccade (

*τ*

_{01}) and from microsaccade back to drift (

*τ*

_{10}). The six steps in our MCMC sampling scheme are as follows:

*g*(C → C

^{new}), which in general does not necessarily equal

*g*(C

^{new}→ C). The MCMC algorithm accepts any of these steps with an acceptance probability

*A*(C → C

^{new}). To sample from the correct posterior distribution, the Markov chain in a Monte Carlo algorithm has to satisfy detailed balance, which ensures that the system makes transitions in and out of every state with compatible probabilities:

*β*, which modify the acceptance probability to where we have split up the posterior into a prior and a likelihood. The lower the temperature (increased

*β*), the less likely the Markov chain will accept steps which reduce the likelihood. Therefore, low-temperature chains are strongly attracted by likelihood maxima (local or global), whereas high-temperature chains explore the space more widely. In the infinite-temperature limit, the Markov chain samples from the prior

*p*(C). The strength of parallel tempering consists in allowing neighboring chains to exchange information by attempting to swap their configurations and accepting swaps with a probability

*β*= 1 sample from the correct posterior probability distribution

*p*(C|z,

*σ*,

_{z}*σ*

_{1},

*d*

_{1},

*σ*

_{0},

*d*

_{0}). We choose the set of temperatures in our simulation to span the full range between

*β*= 0 and

*β*= 1, with significant overlap in the distribution of posterior values between successive chains, so that swaps are accepted with a nonzero probability.