**Despite the fact that the velocity threshold method is widely applied, the detection of microsaccades continues to be a challenging problem, due to gaze-tracking inaccuracy and the transient nature of microsaccades. Important parameters associated with a saccadic event, e.g., saccade duration, amplitude, and maximum velocity, are sometimes imprecisely estimated, which may lead to biases in inferring the roles of microsaccades in perception and cognition. To overcome the biases and have a better detection algorithm for microsaccades, we propose a novel statistical model for the tracked gaze positions during eye fixations. In this model, we incorporate a parametrization that has been previously applied to model saccades, which allows us to veridically capture the velocity profile of saccadic eye movements. Based on our model, we derive the Neyman Pearson Detector (NPD) for saccadic events. Implemented in conjunction with the maximum likelihood estimation method, our NPD can detect a saccadic event and estimate all parameters simultaneously. Because of its adaptive nature and its statistical optimality, our NPD method was able to better detect microsaccades in some datasets when compared with a recently proposed state-of-the-art method based on convolutional neural networks. NPD also yielded comparable performance with a recently developed Bayesian algorithm, with the added benefit of modeling a more biologically veridical velocity profile of the saccade. As opposed to these algorithms, NPD can lend itself better to online saccade detection, and thus has potential for human-computer interaction applications. Our algorithm is publicly available at https://github.com/hz-zhu/NPD-micro-saccade-detection.**

^{−1}(Deubel & Bridgeman, 1995; Rolfs, 2009). The microsaccade, which occurs a few times per second, each with a duration spanning from 5–6 ms to about 20–30 ms, is a high-velocity eye movement with very short duration (Engbert, 2006). Microsaccades and saccades generally form a continuum of oculomotor behaviors (Otero-Millan et al., 2013; Zuber, Stark, & Cook, 1965). Commonly in recent years, microsaccades have been differentiated from saccades as the events with amplitude below 1° of vision angle (Sinn & Engbert, 2016). Some microsaccades can be much smaller than that, sometimes coming close to the measurement noise of many eye-tracking devices (for instance, the EyeLink II video-based infrared eye tracker reports a precision of 0.01° of visual angle; Engbert, 2006; Martinez-Conde, Macknik, Troncoso, & Hubel, 2009). Their small size combined with their relatively short duration account partly for the difficulty of microsaccade detection, which represents an ongoing research problem.

*k*denote the discrete time index for gaze position measurements and

^{+}denotes the set of positive integers, i.e., {1, 2, 3, ...};

*X*denotes the actual (noise-free) gaze position as a visual angle at time

_{k}*k*; and

*Z*denotes the measured eye-gaze position in visual angle at time

_{k}*k*. It has been shown that the correlated random walk constitutes an accurate model for the drifting eye-gaze movement (Engbert & Kliegl, 2004; Matin, Matin, & Pearce, 1970). However, the correlation factors in Engbert and Kliegl (2004) are not trivial to estimate from the dataset itself. Thus, we propose the use of a simplified yet more straightforward model based on Engbert and Kliegl (2004), where only first-order correlation is considered. More specifically, we propose the following model for state

*W*and

_{k}*V*are independent;

_{k}*σ*,

_{w}*W*and

_{k}*V*respectively; and

_{k}*β*and variance

*γ*.

*t*= 0:

*T*denotes the transposed operation. We need

*θ*

_{2}> 0 and

*θ*

_{3}> 1. As saccades and microsaccades are thought to share the same generating neural mechanism (Martinez-Conde, Otero-Millan, & Macknik, 2013), we adopt the same generalized exponential function to model a microsaccade. Figure 1 shows three examples of

*h*(

*t*;

*) with different parameters. We can see that the parameter*

**θ***θ*

_{1}controls the amplitude of the function;

*θ*

_{2}and

*θ*

_{3}control the shape of the function. It is also demonstrated that the velocity profile of a microsaccade can be captured by Equation 2,

*v*(

*t*;

*) of a microsaccade as studied in Van Opstal and Van Gisbergen (1987). From Equation 2 and Figure 1, we also notice the following facts: (a)*

**θ***h*(

*t*;

*) converges to*

**θ***θ*

_{1}, as

*t*→ +∞; and (b) the time that

*v*(

*t*;

*) reaches its maximum is dependent on*

**θ***. Therefore, we need to define the beginning and end of a microsaccade event explicitly based on Equation 2. We define the duration of a microsaccade, denoted as*

**θ***d*, that follows Equation 2, as the time span [

*t*

_{1},

*t*

_{2}] that its amplitude raises from 0.5

*θ*

_{1}(1 –

*d*) to 0.5

_{m}*θ*

_{1}(1 +

*d*), where

_{m}*d*∈ (0,1) is a scaling factor. Therefore, during the defined duration of a microsaccade,

_{m}*d*× 100% of the total amplitude

_{m}*θ*

_{1}has finished. Figure 1 helps demonstrate the relation of

*θ*

_{1},

*d*,

_{m}*d*,

*t*

_{1}, and

*t*

_{2}. It can be calculated that:

*d*, and in this article, we choose

_{m}*d*= 0.95. Figure 1 can also help demonstrate the relation between parameters

_{m}*t*

_{1},

*t*

_{2},

*d*,

*h*(

*t*;

*), and*

**θ***v*(

*t*;

*).*

**θ***h*(

*t*;

*) based on the defined starting and end of a microsaccade, as:*

**θ***h*(

_{s}*t*;

*) is the shifted function of*

**θ***h*(

*t*;

*), such that the defined starting time of a microsaccade coincides with the time*

**θ***t*= 0. For sampled data with sampling interval

*t*, at which a microsaccade reaches its maximum velocity,

_{v}*v*:

_{max}*H*(

*k*;

*) can capture the amplitude and velocity profile of a microsaccade parametrically. Therefore, we can build the statistical model as:*

**θ***W*and measurement noise

_{k}*V*follow the same distribution as specified by Equation 1; and the starting value is

_{k}*X*= 0. Comparing Equation 11 with Equation 1, we can note the only difference comes from the term

_{ω}*H*(

_{d}*k*;

*,*

**θ***ω*), and the essence of

*H*(

_{d}*k*;

*,*

**θ***ω*) is that it models a deterministic displacement to represent the microsaccade. It is not hard to see that:

*. Then, we derive the test statistics for NPD under the NPD framework. Lastly, we unify the parameter estimation method with NPD to perform the task of detecting saccadic events, where some postprocessing steps are applied to further enhance the performance of the detector.*

**θ***n*, where

*Z*under state

_{k}– Z_{k–n}*Z*–

_{k}*Z*under two states, we propose Algorithm 1 to estimate

_{k–n}- Get a vector of measured gaze position
Display Formula \({Z_{{k_0}:{k_1}}} = {\left[ {{Z_{{k_0}}},{Z_{{k_0} + 1}},...,{Z_{{k_1}}}} \right]^T}\) - Section
Display Formula \({Z_{{k_0}:{k_1}}}\) intoDisplay Formula \(m\ { \in {\Bbb Z} ^ + }\) chunks, such thatDisplay Formula \({\left[ {{{\boldsymbol Z}_1},...,{{\boldsymbol Z}_m}} \right]^T} = {Z_{{k_0}:{k_1}}}\) - Pick a value
Display Formula \({n_0}\ { \in {\Bbb Z} ^ + }\). Construct*n*_{0}empty setsDisplay Formula \({{\boldsymbol V}_{1:{n_0}}}\), and an all-zero vectorDisplay Formula \({G_{1:{n_0}}}\) -
**for***i*= 1, 2, ...,*m***do** - Let
*Y*denotes the_{k}*k*^{th}element in, and**Z**_{i}has**Z**_{i}*N*elements in total -
**for***j*= 1, 2, ...,*n*_{0}**do** - Calculate vector
Display Formula \({\boldsymbol D} = {Y_{1 + j:N}} - {Y_{1:N - j}}\), and calculateDisplay Formula \(\sigma _D^2 = \Bbb {VAR} \left[{\boldsymbol D} \right]\) -
**for***j*= 1, 2, ...,*n*_{0}**do** - Let
Display Formula \({G_j} = {\boldsymbol {min}}\left[ {{{\boldsymbol V}_j}} \right]\) - Perform linear regression
*G*=_{j}*αj*+*β*with respect to*j*

*be denoted as:*

**D***, and*

**D***D*is the

_{i}*i*th element.

*n*. The key point in Algorithm 1 is Step 10 where we calculate the minimum

*G*=

_{j}*[*

**min***] of the elements*

**V**_{j}*. From Equations 14 through 18, we know that the empirical variance will be larger if (*

**V**_{j}*Z*) follows the state

_{k}– Z_{k–n}*Z*) follows the state

_{k}– Z_{k–n}*n*values using Algorithm 1 with the real gaze data provided in Mihali et al. (2017). The dotted line in Figure 3a depicts the estimated variance from the dataset, and the dashed line in Figure 3a shows the theoretical variance from fitted data points. We note from Figure 3a that as

*n*increases, the estimated value does not always exhibit a linear increment of the same rate. That is because measured gaze position is in the unit of visual angle, which cannot exceed 180°. Therefore, the random walk model can only fit the data when

*n*is small, and in our case,

*n*≤ 30. Also, we notice that the estimated value at

*n*= 1 is an obvious outlier from the rest of the data points. Its existence is due to the limitation of the operation

*n*is very small, the value of

*[*

**min***]. Therefore, in practice, we only use the*

**V**_{j}*n*values range from 2 to 10 to perform the estimation. Figure 3b shows the linear regression results for

*n*= 2, 3, ..., 30. The resulting

*R*

^{2}score is 0.9942, suggesting that our linear model fits the data very well.

_{1}, θ

_{2}, and θ

_{3}

*. Suppose from*

**θ***k*

_{0}to

*k*

_{1}, the measured gaze data,

**Σ**as in Equation 24 because, in our model, only the first order correlation is considered. Thus, all off-diagonal elements are zero except for the elements directly adjacent to the diagonal elements. Mathematically, we can see that the correlation between

*C*and

_{k}*C*and

_{k}*j*> 1, are uncorrelated as they share no correlated factor. Let

*k*

_{0}+ 1 =

*k*

_{2}, and we can write the probability density function of

*as:*

**θ***θ*

_{1},

*k*

_{1}to

*k*

_{2}. Hence the detection of a microsaccade in

*and*

**μ****Σ**are the same as in Equation 22. The log-likelihood function for

*under*

**C***under*

**C***T*. Then the NPD theory tells us that the Neyman Pearson test can be performed as:

*ρ*is the threshold based on the significance value

*α*. We can calculate

*ρ*as:

*P*, equals to

_{FA}*α*. Some common choices of

*α*are 0.05 and 0.01. We can also calculate the probability of detection at significant value

*α*as:

*P*.

_{FA}*L*on the data vector

**Algorithm 2**The NPD for microsaccades

- Get a sequence of tracked eye-gaze positions
Display Formula \({Z_{{k_0}:{k_1}}}\) - Pick a value for
Display Formula \(L\ { \in {\Bbb Z} ^ + }\), which is the length that we want to section vectorDisplay Formula \({Z_{{k_0}:{k_1}}}\) -
**for***i*=*k*_{0},*k*_{0}+ 1, ...,*k*_{1}−*L*+ 1**do** - Let
Display Formula \({\boldsymbol Z} = {Z_{i:i + L - 1}}\), which is a subsection ofDisplay Formula \({Z_{{k_0}:{k_1}}}\) - Calculate
*ρ*using Equation 34 -
**if***T*>*ρ***then** -
*J*= 1_{i} -
**else** -
*J*= 0_{i} -
**if**Display Formula \({\hat \theta _{1ML}}^{(i)} \lt {T_{{\theta _1}}}{\boldsymbol {median}}\left[ {\left| {{{\hat \bitheta }_{1ML}}} \right|} \right]\)**then** -
*J*= 0_{i} - The NPD return the vector
Display Formula \({J_{{k_0}:{k_1} - L + 1}}\) and matrixDisplay Formula \({\hat \bitheta _{{k_0}:{k_1} - L + 1}}\) - Post process
Display Formula \({J_{{k_0}:{k_1} - L + 1}}\) andDisplay Formula \({\hat \bitheta _{{k_0}:{k_1} - L + 1}}\) and yield detected saccadic events

*θ*

_{1}is very close to 0, the difference between

*θ*

_{1}= 0, reducing the proposed detection method to a random guess. To avoid the overfitting and rectify the increased probability of false alarm, a thresholding is needed for the parameter

*θ*

_{1}. Figures A1b and A1d show the estimated

*θ*

_{1}for a real gaze signal. Since

*θ*

_{1}is estimated using state model

*θ*

_{1}is large when a true saccadic event has occurred and

*θ*

_{1}of the input sequence

*T*and

*ρ*if

*i*th element in

*L*to around the maximum possible duration of a microsaccade. Yet, we cannot choose a value too large, otherwise two or more microsaccades can be entirely enclosed in vector

- An entire microsaccade is enclosed in
Display Formula \({Z_{i:i + L - 1}}\) and the microsaccade start at time*j*=*i* - An entire microsaccade is enclosed in
Display Formula \({Z_{i:i + L - 1}}\) and the microsaccade start at time*j*>*i* - A section of a microsaccade is enclosed in
Display Formula \({Z_{i:i + L - 1}}\) and the microsaccade start at time*j*<*i* - A section of a microsaccade is enclosed in
Display Formula \({Z_{i:i + L - 1}}\) and the microsaccade end at time*j*>*i*+*L*− 1

*M*to

*efficiently, the search of candidate parameter pairs (*

**θ***θ*

_{2},

*θ*

_{3}) can be accelerated if we impose more constraints on (

*θ*

_{2},

*θ*

_{3}). One such constraint is that the parameter pair satisfies a maximum velocity and occurs within a specific duration range. These constraints on (

*θ*

_{2},

*θ*

_{3}) significantly accelerate the computation.

*, as well as for*

**θ***v*,

_{max}*t*, and

_{v}*d*, using simulated data where we know the ground truth. The receiver operating characteristic (ROC) curve of the NPD with simulated data is also presented. Then, we compare the proposed detector with the state-of-the-art CNN method, in terms of the area under the curve (AUC) of the ROC plot, and also in terms of the measurement sensitivity index. We then compare the NPD and the BIM methods to evaluate parameter estimation accuracy in terms of the linearity of the main sequence.

*θ*

_{1},

*θ*

_{2}, and

*θ*

_{3}. A thorough test of all combinations between these parameters is not practical. Therefore, we exploit the measurement, signal-to-noise ratio (SNR), to capture the joint behavior of all parameters. The SNR, in most scenarios, directly affects the performance of statistical estimation and detection.

*k*

_{0}= 0,

*k*

_{1}= 100,

*θ*

_{1}= 1,

*θ*

_{2}= 40,

*θ*

_{3}= 40, and

*σ*and

_{w}*σ*were defined earlier, in Equation 1 but, as a reminder, it makes sense to fix the parameter

_{v}*σ*as it relates to an intrinsic oculomotor process and this value captures it well. Note the value of

_{w}*σ*is only fixed in this section for simulation, and is estimated from real data for the detection tasks. However, we can change the value of

_{w}*T*under state

*T*under state

*α*∈ [0, 0.025].

*as well as*

**θ***d*,

*t*, and

_{v}*v*. Only trials with state

_{max}*. Then based on estimated*

**θ***, we employ Equations 9, 10, and 6 to estimate*

**θ***t*,

_{v}*v*, and

_{max}*d*respectively. We measure the performance of the estimator via the mean squared error (MSE). Table 1 shows the parameter estimation results, where MSE for

*θ*

_{1},

*θ*

_{2}, and

*θ*

_{3}are unit free, MSE for

*d*and

*t*have unit of ms

_{v}^{2}, and MSE for

*v*has unit (°/ms)

_{max}^{2}. We can see from Table 1 that the estimation of

*θ*

_{3}has the largest inaccuracy. However, despite the large inaccuracy in the estimated

*θ*

_{3}, the estimated

*d*,

*v*, and

_{max}*t*are still accurate.

_{v}*for each windowed subsection of the input data points; (b) detection result (a binary sequence, where 0 indicates the absence of a saccadic event, and 1 indicates the existence of a saccadic event); (c) the best fitted*

**θ***θ*

_{1},

*θ*

_{2}, and

*θ*

_{3}for each detected event and the corresponding parameters, i.e.,

*t*

_{1}: the starting point of the event;

*t*

_{2}: the end point of the event;

*d*: duration of the event;

*v*: maximum velocity of the saccadic eye movement; and

_{max}*t*: the time index of reaching

_{v}*v*.

_{max}*. The circles and crosses in Figure A1a and A1c indicate the starting and ending points of the event based on the definition discussed in the Statistical modeling of gaze data section. The stars indicate the positions of maximum velocity and their corresponding saccadic events. We also calculate the value of the maximum velocity, and we indicate*

**θ***v*in Figure A1a and A1c. Figure A1b and A1d show the estimated

_{max}*θ*

_{1}. We can see that the estimated

*θ*

_{1}does not depend on the drifting movement of the eye, as it reverts around the zero line when no saccadic event has occurred.

*L*,

*M*,

*α*, and

*P*,

_{D}*P*value pairs. Based on the NPD performances by using the randomly selected parameters, we acquired insights regarding the proper choices of these parameters. The proper choices of parameters for all the datasets share the same properties, and we discuss it in the Discussion and conclusion section.

_{FA}*P*∈ [0, 0.2] and

_{FA}*P*∈ [0.48, 1] on the right) of the detection results for each dataset, and Table 3 summarizes the AUC for each detectors. We can see that other than Dataset B, the NPD yields larger AUC value than that of the CNN method. Therefore, we conclude that our NPD outperforms the CNN method when the dataset does not contain catch-up saccades. Besides AUC, we also use the measurement sensitivity index, denoted as d′, to compare the detector performance. d′ reflects the discriminating ability of the target signal with contamination noise (Macmillan & Creelman, 2004). A larger d′ represents a stronger discriminating power of the detector. Table 3 shows the sensitivity index of each detector for all datasets. The results suggested by the sensitivity index agree with the measurement AUC in Table 3.

_{D}*σ*,

_{w}*σ*, and

_{v}*before producing the detection outcome.*

**θ***R*denotes the

*R*

^{2}score for linear regression. A detailed detection result is summarized in Table 5, where

*N*stands for the number of events detected under different detection methods, and

*N*is the adjusted number of detected events for NPD.

_{a}*N*is calculated as the total number of microsaccades of both eyes minus the number of binocularly occurred microsaccades of one individual, such that it reflects the same quantity as calculated by

_{a}*N*(BIM). Due to the high noise level of Dataset 4

*X*through 5

*Y*, as reflected by large

*σ*and

_{v}*σ*from each dataset, we can notice that NPD can detect more events when noise is small. That is due to the false alarm probability constraint imposed within NPD once we set

_{w}*α*. When noise is huge (

*σ*and

_{v}*σ*are large), even the true saccadic events may not pass the test, thus reducing the total amount of detected events for NPD. The advantage of this mechanism is that it guarantees that all detected events are reliable at a significance level

_{w}*α*. However, this may lead to the decrease of the probability of detection as noise level increases. To counteract the decrease of

*P*, we can increase the value of

_{D}*α*, at a cost of increasing

*P*. BIM, on the contrary, does not encounter this kind of problem, as it can detect more or less the same amount of events regardless of the noise level. But, we cannot know to what extent that a detected event from BIM is a false alarm.

_{FA}*M*[the strength of the median filter],

*L*[the section length for detection],

*α*, and

*M*and

*L*can be adjusted proportionally to the sampling rate difference. Finally, when dealing with data with missing values, or when catch-up saccades are present in the dataset, the NPD is less reliable. To extend our NPD to a method that is capable of dealing with more application scenarios, a more complex statistical model needs to be developed. The length of the dataset is also worth mentioning. Since we need to perform parameter estimation for

*σ*and

_{v}*σ*based on the data itself, a longer (more than 1,000 data points) trial of gaze signal is preferred for parameter estimation accuracy.

_{w}*α*. The second is the online applicability. Different from the BIM or CNN methods, which cannot be implemented online, our detection method is strictly causal, and the detection can be made once a microsaccade has finished. One modification needed based on our proposed algorithm is that we need to introduce a calibration phase at the beginning for the estimation of

*σ*,

_{v}*σ*, and

_{w}*Archives of Ophthalmology*, 12 (4), 475–483.

*Journal of Neurophysiology*, 121 (2), 646–661.

*Medical and Biological Engineering*, 6 (2), 171–179.

*Statistical inference*(vol. 2). Pacific Grove, CA: Duxbury.

*arXiv:*1312.6410, https://arxiv.org/ftp/arxiv/papers/1312/1312.6410.pdf.

*Eye movement basics for the clinician*. Maryland Heights, MO: Mosby.

*Journal of Neuroscience Methods*, 235, 157–168.

*Vision Research*, 35 (4), 529–538.

*Vision: How it works and what can go wrong*. Cambridge, MA: MIT Press.

*Progress in Brain Research*, 154, 177–192.

*Psychological Science*, 15 (6), 431–431.

*Journal of Vision*, 13 (8): 27, 1–13, https://doi.org/10.1167/13.8.27. [PubMed] [Article]

*Quantum detection theory*. Reproduced by the National Technical Information Service, Springfield, VA.

*Vision Research*, 122, 93–104.

*SIAM Journal on optimization*, 9 (1), 112–147.

*Detection theory: A user's guide*. Psychology Press.

*Trends in Neurosciences*, 32 (9), 463–475.

*Nature Reviews Neuroscience*, 14 (2), 83.

*Vision Research*, 10 (9), 837–857.

*Journal of Vision*, 17 (1): 13, 1–23, https://doi.org/10.1167/17.1.13. [PubMed] [Article]

*Journal of Vision*, 14 (2): 18, 1–17, https://doi.org/10.1167/14.2.18. [PubMed] [Article]

*Proceedings of the National Academy of Sciences*, 110 (15), 6175–6180.

*Vision Research*, 49 (20), 2415–2441.

*Computer graphics forum*(vol. 34, pp. 299–326).

*Journal of Vision*, 18 (6): 19, 1–16, https://doi.org/10.1167/18.6.19. [PubMed] [Article]

*Vision Research*, 118, 132–143.

*Vision Research*, 27 (5), 731–745.

*Detection, estimation, and modulation theory, Part I: detection, estimation, and linear modulation theory*. John Wiley & Sons.

*Foundations of vision*(vol. 8). Sunderland, MA: Sinauer Associates.

*Statistical modeling and tracking of artery wall motion in ultrasound images*(Unpublished doctoral dissertation). McGill University, Montreal, Canada.

*International Journal of Computer Assisted Radiology and Surgery*, 14 (7), 1107–1115.

*Science*, 150 (3702), 1459–1460.

*, the parameter*

**μ***θ*

_{1}is a scaling factor, we can define

*θ*

_{1}:

*i*

^{th}element in