**An increasing number of researchers record binocular eye-tracking signals from participants viewing moving stimuli, but the majority of event-detection algorithms are monocular and do not consider smooth pursuit movements. The purposes of the present study are to develop an algorithm that discriminates between fixations and smooth pursuit movements in binocular eye-tracking signals and to evaluate its performance using an automated video-based strategy. The proposed algorithm uses a clustering approach that takes both spatial and temporal aspects of the binocular eye-tracking signal into account, and is evaluated using a novel video-based evaluation strategy based on automatically detected moving objects in the video stimuli. The binocular algorithm detects 98% of fixations in image stimuli compared to 95% when only one eye is used, while for video stimuli, both the binocular and monocular algorithms detect around 40% of smooth pursuit movements. The present article shows that using binocular information for discrimination of fixations and smooth pursuit movements is advantageous in static stimuli, without impairing the algorithm's ability to detect smooth pursuit movements in video and moving-dot stimuli. With an automated evaluation strategy, time-consuming manual annotations are avoided and a larger amount of data can be used in the evaluation process.**

*fixations*and

*saccades*. A fixation is when the eye is more or less still and visual information is taken in. A saccade is instead a fast eye movement that redirects the eye from one position to the next. When the eye follows a smoothly moving target, the eye movement is called a

*smooth pursuit*. In order to see a moving object clearly during smooth pursuit, the object must be aligned with the direction of gaze. When the object is not perfectly followed by the eye, small corrective saccades are used to realign the direction of the gaze to that of the moving object. A smooth pursuit is divided into two stages: open-loop and closed-loop (Leigh & Zee, 2006, p. 219). The open-loop stage is the initial stage when the smooth pursuit is initiated by a movement of an object. The second, closed-loop stage is a feedback system where the velocity of the eye is controlled in order to keep the eye on the moving object. The upper limit for the velocity of a smooth pursuit movement in a natural task is above 100°/s (Hayhoe, McKinney, Chajka, & Pelz, 2012). No lower limit for smooth pursuit velocity seems to exist, and the pursuit system can operate in the same velocity range as fixational eye movements (Martins, Kowler, & Palmer, 1985).

*v*(

*n*) is calculated as where

*x*(

*n*) and

*y*(

*n*) are the position signals in the

*x*and

*y*direction, respectively, and

*F*

_{s}is the sampling frequency of the signals.

*v*(

^{e}*n*) for each eye separately is compared to

*n*), which is

*v*(

^{e}*n*) filtered with a median filter of length 3, where

*e*= {L, R} for the left and right eye. The length of the median filter is chosen as 3 in order to be able to filter out one-sample spikes that are confused with saccades. The residual signal

*r*

^{e}(

*n*) =

*v*

^{e}(

*n*) −

*n*) is calculated to contain the spike, where

*v*(

^{e}*n*) =

*is calculated as the ratio between the residual signal and the original*

^{e}*v*(

^{e}*n*), and is calculated for each eye as where

*M*is the number of samples in the saccade (see Figure 2 for an example). If SI

*>*

^{e}*η*

_{SI}for both eyes, the detected saccade is classified as noise and the two previous intersaccadic intervals which were split by the putative saccade are merged into one. A saccade is considered to be correctly classified if SI

*≤*

^{e}*η*

_{SI}for both eyes. Moreover, if SI

*≤*

^{e}*η*

_{SI}for one of the eyes, the amplitude of that saccade is determined, and if the amplitude is larger than

*η*

_{SA}the saccade is considered to be correctly classified.

*I*

_{hf}(

*i*), representing the energy of the differential signal for each window

*i*, is calculated as where

*x*(

^{e}*n*) and

*y*(

^{e}*n*) represent the respective coordinates of the eye-tracking signals for each eye

*e*. The number of samples in the sliding window is denoted

*N*, and

*i*= 1, …,

*M*, where

*M*is the number of windows in the intersaccadic interval. For each intersaccadic interval and for each eye separately, the maximum values of the high-frequency noise-content indices are calculated:

*S*, where

*A*= 0,

*K*= 1,

*Q*= 0.001,

*P*= 3000,

*B*= 0.001, and

*ν*=

*Q*. The parameters of the function

*S*(

*a*) are determined by visual inspection of the complete database to best separate high-frequency noise from data with good quality. Three examples of signals with their corresponding spectra are shown in Figure 3. The three signals have low, medium, and high levels of noise, and the parameters of the generalized logistic function were chosen to pass only the signal with the lowest noise level (see Figure 4). The range of the generalized logistic function

*S*(

*a*) is from 0 to 1, where 0 indicates a low level of high-frequency noise content and 1 indicates a high level of high-frequency noise content. Therefore, all intersaccadic intervals where

*η*or

_{S}*η*are considered to have high enough quality to be further classified into events.

_{S}*S*(

*a*) <

*η*, a classifier is applied. The classifier consists of the following steps: directional clustering, binary filters, and classification. In directional clustering, each consecutive pair of samples is clustered based on their direction. The clustering is later used in the algorithm by binary filters, which are filters that enhance fixations or smooth pursuit movements. In the classification step, the output signals from the binary filters are summed, and the final classification into fixations and smooth pursuit movements is based on the summation signal.

_{S}*x*- and

*y*-coordinates, the sample-to-sample direction

*α*(

*n*) is calculated. It is defined as the angle between the line connecting consecutive pairs of

*x*- and

*y*-coordinates and the

*x*-axis. The sample-to-sample directions are mapped onto the unit circle and clustered using the iterative minimum-squared-error clustering algorithm (Duda, Hart, & Stork, 2001); see Figure 5 for examples of directional clustering for a fixation and a smooth pursuit movement. The procedure of the clustering algorithm is described in the following. First, the threshold for the maximum angular span of a cluster is initialized to

*γ*

_{max}, which is the maximum size of the sector for one cluster. The value of

*γ*

_{max}was chosen to ignore small directional changes but capture significant ones in the signal. Each cluster

*i*is described by its angular span

*γ*and its mean direction

_{i}*m*. In the initial iteration, all

_{i}*α*(

*n*) are placed into cluster

*i*= 1, and the mean

*m*

_{1}and angular span

*γ*

_{1}are calculated.

*L*, each following iteration starts by determining which cluster

*j*has the maximum angular span. If

*γ*>

_{j}*γ*

_{max}, cluster

*j*is split into two clusters

*j*and

*L*+ 1. The mean direction

*m*

_{L}_{+1}is initialized to the sample of

*α*(

*n*) belonging to cluster

*j*which has the largest angle

*β*(

_{j}*n*) to

*m*. The mean direction

_{j}*m*is recalculated as the mean value of the remaining directions that belong to cluster

_{j}*j*. The mean values of the clusters are saved and the affiliations of the samples are removed.

*α*(

*k*) at the time and measuring the angles

*β*(

_{i}*k*) between

*α*(

*k*) and the mean directions of each cluster,

*α*(

*k*) is assigned to the cluster with the closest mean direction, where

*k*ranges from 1 to the sample length of the intersaccadic interval. Each time an

*α*(

*k*) is assigned to a cluster, the mean direction and angular span of that cluster are updated. When all

*α*(

*n*) are reassigned to a cluster, the maximum angular span

*γ*is again calculated and compared to

_{j}*γ*

_{max}. The procedure continues until all clusters have an angular span that is smaller than

*γ*

_{max}. When the clustering process has converged, all samples are assigned to a cluster. A short pseudocode of the algorithm is shown in Algorithm 1.

*m*with an angle larger than

_{i}*α*. Transitions between clusters are more frequent in samples belonging to fixations than in samples belonging to smooth pursuit movements. Therefore, a large transition rate more likely represents a fixation.

_{T}*α*away. A large number of samples in the same cluster represents samples that are heading in the same direction, which is a typical feature of a smooth pursuit movement.

_{T}*d*that the samples in the filter have moved in total. In order to determine the distance that the samples

_{S}*x*(

*n*) and

*y*(

*n*) in each cluster

*i*have moved, the distance

*d*is calculated as where

_{i}*M*is the number of samples that are covered by the filter. Each cluster

*i*has mean direction

*m*, which the corresponding

_{i}*d*is mapped to in order to calculate the total distance. The distance

_{i}*d*represents the actual movement of the samples in the filter: where

_{S}*N*is the number of clusters. The total distance

*d*is compared to the criterion of the filter. A small distance is representative of a fixation, and a longer distance is representative of a smooth pursuit movement.

_{S}*α*(

*n*) from the two eyes are in the same or a neighboring cluster, maximally

*α*away, at the same time.

_{T}*K*binary responses

*r*(

_{l}*n*),

*l*= 1, 2, …,

*K*, for each intersaccadic interval. If only the signal from one eye has passed the quality assessment,

*K*= (7 + 9) = 16—that is, filters F1–F7 and S1–S9 are used. When the signals from both eyes have passed the quality assessment,

*K*= 2(7 + 9) + 2 = 34—that is, filters F1–F7 and S1–S9 are used for both eyes separately and filters F8 and F9 for the combination of the two eyes. Finally, the responses are added together to one summation signal

*s*(

*n*):

*s*(

*n*), the samples are classified into fixations and smooth pursuit movements. In general, when

*s*(

*n*) ≥ 0, sample

*n*is classified as a smooth pursuit movement, and when

*s*(

*n*) < 0, sample

*n*is classified as a fixation. In order to prevent the samples in the intersaccadic interval from being divided into small segments of smooth pursuit movements and fixations, the dominant type of eye movement of the intersaccadic interval is estimated. The estimation is based on the sign of the mean value of

*s*(

*n*) and is used to filter out nonmatching candidate fixations or smooth pursuit movements that are shorter than

*t*

_{minFix}or

*t*

_{minSmp}, respectively. Correspondingly, when the dominant event is fixation—that is, the sign of the mean of

*s*(

*n*) < 0—smooth pursuit movements shorter than

*t*

_{minSmp}are converted to fixations. If the dominant event is smooth pursuit—that is, the sign of the mean of

*s*(

*n*) ≥ 0—fixations shorter than

*t*

_{minFix}are converted to smooth pursuit movements.

*v*and direction

*δ*of the tracks between two consecutive frames are calculated as where

*d*(

_{xp}*n*) and

*d*(

_{yp}*n*) are the sample-to-sample velocities in the

*x*- and

*y*-directions for track

*p*in frame

*n*, and Δ

*t*is the time between two frames in the video.

*k*-means method, as shown in Figure 7. Since the number of objects in each frame is unknown, the number of clusters

*k*= 1, 2, …, 6 is tested using the Calinski–Harabasz criterion in order to find the optimal number of clusters (see Calinski & Harabasz, 1974). The tracks belonging to one cluster form a detected object. Figure 6c shows one frame with the clustered tracks marked according to the clusters in Figure 7.

*v*

_{p}(

*n*) > 0.25°/s (Requirement a). Requirement b is satisfied if

*v*> 0.35°/s, where

^{e}*v*is the mean velocity of the eye-tracking signal calculated for a window with length 100 ms. For Requirement c, the cluster that is closest to the velocity of the eye-tracking signal in the velocity domain is identified (see Figure 7). The feature points that belong to that cluster are mapped back to the frame, and the positions of the feature points are compared to the positions of the eye-tracking signal in the frame (Requirement d). A rectangular region of interest is centered around the most recent eye coordinate. The height and width of the region of interest are 30% of the dimensions of the frame. The gaze coordinates together with the region of interest are shown in Figure 6d. When the four requirements are fulfilled, a video-gaze movement (VGM) is indicated. Intervals indicated as a VGM are likely to contain smooth pursuit movements but may also contain small proportions of other types of eye movements. If a VGM is not indicated, we presume that a smooth pursuit movement was not performed.

^{e}*P*

_{SP}is calculated as where

*N*

_{SP}is the total number of samples detected as smooth pursuit movements and

*N*

_{ISI}is the total number of samples of the intersaccadic intervals. The percentage of fixations

*P*

_{F}is calculated as where

*N*

_{F}is the total number of samples detected as fixations. The percentage

*P*

_{A}of smooth pursuit movements that are in agreement with the video-gaze model is calculated as where

*N*

_{C}is the total number of samples detected as smooth pursuit movements and indicated as VGMs by the video-gaze model. The percentage

*P*

_{NA}of smooth pursuit movements that are not in agreement with the video-gaze model is calculated as where

*N*

_{IC}is the total number of samples detected as smooth pursuit movements and not indicated as VGMs by the video-gaze model.

*B*is calculated as the mean value between

*P*

_{F}for image stimuli and

*P*

_{A}for moving-dot stimuli. The balanced performance measure indicates the algorithm's ability to detect few smooth pursuit movements in image stimuli and at the same time achieve as high an agreement as possible between smooth pursuit detections and VGMs during moving-dot stimuli. A value close to 100 is desired.

*SD*= 7) years. Written informed consent was provided by the participants, and the experiment was conducted in accordance with the Code of Ethics of the World Medical Association (Declaration of Helsinki). Binocular eye movements were recorded at 500 Hz with the Hi-speed 1250 system and iView X (v. 2.8.26) from SMI. Stimuli were presented with Experiment Center v. 3.5.101 on an Asus VG248QE screen (53.2 × 30.0 cm) with a resolution of 1920 × 1080 pixels and a refresh rate of 144 Hz.

*v*(

_{i}*n*) must be larger than 10 pixels/s. Figures 10a and b show the percentages of time that the video-gaze model has marked as VGMs, together with the percentages of detected smooth pursuit movements by the binocular version of the proposed algorithm and the Larsson et al. (2015) algorithm. Figure 9a shows the percentages for video clips with static stimuli and Figure 9b shows the percentages for video clips with a moving camera or background. For a majority of the video clips, independent of whether the camera is moving or not, the percentage of time indicated as VGMs by the video-gaze model is larger compared to that of the event-detection algorithms. In general, the proposed binocular algorithm detects higher percentages of smooth pursuit movements than the Larsson et al. (2015) algorithm (see also Tables 7 and 8). For video stimuli, the percentage of smooth pursuit detections which are in agreement with the video-gaze model is around 40% when the camera is static and around 35% when the camera is moving. For detection of smooth pursuit movements in videos with a moving camera, the proportion of detected smooth pursuit movements that is not in agreement with the video-gaze model is larger than for other types of stimuli. The monocular versions of the proposed algorithm detect the largest amount of smooth pursuit movements, but at the cost of a larger number of detected smooth pursuit movements that are not in agreement with video-gaze model.

*IEEE Transactions on Biomedical Engineering*, 43 (11), 1073–1082.

*Intel Corporation*, 5, 1–10.

*Communications in Statistics*, 3 (1), 1–27.

*PLoS ONE*, 9 (10), e111197.

*Behavior Research Methods, Instruments, & Computers*, 34 (4), 573–591.

*Pattern classification*. New York: Wiley-Interscience.

*Vision Research*, 34 (10), 1345–1358.

*Vision Research*, 43 (9), 1035–1045.

*Proceedings of SICE annual conference*

*2010*(pp. 3304–3307 ). Taipei: IEEE.

*Psychological Medicine*, 27, 1411–1419.

*Experimental Brain Research*, 217 (1), 125–136.

*Eye tracking: A comprehensive guide to methods and measures*. Oxford, UK: Oxford University Press.

*Psychological Bulletin*, 134 (5), 742–763.

*IEEE Transactions on Biomedical Engineering*, 57 (11), 2635–2645.

*Behavior Research Methods*, 45 (1), 203–215.

*Proceedings of the fifth workshop on beyond time and errors: novel evaluation methods for visualization*(pp. 54–60 ). Paris: ACM.

*Biomedical Signal Processing and Control*, 18, 145–152.

*IEEE Transactions on Biomedical Engineering*, 60 (9), 2484–2493.

*The neurology of eye movements*. Oxford, UK: Oxford University Press.

*Journal of the Optical Society of America A*, 2 (2), 234–242.

*Vision Research*, 25 (4), 561–563.

*The Quarterly Journal of Experimental Psychology*, 62 (8), 1457–1506.

*Scientific Studies of Reading*, 10 (3), 241–255.

*Proceedings of the IEEE conference on computer vision and pattern recognition*(pp. 593–600 ). IEEE.

*Vision Research*, 16 (12), 1371–1376.

*Behavior Research Methods*, 43 (1), 239–257.

*Brain*, 119 (6), 1933–1950.