Open Access
Methods  |   December 2016
Smooth pursuit detection in binocular eye-tracking data with automatic video-based performance evaluation
Author Affiliations
Journal of Vision December 2016, Vol.16, 20. doi:10.1167/16.15.20
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to Subscribers Only
      Sign In or Create an Account ×
    • Get Citation

      Linnéa Larsson, Marcus Nyström, Håkan Ardö, Kalle Åström, Martin Stridh; Smooth pursuit detection in binocular eye-tracking data with automatic video-based performance evaluation. Journal of Vision 2016;16(15):20. doi: 10.1167/16.15.20.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

An increasing number of researchers record binocular eye-tracking signals from participants viewing moving stimuli, but the majority of event-detection algorithms are monocular and do not consider smooth pursuit movements. The purposes of the present study are to develop an algorithm that discriminates between fixations and smooth pursuit movements in binocular eye-tracking signals and to evaluate its performance using an automated video-based strategy. The proposed algorithm uses a clustering approach that takes both spatial and temporal aspects of the binocular eye-tracking signal into account, and is evaluated using a novel video-based evaluation strategy based on automatically detected moving objects in the video stimuli. The binocular algorithm detects 98% of fixations in image stimuli compared to 95% when only one eye is used, while for video stimuli, both the binocular and monocular algorithms detect around 40% of smooth pursuit movements. The present article shows that using binocular information for discrimination of fixations and smooth pursuit movements is advantageous in static stimuli, without impairing the algorithm's ability to detect smooth pursuit movements in video and moving-dot stimuli. With an automated evaluation strategy, time-consuming manual annotations are avoided and a larger amount of data can be used in the evaluation process.

Introduction
Eye tracking is an important research tool which measures the movements of the eyes. It is an established technique to investigate the comprehension and understanding of, for example, a text (Rayner, Chace, Slattery, & Ashby, 2009) or an image (Rayner, 2009). Research using eye tracking also includes clinical applications—for example, examination of eye-movement dysfunctions in individuals with schizophrenia (Flechtner, Steinacher, Sauer, & Mackert, 1997) or dyslexia (Eden, Stein, Wood, & Wood, 1994) and in the human vestibular system (Allison, Eizenman, & Cheung, 1996). The two most common types of events in eye-tracking data are fixations and saccades. A fixation is when the eye is more or less still and visual information is taken in. A saccade is instead a fast eye movement that redirects the eye from one position to the next. When the eye follows a smoothly moving target, the eye movement is called a smooth pursuit. In order to see a moving object clearly during smooth pursuit, the object must be aligned with the direction of gaze. When the object is not perfectly followed by the eye, small corrective saccades are used to realign the direction of the gaze to that of the moving object. A smooth pursuit is divided into two stages: open-loop and closed-loop (Leigh & Zee, 2006, p. 219). The open-loop stage is the initial stage when the smooth pursuit is initiated by a movement of an object. The second, closed-loop stage is a feedback system where the velocity of the eye is controlled in order to keep the eye on the moving object. The upper limit for the velocity of a smooth pursuit movement in a natural task is above 100°/s (Hayhoe, McKinney, Chajka, & Pelz, 2012). No lower limit for smooth pursuit velocity seems to exist, and the pursuit system can operate in the same velocity range as fixational eye movements (Martins, Kowler, & Palmer, 1985). 
A majority of eye-tracking studies are performed using monocular recordings (Holmqvist et al., 2011, p. 59)—that is, only one eye is recorded. The popularity of monocular recordings is partly due to the common belief that the two eyes are performing the same movements at the same time, which is not always the case (Holmqvist et al., 2011, p. 59; Kirkby, Webster, Blythe, & Liversedge, 2008; van der Lans, Wedel, & Pieters, 2011). In addition, monocular eye trackers are less expensive than binocular ones, which contributes to their popularity (Holmqvist et al., 2011, p. 59). Studies where binocular aspects are important are most often clinical, where binocular coordination and control are investigated, for example, in children with dyslexia (Eden et al., 1994) and in individuals with cerebellar dysfunction (Versino, Hurko, & Zee, 1996). 
Since a majority of studies are based on monocular recordings, event-detection algorithms, which classify the eye-tracking signal into different types of eye movements, are typically also developed for monocular data. An exception is the Binocular-Individual Threshold algorithm, which uses both eyes and is developed to adapt its internal settings to each specific task and participant (van der Lans et al., 2011). This is a velocity-based algorithm that uses minimum-determinant covariance estimates and control-chart procedures in order to detect fixations, saccades, and blinks. 
Other algorithms have indirectly used binocular information, for example by averaging the data from the two eyes in order to reduce the level of noise (Duchowski et al., 2002). Another strategy was used by Engbert and Kliegl (2003), where the detection algorithm separately analyzed the data from the two eyes and at a later stage combined the two series of events into one. The algorithm was proposed for the detection of microsaccades which occurred simultaneously in both eyes. None of the algorithms that use binocular information (Duchowski et al., 2002; Engbert & Kliegl, 2003; van der Lans et al., 2011) detect smooth pursuit movements. 
Accurately differentiating between smooth pursuit movements and fixations in eye-tracking data recorded during dynamic scene viewing is still a major challenge. Inclusion of binocular information may improve classification robustness and make it easier to distinguish between smooth pursuit movements and vergence movements, where the eyes move in opposite directions. Therefore, in this article, we address these issues by proposing a new event-detection algorithm to discriminate between fixations and smooth pursuit movements, which takes both spatial and temporal aspects of binocular eye-tracking signals into account. The algorithm is developed and evaluated for signals recorded with a high-speed eye tracker and restricted head movements. 
Another novelty of this article is a video-based performance-evaluation strategy that is based on automatic detection of moving objects in the video stimuli. Evaluation of event-detection algorithms has previously mainly been performed through manual annotations (Larsson, Nyström, Andersson, & Stridh, 2015), by using simulated eye-tracking data (Otero-Millan, Castro, Macknik, & Martinez-Conde, 2014), and by recording eye-tracking data for artificial stimuli such as moving dots (Komogortsev, Gobert, Jayarathna, Hyong Koh, & Gowda, 2010; Komogortsev & Karpov, 2013). None of these methods is completely satisfactory or practical for dynamic scene viewing. Manual annotations are time consuming, and suffer from subjectivity and often large interrater variability. Building simulation models that mimic the complexity and individuality of eye-tracking signals is difficult for all types of eye movements. When artificial stimuli are used, it is not only the algorithm's performance that is evaluated but also the viewer's ability to follow the presented stimuli. An evaluation method based on artificial stimuli has been proposed by Komogortsev and colleagues (Komogortsev et al., 2010; Komogortsev & Karpov, 2013). The speed and position of the moving dots are compared to the corresponding characteristics of the eye-tracking signals, and a set of scores are calculated. The method is limited, however, to stimuli where the coordinates of the moving dots are known. To evaluate the performance of a smooth pursuit detection algorithm when eye-tracking data are recorded for complex videos, automatic tracking of the trajectories of the moving objects is needed. Knowledge about the trajectories of all moving objects opens up new possibilities for evaluating the performance of event detectors. The proposed video-based evaluation strategy relates the eye-tracking signal to the trajectories of the moving objects and compares them to the smooth pursuit movements detected by the proposed algorithm. The main assumption of the video-based evaluation strategy is that a detected smooth pursuit movement is correct only when the eye-tracking signal is aligned with a moving object in terms of position, velocity, and direction. 
Compared to manual annotation, the video-based evaluation strategy does not provide as detailed information about the detected events, but it has three important advantages: It is more objective, it is faster, and it relates the eye-tracking signal to the content of the video. This is practical for future studies where longer sequences of dynamic stimuli may be used in combination with a larger number of participants. It is important to note, however, that in the current work, the video data are used only for evaluation purposes, not as part of the event detector as is proposed by Essig et al. (2010). 
The article is structured as follows: The proposed algorithm and the video-based evaluation strategy are described in Section 2, and in Section 3, a description of the eye-tracking recording procedure and the database is given. In Section 4, the results are presented, and finally, in Section 5, the results are discussed. 
Methods
The following section is divided into two parts, respectively describing the proposed binocular event-detection algorithm and the video-based evaluation strategy. 
Binocular event-detection algorithm
The proposed algorithm contains four stages; an overview is shown in Figure 1. The algorithm is applied to intersaccadic intervals derived, for example, from Larsson, Nyström, and Stridh (2013), which are intervals between detected saccades or postsaccadic oscillations and blinks. In the first stage, the intersaccadic intervals are preprocessed; in the second stage, the intersaccadic intervals from the two eyes are compared and intervals that occur in both eyes simultaneously are selected. The third stage contains quality assessment of the selected intersaccadic intervals, and in the final stage, the samples in the intervals are classified into fixations and smooth pursuit movements based on the directionality of the data from the two eyes. 
Figure 1
 
Overview of the structure for the proposed binocular event-detection algorithm.
Figure 1
 
Overview of the structure for the proposed binocular event-detection algorithm.
Preprocessing
The main objective of the preprocessing stage is to remove samples that do not belong to fixations or smooth pursuit movements. Since smooth pursuit movements, in this study, will not move faster than 100°/s (Meyer, Lasker, & Robinson, 1985), all samples in the beginning and end of the intersaccadic interval with corresponding velocities that exceed 100°/s are removed and assumed to belong to adjacent saccades or postsaccadic oscillations. The velocity v(n) is calculated as  where x(n) and y(n) are the position signals in the x and y direction, respectively, and Fs is the sampling frequency of the signals.  
Selection of intersaccadic intervals
In this stage, the intersaccadic intervals that occur in both eyes simultaneously are determined—that is, they are aligned in time to contain the same number of samples. If a saccade is detected in both eyes, the on- and offset of the saccade with the longest duration determines the end of the previous intersaccadic interval and the beginning of the next. Saccades that are not detected in both eyes are classified as true monocular saccades or noise. The main difference between a true saccade and noise is measured in the velocity signal, where noise is several order of magnitudes larger and most often contains spikes. In order to determine whether the detected saccades contain spikes, the sample-to-sample velocity ve(n) for each eye separately is compared to Display FormulaImage not available (n), which is ve(n) filtered with a median filter of length 3, where e = {L, R} for the left and right eye. The length of the median filter is chosen as 3 in order to be able to filter out one-sample spikes that are confused with saccades. The residual signal re(n) = ve(n) − Display FormulaImage not available (n) is calculated to contain the spike, where ve(n) = Display FormulaImage not available is the sample-to-sample velocity and Display FormulaImage not available is the median filtered velocity. The spike index SIe is calculated as the ratio between the residual signal and the original ve(n), and is calculated for each eye as  where M is the number of samples in the saccade (see Figure 2 for an example). If SIe > ηSI for both eyes, the detected saccade is classified as noise and the two previous intersaccadic intervals which were split by the putative saccade are merged into one. A saccade is considered to be correctly classified if SIeηSI for both eyes. Moreover, if SIeηSI for one of the eyes, the amplitude of that saccade is determined, and if the amplitude is larger than ηSA the saccade is considered to be correctly classified.  
Figure 2
 
(a) Position signals of a saccade and (b) its corresponding sample-to-sample velocity (gray) and median filtered (black). A true saccade results in a low spike index. (c) Position signals of a spike incorrectly detected as a saccade and (d) its corresponding sample-to-sample velocity (gray) and median filtered (black). A spike results in a high spike index.
Figure 2
 
(a) Position signals of a saccade and (b) its corresponding sample-to-sample velocity (gray) and median filtered (black). A true saccade results in a low spike index. (c) Position signals of a spike incorrectly detected as a saccade and (d) its corresponding sample-to-sample velocity (gray) and median filtered (black). A spike results in a high spike index.
Quality assessment
The quality of the eye-tracking signal may vary, especially in terms of undesired high-frequency noise. Through calculation of the high-frequency content of the recorded signal, an estimate of the amount of high-frequency noise is obtained. For each eye separately, a nonoverlapping sliding window of 50 ms is applied to the intersaccadic intervals. Within each window a differential filter of length 2 is applied. A high-frequency noise-content index Ihf(i), representing the energy of the differential signal for each window i, is calculated as   where xe(n) and ye(n) represent the respective coordinates of the eye-tracking signals for each eye e. The number of samples in the sliding window is denoted N, and i = 1, …, M, where M is the number of windows in the intersaccadic interval. For each intersaccadic interval and for each eye separately, the maximum values of the high-frequency noise-content indices are calculated:     
The maximum high-frequency content Display FormulaImage not available and Display FormulaImage not available are mapped onto the generalized logistic function S,  where A = 0, K = 1, Q = 0.001, P = 3000, B = 0.001, and ν = Q. The parameters of the function S(a) are determined by visual inspection of the complete database to best separate high-frequency noise from data with good quality. Three examples of signals with their corresponding spectra are shown in Figure 3. The three signals have low, medium, and high levels of noise, and the parameters of the generalized logistic function were chosen to pass only the signal with the lowest noise level (see Figure 4). The range of the generalized logistic function S(a) is from 0 to 1, where 0 indicates a low level of high-frequency noise content and 1 indicates a high level of high-frequency noise content. Therefore, all intersaccadic intervals where Display FormulaImage not available < ηS or Display FormulaImage not available < ηS are considered to have high enough quality to be further classified into events.  
Figure 3
 
Three examples of signals with different noise levels—(a) low, (c) medium, and (e) high—and their respective spectra (b), (d), and (f).
Figure 3
 
Three examples of signals with different noise levels—(a) low, (c) medium, and (e) high—and their respective spectra (b), (d), and (f).
Figure 4
 
The generalized logistic function S(a) used in this article. The high-frequency noise indices of the three signal examples shown in Figure 3 are marked with arrows.
Figure 4
 
The generalized logistic function S(a) used in this article. The high-frequency noise indices of the three signal examples shown in Figure 3 are marked with arrows.
Classifier
For each intersaccadic interval that occurs in both eyes simultaneously and for which the signal has S(a) < ηS, a classifier is applied. The classifier consists of the following steps: directional clustering, binary filters, and classification. In directional clustering, each consecutive pair of samples is clustered based on their direction. The clustering is later used in the algorithm by binary filters, which are filters that enhance fixations or smooth pursuit movements. In the classification step, the output signals from the binary filters are summed, and the final classification into fixations and smooth pursuit movements is based on the summation signal. 
Directional clustering:
For each consecutive pair of x- and y-coordinates, the sample-to-sample direction α(n) is calculated. It is defined as the angle between the line connecting consecutive pairs of x- and y-coordinates and the x-axis. The sample-to-sample directions are mapped onto the unit circle and clustered using the iterative minimum-squared-error clustering algorithm (Duda, Hart, & Stork, 2001); see Figure 5 for examples of directional clustering for a fixation and a smooth pursuit movement. The procedure of the clustering algorithm is described in the following. First, the threshold for the maximum angular span of a cluster is initialized to γmax, which is the maximum size of the sector for one cluster. The value of γmax was chosen to ignore small directional changes but capture significant ones in the signal. Each cluster i is described by its angular span γi and its mean direction mi. In the initial iteration, all α(n) are placed into cluster i = 1, and the mean m1 and angular span γ1 are calculated. 
Figure 5
 
(a) A fixation and (b) the corresponding directional clustering. (c) A smooth pursuit movement and (d) the corresponding directional clustering. Note the spread in all directions during a fixation (b) and a clear direction during a smooth pursuit movement (d).
Figure 5
 
(a) A fixation and (b) the corresponding directional clustering. (c) A smooth pursuit movement and (d) the corresponding directional clustering. Note the spread in all directions during a fixation (b) and a clear direction during a smooth pursuit movement (d).
Assuming that the number of clusters is L, each following iteration starts by determining which cluster j has the maximum angular span. If γj > γmax, cluster j is split into two clusters j and L + 1. The mean direction mL+1 is initialized to the sample of α(n) belonging to cluster j which has the largest angle βj(n) to mj. The mean direction mj is recalculated as the mean value of the remaining directions that belong to cluster j. The mean values of the clusters are saved and the affiliations of the samples are removed. 
By randomly selecting one α(k) at the time and measuring the angles βi(k) between α(k) and the mean directions of each cluster, α(k) is assigned to the cluster with the closest mean direction, where k ranges from 1 to the sample length of the intersaccadic interval. Each time an α(k) is assigned to a cluster, the mean direction and angular span of that cluster are updated. When all α(n) are reassigned to a cluster, the maximum angular span γj is again calculated and compared to γmax. The procedure continues until all clusters have an angular span that is smaller than γmax. When the clustering process has converged, all samples are assigned to a cluster. A short pseudocode of the algorithm is shown in Algorithm 1.   
Binary filters:
In the next step, the clustered signal is applied to a set of binary filters. In this article, a binary filter refers to a filter that has a length and a criterion. If the samples in the filter satisfy the criterion, the output for the central sample of the filter is 1 or −1, depending on the purpose of the filter. If the samples do not satisfy the criterion, the output is 0. The purpose of having different types of binary filters is that each filter emphasizes either typical properties of fixations or typical properties of smooth pursuit movements. The contributions from all filters are then summed together and used for discrimination between fixations and smooth pursuit movements. In this article four types of filters are used: Transition, Directional consistency, Total distance, and Synchronization. A Transition filter counts the number of transitions between the clusters within the filter length—for example, if the clustered signal is 2111, there is one transition in the signal between clusters 2 and 1. A transition occurs when two consecutive samples belong to clusters that differ between their mean directions mi with an angle larger than αT. Transitions between clusters are more frequent in samples belonging to fixations than in samples belonging to smooth pursuit movements. Therefore, a large transition rate more likely represents a fixation. 
The Directional consistency filter counts the number of samples that are in the same cluster or in a neighboring cluster maximally αT away. A large number of samples in the same cluster represents samples that are heading in the same direction, which is a typical feature of a smooth pursuit movement. 
The Total distance filter determines the distance dS that the samples in the filter have moved in total. In order to determine the distance that the samples x(n) and y(n) in each cluster i have moved, the distance di is calculated as  where M is the number of samples that are covered by the filter. Each cluster i has mean direction mi, which the corresponding di is mapped to in order to calculate the total distance. The distance dS represents the actual movement of the samples in the filter:  where N is the number of clusters. The total distance dS is compared to the criterion of the filter. A small distance is representative of a fixation, and a longer distance is representative of a smooth pursuit movement.  
Finally, the Synchronization filter measures the synchronization between the eye-tracking signals from the two eyes, and is therefore active only if signals from both eyes are present. The filter counts the number of samples where the sample-to-sample directions α(n) from the two eyes are in the same or a neighboring cluster, maximally αT away, at the same time. 
The filters and their lengths and criteria are shown in Tables 1 and 2. The number of filters that emphasize fixations is balanced to the number of filters that emphasize smooth pursuit movements. The criteria and lengths of the filters are set individually for each filter to optimize the separation between the two movements. The sensitivity of the parameters is low, and this is the only combination of parameters that has been evaluated. When the criterion of a filter is fulfilled, the central sample receives the output −1 for filters emphasizing fixations (Table 1) and 1 for filters emphasizing smooth pursuit movements (Table 2), resulting in K binary responses rl(n), l = 1, 2, …, K, for each intersaccadic interval. If only the signal from one eye has passed the quality assessment, K = (7 + 9) = 16—that is, filters F1–F7 and S1–S9 are used. When the signals from both eyes have passed the quality assessment, K = 2(7 + 9) + 2 = 34—that is, filters F1–F7 and S1–S9 are used for both eyes separately and filters F8 and F9 for the combination of the two eyes. Finally, the responses are added together to one summation signal s(n):    
Table 1
 
Settings for binary filters F1–F9, which emphasize fixations. Notes: The filter length for the Total distance filter is described as the percentage of the current intersaccadic interval.
Table 1
 
Settings for binary filters F1–F9, which emphasize fixations. Notes: The filter length for the Total distance filter is described as the percentage of the current intersaccadic interval.
Table 2
 
Settings for binary filters S1–S9, which emphasize smooth pursuit movements. Notes: The filter length for the Total distance filter is described as the percentage of the current intersaccadic interval.
Table 2
 
Settings for binary filters S1–S9, which emphasize smooth pursuit movements. Notes: The filter length for the Total distance filter is described as the percentage of the current intersaccadic interval.
Classification:
Based on the summation signal s(n), the samples are classified into fixations and smooth pursuit movements. In general, when s(n) ≥ 0, sample n is classified as a smooth pursuit movement, and when s(n) < 0, sample n is classified as a fixation. In order to prevent the samples in the intersaccadic interval from being divided into small segments of smooth pursuit movements and fixations, the dominant type of eye movement of the intersaccadic interval is estimated. The estimation is based on the sign of the mean value of s(n) and is used to filter out nonmatching candidate fixations or smooth pursuit movements that are shorter than tminFix or tminSmp, respectively. Correspondingly, when the dominant event is fixation—that is, the sign of the mean of s(n) < 0—smooth pursuit movements shorter than tminSmp are converted to fixations. If the dominant event is smooth pursuit—that is, the sign of the mean of s(n) ≥ 0—fixations shorter than tminFix are converted to smooth pursuit movements. 
Video-based performance-evaluation strategy
The performance of the proposed algorithm is evaluated by a video-based evaluation strategy, which comprises three parts. First, the positions of objects in the stimuli are detected. In the second part, a model is proposed where the coordinates of the eye-tracking signal are related to the movements of the detected objects. Finally, performance measures are calculated by comparing the intervals where the eye-tracking signal is moving close to and in alignment with a moving object to the intervals where the proposed algorithm detects smooth pursuit movements. Thus, smooth pursuit movements detected by the proposed algorithm when the eye-tracking data are moving in alignment with moving objects are marked as correctly detected smooth pursuit movements. These three parts are in the following described in detail. 
Automatic detection of objects in video
In this step, the positions of the moving objects in video stimuli are detected and tracked for each frame. In order to determine the trajectories of moving objects and possibly also of the background of the video, feature points are extracted (Shi & Tomasi, 1994); see Figure 6a for an example of a frame and Figure 6b for extracted feature points. The extracted feature points are then connected between frames into tracks (Bouguet, 2001). The implementation from OpenCV (http://opencv.org/) was used. New feature points were extracted every fifth frame. They were enforced to be at least 7 pixels apart and more than 5 pixels from existing tracks. Also, the total number of detected points was enforced to never exceed 5,000 and points with a quality measure less than 0.01 times the highest quality measure observed for the frame were discarded. The detected points were then tracked using a window size of 15 × 15 and a scale-space pyramid with three levels. The velocity v and direction δ of the tracks between two consecutive frames are calculated as     where dxp(n) and dyp(n) are the sample-to-sample velocities in the x- and y-directions for track p in frame n, and Δt is the time between two frames in the video.  
Figure 6
 
An example of the clustering of the detected feature points in a frame. The tracks are clustered with respect to their speeds and directions (see Figure 7).
Figure 6
 
An example of the clustering of the detected feature points in a frame. The tracks are clustered with respect to their speeds and directions (see Figure 7).
Figure 7
 
Overview of the clustering for the sample-to-sample velocities of the tracks into six clusters. The red circle marks the velocity of the eye.
Figure 7
 
Overview of the clustering for the sample-to-sample velocities of the tracks into six clusters. The red circle marks the velocity of the eye.
Tracks that move in similar directions and speeds are grouped together into clusters using the k-means method, as shown in Figure 7. Since the number of objects in each frame is unknown, the number of clusters k = 1, 2, …, 6 is tested using the Calinski–Harabasz criterion in order to find the optimal number of clusters (see Calinski & Harabasz, 1974). The tracks belonging to one cluster form a detected object. Figure 6c shows one frame with the clustered tracks marked according to the clusters in Figure 7
Video-gaze model
In this section a video-gaze model is introduced to indicate whether the positions of the eye-tracking signals are moving close to and in alignment with a detected object in the video. The model consists of four requirements that need to be satisfied for both eyes: 
  •  
    The detected object is classified as moving.
  •  
    The eye-tracking signal has moved.
  •  
    The velocity and direction of the eye-tracking signal match with the velocity and direction of a detected object.
  •  
    The position of the eye-tracking signal is close to the area covered by the detected object.
The detected objects in the video are classified as moving if vp(n) > 0.25°/s (Requirement a). Requirement b is satisfied if ve > 0.35°/s, where ve is the mean velocity of the eye-tracking signal calculated for a window with length 100 ms. For Requirement c, the cluster that is closest to the velocity of the eye-tracking signal in the velocity domain is identified (see Figure 7). The feature points that belong to that cluster are mapped back to the frame, and the positions of the feature points are compared to the positions of the eye-tracking signal in the frame (Requirement d). A rectangular region of interest is centered around the most recent eye coordinate. The height and width of the region of interest are 30% of the dimensions of the frame. The gaze coordinates together with the region of interest are shown in Figure 6d. When the four requirements are fulfilled, a video-gaze movement (VGM) is indicated. Intervals indicated as a VGM are likely to contain smooth pursuit movements but may also contain small proportions of other types of eye movements. If a VGM is not indicated, we presume that a smooth pursuit movement was not performed. 
Performance-evaluation measures
In order to evaluate the performance of the proposed algorithm, the following parameters are calculated: percentage of smooth pursuit movements, percentage of fixations, percentage of smooth pursuit movements in agreement with the video-gaze model, percentage of smooth pursuit movements not in agreement with the video-gaze model, and a balanced performance measure. The parameters are calculated for the intersaccadic intervals during which the proposed binocular algorithm uses both eyes. The percentage of smooth pursuit movements PSP is calculated as  where NSP is the total number of samples detected as smooth pursuit movements and NISI is the total number of samples of the intersaccadic intervals. The percentage of fixations PF is calculated as  where NF is the total number of samples detected as fixations. The percentage PA of smooth pursuit movements that are in agreement with the video-gaze model is calculated as  where NC is the total number of samples detected as smooth pursuit movements and indicated as VGMs by the video-gaze model. The percentage PNA of smooth pursuit movements that are not in agreement with the video-gaze model is calculated as  where NIC is the total number of samples detected as smooth pursuit movements and not indicated as VGMs by the video-gaze model.  
The balanced performance measure B is calculated as the mean value between PF for image stimuli and PA for moving-dot stimuli. The balanced performance measure indicates the algorithm's ability to detect few smooth pursuit movements in image stimuli and at the same time achieve as high an agreement as possible between smooth pursuit detections and VGMs during moving-dot stimuli. A value close to 100 is desired. 
Experiment and database
The eye-tracking signals used for the evaluation of the proposed algorithm were recorded during two separate experiments. In both experiments a Hi-speed eye tracker from SMI (SensoMotoric Instruments, Teltow, Germany) with a sampling frequency of 500 Hz was used. A detailed description of the first experiment is given by Larsson et al. (2013). In this work, the eye-tracking signals recorded with images and moving-dot stimuli by Larsson et al. (2013) were used. A subset of the signals was manually annotated. 
The second experiment had 21 participants (four women, 17 men), with a mean age of 32.9 (SD = 7) years. Written informed consent was provided by the participants, and the experiment was conducted in accordance with the Code of Ethics of the World Medical Association (Declaration of Helsinki). Binocular eye movements were recorded at 500 Hz with the Hi-speed 1250 system and iView X (v. 2.8.26) from SMI. Stimuli were presented with Experiment Center v. 3.5.101 on an Asus VG248QE screen (53.2 × 30.0 cm) with a resolution of 1920 × 1080 pixels and a refresh rate of 144 Hz. 
The participants were seated and asked to place their head in the eye tracker. The head was supported by a chin and forehead rest. The viewing distance to the screen was 70 cm. A 13-point calibration was performed followed by a four-point validation of the calibration. The average accuracy reported by Experiment Center was 0.25° and 0.38°, respectively, for the horizontal and vertical directions. 
In this experiment, short video clips were used as stimuli. The stimuli were both material from the benchmark data described by Kurzhals, Bopp, Bässler, Ebinger, and Weiskopf (2014) and video clips downloaded from http://pi4.informatik.uni-mannheim.de/∼kiess/test_sequences/download/. The video clips contained both static and moving camera and background. All video clips contained objects that moved most of the time (see Table 3). The objects had both linear and nonlinear trajectories. The participants were instructed to follow the moving objects as closely as possible. 
Table 3
 
Proportion of moving objects in video clips with static camera (Videos 1–7) and moving camera (Videos 8–14).
Table 3
 
Proportion of moving objects in video clips with static camera (Videos 1–7) and moving camera (Videos 8–14).
The total database of eye-tracking data consisted of 23.9 min recorded during image stimuli, 41.7 min recorded during moving-dot stimuli, 126.5 min recorded during video clips with a static camera, and 72.8 min recorded during video clips with a moving camera. The data were randomly split into a development database and a test database of equal size. The development database was used during the development and implementation of the algorithm, while the test database was used only for evaluation. 
Results
The parameters of the proposed algorithm are found in Table 4. They were adjusted based on the development database. The intersaccadic intervals were generated using the algorithm from Larsson et al. (2013). An example of fixations and smooth pursuit movements detected by the proposed binocular algorithm is shown in Figure 8
Table 4
 
Settings for intrinsic parameters for the proposed binocular event-detection algorithm.
Table 4
 
Settings for intrinsic parameters for the proposed binocular event-detection algorithm.
Figure 8
 
Example of fixation and smooth pursuit detection performed by the proposed binocular algorithm. (a) Velocity signal and (b) x- and y-coordinate for the right eye. In both panels, red indicates fixation and blue indicates smooth pursuit.
Figure 8
 
Example of fixation and smooth pursuit detection performed by the proposed binocular algorithm. (a) Velocity signal and (b) x- and y-coordinate for the right eye. In both panels, red indicates fixation and blue indicates smooth pursuit.
Binocular event-detection algorithm
The performance of the proposed algorithm is evaluated by calculating the percentages of detected fixations and smooth pursuit movements in image, video, and moving-dot stimuli. In addition, the percentages of detected smooth pursuit movements that are and are not in agreement with the video-gaze model are calculated. Three versions of the proposed algorithm (binocular, monocular right eye, and monocular left eye) are compared to the algorithm of Larsson et al. (2015), which is a state-of-the-art algorithm that has been proven to outperform earlier smooth pursuit detection algorithms. The results of the four compared algorithms are shown in Tables 5 through 8. For image stimuli, the ideal results are 0% detected smooth pursuit movements and 100% detected fixations in the intersaccadic intervals. The binocular version of the proposed algorithm outperforms the other algorithms with 1.7% detected smooth pursuit movements and 98.3% detected fixations, compared to values around 5%–9% and 91%–95% detected smooth pursuit movements and fixations, respectively, for the three other algorithms. Since there are no moving objects in image stimuli, all detected smooth pursuit movements during image stimuli are considered not in agreement with the video-gaze model. 
Table 5
 
Results for the eye-tracking signals recorded with image stimuli for the test (development) database. Notes: Bin: Binocular. Mono R: Monocular right eye. Mono L: Monocular left eye.
Table 5
 
Results for the eye-tracking signals recorded with image stimuli for the test (development) database. Notes: Bin: Binocular. Mono R: Monocular right eye. Mono L: Monocular left eye.
For moving-dot stimuli, there is a dot moving 100% of the time. In Figure 9c, the percentage of time marked as VGMs by the video-gaze model is shown for four different speeds of the moving dot. Between 80% and 90% VGMs are indicated, which can be compared to the results in Table 6, where the proposed algorithm—both the monocular and binocular versions—detects a larger amount of smooth pursuit movements than the Larsson et al. (2015) algorithm: 83%–85% compared to 81%. The percentage of smooth pursuit movements in agreement with the video-gaze model is 80% for the proposed binocular algorithm. See Figure 9c for a comparison between different speeds of the moving target. The balanced performance is shown in Figure 10. In summary, the binocular version of the proposed algorithm detects a large amount of smooth pursuit movements for moving-dot stimuli and at the same time decreases the percentage of false smooth pursuit detections for image stimuli. 
Figure 9
 
Percentage of time indicated as video-gaze movements by the video-gaze model and percentage of time in smooth pursuit movements for the proposed algorithm and the algorithm of Larsson et al. (2015).
Figure 9
 
Percentage of time indicated as video-gaze movements by the video-gaze model and percentage of time in smooth pursuit movements for the proposed algorithm and the algorithm of Larsson et al. (2015).
Table 6
 
Results for the eye-tracking signals recorded with moving-dot stimuli for the test (development) database. Notes: Bin: Binocular. Mono R: Monocular right eye. Mono L: Monocular left eye.
Table 6
 
Results for the eye-tracking signals recorded with moving-dot stimuli for the test (development) database. Notes: Bin: Binocular. Mono R: Monocular right eye. Mono L: Monocular left eye.
Figure 10
 
The balanced performance measure—mean value between the percentage of detected smooth pursuit movements in moving-dot and percentage of fixations in image stimuli—for the proposed binocular (Bin), proposed monocular right (R), and proposed monocular left (L) algorithms and the algorithm of Larsson et al. (2015).
Figure 10
 
The balanced performance measure—mean value between the percentage of detected smooth pursuit movements in moving-dot and percentage of fixations in image stimuli—for the proposed binocular (Bin), proposed monocular right (R), and proposed monocular left (L) algorithms and the algorithm of Larsson et al. (2015).
For video stimuli, the maximal percentages of detected smooth pursuit movements depends on the percentage of time that moving objects are present in the video stimuli. For the video clips used in this study, the percentage of time with moving objects varies between 72% and 100%, where the calculation also includes moving backgrounds and cameras (see Table 3). In order for an object to be considered as moving, the sample-to-sample velocity vi(n) must be larger than 10 pixels/s. Figures 10a and b show the percentages of time that the video-gaze model has marked as VGMs, together with the percentages of detected smooth pursuit movements by the binocular version of the proposed algorithm and the Larsson et al. (2015) algorithm. Figure 9a shows the percentages for video clips with static stimuli and Figure 9b shows the percentages for video clips with a moving camera or background. For a majority of the video clips, independent of whether the camera is moving or not, the percentage of time indicated as VGMs by the video-gaze model is larger compared to that of the event-detection algorithms. In general, the proposed binocular algorithm detects higher percentages of smooth pursuit movements than the Larsson et al. (2015) algorithm (see also Tables 7 and 8). For video stimuli, the percentage of smooth pursuit detections which are in agreement with the video-gaze model is around 40% when the camera is static and around 35% when the camera is moving. For detection of smooth pursuit movements in videos with a moving camera, the proportion of detected smooth pursuit movements that is not in agreement with the video-gaze model is larger than for other types of stimuli. The monocular versions of the proposed algorithm detect the largest amount of smooth pursuit movements, but at the cost of a larger number of detected smooth pursuit movements that are not in agreement with video-gaze model. 
Table 7
 
Results for the eye-tracking signals recorded with video stimuli with static camera for the test (development) database. Notes: Bin: Binocular. Mono R: Monocular right eye. Mono L: Monocular left eye.
Table 7
 
Results for the eye-tracking signals recorded with video stimuli with static camera for the test (development) database. Notes: Bin: Binocular. Mono R: Monocular right eye. Mono L: Monocular left eye.
In order to remove the effect of the division of the two databases, a leave-one-out cross validation has been performed on the signals recorded with video stimuli, with both static and moving cameras. The results show that there is a small variation between the sets and that the values are between the results presented in Tables 7 and 8 (see Figure 11). 
Table 8
 
Results for the eye-tracking signals recorded with video stimuli with moving camera for the test (development) database. Notes: Bin: Binocular. Mono R: Monocular right eye. Mono L: Monocular left eye.
Table 8
 
Results for the eye-tracking signals recorded with video stimuli with moving camera for the test (development) database. Notes: Bin: Binocular. Mono R: Monocular right eye. Mono L: Monocular left eye.
Figure 11
 
Results of a leave-one-out cross validation for the proposed binocular algorithm and the algorithm of Larsson et al. (2015).
Figure 11
 
Results of a leave-one-out cross validation for the proposed binocular algorithm and the algorithm of Larsson et al. (2015).
Synchronization between the two eyes
An advantage of using binocular information in the event-detection algorithm is that the temporal alignment between the positions of the two eye-tracking signals can be measured and compared. An example of an intersaccadic interval recorded during image viewing is shown in Figure 12a. This example shows how different the eye-tracking signals acquired from the left and the right eyes may be over shorter periods of time. In Figure 12b, the outputs from the two synchronization filters used in the proposed binocular algorithm are shown. The filters measure whether the positions of the eye-tracking signals from the two eyes are temporally aligned. An output of −1 means that the signals are unsynchronized, while an output of 0 means that signals are synchronized. The two output signals show that the eye-tracking signals are not synchronized until after the first 30–40 samples. In the proposed binocular algorithm, this feature is used to promote fixations and prevent confusing vergence-like movements, similar to the case shown in Figure 12a, with smooth pursuit movements. 
Figure 12
 
(a) Binocular eye-tracking data of an intersaccadic interval recorded during image stimuli. The right eye is shown in gray and the left eye in black. The black circles show the start of the intersaccadic interval. In the beginning of the interval, the signals from the two eyes move differently and are deemed to be moving unsynchronized. (b) Output signals from the two synchronization filters. The solid line is the output from the 50-ms-long filter, and the dash-dotted line is the output from the 80-ms-long filter. Both filters are described in Table 1.
Figure 12
 
(a) Binocular eye-tracking data of an intersaccadic interval recorded during image stimuli. The right eye is shown in gray and the left eye in black. The black circles show the start of the intersaccadic interval. In the beginning of the interval, the signals from the two eyes move differently and are deemed to be moving unsynchronized. (b) Output signals from the two synchronization filters. The solid line is the output from the 50-ms-long filter, and the dash-dotted line is the output from the 80-ms-long filter. Both filters are described in Table 1.
Quality assessment
In the first part of the proposed algorithm, quality assessment of the intersaccadic intervals was performed. Due to poor quality, 4.6%, 9.7%, 4.6%, and 2.3% of the data were rejected—that is, data were deemed to have poor quality in both eyes for, respectively, image, moving-dot, static-camera, and moving-camera stimuli. The rejected data were not used in the evaluation of any of the algorithms. 
Discussion
Experiments where binocular eye-tracking signals are recorded from participants viewing dynamic stimuli have become increasingly common. Few event-detection algorithms, however, take into account the fact that such signals contain smooth pursuit movements. Moreover, information from both eyes is rarely used to make robust decisions about when a specific event occurs. We propose a binocular event-detection algorithm based on directional clustering of the eye-tracking signal using both spatial and temporal filters. The proposed algorithm was developed with the idea that signals from two eyes contain more information about the performed type of eye movement than a signal from only one of the eyes. Tables 5 and 6, together with Figure 10, show that the binocular version of the proposed algorithm provides the best balance between the percentage of detected smooth pursuit movements that are in agreement with the video-gaze model and the percentage of detected fixations. The monocular versions of the proposed algorithm detect a larger number of smooth pursuit movements but also a much larger number of smooth pursuit movements that are not in agreement with the video-gaze model. Since the binocular version of the proposed algorithm requires that the two eyes be synchronized during smooth pursuit movements, its performance is more robust with fewer detected smooth pursuit movements that are not in agreement with the video-gaze model. This is especially true for the image stimuli, as seen in Table 5, where the binocular version of the proposed algorithm detects 1.7% smooth pursuit movements. These detections are smooth pursuit-like movements that could, for instance, be due to postsaccadic drift that occurs in both eyes at the same time or due to changes in pupil size causing drift in the eye-tracking signal (Drewes, Zhu, Hu, & Hu, 2014). It should be noted that the difference between the monocular and binocular algorithms is small, and further research is needed to investigate the advantage of using binocular eye-tracking signals in event-detection algorithms. 
It should also be pointed out that movements were taking place 70%–100% of the time in the videos (see Table 3), and that the participants were asked to follow the moving objects to the largest extent possible. The results of Figure 9 verify that the instruction was followed; the video-gaze model indicates VGMs 60%–80% of the time. In Figure 9, some cases show a large gap between the VGMs indicated by the video-gaze model and the percentage of smooth pursuit movements detected by the algorithms based solely on the eye-tracking signals. One reason is that the videos—for example, Video 11—contain slow-moving objects, where the corresponding eye movements move so slowly in alignment with the moving object that they are considered VGMs by the video-gaze model but not smooth pursuit movements by the proposed algorithm (see Figure 13). 
Figure 13
 
Example of fixation and smooth pursuit detection performed by the proposed binocular algorithm. (a) Velocity signal, (b) x- and y-coordinate for the right eye, and (c) the final detection based on the signal from the two eyes, where gray indicates fixation and black indicates smooth pursuit movement or video-gaze movement. (d) The results from the video-gaze model, which detects a large amount of video-gaze movement due to slow-moving objects in the video stimulus.
Figure 13
 
Example of fixation and smooth pursuit detection performed by the proposed binocular algorithm. (a) Velocity signal, (b) x- and y-coordinate for the right eye, and (c) the final detection based on the signal from the two eyes, where gray indicates fixation and black indicates smooth pursuit movement or video-gaze movement. (d) The results from the video-gaze model, which detects a large amount of video-gaze movement due to slow-moving objects in the video stimulus.
The proposed algorithm is evaluated using a novel video-based evaluation strategy, which uses information automatically extracted from the stimuli videos. The logic behind the strategy is that smooth pursuit is only possible when there is a moving object to follow (see (Steinbach, 1976); if the eye-tracking signal is aligned with a moving object in terms of speed, direction, and position, it is therefore very likely that the participant is pursuing the object. This type of automatic evaluation strategy may not be as accurate as manual annotations, but it gives a general and objective picture of the performance of the evaluated algorithm. The main advantage of using automatic evaluation compared to manual annotations is that significantly larger amounts of data can be used in the evaluation process. Manual annotations are very time consuming and are not a practical solution for large data sets. With the proposed video-based evaluation strategy, longer videos can be used and a larger number of participants can be included when the performance of a new algorithm is evaluated. 
In order to show that there is a reasonable match between manual annotations and the proposed binocular algorithm, sensitivity and specificity were calculated at 0.72 and 0.85, respectively, for eye-tracking data recorded during moving-dot stimuli. 
Despite its clear logic, the proposed video-based evaluation strategy is new and lacks objective validation. To address this issue, the VGMs indicated by the video-gaze model are compared to smooth pursuit movements annotated manually in data recorded during moving-dot stimuli. In this comparison, the sensitivity and the specificity were determined to be 0.87 and 0.58, respectively. The rather low value of specificity may indicate that the video-gaze model cannot distinguish between fixations and smooth pursuit movements when the eye-tracking signal is close to a slowly moving object. 
The proposed algorithm presented in this work is evaluated for eye-tracking data recorded when stimuli are viewed in front of a computer screen and the participant's head is not moving. Future studies will investigate the generalizability of the algorithm in terms of parameter settings for different types of eye trackers. Future studies will also show if the algorithm is suitable for data recorded from a mobile eye tracker where participants are free to move their heads and when objects in the stimuli also move in depth. 
In order to use the proposed video-based evaluation strategy for other types of eye trackers than the one used in this article, an evaluation study regarding the velocity parameters needs to be performed to confirm these thresholds. The velocity thresholds may be set in relation to the noise level of the eye tracker used. 
In this work, the video-gaze model is used to confirm or reject smooth pursuit detections for the purpose of performance evaluation. It would be possible to instead incorporate such information in the event detector in order to provide even better fixation and smooth pursuit discrimination. Such strategy may be particularly well suited for mobile eye trackers where a scene camera is used to record the scene which the user is looking at. This is, however, outside the scope of this article. 
In the present article, the eye-tracking signals are recorded for participants without known problems with binocular coordination. The proposed binocular algorithm has therefore not been tested on participants with poor binocular control. Future studies will show if the requirements of synchronization will work also for this group of participants. 
Conclusions
An algorithm for the detection of fixations and smooth pursuit movements using binocular eye-tracking data is proposed. Using binocular information is most advantageous in image stimuli, where vergence or drift-like movements otherwise may be confused with smooth pursuit movements. The proposed binocular algorithm detects a larger number of smooth pursuit movements in moving-dot stimuli than previous algorithms, without increasing the percentage of false smooth pursuit detections for image stimuli. The proposed algorithm is evaluated using a novel video-based evaluation strategy based on automatically detected moving objects in video stimuli. Compared to manual annotation of data, this makes it practically feasible to evaluate larger amounts of data. 
Acknowledgments
This work was supported by the Strategic Research Project eSSENCE, funded by the Swedish Research Council. Data were recorded in the Lund University Humanities Laboratory. 
Corresponding author: Linnéa Larsson. 
Email: linnea.larsson@bme.lth.se. 
Address: Department of Biomedical Engineering, Lund University, Lund, Sweden. 
References
Allison, R. S, Eizenman, M, & Cheung, B. S. (1996). Combined head and eye tracking system for dynamic testing of the vestibular system. IEEE Transactions on Biomedical Engineering, 43 (11), 1073–1082.
Bouguet, J.-Y. (2001). Pyramidal implementation of the affine Lucas Kanade feature tracker: Description of the algorithm. Intel Corporation, 5, 1–10.
Calinski, T, & Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics, 3 (1), 1–27.
Drewes, J, Zhu, W, Hu, Y, & Hu, X. (2014). Smaller is better: Drift in gaze measurements due to pupil dynamics. PLoS ONE, 9 (10), e111197.
Duchowski, A, Medlin, E, Cournia, N, Murphy, H, Gramopadhye, A, Nair, S, & Melloy, B. (2002). 3-d eye movement analysis. Behavior Research Methods, Instruments, & Computers, 34 (4), 573–591.
Duda, R. O, Hart, P. E, & Stork, D. G. (2001). Pattern classification. New York: Wiley-Interscience.
Eden, G. F, Stein, J. F, Wood, H. M, & Wood, F. B. (1994). Differences in eye movements and reading problems in dyslexic and normal children. Vision Research, 34 (10), 1345–1358.
Engbert, R, & Kliegl, R. (2003). Microsaccades uncover the orientation of covert attention. Vision Research, 43 (9), 1035–1045.
Essig, K, Sand, N, Schack, T, Künsemöller, J, Weigelt, M, & Ritter, H. (2010). Fully-automatic annotation of scene videos: Establish eye tracking effectively in various industrial applications. In Proceedings of SICE annual conference 2010 (pp. 3304–3307 ). Taipei: IEEE.
Flechtner, K.-M, Steinacher, B, Sauer, R, & Mackert, A. (1997). Smooth pursuit eye movements in schizophrenia and affective disorder. Psychological Medicine, 27, 1411–1419.
Hayhoe, M. M, McKinney, T, Chajka, K, & Pelz, J. B. (2012). Predictive eye movements in natural vision. Experimental Brain Research, 217 (1), 125–136.
Holmqvist, K, Nyström, M, Andersson, R, Dewhurst, R, Jarodzka, H, & van de Weijer, J. (2011). Eye tracking: A comprehensive guide to methods and measures. Oxford, UK: Oxford University Press.
Kirkby, J. A, Webster, L. A. D, Blythe, H. I, & Liversedge, S. P. (2008). Binocular coordination during reading and non-reading tasks. Psychological Bulletin, 134 (5), 742–763.
Komogortsev, O, Gobert, D, Jayarathna, S, Hyong Koh, D, & Gowda, S. (2010). Standardization of automated analyses of oculomotor fixation and saccadic behaviors. IEEE Transactions on Biomedical Engineering, 57 (11), 2635–2645.
Komogortsev, O, & Karpov, A. (2013). Automated classification and scoring of smooth pursuit eye movements in the presence of fixations and saccades. Behavior Research Methods, 45 (1), 203–215.
Kurzhals, K, Bopp, C. F, Bässler, J, Ebinger, F, & Weiskopf, D. (2014). Benchmark data for evaluating visualization and analysis techniques for eye tracking for video stimuli. In Lam, H. Isenberg, P. Isenberg, T. & Sedlmair M. (Eds.), Proceedings of the fifth workshop on beyond time and errors: novel evaluation methods for visualization (pp. 54–60 ). Paris: ACM.
Larsson, L, Nyström, M, Andersson, R, & Stridh, M. (2015). Detection of fixations and smooth pursuit movements in high-speed eye-tracking data. Biomedical Signal Processing and Control, 18, 145–152.
Larsson, L, Nyström, M, & Stridh, M. (2013). Detection of saccades and post-saccadic oscillations in the presence of smooth pursuit. IEEE Transactions on Biomedical Engineering, 60 (9), 2484–2493.
Leigh, R, & Zee, D. (2006). The neurology of eye movements. Oxford, UK: Oxford University Press.
Martins, A. J, Kowler, E, & Palmer, C. (1985). Smooth pursuit of small-amplitude sinusoidal motion. Journal of the Optical Society of America A, 2 (2), 234–242.
Meyer, C, Lasker, A, & Robinson, D. (1985). The upper limit of human smooth pursuit. Vision Research, 25 (4), 561–563.
Otero-Millan, J, Castro, J. L. A, Macknik, S. L, & Martinez-Conde, S. (2014). Unsupervised clustering method to detect microsaccades. Journal of Vision, 14 (2): 18, 1–17, doi: 10.1167/14.2.18. [PubMed] [Article]
Rayner, K. (2009). Eye movements and attention in reading, scene perception, and visual search. The Quarterly Journal of Experimental Psychology, 62 (8), 1457–1506.
Rayner, K, Chace, K, Slattery, T, & Ashby, J. (2009). Eye movements as reflections of comprehension processes in reading. Scientific Studies of Reading, 10 (3), 241–255.
Shi, J, & Tomasi, C. (1994). Good features to track. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 593–600 ). IEEE.
Steinbach, M. J. (1976). Pursuing the perceptual rather than the retinal stimulus. Vision Research, 16 (12), 1371–1376.
van der Lans, R, Wedel, M, & Pieters, R. (2011). Defining eye-fixation sequence across individuals and tasks: The binocular-individual threshold (BIT) algorithm. Behavior Research Methods, 43 (1), 239–257.
Versino, M, Hurko, O, & Zee, D. S. (1996). Disorders of binocular control of eye movements in patients with cerebellar dysfunction. Brain, 119 (6), 1933–1950.
Figure 1
 
Overview of the structure for the proposed binocular event-detection algorithm.
Figure 1
 
Overview of the structure for the proposed binocular event-detection algorithm.
Figure 2
 
(a) Position signals of a saccade and (b) its corresponding sample-to-sample velocity (gray) and median filtered (black). A true saccade results in a low spike index. (c) Position signals of a spike incorrectly detected as a saccade and (d) its corresponding sample-to-sample velocity (gray) and median filtered (black). A spike results in a high spike index.
Figure 2
 
(a) Position signals of a saccade and (b) its corresponding sample-to-sample velocity (gray) and median filtered (black). A true saccade results in a low spike index. (c) Position signals of a spike incorrectly detected as a saccade and (d) its corresponding sample-to-sample velocity (gray) and median filtered (black). A spike results in a high spike index.
Figure 3
 
Three examples of signals with different noise levels—(a) low, (c) medium, and (e) high—and their respective spectra (b), (d), and (f).
Figure 3
 
Three examples of signals with different noise levels—(a) low, (c) medium, and (e) high—and their respective spectra (b), (d), and (f).
Figure 4
 
The generalized logistic function S(a) used in this article. The high-frequency noise indices of the three signal examples shown in Figure 3 are marked with arrows.
Figure 4
 
The generalized logistic function S(a) used in this article. The high-frequency noise indices of the three signal examples shown in Figure 3 are marked with arrows.
Figure 5
 
(a) A fixation and (b) the corresponding directional clustering. (c) A smooth pursuit movement and (d) the corresponding directional clustering. Note the spread in all directions during a fixation (b) and a clear direction during a smooth pursuit movement (d).
Figure 5
 
(a) A fixation and (b) the corresponding directional clustering. (c) A smooth pursuit movement and (d) the corresponding directional clustering. Note the spread in all directions during a fixation (b) and a clear direction during a smooth pursuit movement (d).
Figure 6
 
An example of the clustering of the detected feature points in a frame. The tracks are clustered with respect to their speeds and directions (see Figure 7).
Figure 6
 
An example of the clustering of the detected feature points in a frame. The tracks are clustered with respect to their speeds and directions (see Figure 7).
Figure 7
 
Overview of the clustering for the sample-to-sample velocities of the tracks into six clusters. The red circle marks the velocity of the eye.
Figure 7
 
Overview of the clustering for the sample-to-sample velocities of the tracks into six clusters. The red circle marks the velocity of the eye.
Figure 8
 
Example of fixation and smooth pursuit detection performed by the proposed binocular algorithm. (a) Velocity signal and (b) x- and y-coordinate for the right eye. In both panels, red indicates fixation and blue indicates smooth pursuit.
Figure 8
 
Example of fixation and smooth pursuit detection performed by the proposed binocular algorithm. (a) Velocity signal and (b) x- and y-coordinate for the right eye. In both panels, red indicates fixation and blue indicates smooth pursuit.
Figure 9
 
Percentage of time indicated as video-gaze movements by the video-gaze model and percentage of time in smooth pursuit movements for the proposed algorithm and the algorithm of Larsson et al. (2015).
Figure 9
 
Percentage of time indicated as video-gaze movements by the video-gaze model and percentage of time in smooth pursuit movements for the proposed algorithm and the algorithm of Larsson et al. (2015).
Figure 10
 
The balanced performance measure—mean value between the percentage of detected smooth pursuit movements in moving-dot and percentage of fixations in image stimuli—for the proposed binocular (Bin), proposed monocular right (R), and proposed monocular left (L) algorithms and the algorithm of Larsson et al. (2015).
Figure 10
 
The balanced performance measure—mean value between the percentage of detected smooth pursuit movements in moving-dot and percentage of fixations in image stimuli—for the proposed binocular (Bin), proposed monocular right (R), and proposed monocular left (L) algorithms and the algorithm of Larsson et al. (2015).
Figure 11
 
Results of a leave-one-out cross validation for the proposed binocular algorithm and the algorithm of Larsson et al. (2015).
Figure 11
 
Results of a leave-one-out cross validation for the proposed binocular algorithm and the algorithm of Larsson et al. (2015).
Figure 12
 
(a) Binocular eye-tracking data of an intersaccadic interval recorded during image stimuli. The right eye is shown in gray and the left eye in black. The black circles show the start of the intersaccadic interval. In the beginning of the interval, the signals from the two eyes move differently and are deemed to be moving unsynchronized. (b) Output signals from the two synchronization filters. The solid line is the output from the 50-ms-long filter, and the dash-dotted line is the output from the 80-ms-long filter. Both filters are described in Table 1.
Figure 12
 
(a) Binocular eye-tracking data of an intersaccadic interval recorded during image stimuli. The right eye is shown in gray and the left eye in black. The black circles show the start of the intersaccadic interval. In the beginning of the interval, the signals from the two eyes move differently and are deemed to be moving unsynchronized. (b) Output signals from the two synchronization filters. The solid line is the output from the 50-ms-long filter, and the dash-dotted line is the output from the 80-ms-long filter. Both filters are described in Table 1.
Figure 13
 
Example of fixation and smooth pursuit detection performed by the proposed binocular algorithm. (a) Velocity signal, (b) x- and y-coordinate for the right eye, and (c) the final detection based on the signal from the two eyes, where gray indicates fixation and black indicates smooth pursuit movement or video-gaze movement. (d) The results from the video-gaze model, which detects a large amount of video-gaze movement due to slow-moving objects in the video stimulus.
Figure 13
 
Example of fixation and smooth pursuit detection performed by the proposed binocular algorithm. (a) Velocity signal, (b) x- and y-coordinate for the right eye, and (c) the final detection based on the signal from the two eyes, where gray indicates fixation and black indicates smooth pursuit movement or video-gaze movement. (d) The results from the video-gaze model, which detects a large amount of video-gaze movement due to slow-moving objects in the video stimulus.
Table 1
 
Settings for binary filters F1–F9, which emphasize fixations. Notes: The filter length for the Total distance filter is described as the percentage of the current intersaccadic interval.
Table 1
 
Settings for binary filters F1–F9, which emphasize fixations. Notes: The filter length for the Total distance filter is described as the percentage of the current intersaccadic interval.
Table 2
 
Settings for binary filters S1–S9, which emphasize smooth pursuit movements. Notes: The filter length for the Total distance filter is described as the percentage of the current intersaccadic interval.
Table 2
 
Settings for binary filters S1–S9, which emphasize smooth pursuit movements. Notes: The filter length for the Total distance filter is described as the percentage of the current intersaccadic interval.
Table 3
 
Proportion of moving objects in video clips with static camera (Videos 1–7) and moving camera (Videos 8–14).
Table 3
 
Proportion of moving objects in video clips with static camera (Videos 1–7) and moving camera (Videos 8–14).
Table 4
 
Settings for intrinsic parameters for the proposed binocular event-detection algorithm.
Table 4
 
Settings for intrinsic parameters for the proposed binocular event-detection algorithm.
Table 5
 
Results for the eye-tracking signals recorded with image stimuli for the test (development) database. Notes: Bin: Binocular. Mono R: Monocular right eye. Mono L: Monocular left eye.
Table 5
 
Results for the eye-tracking signals recorded with image stimuli for the test (development) database. Notes: Bin: Binocular. Mono R: Monocular right eye. Mono L: Monocular left eye.
Table 6
 
Results for the eye-tracking signals recorded with moving-dot stimuli for the test (development) database. Notes: Bin: Binocular. Mono R: Monocular right eye. Mono L: Monocular left eye.
Table 6
 
Results for the eye-tracking signals recorded with moving-dot stimuli for the test (development) database. Notes: Bin: Binocular. Mono R: Monocular right eye. Mono L: Monocular left eye.
Table 7
 
Results for the eye-tracking signals recorded with video stimuli with static camera for the test (development) database. Notes: Bin: Binocular. Mono R: Monocular right eye. Mono L: Monocular left eye.
Table 7
 
Results for the eye-tracking signals recorded with video stimuli with static camera for the test (development) database. Notes: Bin: Binocular. Mono R: Monocular right eye. Mono L: Monocular left eye.
Table 8
 
Results for the eye-tracking signals recorded with video stimuli with moving camera for the test (development) database. Notes: Bin: Binocular. Mono R: Monocular right eye. Mono L: Monocular left eye.
Table 8
 
Results for the eye-tracking signals recorded with video stimuli with moving camera for the test (development) database. Notes: Bin: Binocular. Mono R: Monocular right eye. Mono L: Monocular left eye.
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×