Open Access
Methods  |   July 2024
(The limits of) eye-tracking with iPads
Author Affiliations
  • Aryaman Taore
    School of Optometry & Vision Science, The University of Auckland, Auckland, New Zealand
    aryamantaore@gmail.com
  • Michelle Tiang
    School of Optometry & Vision Science, The University of Auckland, Auckland, New Zealand
    michellemltiang@gmail.com
  • Steven C. Dakin
    School of Optometry & Vision Science, The University of Auckland, Auckland, New Zealand
    UCL Institute of Ophthalmology, University College London, London, United Kingdom
    s.dakin@auckland.ac.nz
Journal of Vision July 2024, Vol.24, 1. doi:https://doi.org/10.1167/jov.24.7.1
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Aryaman Taore, Michelle Tiang, Steven C. Dakin; (The limits of) eye-tracking with iPads. Journal of Vision 2024;24(7):1. https://doi.org/10.1167/jov.24.7.1.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Applications for eye-tracking—particularly in the clinic—are limited by a reliance on dedicated hardware. Here we compare eye-tracking implemented on an Apple iPad Pro 11” (third generation)—using the device's infrared head-tracking and front-facing camera—with a Tobii 4c infrared eye-tracker. We estimated gaze location using both systems while 28 observers performed a variety of tasks. For estimating fixation, gaze position estimates from the iPad were less accurate and precise than the Tobii (mean absolute error of 3.2° ± 2.0° compared with 0.75° ± 0.43°), but fixation stability estimates were correlated across devices (r = 0.44, p < 0.05). For tasks eliciting saccades >1.5°, estimated saccade counts (r = 0.4–0.73, all p < 0.05) were moderately correlated across devices. For tasks eliciting saccades >8° we observed moderate correlations in estimated saccade speed and amplitude (r = 0.4–0.53, all p < 0.05). We did, however, note considerable variation in the vertical component of estimated smooth pursuit speed from the iPad and a catastrophic failure of tracking on the iPad in 5% to 20% of observers (depending on the test). Our findings sound a note of caution to researchers seeking to use iPads for eye-tracking and emphasize the need to properly examine their eye-tracking data to remove artifacts and outliers.

Introduction
Our eyes are in constant motion making rapid movements (known as saccades) two to three times a second to settle on (fixate) locations of interest within a scene, allowing fixated regions to be processed by high-resolution foveal vision (Leigh, 2015; Wade & Tatler, 2005). If an object of interest within the scene is in motion, we can voluntarily move our eyes to track it (smooth pursuit), and when larger portions of the scene move our eyes will make involuntary (optokinetic) eye movements to minimize movement of the world over the back of the eye (retinal slip). Although we remain largely unaware of these various types of eye movements (Tatler, Kirtley, Macdonald, Mitchell, & Savage, 2014), measuring them can provide a range of information about conscious and unconscious cognitive and perceptual processing. Consequently, eye-tracking has a broad range of research applications. In terms of applied research, eye-tracking drives many studies of usability (Cowen, Ball, & Delin, 2002; Jacob & Karn, 2003; Wang et al., 2019) from analyzing user behavior for marketing purposes (Wedel & Pieters, 2008, Wedel & Pieters, 2017) to understanding decision-making during driving (Mourant & Rockwell, 1970; Vetturi, Tiboni, Maternini, & Bonera, 2020). Within science, eye-tracking continues to play a vital role in physiological research (Frost-Karlsson et al., 2019; Holzman, 1974; Snegireva, Derman, Patricios, & Welman, 2018), where differences in patterns of fixation and eye movements have been reported for various clinical conditions impacting brain function. 
In this article, we aimed to assess the efficacy of eye-tracking run on consumer-grade devices, which are widely accessible and have benefited from recent advancements in image processing and mobile computing. Specifically, we compare the performance of an iPad (Apple Inc., Cupertino, CA) with that of a video-based infrared (IR) eye tracker. 
Today, video-based remote eye trackers are the everyday standard for measuring eye movements. These trackers operate at some distance from the user and are typically more compact than their predecessors, with no moving components. Some remote eye-trackers even allow the user to move their head freely, making them particularly valuable when working with children or clinical populations who may struggle to remain still for extended periods. 
Here we consider three of the most widely used methods for remote eye-tracking (i.e., non-invasive methods operating at distance from the participant) in use today: first, laboratory-based IR eye-tracking; second, consumer-grade IR eye-tracking; and third, software-only eye-tracking. 
Most dedicated remote eye-trackers use a series of digital images of the eyes to estimate the (x, y) coordinates of where a participant is looking (Cooke, 2005) on a screen of fixed/known distance. IR imaging of the participant's eyes provide good estimates of gaze-position (often in real time) (Cornelissen, Peters, & Palmer, 2002), due to the distinct visibility of the pupil in IR images. Such visibility is a result of the IR light reflecting off the retina, making the pupil easy to identify and locate. Additionally, IR imaging minimizes reflections from the scene onto the cornea, which improves pupil detection. When the source of IR illumination aligns with the camera's optical axis, it results in a bright pupil effect. Conversely, when the illumination is off axis, it results in a dark pupil effect. The bright pupil effect is particularly effective in enhancing contrast between the pupil and dark irises. Once the pupil is accurately localized, its position is translated to on-screen coordinates through a two-step process. The first step, termed the calibration phase, involves the participant looking at a series of predetermined points on the screen. This step establishes a ground-truth relationship between the estimated pupil position and the known screen location. After this, in the second step, algorithms (often using multiparameter regression or machine learning methods) use these calibration data to map new estimates of pupil position to specific screen locations. Using this approach, most research-grade eye-trackers can achieve accuracy of less than 0.6° and precision of less than 0.25° (Gibaldi, Vanegas, Bex, & Maiello, 2017), at very high data acquisition rates (often several kHz). 
However, accuracy and precision are susceptible to degradation by an array of factors during the eye-tracking process. As a result, although a device's specification may indicate that the error in estimating gaze direction is 0.5°, this estimate can double during real-world testing (Holmqvist et al., 2011). Accurately localizing the pupil's boundary can be disrupted by individual variations in retinal IR reflectance, pupil size, use of corrective eyewear, or extent of eye rotation (Nguyen, Wagner, Koons, & Flickner, 2002). Additionally, factors such as iris color or physical obstructions like eyelids or eyelashes can further complicate pupil detection (Nyström, Andersson, Holmqvist, & van de Weijer, 2013). Transient physiological changes, such as pupil dilation or constriction, also introduce variability (Bedell & Stevenson, 2013; Choe, Blake, & Lee, 2016; Drewes, Masson, & Montagnini, 2012), as does head movement (Cerrolaza, Villanueva, Villanueva, & Cabeza, 2012; Hessels, Andersson, Hooge, Nyström, & Kemner, 2015; Houben, Goumans, & van der Steen, 2006). Eye-trackers that do not account for head motion and only correlate pupil position with screen coordinates can be inaccurate. Finally, the choice of calibration method—whether pursuit of a moving target or a series of fixations—can introduce challenges. For example, the smooth pursuit approach can result in a lag between the actual stimulus position and eye direction (Robinson, 1965). In contrast, the sequential fixation method might undersample certain screen regions and/or be sensitive to lapses in participant concentration, potentially misaligning pupil position with the intended on-screen gaze position. 
Systems like the Eyelink 1000 (SR Research Ltd., Ottawa, Canada) are the day-to-day gold standard for laboratory-based remote eye-tracking, but employ expensive, specialized hardware (e.g., IR illuminator, IR camera, processing unit), limiting their use outside of the laboratory (Papoutsaki, 2015). Cheaper consumer eye-trackers employ similar techniques (e.g., IR illumination, multiparameter algorithm) to predict gaze, but typically image the eyes at a lower spatial and temporal resolution (i.e., a sampling rate of <250 Hz), increasing gaze estimation errors to more than 1.0° (Brand, Diamond, Thomas, & Gilbert-Diamond, 2021). The suitability of consumer eye-trackers depends on the intended application. Although they may provide a general indication of a participant's gaze direction (e.g., at a fixation marker or word) or support the analysis of saccade and fixation characteristics in tasks like free viewing or reading, researchers need to evaluate their effectiveness within their specific application, considering the device's inherent limitations. 
Software-only eye-tracking systems typically capture eye images using an unmodified digital video camera (Papoutsaki, Laskey, & Huang, 2017), such as the front facing camera on a smartphone. Locating the pupil in visible light images can be challenging due to variability in lighting, reflections from spectacles, and so on (Larrazabal, García Cena, & Martínez, 2019). As such, the accuracy of software-only eye-tracking is much lower (closer to 3°) (Krafka et al., 2016; Zhang, Sugano, Fritz, & Bulling, 2015), as is the signal-to-noise ratio (Gómez-Poveda & Gaudioso, 2016). The latter is perhaps even more detrimental, because noisy data can often obscure small eye movements (Holmqvist, Nyström, & Mulvey, 2012; Ko, Snodderly, & Poletti, 2016). Further, the low sampling rate of software-only eye-trackers (15–30 Hz) (Valliappan et al., 2020) makes them unsuitable for capturing rapid eye movements (lasting 20–40 ms). However, despite their shortcomings, software-only eye-trackers offer many advantages arising from their running on unmodified consumer hardware. Combined with the advent of widely available machine learning techniques, this has led to a proliferation of software-only eye-tracking techniques in recent years. Examples include Hawkeye (http://www.usehawkeye.com), iMotions (https://imotions.com/), EyeWare Beam (https://beam.eyeware.tech/), GazeRecorder (https://gazerecorder.com/), and Sticky (https://www.tobii.com/products/software/online-marketing-research/sticky). 
In terms of applications of this technology, software-only eye-tracking has the potential to make large quantities of (albeit noisier) eye-tracking data available and so transform clinical testing based on eye-tracking. Based on the published accuracy of these techniques, it is likely that at least some tests (e.g., that use eye-tracking to ensure compliant fixation or require only crude estimate of fixation location) could use software-only eye-tracking. Further, tests could also be adapted to be less reliant on accurate eye-tracking, for instance, by increasing stimulus size or spacing fixation markers in different screen quadrants. 
Research-grade eye trackers deliver hundreds of relatively accurate estimates of gaze position per second. However, not all studies are likely to require this level of specification. In certain research contexts, a moderately priced and slightly less accurate eye tracker might suffice, provided that the researchers understand and account for the inherent limitations and potential errors of these devices. It is crucial to emphasize that eye-tracking requirements for a given project depend on both the measures being made and on the stimulus being used to make those measurements. Clearly, investigations that seek to characterize ballistic eye movements or absolute gaze position—with maximal levels of accuracy/precision—will need to use research-grade eye trackers. However, this degree of spatial accuracy is not required in, for example, optokinetic research, which measures eye velocity (the change in eye position over time) and which is consequently resilient to inaccurate estimate of absolute gaze position (Dakin & Turnbull, 2016; Taore, Lobo, Turnbull, & Dakin, 2022). Further—as well as the measure being made—the stimulus being used can support the use of less elaborate eye-tracking solutions. For example, for tasks where observers are engaged in the reading of pages of text, the position of the eye is constrained by the stimulus—because gaze position tends to fall along lines of text—and this factor can simplify the analysis of noisy estimates of gaze position from software-only eye-tracking. Similarly, using a velocity threshold (Andersson, Larsson, Holmqvist, Stridh, & Nyström, 2017), potentially even coupled with a piece-wise constant function to fit the remaining short intervals of data (Mulligan, 2018), can effectively parse noisy gaze data into fixations and saccades. 
Alongside the dedicated-hardware or software-only approaches, the iPad tablet (Apple Inc.) uses a combination of software-only eye-tracking and dedicated IR imaging/computer hardware (Anisimov et al., 2021). This approach increases eye-tracking accuracy and sampling rate compared with software-only solutions (Wang et al., 2019). The iPad's front camera is equipped with an IR illuminator that projects 30,000 dots on the user's face, and a low spatial resolution IR camera to record the resulting image. The position of the dots relative to one another generates a depth map of the face and assists with localization of features, such as the eyes (Raynal, 2019). The depth map is used for face localization and identification (Raynal, 2019) and to create animated cartoon emojis (referred to as Animoji) that mimic a user's facial expressions in real time via an animated cartoon character. These cartoons mirror not only expression, but also gaze direction, indicating that eye position is being estimated at the same time (likely using not low-resolution IR imaging, but higher-resolution visible light imaging from the front-facing camera). Dedicated hardware on iPad's bionic chip (Raynal, 2019) allows the iPad to sample head and eye movement at 60 Hz, which is considerably higher than alternative software-only eye-tracking solutions that sample at between 15 and 30 Hz (Papoutsaki et al., 2017; Valliappan et al., 2020). The three-dimensional localization of the head also ensures eye movements are decoupled from head movements, removing the need for a chin rest. Critically, the iPad is a ubiquitous and portable device with high consistency in display characteristics (Bodduluri, Boon, & Dain, 2017; de Fez, Luque, García-Domene, Camps, & Piñero, 2016) and camera technology, where estimated eye position data are accessible via an SDK. 
In this study, we set out to document the characteristics of eye movements recorded with an iPad compared with those captured with a widely recognized and more elaborate eye-tracking device, the Tobii. In particular, we were interested in the measurement of fixation and saccades, which can be used in screening for neurological conditions (e.g., Alzheimer's and Parkinson's disease, epilepsy) (Tao et al., 2020) or visual disorders such as glaucoma (Smith, Glen, Mönter, & Crabb, 2014) or macular degeneration (Rubin & Feely, 2009). Screening tests typically analyze gaze patterns independent of the stimulus (Wang et al., 2018; Zhang et al., 2022), focusing on broader characteristics of eye movement (such as velocity, saccadic latency, and amplitude), rather than requiring the assessment of precise gaze location on the screen. We show that, because fixation and saccades have different velocity characteristics (close to 0°/s for fixation and >20°/s for saccades) (Nyström & Holmqvist, 2010), even inaccurate eye-tracking data can be divided into these two components using a velocity threshold. In such an instance, gaze does not need to be accurately mapped to the screen to infer number of saccades or duration of fixation relevant to neurological and physiological screening. 
We are aware of only a handful of published works that have used the iPad to collect eye-tracking data (Anisimov et al., 2021; Holland, Garza, Kurtova, Cruz, & Komogortsev, 2013; Holland & Komogortsev, 2012). The most recent study by Anisimov et al. revealed that the number of saccades and fixations made during reading were similar when estimated with an iPad or with a commercial-grade eye-tracker. Other studies have evaluated other software-only eye-tracking approaches. For example, Google's SOTA mobile tracking (Valliappan et al., 2020) produces comparable results with a commercial-grade head-mounted eye-tracker during standard oculomotor tasks (fixation, smooth pursuit, etc.). 
The goal of this study was to present an evaluation of the quality of eye-tracking available on the third generation iPad Pro (11”). Using Apple's built-in estimates of distance, translation, and rotation of the eyes (Apple ARKit), we categorize eye events and record metrics such as number of saccades, fixation duration, smooth pursuit amplitude, among others. These metrics are compared with similar estimates derived from data from a commercial remote eye-tracker (Tobii 4c), that samples at 90 Hz and has an estimated error of less than 1.0°. The iPad, in contrast, samples at 60 Hz and has been reported to have an estimated error just above 3.0° (Wang et al., 2019). Here we treat the output of the Tobii 4c as our ground truth, as in previous studies that aimed to classify and/or track eye movements (Dakin et al., 2019; Yu et al., 2018). We compare measures from the iPad and the Tobii devices for fixation and saccade-intensive tasks that are typically used in physiological research. We recognize that the Tobii is by no means the most accurate or precise eye-tracker available today. Instead, the Tobii was intended to serve as a point of comparison for this specific study, not to act as a definitive standard. 
Note that our evaluation uses an iPad eye-tracking program based on the universally adopted ARKit framework, endorsed by Apple, developers, and a vast user community. We acknowledge that ARKit may prioritize speed and energy efficiency, features more suited for mainstream consumer applications like Animoji, rather than precise eye-tracking required for vision science research. Although algorithms designed to optimize eye-tracking accuracy could potentially offer improved performance by processing the front camera's image stream, these specialized solutions might not be as immediately available to all researchers as ARKit is. Consequently, a thorough assessment of ARKit's innate strengths and limitations is crucial. 
Methods
We developed a native iOS application for the iPad that presented a series of tasks intended to elicit saccades and fixations, while recording estimates of the yaw and pitch of participant's eyes via Apple's built in software framework (ARKit2; Apple ARKit). The yaw and pitch estimates were mapped to the iPad screen coordinates (using a polynomial fit detailed under Processing eye-tracking data) using measurements made during an initial 25-second calibration phase. An IR eye-tracker (Tobii 4c) was also calibrated to the screen coordinates via Tobii's built-in five-point calibration (where four of the five calibration points were located near the corners of the calibration display and the remaining point was located at the center). The two sets of eye-tracking data were analyzed using the same algorithm—described below under Adaptive threshold algorithm—to produce metrics that could be meaningfully compared because they differed only in terms of the device from which they were calculated. 
Participants
We recruited 28 participants (14 females,14 male, aged 18–57 years) with no self-reported neurological or ocular conditions and who wore prescribed optical correction for the experiment, as required. The protocols and procedure complied with the Declaration of Helsinki, and informed consent was obtained from all participants before the experiment. Our protocol was approved by the University of Auckland Human Participants Ethics Committee. 
Apparatus
For the eye-tracking tasks, we presented stimuli on an Apple iPad Pro 11” (third generation) with a 2,388 × 1,668-pixel IPS display operating at 120 Hz. The screen (set to its maximum brightness of 424 cd/m2) was viewed in portrait mode to ensure that the front-facing camera (located at the top middle of the iPad) had a clear view of both eyes. The iPad was viewed binocularly under standard room lighting (Illuminant D65) at a distance of approximately 50 cm without head or chin support. Stimuli were created in Xcode using Apple's inbuilt UIKit framework (UIKit). 
A Tobii 4c eye-tracker was placed directly below the iPad and connected to a Windows 10 laptop that recorded the data stream from the device. The Tobii 4c eye-tracker uses a series of IR images of the user at 90 Hz to a) localize the face and b) estimate the direction of gaze and provide an estimate of the screen pixel binocular gaze point (Active Display Gaze Point in the Tobii Pro software framework; Technology, 2010). The experiment was performed without chin or headrest, although participants were asked to try to maintain a constant head position. 
Stimuli
Participants performed five tasks (fixation task, letter saccade, reading A, reading B, and smooth pursuit) designed to cover the most common eye-tracking tasks used for screening of ocular and neurological conditions. Our tasks accord with those adopted by clinically validated systems such as the Thomson Clinical Eye-tracker (Thomson, 2017) and RightEye modules (Hunfalvay et al., 2020), both of which use commercial-grade eye-trackers to measure gaze in clinical settings. A description of each task is provided. 
Fixation task
Fixation targets appeared sequentially at 1 of 16 locations (which collectively formed a 4 × 4 grid spanning 12° of visual angle horizontally and 15° vertically). During testing, participants viewed a black disc (diameter 1.22°) with a white center (diameter 0.24°) presented at a randomly selected location for a period of 3 seconds. After this time, the disc reappeared at another of the 15 possible remaining locations, and this change was repeated until all locations had been covered. The sequence was run twice. 
Letter saccade task
Twenty randomly selected capitalized letters (37-point Times New Roman, equivalent to 1.0 logarithm of the minimum angle of resolution [logMAR]) were evenly distributed across two columns on the screen (i.e., 10 letters per column). The columns themselves were 11° apart and measured 0.5° in width and 15.5° in height. The letters were evenly distributed vertically inside the columns. The participant was instructed to silently read each letter from the left to the right, starting at the top left letter. 
Reading (tasks A and B)
The participants were required to silently read a passage of approximately 150 words (set in the Times New Roman font). This task was repeated twice: task A with a passage set at 18 points (equivalent to 0.7 logMAR), and task B used a different passage set at 23 points (equivalent to 0.8 logMAR). Both passages were sourced from the International Reading Speed Texts, which are a series of 10 paragraphs selected so each is similar to one another in terms of length, difficulty, and linguistic complexity (Trauzettel-Klosinski, Dietz, & Group, 2012). 
Smooth pursuit task
A black disc with a white center (same dimensions as the target in the fixation task) moved along a circular path of 5° radius at an average speed of 10°/s. The dot moved for 30 seconds in an anticlockwise direction before switching to a clockwise direction; the dot changed direction a total of three times during the test. Participants were instructed to track the disc as accurately as possible. 
Processing eye-tracking data
Processing of raw eye-tracking data from both devices—collected simultaneously—involved the processes as described. 
Calibration
During the 25-second calibration phase—which preceded every test—approximately 1,500 individual eye-tracking samples were collected from the iPad over different locations on screen (which covered every width and height coordinate separately at least once). Each sample, generated using ARKit 2, included a timestamp (in milliseconds), estimates of yaw and pitch of the individual eyes, the closure of the individual eyelids (0 fully open, 1 fully closed), the location of the calibration stimulus on screen (in screen coordinates), and the distance of the user from the screen. Two polynomial curves (of order 1, 2, or 3) were used to map the eyes’ mean yaw estimate to the x coordinate of the calibration stimulus, and pitch to the y coordinate, respectively. The order of curves that produced the lowest mean absolute error between predicted gaze and stimulus position were selected. Calibration of the Tobii was carried out by the Tobii Eye-tracking Software using an inbuilt five-point calibration before the experiment. To ensure the accuracy of the eye-tracking data, the quality of Tobii's eye-tracking was assessed before each test by running the Tobii test environment. This involved asking the user to look at nine circles (of diameter 2°) placed across the screen while the experimenter observed their gaze to judge its accuracy. If the observers' gaze was judged to fall outside any of the circles, the Tobii was recalibrated. Finally, the distance of the user (estimated by the iPad), the gaze predictions (in screen coordinates) from the Tobii, and the iPad were used to convert the eye-tracking data into degrees for further analysis. 
Blink removal
iPad data were categorized as blinks when the average closure of the two eyelids exceeded 0.3 (this measure is scaled from 0.0 to 1.0, from open to closed). Tobii's inbuilt algorithm automatically categorizes samples as blinks when appropriate. For both devices’ data, samples categorized as blinks were removed and filled in via simple linear interpolation based on neighboring/non-missing values. It should be noted that for samples where there was a discrepancy between the iPad and Tobii regarding whether a sample was judged to be a blink, we did not attempt to resolve the conflict. This allowed the iPad to offer an independent preprocessing approach that more closely reflected real-world scenarios. 
Resampling and temporal alignment
The iPad dataset was first linearly interpolated to match the 90-Hz sampling rate of the Tobii dataset. To sync the iPad eye-tracking data with that of the Tobii, participants were asked to press the space bar on the laptop running the Tobii when the iPad test begun. To account for reaction time, a shift or delay (maximum of 0.5 seconds) was applied to the Tobii data, with that value being set to minimize errors between the gaze prediction from the iPad and the Tobii. 
Adaptive speed thresholding
To divide data into saccades and fixations, a speed threshold was then calculated on a test-by-test basis separately for iPad and Tobii data. Our algorithm was inspired by the saccade detection algorithm proposed by Nyström and Holmqvist (2010). Part of their solution involved an adaptive speed threshold that made saccade detection less sensitive to variations in noise level. In their approach, the threshold was calculated iteratively as a function of the mean plus six standard deviations. Although Nyström et al.’s algorithm outperformed all nine alternative algorithms in a recent review (Andersson et al., 2017), it failed to classify saccades when pilot tested on our iPad data. In almost all cases, this result was because the high level of intrinsic noise on iPad data inflated the speed threshold, so that only very large amplitude eye movements were classified as saccades (Figure 1b). To manage this issue, we adapted Nyström et al.’s algorithm using the median speed plus three scaled median absolute deviations (MADs) as a threshold (Equation 1). Both the use of MADs, which is a more robust measure of dispersion (Leys, Ley, Klein, Bernard, & Licata, 2013), and the multiple of three made sure the threshold was less biased to the presence of outliers and noise (Figure 1b). Unlike Nyström et al., we also chose to use both speed and acceleration as inputs, because both derivates have been used in tandem or separately to classify saccades in previous studies (Behrens, MacKeben, & Schröder-Preikschat, 2010; Friedman, Rigas, Abdulin, & Komogortsev, 2018; König & Buffalo, 2014) (Figure 1a). 
Figure 1.
 
Plots relating the speed and acceleration of eye-tracking data collected from the iPad for a participant performing Reading task A. (a) Plot of eye-speed vs eye acceleration. Red discs show data that exceeded the acceleration and speed threshold (shown as red lines in (b), (c), and (d)) with remaining points (blue discs) being classed as fixations. (b) Plot of speed vs time, showing the speed threshold calculated by our adaptive algorithm in red, and that of Nyström et al.’s in gray. Notice how our algorithm sets the threshold lower to include low speed saccades expected when jumping between words within a sentence, whereas Nyström et al.’s approach only classifies high speed saccades expected when jumping to a new line. (c) Acceleration vs time plot, showing the acceleration threshold calculated by our adaptive algorithm in red. (d) A close-up of a speed vs time plot for a participant performing Reading task A. It shows eye-speed data from the iPad (blue) and Tobii (black; shifted up by 100 deg/s for clarity). Horizontal green bars indicate fixation intervals, determined by each dataset's unique speed/acceleration threshold (red) and additional processing detailed under ‘Additional processes specific to tasks.’ Note the similarities in speed profiles between iPad and Tobii devices, except for a 31- to 32-second period, where the iPad's failure to capture abrupt speed changes, results in the merging of multiple fixations. (e) Distribution of fixation durations across both devices for the same participant. Some of the longer fixations (>0.5 seconds) captured by the iPad are attributable to a failure to capture abrupt speed change, as discussed in (d). If we were to exclude these longer fixations, the distributions of fixation durations from both devices are similar with mean fixation duration = 167 ms; σ = ∼90 ms for fixations lasting < 0.5 seconds.
Figure 1.
 
Plots relating the speed and acceleration of eye-tracking data collected from the iPad for a participant performing Reading task A. (a) Plot of eye-speed vs eye acceleration. Red discs show data that exceeded the acceleration and speed threshold (shown as red lines in (b), (c), and (d)) with remaining points (blue discs) being classed as fixations. (b) Plot of speed vs time, showing the speed threshold calculated by our adaptive algorithm in red, and that of Nyström et al.’s in gray. Notice how our algorithm sets the threshold lower to include low speed saccades expected when jumping between words within a sentence, whereas Nyström et al.’s approach only classifies high speed saccades expected when jumping to a new line. (c) Acceleration vs time plot, showing the acceleration threshold calculated by our adaptive algorithm in red. (d) A close-up of a speed vs time plot for a participant performing Reading task A. It shows eye-speed data from the iPad (blue) and Tobii (black; shifted up by 100 deg/s for clarity). Horizontal green bars indicate fixation intervals, determined by each dataset's unique speed/acceleration threshold (red) and additional processing detailed under ‘Additional processes specific to tasks.’ Note the similarities in speed profiles between iPad and Tobii devices, except for a 31- to 32-second period, where the iPad's failure to capture abrupt speed changes, results in the merging of multiple fixations. (e) Distribution of fixation durations across both devices for the same participant. Some of the longer fixations (>0.5 seconds) captured by the iPad are attributable to a failure to capture abrupt speed change, as discussed in (d). If we were to exclude these longer fixations, the distributions of fixation durations from both devices are similar with mean fixation duration = 167 ms; σ = ∼90 ms for fixations lasting < 0.5 seconds.
Once our algorithm set the first speed/acceleration threshold as a function of the median + 3 MADs, like Nyström et al.’s algorithm it then iteratively calculated the next best threshold as a function of the median + 3 MADs for the dataset below the previous threshold. This was repeated until the threshold converged (Equations 1 and 2). 
\begin{eqnarray}\theta = \tilde x + 3\; \times s \times MAD \left( x \right) \quad \end{eqnarray}
(1)
 
\begin{eqnarray}MAD\; = median(|{x_i}\;-\;\tilde x|), \quad \end{eqnarray}
(2)
where θ is the speed or acceleration threshold, \(\tilde x\) is the median of the input (below the previous threshold), and s is equal to 1.4826 to scale MAD, assuming the underlying distribution is normal. 
In Figure 1d, we assessed the algorithm's performance by comparing the fixations detected by both devices (note that fixation intervals are denoted by horizontal green bars). Notice that the iPad fails to capture sudden speed changes between 31 and 32 seconds, resulting in the merging of multiple fixations. This observation may help to explain why the iPad records a limited number of longer fixations (>0.5 seconds) when compared with the Tobii, as depicted in Figure 1e. Excluding these longer fixations, both distributions exhibit a similar mean duration of 167 ms, with approximately equivalent standard deviations of approximately 90 ms. 
Saccade amplitude thresholding
Note that of the eye-movements classified by the algorithm as saccades, only those with an amplitude greater than or equal to 1.0° were selected for analysis in the fixation and letter saccade tasks. For the reading tasks, only positive saccades (moving right) with amplitude greater than or equal to 0.5° were accepted, in keeping with previous literature (Rayner, 1998). Leftward saccades with an amplitude less than 1.0° were classified as regressions, and greater than 1° as line sweeps (moving to the next line). As for the smooth pursuit task, no saccade amplitude threshold was applied. 
Additional processes specific to tasks
Fixation task
For eye-tracking data collected during display of the fixation marker, only the middle 1 second (of the 3-second sequence) was used to calculate fixation related metrics such as fixation stability. This was done to remove any delayed pro-saccades or corrective saccades made by the participant, which would artificially inflate fixation instability. 
Letter saccade task
Any saccades with a y component exceeding 5.0° were discounted, because these measurements were considered blinks that had not been successfully identified before. The 5° threshold was selected upon visual inspection of the data across participants. 
Reading task
Any saccades with a y component exceeding 2.0°, regressions with a y component exceeding 0.5°, and line sweep with a y component exceeding 3.0° were removed as these were considered blinks that had not been identified before. To establish these thresholds, we visually examined the eye-tracking data across all participants and identified blinks that were misclassified. Our goal was to capture as many of these misclassified blinks as possible while preserving other eye movements. Also only fixations of more than 50 ms were included based on the previous literature (Niefind & Dimigen, 2016). 
Smooth pursuit task
Speed samples belonging to saccades in the smooth pursuit task were removed and replaced via linear interpolation across neighboring non-missing values. This is common practice in smooth pursuit studies, as it ensures the mean speed is not artificially increased because of saccadic movement (Ebisawa, Minamitani, Mori, & Takase, 1988; Murray et al., 2020). 
Metrics
Table 1 breaks down the various eye-tracking metrics we collected for each task, along with current or proposed clinical applications of these metrics. 
Table 1.
 
Eye-tracking metrics calculated from various tasks in this study, and their potential clinical applications. Fixation stability was quantified using the Bivariate contour ellipse area (BCEA; Bellmann et al., 2004; Tarita-Nistor, Gonza´lez, Mandelcorn, Lillakas, & Steinbach, 2009). Saccade amplitude was calculated by taking the Euclidean distance of the first and last point of the saccade. Saccade speed was derived from this amplitude. Specifically, we first calculated the instantaneous speed (the distance travelled between adjacent samples divided by the time elapsed between samples) for every sample. We then categorized speed estimates as “saccade” or “fixation” based on a threshold that was set for each participant. We calculate the average saccade speed by averaging instantaneous speed estimates within a set of samples characterized as a saccade (note that some sequences of speeds that exceed the saccade threshold may have been excluded as outlier; see Methods).
Table 1.
 
Eye-tracking metrics calculated from various tasks in this study, and their potential clinical applications. Fixation stability was quantified using the Bivariate contour ellipse area (BCEA; Bellmann et al., 2004; Tarita-Nistor, Gonza´lez, Mandelcorn, Lillakas, & Steinbach, 2009). Saccade amplitude was calculated by taking the Euclidean distance of the first and last point of the saccade. Saccade speed was derived from this amplitude. Specifically, we first calculated the instantaneous speed (the distance travelled between adjacent samples divided by the time elapsed between samples) for every sample. We then categorized speed estimates as “saccade” or “fixation” based on a threshold that was set for each participant. We calculate the average saccade speed by averaging instantaneous speed estimates within a set of samples characterized as a saccade (note that some sequences of speeds that exceed the saccade threshold may have been excluded as outlier; see Methods).
Results
Fixation task
Eye-tracking measures from the iPad were considerably less accurate than those measured with the Tobii (compare Figures 2a and 2b). On average, the iPad predicted gaze with an absolute error of 3.61°, compared with the Tobii's performance of 0.75° (Figure 2c). These measures align with previous findings that estimated an accuracy of 3.18° (Greinacher & Voigt-Antons, 2020) for the iPad ARKit framework and an accuracy of less than 0.6° (Gibaldi et al., 2017) on an equivalent Tobii eye-tracker. It is important to note that the error between measured gaze and the target position does not solely stem from tracking accuracy; participants’ fixation stability also plays a role. Note that one observer was classed as an outlier based on a high level of inaccuracy in eye-tracking data from the Tobii device (red point in Figure 2c). Outliers fall outside the range of ±3 MAD × 1.428. In this instance, more than 80% of the participant's eye-tracking data collected were invalid as a result of their eyes being absent from the scene visible to the Tobii device. This participant's measures of fixation stability and number of saccades were also detected as outliers and excluded from the correlation analyses. 
Figure 2.
 
Eye-tracking data recorded from the iPad and the Tobii during the fixation task. (a and b) A single participant's eye-tracking data. Large, colored discs indicate the location of fixation markers, with corresponding eye-tracking data (recorded while a given fixation-marker was on-screen) presented in the same color. Note that fixation markers contain two colors; darker and lighter colors denote data collected in the first and second runs, respectively. Note the inaccuracy and imprecision of iPad eye-tracking. (c) Box and whisker plot of individuals’ average mean absolute error (between fixation data and fixation markers) for the iPad and the Tobii (t test; t = 10.57; p < 0.001). Outliers are marked with a thick red outline both here and in parts (d) and (e). (d) Plot of individuals’ average fixation stability (BCEA) for Tobii versus iPad measures. Lines of best fit are derived using linear regression and values of R are based on Spearman's rank correlation. (e) Plot of the individuals’ total number of saccades for the Tobii against that of the iPad, using the same plotting conventions as (d).
Figure 2.
 
Eye-tracking data recorded from the iPad and the Tobii during the fixation task. (a and b) A single participant's eye-tracking data. Large, colored discs indicate the location of fixation markers, with corresponding eye-tracking data (recorded while a given fixation-marker was on-screen) presented in the same color. Note that fixation markers contain two colors; darker and lighter colors denote data collected in the first and second runs, respectively. Note the inaccuracy and imprecision of iPad eye-tracking. (c) Box and whisker plot of individuals’ average mean absolute error (between fixation data and fixation markers) for the iPad and the Tobii (t test; t = 10.57; p < 0.001). Outliers are marked with a thick red outline both here and in parts (d) and (e). (d) Plot of individuals’ average fixation stability (BCEA) for Tobii versus iPad measures. Lines of best fit are derived using linear regression and values of R are based on Spearman's rank correlation. (e) Plot of the individuals’ total number of saccades for the Tobii against that of the iPad, using the same plotting conventions as (d).
We also measured an average retest error of 1.99° across participants on the iPad vs 0.43° on the Tobii device. Retest error was quantified as the mean absolute error between fixation eye-tracking data measured from the first run and from the second run for identical fixation marker locations. Note that retest error could stem from either device noise or behavioral variability (i.e., attentional fluctuation, varying head position, physiological changes such as varying pupil size), or both. However, the lower retest error for the Tobii device suggests that behavioral variability across the two test sessions is low. Thus, the greater error observed in iPad data must predominantly be attributable to device noise. Poor precision is not uncommon for data derived using software-only eye-tracking (Gómez-Poveda & Gaudioso, 2016). Figure 2a shows an example of the translation and shearing to which iPad eye-tracking data were subject. 
Figures 2d and 2e shows that—despite poor accuracy and precision of eye-tracking data measured with the iPad compared with the Tobii—there is a moderate but significant positive correlation between these devices' estimates of both fixation stability and the number of saccades. Note that, in Figure 2d, three participants (including the individual mentioned above) were excluded from correlation analysis, based on their poor estimated fixation stability. Having reviewed the data, all three participants apparently made a large number of saccades while trying to fixate (these three individuals also exhibit large numbers of saccades as highlighted in Figure 2e). We suspect that our algorithm estimated large numbers of saccades for these individuals because they a) blinked more frequently and/or b) were less attentive to the fixation marker. Although our algorithm sought to separate saccades from fixation data, it did not remove saccades smaller than 1° in amplitude. The presence of low-amplitude saccades in our fixation data then may have increased estimated fixation instability. For participants not classified as outliers (Figure 2d, colored discs with no red outline), the average fixation stability (BCEA95%) across participants was 2.36 deg2 and 1.14 deg2, estimated using the iPad and the Tobii, respectively. These values are comparable with the results of a previous study of adolescents presented with a 3-second duration stimulus similar to our own (Pueyo et al., 2020). 
Finally, compared with the ground truth estimates from the Tobii device, our adaptive speed threshold algorithm applied to iPad data predicted saccades with an absolute mean error of 13 ± 18 saccades. 
Letter saccade task
Figures 3a and 3b shows eye-tracking data—from iPad and Tobii devices respectively—for the letter saccade task, where the participants moved their eyes from one side of the screen to the other working their way down the screen as quickly as possible. We used the letter saccade task to understand how well the iPad characterized large amplitude saccades. Previous physiological studies have used saccade characteristics such as peak and average speed and amplitude to signal mental workload (Di Stasi et al., 2010) and to classify mental states (Kardan, Berman, Yourganov, Schmidt, & Henderson, 2015). 
Figure 3.
 
Plots representing the eye-tracking data recorded from the iPad and Tobii devices during performance of the Letter Saccade task. (a and b) Plot of individual participant's eye-tracking data recorded from the devices. The colored discs represent averaged points of fixation, where the shift in color gradient reflects the order of fixation (red, first fixation; blue, last fixation). The lines represent saccadic movement between the fixations and follow the same color convention as the discs. The letters are arranged in the same location as they were in the test for this participant but are not to scale. (ce) Plot of the individuals’ average saccadic amplitude, average saccadic speed, and average peak saccadic speed recorded from the Tobii device against that of the iPad, respectively. The error bars on subplots d and e represent the median absolute deviations (MAD), providing a measure of variability in the data.
Figure 3.
 
Plots representing the eye-tracking data recorded from the iPad and Tobii devices during performance of the Letter Saccade task. (a and b) Plot of individual participant's eye-tracking data recorded from the devices. The colored discs represent averaged points of fixation, where the shift in color gradient reflects the order of fixation (red, first fixation; blue, last fixation). The lines represent saccadic movement between the fixations and follow the same color convention as the discs. The letters are arranged in the same location as they were in the test for this participant but are not to scale. (ce) Plot of the individuals’ average saccadic amplitude, average saccadic speed, and average peak saccadic speed recorded from the Tobii device against that of the iPad, respectively. The error bars on subplots d and e represent the median absolute deviations (MAD), providing a measure of variability in the data.
Despite the iPad's poor estimation of gaze location, the number of saccades measured using the letter saccade task showed a moderate but significant correlation (R = 0.46, p = 0.015) across Tobii and iPad devices. Further, the amplitude of saccades estimated by the two devices was significantly correlated (Figure 3c). However, the iPad data showed substantially more variation in estimated saccade-length across participants (σ = 2.42°) compared to the Tobii (σ = 0.66°) despite a constant 11° distance between the two columns across participants. The iPad also underestimated saccade length (average = 8.56°) likely as a result of limited accuracy at the left and right extremities of the screen. Eye-trackers commonly lose view of the pupil when the eye rotates too far to the right or left (Valliappan et al., 2020). Furthermore, for the iPad data, we noticed that saccades made in the lower-half of the screen had lower average amplitudes (7.97°) than the upper-half (9.18°). We suggest this finding result may originate from eyelids partially covering the pupil when participants were looking toward the bottom one-half of the screen. 
Average saccade speed across participants was correlated significantly (Figure 3d). However, like saccade amplitude, average saccade speed was also underestimated by the iPad (average = 47.03°/s) when compared with the Tobii (average = 72.89°/s) across participants. Peak saccade speed was also correlated across the two devices (Figure 3e). Examining this plot, it is evident that the iPad was unable to measure speeds higher than 400°/s, while the Tobii continued to measure peak speeds up to 600 to 800°/s. We suspect that the sampling rate of the iPad limited how accurately it could measure peak speeds. This finding aligns with an earlier study (Enright, 1998), which found a 10% decrease in measured peak speed for 10° saccades, and a near 20% decrease for 5° saccades when eye-tracking data were collected at 60 Hz. Subsequent research (Mulligan, 2008) has tackled the challenge of velocity measurement with low sampling rates by transforming the measurement of saccade velocity into a measurement of the angle between two IR line segments reflected by the cornea. We find a more pronounced reduction in measured peak speed (45% ± 14%) compared to Enright's findings. We propose that the iPad's tendency to underestimate saccade length, as previously discussed, contributes to this effect. The combination of underestimation and the lower sampling rate of the iPad leads to the observed decrease in measured peak speed. 
Finally, note that in Figure 3e two of the six outlier participants (all denoted with the red outline) exhibited extremely high peak saccade speeds as estimated using the iPad. Our analysis of their eye-tracking data produced poor predictions of their gaze along the horizontal meridian, exceeding the limits of the screen by more than 20°. This discrepancy was observable only when participants lowered their eyes—to view the lower half of the screen—which is consistent with the cause of this problem being occlusion of the iris/pupil by the eyelid. Note that, for one participant with an average peak saccade speed of approximately 1,400°/s, our analysis could only classify three saccades from the iPad eye-tracking data (vs. 15 saccades from the Tobii data). Upon visual inspection, it became clear that the participant had poorly calibrated iPad eye-tracking data. This result shows how prone the iPad is to catastrophic eye-tracking failure when not well calibrated. 
Reading tasks (A vs B)
Reading tasks A and B involved the presentation of paragraphs (similar in language and difficulty to one another) at different text sizes. Unlike the letter saccade task, which were intended to elicit large amplitude saccades, both reading tasks elicited smaller amplitude saccades (average = 1.53° for reading A and average = 2.03° for reading B estimated from Tobii eye-tracking data | average = 2.16° for reading A and average = 2.50° for reading B estimated from the iPad eye-tracking data). As such, the reading tasks measured the efficacy of iPad eye-tracking in detecting small amplitude saccades and their characteristics. Furthermore, reading was of interest as it is a commonly employed eye-tracking task to classify participants with neurodevelopmental disorders such as dyslexia (Rello & Ballesteros, 2015) or ocular diseases such as macular degeneration (Rubin & Feely, 2009). Figures 4a and 4b show examples of eye tracking recorded during the reading task. 
Figure 4.
 
Eye-tracking data recorded from the iPad and the Tobii during Reading task B. (a and b) An individual participant's eye-position: open red discs are points of fixation, red lines are saccades, blue lines are regressions and grey lines are line sweeps. (cf) Plot of the individuals’ total number of regressions, average saccadic amplitude, average saccade speed, and average saccadic peak speed recorded from the Tobii against that of the iPad, respectively. (g) Plot of the individuals’ average saccade speed recorded from the Tobii against the individuals’ average reading speed recorded from the iPad. (h) Plot of the individuals’ average peak saccade speed recorded from the Tobii against the individuals’ average reading speed recorded from the iPad. Plotting conventions are same as Figure 2d.
Figure 4.
 
Eye-tracking data recorded from the iPad and the Tobii during Reading task B. (a and b) An individual participant's eye-position: open red discs are points of fixation, red lines are saccades, blue lines are regressions and grey lines are line sweeps. (cf) Plot of the individuals’ total number of regressions, average saccadic amplitude, average saccade speed, and average saccadic peak speed recorded from the Tobii against that of the iPad, respectively. (g) Plot of the individuals’ average saccade speed recorded from the Tobii against the individuals’ average reading speed recorded from the iPad. (h) Plot of the individuals’ average peak saccade speed recorded from the Tobii against the individuals’ average reading speed recorded from the iPad. Plotting conventions are same as Figure 2d.
For both reading tasks the number of fixations, saccades, and regressions were moderately but significantly correlated between the two devices (Table 2 details the comparison). On average the Tobii detected approximately 100 saccades per passage. This is equivalent to an average saccade jump size of approximately nine characters, which is in line with previous findings (Blais et al., 2009). The iPad on average only detected approximately 50% of the total number of saccades detected by the Tobii for both reading tasks. This was considerably less than for the letter saccade task, where the iPad detected more than 85% of all saccades. This outcome is consistent with iPad eye-tracking being insensitive to smaller ballistic movements; indeed, the saccades recorded by the iPad had a larger average amplitude (average = 2.16) than those of the Tobii (average = 1.53°), as exemplified in the analysis of reading A. 
Table 2.
 
Spearman's rank correlation coefficient and significance levels for number of fixation, number of saccades, and number of regressions measured across the two devices across all participants. Row 1 refers to measures for reading A and row 2 refers to measures for reading B. Note that all measures are moderately but significantly correlated between the two devices for both reading tasks.
Table 2.
 
Spearman's rank correlation coefficient and significance levels for number of fixation, number of saccades, and number of regressions measured across the two devices across all participants. Row 1 refers to measures for reading A and row 2 refers to measures for reading B. Note that all measures are moderately but significantly correlated between the two devices for both reading tasks.
The iPad also only detected approximately 15% to 20% of all regressions detected by the Tobii for both reading tasks (Figure 4c). Regressions on average had an amplitude of approximately 0.60° for both reading tasks (estimate derived from the Tobii recordings). Again, the internal noise and limited sampling rate of the iPad recordings are likely responsible for this insensitivity to smaller regressions. 
Unlike the letter saccade task, saccade amplitude (Figure 4d) did not correlate between the two devices for both reading tasks; this discrepancy is likely attributed to the iPad's decreased sensitivity to smaller eye movements, leading to less reliable measurements. We consider this issue below in the Discussion. 
Average saccade speed (Figure 4e) and peak saccade speed (Figure 4f) also did not significantly correlate between the devices for reading A and for reading B. Additionally, we compared the distribution of saccade speeds measured from each device, for each participant (shown in Appendix A) performing reading task B. To assess the disparities between these distributions, we applied a two-sample Kolmogorov-Smirnov test. Only 8 of the 28 participants exhibited saccade speed distributions that were not statistically significantly different between the two devices. We suspect the iPad was unable to track high-speed, low-amplitude saccades as a result of its limited temporal sampling, and lower spatial resolution. However, in the case of reading, other proxies may be used to estimate relative saccade amplitude and speed. Correlating reading speed (derived from the iPad) with average saccadic speed (Figure 4g) and average peak saccadic speed (Figure 4h) from the Tobii, we note a moderate but significant correlation between measures from the two devices. As a result, reading speed is a proxy that could be used to estimate both relative saccade amplitude and saccade speed across participants. Previous work suggests that slower readers typically make shorter saccades (Rayner, Slattery, & Bélanger, 2010), so their saccades will have a lower speed. 
Smooth pursuit
Smooth pursuit eye movements are frequently used to screen for impaired oculomotor function, especially in the case of concussed patients (Murray et al., 2020) and people with schizophrenia (Thaker et al., 2003).  Figures 5a and 5b compares an individual participant's eye-tracking data from the two devices. As expected, the iPad recording loosely tracked the participant's pursuit path and showed considerable variation in eye position for multiple sweeps of the same stimulus path. The Tobii recording offered more accurate and precise tracking. Across participants, the iPad on average predicted gaze coordinates with an absolute mean error of 2.75° when compared against the stimulus position. The Tobii instead had a tracking error of 0.76° on average across participants. Note that this error is after both devices’ recording had been time shifted to account for any reaction time lag. 
Figure 5.
 
Eye-tracking data from the iPad and the Tobii recorded during smooth pursuit. (a and b) An individual participant's eye-tracking data where the red line is the stimulus path, and the blue discs are estimated eye-position. (c) Box and whisker plot of the average absolute horizontal amplitude recorded from both devices. Each colored disc represents the score of a participant, and the horizontal red line is the average amplitude of the stimulus. (d) Box and whisker plot of the average absolute vertical amplitude recorded from the two devices; conventions are same as subplot c. Note the considerably wider range of amplitudes derived from the eye-tracking recorded from the iPad. (e and f) Box and whisker plot of the average smooth pursuit eye movement horizontal gain, and average smooth pursuit eye movement vertical gain, recorded from the two devices, respectively; conventions are same as subplot c. (g and h) Plot of the individuals’ total number of saccades, and average saccade amplitude recorded from the Tobii against that of the iPad, respectively. For plots g and h, plotting conventions are same as 2d.
Figure 5.
 
Eye-tracking data from the iPad and the Tobii recorded during smooth pursuit. (a and b) An individual participant's eye-tracking data where the red line is the stimulus path, and the blue discs are estimated eye-position. (c) Box and whisker plot of the average absolute horizontal amplitude recorded from both devices. Each colored disc represents the score of a participant, and the horizontal red line is the average amplitude of the stimulus. (d) Box and whisker plot of the average absolute vertical amplitude recorded from the two devices; conventions are same as subplot c. Note the considerably wider range of amplitudes derived from the eye-tracking recorded from the iPad. (e and f) Box and whisker plot of the average smooth pursuit eye movement horizontal gain, and average smooth pursuit eye movement vertical gain, recorded from the two devices, respectively; conventions are same as subplot c. (g and h) Plot of the individuals’ total number of saccades, and average saccade amplitude recorded from the Tobii against that of the iPad, respectively. For plots g and h, plotting conventions are same as 2d.
As noted in Figures 5c and 5d, the mean amplitude (averaged across all participants) was fairly close to the actual amplitude of the stimulus (red line) for both the horizontal and vertical amplitude measures for both devices. However, the iPad showed considerable variation across participants in both the horizontal and vertical amplitude measures. Although the Tobii exhibited a standard deviation of 0.54° in the horizontal amplitude measure across participants, and 0.60° in the vertical amplitude measure, the iPad exhibited a standard deviation of 2.60° in the horizontal amplitude measure and 3.52° in the vertical amplitude measure. The poor eye-tracking capability of the iPad explains the large deviation in amplitude measures across participants. 
As for smooth pursuit eye movement gain, we observed that the iPad, on average, recorded a horizontal gain of 75% across participants, whereas the Tobii registered a gain of close to 90% (Figure 5e). However, the interquartile range for horizontal gain was comparable between the two devices, indicating minimal variation or noise introduced by the iPad. This was not the scenario for vertical gain, where the iPad's interquartile range was more than double that of the Tobii (Figure 5f). In fact, many participants exhibited vertical smooth pursuit eye movement gains exceeding 1 when recorded via the iPad, partially due to our algorithm's failure to accurately identify blinks and other downward saccades. 
Smooth pursuit is the only task where the total number of saccades across participants did not correlate significantly between the two devices (Figure 5g). However, this task on average also elicited the lowest amplitude saccades (0.72° according to estimates derived from the Tobii device). As such, the iPad's internal noise and limited sampling rate would have masked several saccades in a recording. 
However, measures from the iPad showed significant correlation with measures from the Tobii for average saccadic amplitude (Figure 5h). This finding is surprising, considering that estimates of saccade amplitude from the iPad were not significantly correlated with measures made with the Tobii device in the reading tasks (which elicited larger amplitude saccades). 
Discussion
We examined the feasibility of performing eye-tracking using an iPad tablet, a ubiquitous consumer computing device with inbuilt hybrid (software/hardware) eye-tracking capabilities. We report that eye-tracking data from the iPad were considerably less accurate and precise then data from a commercial (Tobii 4c) eye tracker. However, despite a mean absolute error of approximately 2° to 3°for iPad estimates of gaze position (approximately 4-fold worse than the Tobii device), the iPad's estimates of the number of saccades, fixations, and regressive saccades (applicable only to the reading tasks) correlated with the measures derived from the Tobii device for all tasks but smooth pursuit. Although correlation provides valuable insights into the relationship between measurements, it is equally important to consider the average difference between the estimates derived from each device. For instance, in both reading tasks, the iPad only recorded about 50% of the fixations and saccades and about 15% of the regressive saccades detected by the Tobii device. The iPad's estimates of fixations and saccades were comparable to the Tobii device only in the fixation and letter saccade tasks. Consequently, although the measurements correlate, using the iPad to measure saccades and regressive saccades during reading could lead to a significant number of misses and underestimations. 
We also note that saccade amplitude and speed (mean and peak) were significantly correlated between devices for tasks that elicited large amplitude saccades (i.e., fixation stability task and letter saccade task). For reading tasks (where saccade amplitude and speed did not correlate between the two devices), we suggest that reading speed derived from the iPad acted as a robust proxy to correlate to saccade characteristics (R = ∼0.6, p < 0.01). 
Last, for the smooth pursuit task, we noted that the iPad data typically yielded lower horizontal gain than the Tobii device data. Additionally, there was notable variation in the vertical gain estimates derived from the iPad data across different participants. Refer to Table 3 for a summary of our results. 
Table 3.
 
Correlation, significance levels, and ratio of eye movement measures between the iPad and the Tobii across all tasks. N/A, not available. Significance levels (p values) are marked with an asterisk for values <0.05 (indicating statistical significance). The iPad/Tobii Estimate Ratio is calculated by dividing the average of the iPad estimates by the average of the Tobii estimates. A ratio of 1 indicates equal estimation, a ratio of <1 indicates underestimation by the iPad, and a ratio of >1 indicates overestimation by the iPad.
Table 3.
 
Correlation, significance levels, and ratio of eye movement measures between the iPad and the Tobii across all tasks. N/A, not available. Significance levels (p values) are marked with an asterisk for values <0.05 (indicating statistical significance). The iPad/Tobii Estimate Ratio is calculated by dividing the average of the iPad estimates by the average of the Tobii estimates. A ratio of 1 indicates equal estimation, a ratio of <1 indicates underestimation by the iPad, and a ratio of >1 indicates overestimation by the iPad.
We note that our dataset contained outliers (approximately 5%–20% of observers depending on the test) which if not removed would have heavily influenced the correlation metrics. For example, three participants’ average fixation stability measures were removed from the correlation analysis in the fixation stability task; six participants’ peak saccade speed measures were removed in the letter saccade task; and in both reading tasks, data from one or more participants had to be omitted across various eye-movement measures. Typically, these outliers were caused by gaze predictions falling outside the area of interest for a given task (e.g., outside the text-regions for the reading task). Such poor predictions arose from the iPad system's poorer tolerance of eye-occlusion, rapid head movements, and blinking, which the Tobii system could use its infrared tracking to overcome. We propose that monitoring of the percentage of eye-tracking falling outside the area of interest during testing, along with allowing participants to retake the test (if it is deemed necessary), as a practical means of decreasing the number of outliers. 
The presence of outliers underscores the need for researchers to carefully consider the suitability of such devices for use in their research or clinical applications. Outliers have the potential to substantially affect the quality of data collected and to restrict subsequent analysis. We, therefore, recommend conducting pilot tests, in the laboratory, on target populations, comparing the iPad with a dedicated eye-tracking device. This approach can help researchers to establish a) the likely causes of poor eye-tracking performance in their group of interest and (b) decide what appropriate threshold might be used to reject outlier data collected in the field. Although the cost effectiveness and accessibility of iPads and similar consumer-grade devices are attractive, it is imperative that researchers and practitioners understand the associated risks and limitations fully. 
Recall that we found the iPad's measurement of saccade-related metrics, such as amplitude and speed, correlated moderately with those obtained from the Tobii device only in the case of large amplitude saccades. Specifically within tasks like the fixation stability task and letter saccade task. Understanding the iPad's shortcomings in detecting smaller eye movements and their related metrics is complicated by the fact that the exact mechanisms underlying the iPad's eye-tracking capabilities have not been publicly disclosed. It is almost certain however that eye-tracking on the iPad is predominantly reliant on processing of visible light images since it does not operate in the dark. The relatively poor spatial resolution of the iPad's front-facing camera (12 MP) will limit spatial precision, both by reducing the demarcation of the iris–sclera boundary and complicating the task of detecting small eye movements so subtle that can manifest only as distributions of intensity across a handful of pixels. The poor temporal resolution (60 Hz) of the iPad's front-facing camera may inadvertently miss rapid eye movements due to aliasing where saccades shorter than 33 ms are necessarily undersampled. This factor will be particularly noticeable in reading activities that elicit many small saccades (Rayner, 1998). The literature comparing 60 Hz and 120 Hz eye trackers supports this notion by demonstrating reduced saccade detection at the lower rate (Leube & Rifai, 2017). Additionally, even when such movements are detected, longer measurement latency might cause a delay in registering these movements, effectively diminishing the perceived range of eye movements. As well as poor spatial and temporal resolution, if the system suffers from a low signal-to-noise ratio, it may struggle to detect small eye movements from sensor noise. 
The potential influence of head movements on eye-tracking accuracy adds another layer of complexity. Although IR eye trackers maintain close to constant relative positions between the pupil and corneal reflection during small head movements of approximately 2 to 3 cm in any direction (Guestrin & Eizenman, 2006), the iPad must accommodate head movements through additional processing (either using the IR camera or additional processing of the visible light image) which in itself might introduce inaccuracies. Consequently, this could elevate the susceptibility to errors triggered by head movements, as changes in head position could modify the relative positions of features used to track eye movements. This in turn could increase random noise into the signal, further impeding the device's sensitivity to small eye movements. 
When estimating saccade characteristics in tasks that elicit small amplitude saccades (e.g., reading), the use of simple proxy measures (such as reading time) from the iPad is recommended. Although time can be measured without an eye-tracker, the iPad can still offer a lot in the way of ensuring participant compliance. For example, in scenarios where the stimulus size (in visual degrees) must remain constant, the iPad can accurately measure the participant's distance from the device (Nissen et al., 2023). Despite its modest accuracy, the iPad eye-tracking system can also estimate instances when the participant turns completely away from the screen. Additionally, the iPad excels at estimating head position, as demonstrated by its accurate head pose estimation when using Animoji. Measuring head pose could be valuable for identifying cases of improper use, especially in unsupervised home testing. 
As to whether the iPad is a practical means of collecting large quantities of eye-tracking data outside of the laboratory, we need to consider the performance of other current webcam-based eye trackers. Notably, Google's research team (Valliappan et al., 2020), have used software-only eye-tracking to achieve comparable accuracy (0.6°–1° viewing angle) with commercial-grade IR eye-trackers. They further validated their approach by reproducing key findings from previous eye movement research on oculomotor tasks (including prosaccade, smooth pursuit and visual search). Although this study was limited by the 30-Hz frame rate of the front-facing camera, consumer grade webcams in 2023 now offer temporal resolution as high as 90 Hz (Schwarz & Behnke, 2021) and smartphones now often feature slow motion video recording operating at an effective framerate of 240 Hz. We are in the process of comparing the performance of the Google system in real-world conditions to various in-house software based eye-tracking solutions. 
We note that none of the eye-tracking solutions discussed thus far have knowledge about what is on screen. However, if one is concerned less with building a system to objectively move from eye position to gaze position and more with accurately predicting where on the screen the observer is looking, then it makes sense for a gaze prediction system to use all sources of information available including the visual information onscreen. In this vein, saliency-aware eye-tracking (Geisler, Weber, Castner, & Kasneci, 2020) combines fixation maps and saliency maps (Dupont, Ooms, Antrop, & Van Eetvelde, 2016; Rider, Coutrot, Pellicano, Dakin, & Mareschal, 2018) to generate hybrid heatmaps to predict which parts of a scene an observer is extracting information from. In this scenario, even low-quality or noisy eye-tracking (as might be derived from an iPad) could still prove useful. This is particularly true in scenarios where the integration of supplementary information sources could compensate for the lack of precision in the data, but given the manifest limitations of iPads as a tool for characterizing oculomotor behavior, we would highlight the need for proper validation against gold standard eye-tracking. 
Acknowledgments
Financial support: Robert Leitl Trust and School of Optometry & Vision Science, The University of Auckland, Health Research Council of New Zealand (Grant 19/118 to S, Dakin). 
Commercial relationships: none. 
Corresponding author: Aryaman Taore. 
Email: aryaman.taore@auckland.ac.nz. 
Address: University of Auckland, Private Bag 92019, Auckland 1023, New Zealand. 
References
Andersson, R., Larsson, L., Holmqvist, K., Stridh, M., & Nyström, M. (2017). One algorithm to rule them all? An evaluation and discussion of ten eye movement event-detection algorithms. Behavior Research Methods, 49(2), 616–637, https://doi.org/10.3758/s13428-016-0738-9. [CrossRef] [PubMed]
Anisimov, V., Сhernozatonsky, K., Pikunov, A., Raykhrud, M., Revazov, A., Shedenko, K., … Zuev, S. (2021). OkenReader: ML-based classification of the reading patterns using an Apple iPad. Procedia Computer Science, 192, 1944–1953, https://doi.org/10.1016/j.procs.2021.08.200. [CrossRef]
Apple ARKit. Arfaceanchor – information about the pose, topology, and expression of a face detected in a face-tracking ar session. Available from: https://developer.apple.com/documentation/arkit/arfaceanchor.
Bedell, H. E., & Stevenson, S. B. (2013). Eye movement testing in clinical examination. Vision Research, 90, 32–37. [CrossRef] [PubMed]
Behrens, F., MacKeben, M., & Schröder-Preikschat, W. (2010). An improved algorithm for automatic detection of saccades in eye movement data and for calculating saccade parameters. Behavior Research Methods, 42(3), 701–708, https://doi.org/10.3758/BRM.42.3.701. [CrossRef] [PubMed]
Bellmann, C., Feely, M., Crossland, M. D., Kabanarou, S. A., & Rubin, G. S. (2004). Fixation stability using central and pericentral fixation targets in patients with age-related macular degeneration. Ophthalmology, 111(12), 2265–2270, https://doi.org/10.1016/j.ophtha.2004.06.019. [CrossRef] [PubMed]
Biscaldi, M., Gezeck, S., & Stuhr, V. (1998). Poor saccadic control correlates with dyslexia. Neuropsychologia, 36(11), 1189–1202, https://doi.org/10.1016/S0028-3932(97)00170-X. [CrossRef] [PubMed]
Blais, C., Fiset, D., Arguin, M., Jolicoeur, P., Bub, D., & Gosselin, F. (2009). Reading between eye saccades. PLoS One, 4(7), e6448. [CrossRef] [PubMed]
Bodduluri, L., Boon, M. Y., & Dain, S. J. (2017). Evaluation of tablet computers for visual function assessment. Behavior Research Methods, 49(2), 548–558. [CrossRef] [PubMed]
Brand, J., Diamond, S. G., Thomas, N., & Gilbert-Diamond, D. (2021). Evaluating the data quality of the Gazepoint GP3 low-cost eye tracker when used independently by study participants. Behavior Research Methods, 53(4), 1502–1514. [CrossRef] [PubMed]
Cerrolaza, J. J., Villanueva, A., Villanueva, M., & Cabeza, R. (2012). Error characterization and compensation in eye tracking systems. Proceedings of the symposium on eye tracking research and applications. Santa Barbara, California, March 28–30, 2012.
Choe, K. W., Blake, R., & Lee, S.-H. (2016). Pupil size dynamics during fixation impact the accuracy and precision of video-based gaze estimation. Vision Research, 118, 48–59. [CrossRef] [PubMed]
Chung, S. T. L., Kumar, G., Li, R. W., & Levi, D. M. (2015). Characteristics of fixational eye movements in amblyopia: Limitations on fixation stability and acuity? Vision Research, 114, 87–99, https://doi.org/10.1016/j.visres.2015.01.016. [PubMed]
Cooke, L. (2005). Eye tracking: How it works and how it relates to usability. Technical Communication, 52(4), 456–463.
Cornelissen, F. W., Peters, E. M., & Palmer, J. (2002). The Eyelink Toolbox: Eye tracking with MATLAB and the Psychophysics Toolbox. Behavior Research Methods, Instruments, & Computers, 34(4), 613–617.
Cowen, L., Ball, L. J., & Delin, J. (2002). An eye movement analysis of web page usability. In People and Computers XVI-Memorable Yet Invisible (pp. 317–335). Springer.
Dakin, S. C., Mohammadpour Doustkouhi, S., Kersten, H., Turnbull, P. R., Yoon, J., & Danesh-Meyer, H. (2019). Measuring visual field loss in glaucoma using involuntary eye movements. Investigative Ophthalmology & Visual Science, 60(9), 2468–2468. [PubMed]
Dakin, S. C., & Turnbull, P. R. K. (2016). Similar contrast sensitivity functions measured using psychophysics and optokinetic nystagmus. Scientific Reports, 6(1), 1, https://doi.org/10.1038/srep34514. [PubMed]
de Fez, D., Luque, M. J., García-Domene, M. C., Camps, V., & Piñero, D. (2016). Colorimetric characterization of mobile devices for vision applications. Optometry and Vision Science, 93(1), 85–93.
Di Stasi, L. L., Renner, R., Staehr, P., Helmert, J. R., Velichkovsky, B. M., Cañas, J. J., … Pannasch, S. (2010). Saccadic peak velocity sensitivity to variations in mental workload. Aviation Space and Environmental Medicine, 81(4), 413–417, https://doi.org/10.3357/ASEM.2579.2010. [PubMed]
DiCesare, C. A., Kiefer, A. W., Nalepka, P., & Myer, G. D. (2017). Quantification and analysis of saccadic and smooth pursuit eye movements and fixations to detect oculomotor deficits. Behavior Research Methods, 49(1), 258–266, https://doi.org/10.3758/s13428-015-0693-x. [PubMed]
Drewes, J., Masson, G. S., & Montagnini, A. (2012). Shifts in reported gaze position due to changes in pupil size: Ground truth and compensation. Proceedings of the symposium on eye tracking research and applications. Santa Barabara, California, March 28–30, 2012.
Dupont, L., Ooms, K., Antrop, M., & Van Eetvelde, V. (2016). Comparing saliency maps and eye-tracking focus maps: The potential use in visual impact assessment based on landscape photographs. Landscape and Urban Planning, 148, 17–26.
Ebisawa, Y., Minamitani, H., Mori, Y., & Takase, M. (1988). New methods for removing saccades in analysis of smooth pursuit eye movement. Biological Cybernetics, 60(2), 111–119, https://doi.org/10.1007/BF00202898. [PubMed]
Enright, J. (1998). Estimating peak velocity of rapid eye movements from video recordings. Behavior Research Methods, Instruments, & Computers, 30(2), 349–353.
Friedman, L., Rigas, I., Abdulin, E., & Komogortsev, O. V. (2018). A novel evaluation of two related and two independent algorithms for eye movement classification during reading. Behavior Research Methods, 50(4), 1374–1397, https://doi.org/10.3758/s13428-018-1050-7. [PubMed]
Frost-Karlsson, M., Galazka, M. A., Gillberg, C., Gillberg, C., Miniscalco, C., Billstedt, E., … Åsberg Johnels, J. (2019). Social scene perception in autism spectrum disorder: An eye-tracking and pupillometric study. Journal of Clinical and Experimental Neuropsychology, 41(10), 1024–1032, https://doi.org/10.1080/13803395.2019.1646214. [PubMed]
Fukushima, J., Fukushima, K., Morita, N., & Yamashita, I. (1990). Further analysis of the control of voluntary saccadic eye movements in schizophrenic patients. Biological Psychiatry, 28(11), 943–958, https://doi.org/10.1016/0006-3223(90)90060-F. [PubMed]
Geisler, D., Weber, D., Castner, N., & Kasneci, E. (2020). Exploiting the gbvs for saliency aware gaze heatmaps. ACM Symposium on Eye Tracking Research and Applications. ETRA '20 Short Papers: ACM Symposium on Eye Tracking Research and Applications, Article 24.
Gibaldi, A., Vanegas, M., Bex, P. J., & Maiello, G. (2017). Evaluation of the Tobii EyeX Eye tracking controller and Matlab toolkit for research. Behavior Research Methods, 49(3), 923–946. [PubMed]
Gómez-Poveda, J., & Gaudioso, E. (2016). Evaluation of temporal stability of eye tracking algorithms using webcams. Expert Systems With Applications, 64, 69–83, https://doi.org/10.1016/j.eswa.2016.07.029.
Greinacher, R., & Voigt-Antons, J.-N. (2020). Accuracy assessment of ARKit 2 based Gaze estimation. Human-computer interaction. Design and User Experience: Thematic Area, HCI 2020, Held as Part of the 22nd International Conference, HCII 2020, Copenhagen, Denmark, July 19–24, 2020, Proceedings, Part I.
Guestrin, E. D., & Eizenman, M. (2006). General theory of remote gaze estimation using the pupil center and corneal reflections. IEEE Transactions on Biomedical Engineering, 53(6), 1124–1133.
Hessels, R. S., Andersson, R., Hooge, I. T., Nyström, M., & Kemner, C. (2015). Consequences of eye color, positioning, and head movement for eye-tracking data quality in infant research. Infancy, 20(6), 601–633.
Holland, C., Garza, A., Kurtova, E., Cruz, J., & Komogortsev, O. (2013). Usability evaluation of eye tracking on an unmodified common tablet. In CHI'13 Extended Abstracts on Human Factors in Computing Systems (pp. 295–300). Pris, France, April 27–May 2, 2013.
Holland, C., & Komogortsev, O. (2012). Eye tracking on unmodified common tablets: Challenges and solutions. Proceedings of the Symposium on Eye Tracking Research and Applications. Santa Barbara, California, March 28–30, 2012.
Holmqvist, K., Nyström, M., Andersson, R., Dewhurst, R., Jarodzka, H., & Van de Weijer, J. (2011). Eye tracking: A comprehensive guide to methods and measures. Oxford University Press.
Holmqvist, K., Nyström, M., & Mulvey, F. (2012). Eye tracker data quality: What it is and how to measure it. Proceedings of the symposium on eye tracking research and applications. Santa Barbara, California, March 28–30, 2012.
Holzman, P. S. (1974). Eye-tracking dysfunctions in schizophrenic patients and their relatives. Archives of General Psychiatry, 31(2), 143, https://doi.org/10.1001/archpsyc.1974.01760140005001. [PubMed]
Houben, M. M., Goumans, J., & van der Steen, J. (2006). Recording three-dimensional eye movements: Scleral search coils versus video oculography. Investigative Ophthalmology & Visual Science, 47(1), 179–187. [PubMed]
Hunfalvay, M., Murray, N. P., Roberts, C.-M., Tyagi, A., Barclay, K. W., & Carrick, F. R. (2020). Oculomotor behavior as a biomarker for differentiating pediatric patients with mild traumatic brain injury and age matched controls. Frontiers in Behavioral Neuroscience, 14, 581819. [PubMed]
Jacob, R. J. K., & Karn, K. S. (2003) Eye tracking in human-computer interaction and usability research. Mind's Eye, 573–605, https://doi.org/10.1016/B978-044451020-4/50031-1.
Jainta, S., Kapoula, Z., & Miller, J. M. (2011). Dyslexic children are confronted with unstable binocular fixation while reading. PLoS One, 6(4), e18694, https://doi.org/10.1371/journal.pone.0018694. [PubMed]
Kardan, O., Berman, M. G., Yourganov, G., Schmidt, J., & Henderson, J. M. (2015). Classifying mental states from eye movements during scene viewing. Journal of Experimental Psychology: Human Perception and Performance, 41(6), 1502. [PubMed]
Ko, H.-k., Snodderly, D. M., & Poletti, M. (2016). Eye movements between saccades: Measuring ocular drift and tremor. Vision Research, 122, 93–104. [PubMed]
König, S. D., & Buffalo, E. A. (2014). A nonparametric method for detecting fixations and saccades using cluster analysis: Removing the need for arbitrary thresholds. Journal of Neuroscience Methods, 227, 121–131, https://doi.org/10.1016/j.jneumeth.2014.01.032. [PubMed]
Krafka, K., Khosla, A., Kellnhofer, P., Kannan, H., Bhandarkar, S., Matusik, W., … Torralba, A. (2016). Eye tracking for everyone. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vwgas, Nevada. June 27–30, 2016.
Larrazabal, A. J., García Cena, C. E., & Martínez, C. E. (2019). Video-oculography eye tracking towards clinical applications: A review. Computers in Biology and Medicine, 108, 57–66, https://doi.org/10.1016/j.compbiomed.2019.03.025. [PubMed]
Leigh, J. R . (2015). The Neurology of Eye Movements. Oxford University Press.
Leube, A., & Rifai, K. (2017). Sampling rate influences saccade detection in mobile eye tracking of a reading task. Journal of Eye Movement Research, 10(3), 10.16910.
Leys, C., Ley, C., Klein, O., Bernard, P., & Licata, L. (2013). Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median. Journal of Experimental Social Psychology, 49(4), 764–766, https://doi.org/10.1016/j.jesp.2013.03.013.
Mourant, R. R., & Rockwell, T. H. (1970). Mapping eye-movement patterns to the visual scene in driving: An exploratory study. Human Factors, 12(1), 81–87. [PubMed]
Mulligan, J. B. (2008). Measurement of eye velocity using active illumination. Proceedings of the 2008 symposium on Eye tracking research & applications. Savannah, Georgia, March 28–28, 2008.
Mulligan, J. B. (2018). Statistical identification of fixations in noisy eye movement data. Electronic Imaging, 30, 1–7.
Murray, N. G., Szekely, B., Islas, A., Munkasy, B., Gore, R., Berryhill, M., … Reed-Jones, R. J. (2020). Smooth pursuit and saccades after sport-related concussion. Journal of Neurotrauma, 37(2), 340–346, https://doi.org/10.1089/neu.2019.6595. [PubMed]
Nguyen, K., Wagner, C., Koons, D., & Flickner, M. (2002). Differences in the infrared bright pupil response of human eyes. Proceedings of the 2002 Symposium on Eye Tracking Research & Applications. New Orleans, Lousisana, March 25–27, 2002.
Niefind, F., & Dimigen, O. (2016). Dissociating parafoveal preview benefit and parafovea-on-fovea effects during reading: A combined eye tracking and EEG study. Psychophysiology, 53(12), 1784–1798. [PubMed]
Nissen, L., Hübner, J., Klinker, J., Kapsecker, M., Leube, A., Schneckenburger, M., … Jonas, S. M. (2023). Towards preventing gaps in health care systems through smartphone use: Analysis of ARKit for accurate measurement of facial distances in different angles. Sensors, 23(9), 4486.
Nyström, M., Andersson, R., Holmqvist, K., & van de Weijer, J. (2013). The influence of calibration method and eye physiology on eyetracking data quality. Behavior Research Methods, 45(1), 272–288, https://doi.org/10.3758/s13428-012-0247-4. [PubMed]
Nyström, M., & Holmqvist, K. (2010). An adaptive algorithm for fixation, saccade, and glissade detection in eyetracking data. Behavior Research Methods, 42(1), 188–204, https://doi.org/10.3758/BRM.42.1.188. [PubMed]
Papoutsaki, A. (2015). Scalable webcam eye tracking by learning from user interactions. Proceedings of the 33rd Annual ACM Conference Extended Abstracts on Human Factors in Computing Systems. Seoul, Lorea, April 18–23, 2015.
Papoutsaki, A., Laskey, J., & Huang, J. (2017). Searchgazer: Webcam eye tracking for remote studies of web search. Proceedings of the 2017 Conference on Conference Human Information Interaction and Retrieval. Oslo, Norway, March 7–11, 2017.
Perkhofer, L., Lehner, O., Davis, F. D., Riedl, R., vom Brocke, J., Léger, P.-M., … Randolph, A. B. (2019). Using gaze behavior to measure cognitive load. Information Systems and Neuroscience, 29, 73–83, https://doi.org/10.1007/978-3-030-01087-4_9.
Pirozzolo, F. J., & Rayner, K. (2013). Dyslexia: The role of eye movements in developmental reading disabilities. In Neuropsychology of Eye Movement (pp. 77–92). Psychology Press.
Pueyo, V., Castillo, O., Gonzalez, I., Ortin, M., Perez, T., Gutierrez, D., … Masia, B. (2020). Oculomotor deficits in children adopted from Eastern Europe. Acta Paediatrica, 109(7), 1439–1444.
Raynal, J. (2019). With great technology comes great responsibility: Why smartphone users' biometric data needs to be protected. Hofstra Law Review, 48, 179.
Rayner, K. (1998). Eye movements in reading and information processing: 20 years of research. Psychological Bulletin, 124(3), 372–422, https://doi.org/10.1037/0033-2909.124.3.372. [PubMed]
Rayner, K., Slattery, T. J., & Bélanger, N. N. (2010). Eye movements, the perceptual span, and reading speed. Psychonomic Bulletin & Review, 17(6), 834–839. [PubMed]
Rello, L., & Ballesteros, M. (2015). Detecting readers with dyslexia using machine learning with eye tracking measures. Proceedings of the 12th International Web for All Conference. Florence, Italy, May 18–20, 2015.
Rider, A. T., Coutrot, A., Pellicano, E., Dakin, S. C., & Mareschal, I. (2018). Semantic content outweighs low-level saliency in determining children's and adults’ fixation of movies. Journal of Experimental Child Psychology, 166, 293–309. [PubMed]
Robinson, D. A. (1965). The mechanics of human smooth pursuit eye movement. Journal of Physiology, 180(3), 569.
Rubin, G. S., & Feely, M. (2009). The role of eye movements during reading in patients with age-related macular degeneration (AMD). Neuro-Ophthalmology, 33(3), 120–126, https://doi.org/10.1080/01658100902998732.
Schmidt, D., Abel, L. A., Dell'Osso, L. F., & Daroff, R. B. (1979). Saccadic velocity characteristics: Intrinsic variability and fatigue. Aviation Space and Environmental Medicine, 50(4), 393–395. [PubMed]
Schwarz, M., & Behnke, S. (2021). Low-latency immersive 6D televisualization with spherical rendering. 2020 IEEE-RAS 20th International Conference on Humanoid Robots (Humanoids). Munich, Gwermany, July 19–21, 2021.
Shibasaki, H., Tsuji, S., & Kuroiwa, Y. (1979). Oculomotor abnormalities in Parkinson's disease. Archives of Neurology, 36(6), 360–364. [PubMed]
Smith, N. D., Glen, F. C., Mönter, V. M., & Crabb, D. P. (2014). Using eye tracking to assess reading performance in patients with glaucoma: A within-person study. Journal of Ophthalmology, 2014, 1–10, https://doi.org/10.1155/2014/120528.
Snegireva, N., Derman, W., Patricios, J., & Welman, K. E. (2018). Eye tracking technology in sports-related concussion: A systematic review and meta-analysis. Physiological Measurement, 39(12), 12TR01, https://doi.org/10.1088/1361-6579/aaef44. [PubMed]
Sunness, J. S., Applegate, C. A., Haselwood, D., & Rubin, G. S. (1996). Fixation patterns and reading rates in eyes with central scotomas from advanced atrophic age-related macular degeneration and stargardt disease. Ophthalmology, 103(9), 1458–1466, https://doi.org/10.1016/S0161-6420(96)30483-1. [PubMed]
Tao, L., Wang, Q., Liu, D., Wang, J., Zhu, Z., & Feng, L. (2020). Eye tracking metrics to screen and assess cognitive impairment in patients with neurological disorders. Neurological Sciences, 41(7), 1697–1704, https://doi.org/10.1007/s10072-020-04310-y.
Taore, A., Lobo, G., Turnbull, P. R., & Dakin, S. C. (2022). Diagnosis of colour vision deficits using eye movements. Scientific Reports, 12(1), 7734. [PubMed]
Tarita-Nistor, L., Gonza´lez, E. G., Mandelcorn, M. S., Lillakas, L., & Steinbach, M. J. (2009). Fixation stability, fixation location, and visual acuity after successful macular hole surgery. Investigative Ophthalmology & Visual Science, 50(1), 84, https://doi.org/10.1167/iovs.08-2342. [PubMed]
Tatler, B. W., Kirtley, C., Macdonald, R. G., Mitchell, K., & Savage, S. W. (2014). The active eye: Perspectives on eye movement research. Current Trends in Eye Tracking Research (pp. 3–16). Springer.
Technology, T. (2010). Tobii eye tracking: An introduction to eye tracking and Tobii eye trackers. Tobii Technology AP. Stockholm.
Thaker, G. K., Avila, M. T., Hong, E. L., Medoff, D. R., Ross, D. E., & Adami, H. M. (2003). A model of smooth pursuit eye movement deficit associated with the schizophrenia phenotype. Psychophysiology, 40(2), 277–284. [PubMed]
Thomson, D. (2017). Eye tracking and its clinical application in optometry. Optician, 2017(6), 6045–6041.
Trauzettel-Klosinski, S., Dietz, K., & Group, I. S. (2012). Standardized assessment of reading performance: The new international reading speed texts IReST. Investigative Ophthalmology & Visual Science, 53(9), 5452–5461. [PubMed]
UIKit – construct and manage a graphical, event-driven user interface for your iOS, iPadOS, or tvOS app. Available from: https://developer.apple.com/documentation/uikit/views_and_controls.
Uzzaman, S., & Joordens, S. (2011). The eyes know what you are thinking: Eye movements as an objective measure of mind wandering. Consciousness and Cognition, 20(4), 1882–1886, https://doi.org/10.1016/j.concog.2011.09.010. [PubMed]
Valliappan, N., Dai, N., Steinberg, E., He, J., Rogers, K., Ramachandran, V., … Navalpakkam, V. (2020). Accelerating eye movement research via accurate and affordable smartphone eye tracking. Nature Communications, 11(1), 1, https://doi.org/10.1038/s41467-020-18360-5. [PubMed]
Vetturi, D., Tiboni, M., Maternini, G., & Bonera, M. (2020). Use of eye tracking device to evaluate the driver's behaviour and the infrastructures quality in relation to road safety. Transportation Research Procedia, 45, 587–595, https://doi.org/10.1016/j.trpro.2020.03.053.
Wade, N., & Tatler, B. W. (2005). The moving tablet of the eye: The origins of modern eye movement research. Oxford University Press.
Wang, J., Antonenko, P., Celepkolu, M., Jimenez, Y., Fieldman, E., & Fieldman, A. (2019). Exploring relationships between eye tracking and traditional usability testing data. International Journal of Human-Computer Interaction, 35(6), 483–494, https://doi.org/10.1080/10447318.2018.1464776.
Wang, Y., Zhai, G., Zhou, S., Chen, S., Min, X., Gao, Z., … Hu, M. (2018). Eye fatigue assessment using unobtrusive eye tracker. IEEE Access, 6, 55948–55962.
Wedel, M., & Pieters, R. (2008). Eye tracking for visual marketing. Now Publishers Inc.
Wedel, M., & Pieters, R. (2017). A review of eye-tracking research in marketing. Review of Marketing Research, 4, 123–147.
Whittaker, S. G., Budd, J., & Cummings, R. W. (1988). Eccentric fixation with macular scotoma. Investigative Ophthalmology & Visual Science, 29(2), 268–278. [PubMed]
Yu, Y., Wu, Q., Feng, Y., Guo, T., Yang, J., Takahashi, S., … Wu, J. (2018). A central-scotoma simulator based on low-cost eye tracker. 1–6. Changchun, China, August 5–8, 2018, https://doi.org/10.1109/ICMA.2018.8484283.
Zhang, D., Liu, X., Xu, L., Li, Y., Xu, Y., Xia, M., … Chen, T. (2022). Effective differentiation between depressed patients and controls using discriminative eye movement features. Journal of Affective Disorders, 15, 237–243.
Zhang, X., Sugano, Y., Fritz, M., & Bulling, A. (2015). Appearance-based gaze estimation in the wild. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston, Massachusetts, June 7–12, 2015.
Appendix A
Figure A1.
 
Normalized distribution plots depicting saccade speeds measured from both devices for participants performing reading B. The orange bars represent the distribution derived from the iPad eye-tracking data, and the blue bars represent the distribution derived from the Tobii eye-tracking data. These plots facilitate a visual comparison of saccade speed distributions between the two devices.
Figure A1.
 
Normalized distribution plots depicting saccade speeds measured from both devices for participants performing reading B. The orange bars represent the distribution derived from the iPad eye-tracking data, and the blue bars represent the distribution derived from the Tobii eye-tracking data. These plots facilitate a visual comparison of saccade speed distributions between the two devices.
Figure 1.
 
Plots relating the speed and acceleration of eye-tracking data collected from the iPad for a participant performing Reading task A. (a) Plot of eye-speed vs eye acceleration. Red discs show data that exceeded the acceleration and speed threshold (shown as red lines in (b), (c), and (d)) with remaining points (blue discs) being classed as fixations. (b) Plot of speed vs time, showing the speed threshold calculated by our adaptive algorithm in red, and that of Nyström et al.’s in gray. Notice how our algorithm sets the threshold lower to include low speed saccades expected when jumping between words within a sentence, whereas Nyström et al.’s approach only classifies high speed saccades expected when jumping to a new line. (c) Acceleration vs time plot, showing the acceleration threshold calculated by our adaptive algorithm in red. (d) A close-up of a speed vs time plot for a participant performing Reading task A. It shows eye-speed data from the iPad (blue) and Tobii (black; shifted up by 100 deg/s for clarity). Horizontal green bars indicate fixation intervals, determined by each dataset's unique speed/acceleration threshold (red) and additional processing detailed under ‘Additional processes specific to tasks.’ Note the similarities in speed profiles between iPad and Tobii devices, except for a 31- to 32-second period, where the iPad's failure to capture abrupt speed changes, results in the merging of multiple fixations. (e) Distribution of fixation durations across both devices for the same participant. Some of the longer fixations (>0.5 seconds) captured by the iPad are attributable to a failure to capture abrupt speed change, as discussed in (d). If we were to exclude these longer fixations, the distributions of fixation durations from both devices are similar with mean fixation duration = 167 ms; σ = ∼90 ms for fixations lasting < 0.5 seconds.
Figure 1.
 
Plots relating the speed and acceleration of eye-tracking data collected from the iPad for a participant performing Reading task A. (a) Plot of eye-speed vs eye acceleration. Red discs show data that exceeded the acceleration and speed threshold (shown as red lines in (b), (c), and (d)) with remaining points (blue discs) being classed as fixations. (b) Plot of speed vs time, showing the speed threshold calculated by our adaptive algorithm in red, and that of Nyström et al.’s in gray. Notice how our algorithm sets the threshold lower to include low speed saccades expected when jumping between words within a sentence, whereas Nyström et al.’s approach only classifies high speed saccades expected when jumping to a new line. (c) Acceleration vs time plot, showing the acceleration threshold calculated by our adaptive algorithm in red. (d) A close-up of a speed vs time plot for a participant performing Reading task A. It shows eye-speed data from the iPad (blue) and Tobii (black; shifted up by 100 deg/s for clarity). Horizontal green bars indicate fixation intervals, determined by each dataset's unique speed/acceleration threshold (red) and additional processing detailed under ‘Additional processes specific to tasks.’ Note the similarities in speed profiles between iPad and Tobii devices, except for a 31- to 32-second period, where the iPad's failure to capture abrupt speed changes, results in the merging of multiple fixations. (e) Distribution of fixation durations across both devices for the same participant. Some of the longer fixations (>0.5 seconds) captured by the iPad are attributable to a failure to capture abrupt speed change, as discussed in (d). If we were to exclude these longer fixations, the distributions of fixation durations from both devices are similar with mean fixation duration = 167 ms; σ = ∼90 ms for fixations lasting < 0.5 seconds.
Figure 2.
 
Eye-tracking data recorded from the iPad and the Tobii during the fixation task. (a and b) A single participant's eye-tracking data. Large, colored discs indicate the location of fixation markers, with corresponding eye-tracking data (recorded while a given fixation-marker was on-screen) presented in the same color. Note that fixation markers contain two colors; darker and lighter colors denote data collected in the first and second runs, respectively. Note the inaccuracy and imprecision of iPad eye-tracking. (c) Box and whisker plot of individuals’ average mean absolute error (between fixation data and fixation markers) for the iPad and the Tobii (t test; t = 10.57; p < 0.001). Outliers are marked with a thick red outline both here and in parts (d) and (e). (d) Plot of individuals’ average fixation stability (BCEA) for Tobii versus iPad measures. Lines of best fit are derived using linear regression and values of R are based on Spearman's rank correlation. (e) Plot of the individuals’ total number of saccades for the Tobii against that of the iPad, using the same plotting conventions as (d).
Figure 2.
 
Eye-tracking data recorded from the iPad and the Tobii during the fixation task. (a and b) A single participant's eye-tracking data. Large, colored discs indicate the location of fixation markers, with corresponding eye-tracking data (recorded while a given fixation-marker was on-screen) presented in the same color. Note that fixation markers contain two colors; darker and lighter colors denote data collected in the first and second runs, respectively. Note the inaccuracy and imprecision of iPad eye-tracking. (c) Box and whisker plot of individuals’ average mean absolute error (between fixation data and fixation markers) for the iPad and the Tobii (t test; t = 10.57; p < 0.001). Outliers are marked with a thick red outline both here and in parts (d) and (e). (d) Plot of individuals’ average fixation stability (BCEA) for Tobii versus iPad measures. Lines of best fit are derived using linear regression and values of R are based on Spearman's rank correlation. (e) Plot of the individuals’ total number of saccades for the Tobii against that of the iPad, using the same plotting conventions as (d).
Figure 3.
 
Plots representing the eye-tracking data recorded from the iPad and Tobii devices during performance of the Letter Saccade task. (a and b) Plot of individual participant's eye-tracking data recorded from the devices. The colored discs represent averaged points of fixation, where the shift in color gradient reflects the order of fixation (red, first fixation; blue, last fixation). The lines represent saccadic movement between the fixations and follow the same color convention as the discs. The letters are arranged in the same location as they were in the test for this participant but are not to scale. (ce) Plot of the individuals’ average saccadic amplitude, average saccadic speed, and average peak saccadic speed recorded from the Tobii device against that of the iPad, respectively. The error bars on subplots d and e represent the median absolute deviations (MAD), providing a measure of variability in the data.
Figure 3.
 
Plots representing the eye-tracking data recorded from the iPad and Tobii devices during performance of the Letter Saccade task. (a and b) Plot of individual participant's eye-tracking data recorded from the devices. The colored discs represent averaged points of fixation, where the shift in color gradient reflects the order of fixation (red, first fixation; blue, last fixation). The lines represent saccadic movement between the fixations and follow the same color convention as the discs. The letters are arranged in the same location as they were in the test for this participant but are not to scale. (ce) Plot of the individuals’ average saccadic amplitude, average saccadic speed, and average peak saccadic speed recorded from the Tobii device against that of the iPad, respectively. The error bars on subplots d and e represent the median absolute deviations (MAD), providing a measure of variability in the data.
Figure 4.
 
Eye-tracking data recorded from the iPad and the Tobii during Reading task B. (a and b) An individual participant's eye-position: open red discs are points of fixation, red lines are saccades, blue lines are regressions and grey lines are line sweeps. (cf) Plot of the individuals’ total number of regressions, average saccadic amplitude, average saccade speed, and average saccadic peak speed recorded from the Tobii against that of the iPad, respectively. (g) Plot of the individuals’ average saccade speed recorded from the Tobii against the individuals’ average reading speed recorded from the iPad. (h) Plot of the individuals’ average peak saccade speed recorded from the Tobii against the individuals’ average reading speed recorded from the iPad. Plotting conventions are same as Figure 2d.
Figure 4.
 
Eye-tracking data recorded from the iPad and the Tobii during Reading task B. (a and b) An individual participant's eye-position: open red discs are points of fixation, red lines are saccades, blue lines are regressions and grey lines are line sweeps. (cf) Plot of the individuals’ total number of regressions, average saccadic amplitude, average saccade speed, and average saccadic peak speed recorded from the Tobii against that of the iPad, respectively. (g) Plot of the individuals’ average saccade speed recorded from the Tobii against the individuals’ average reading speed recorded from the iPad. (h) Plot of the individuals’ average peak saccade speed recorded from the Tobii against the individuals’ average reading speed recorded from the iPad. Plotting conventions are same as Figure 2d.
Figure 5.
 
Eye-tracking data from the iPad and the Tobii recorded during smooth pursuit. (a and b) An individual participant's eye-tracking data where the red line is the stimulus path, and the blue discs are estimated eye-position. (c) Box and whisker plot of the average absolute horizontal amplitude recorded from both devices. Each colored disc represents the score of a participant, and the horizontal red line is the average amplitude of the stimulus. (d) Box and whisker plot of the average absolute vertical amplitude recorded from the two devices; conventions are same as subplot c. Note the considerably wider range of amplitudes derived from the eye-tracking recorded from the iPad. (e and f) Box and whisker plot of the average smooth pursuit eye movement horizontal gain, and average smooth pursuit eye movement vertical gain, recorded from the two devices, respectively; conventions are same as subplot c. (g and h) Plot of the individuals’ total number of saccades, and average saccade amplitude recorded from the Tobii against that of the iPad, respectively. For plots g and h, plotting conventions are same as 2d.
Figure 5.
 
Eye-tracking data from the iPad and the Tobii recorded during smooth pursuit. (a and b) An individual participant's eye-tracking data where the red line is the stimulus path, and the blue discs are estimated eye-position. (c) Box and whisker plot of the average absolute horizontal amplitude recorded from both devices. Each colored disc represents the score of a participant, and the horizontal red line is the average amplitude of the stimulus. (d) Box and whisker plot of the average absolute vertical amplitude recorded from the two devices; conventions are same as subplot c. Note the considerably wider range of amplitudes derived from the eye-tracking recorded from the iPad. (e and f) Box and whisker plot of the average smooth pursuit eye movement horizontal gain, and average smooth pursuit eye movement vertical gain, recorded from the two devices, respectively; conventions are same as subplot c. (g and h) Plot of the individuals’ total number of saccades, and average saccade amplitude recorded from the Tobii against that of the iPad, respectively. For plots g and h, plotting conventions are same as 2d.
Figure A1.
 
Normalized distribution plots depicting saccade speeds measured from both devices for participants performing reading B. The orange bars represent the distribution derived from the iPad eye-tracking data, and the blue bars represent the distribution derived from the Tobii eye-tracking data. These plots facilitate a visual comparison of saccade speed distributions between the two devices.
Figure A1.
 
Normalized distribution plots depicting saccade speeds measured from both devices for participants performing reading B. The orange bars represent the distribution derived from the iPad eye-tracking data, and the blue bars represent the distribution derived from the Tobii eye-tracking data. These plots facilitate a visual comparison of saccade speed distributions between the two devices.
Table 1.
 
Eye-tracking metrics calculated from various tasks in this study, and their potential clinical applications. Fixation stability was quantified using the Bivariate contour ellipse area (BCEA; Bellmann et al., 2004; Tarita-Nistor, Gonza´lez, Mandelcorn, Lillakas, & Steinbach, 2009). Saccade amplitude was calculated by taking the Euclidean distance of the first and last point of the saccade. Saccade speed was derived from this amplitude. Specifically, we first calculated the instantaneous speed (the distance travelled between adjacent samples divided by the time elapsed between samples) for every sample. We then categorized speed estimates as “saccade” or “fixation” based on a threshold that was set for each participant. We calculate the average saccade speed by averaging instantaneous speed estimates within a set of samples characterized as a saccade (note that some sequences of speeds that exceed the saccade threshold may have been excluded as outlier; see Methods).
Table 1.
 
Eye-tracking metrics calculated from various tasks in this study, and their potential clinical applications. Fixation stability was quantified using the Bivariate contour ellipse area (BCEA; Bellmann et al., 2004; Tarita-Nistor, Gonza´lez, Mandelcorn, Lillakas, & Steinbach, 2009). Saccade amplitude was calculated by taking the Euclidean distance of the first and last point of the saccade. Saccade speed was derived from this amplitude. Specifically, we first calculated the instantaneous speed (the distance travelled between adjacent samples divided by the time elapsed between samples) for every sample. We then categorized speed estimates as “saccade” or “fixation” based on a threshold that was set for each participant. We calculate the average saccade speed by averaging instantaneous speed estimates within a set of samples characterized as a saccade (note that some sequences of speeds that exceed the saccade threshold may have been excluded as outlier; see Methods).
Table 2.
 
Spearman's rank correlation coefficient and significance levels for number of fixation, number of saccades, and number of regressions measured across the two devices across all participants. Row 1 refers to measures for reading A and row 2 refers to measures for reading B. Note that all measures are moderately but significantly correlated between the two devices for both reading tasks.
Table 2.
 
Spearman's rank correlation coefficient and significance levels for number of fixation, number of saccades, and number of regressions measured across the two devices across all participants. Row 1 refers to measures for reading A and row 2 refers to measures for reading B. Note that all measures are moderately but significantly correlated between the two devices for both reading tasks.
Table 3.
 
Correlation, significance levels, and ratio of eye movement measures between the iPad and the Tobii across all tasks. N/A, not available. Significance levels (p values) are marked with an asterisk for values <0.05 (indicating statistical significance). The iPad/Tobii Estimate Ratio is calculated by dividing the average of the iPad estimates by the average of the Tobii estimates. A ratio of 1 indicates equal estimation, a ratio of <1 indicates underestimation by the iPad, and a ratio of >1 indicates overestimation by the iPad.
Table 3.
 
Correlation, significance levels, and ratio of eye movement measures between the iPad and the Tobii across all tasks. N/A, not available. Significance levels (p values) are marked with an asterisk for values <0.05 (indicating statistical significance). The iPad/Tobii Estimate Ratio is calculated by dividing the average of the iPad estimates by the average of the Tobii estimates. A ratio of 1 indicates equal estimation, a ratio of <1 indicates underestimation by the iPad, and a ratio of >1 indicates overestimation by the iPad.
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×