Abstract
Eye tracking is the process of measuring gaze location, which is widely used in behavior research, marketing studies, and assistive technologies. Most eye tracking devices use a light source to illuminate the eye and high-resolution near-infrared cameras to detect the iris and light reflections. The ability to implement eye-tracking using web- and mobile cameras would significantly change the situation. Although some webcam-based solutions (Xu et al., 2015; Cheung & Peng, 2015) have appeared recently, the technology still lacks required accuracy to become widespread. Here we investigate how the processing of temporal information about gaze position can improve the basic performance. Convolutional neural networks (CNNs) show exceptional performance in image processing and can be useful for predicting gaze direction by webcam data. To see if CNN-based solutions could gain from including temporal information about eye movements, we integrate them with Long Short Term Memory networks (LSTMs). As a base CNN model, we use the iTracker CNN (Krafka et al.). We retrained the CNN on our dataset, which preserves temporal information and contains 19 hours of simultaneous recording of a webcam and eye-tracking data of 32 users performing everyday tasks, such as web browsing, video watching, reading, etc. We create multiple LSTM networks, different in size and number of layers and train on 700K of gaze observations (100K are used for testing). Then we compare the performance of the LSTMs to identify the best combination of data preprocessing and the architecture. The results show that the performance can be significantly improved by taking into account the temporal information about gaze position during the prediction process.
Meeting abstract presented at VSS 2018