September 2018
Volume 18, Issue 10
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2018
Including temporal information into prediction of gaze direction by webcam data
Author Affiliations
  • Katerina Malakhova
    Pavlov Institute of Physiology, Laboratory of physiology of vision
  • Evgenii Shelepin
    Pavlov Institute of Physiology, Laboratory of physiology of vision
Journal of Vision September 2018, Vol.18, 1204. doi:10.1167/18.10.1204
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Katerina Malakhova, Evgenii Shelepin; Including temporal information into prediction of gaze direction by webcam data. Journal of Vision 2018;18(10):1204. doi: 10.1167/18.10.1204.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

Eye tracking is the process of measuring gaze location, which is widely used in behavior research, marketing studies, and assistive technologies. Most eye tracking devices use a light source to illuminate the eye and high-resolution near-infrared cameras to detect the iris and light reflections. The ability to implement eye-tracking using web- and mobile cameras would significantly change the situation. Although some webcam-based solutions (Xu et al., 2015; Cheung & Peng, 2015) have appeared recently, the technology still lacks required accuracy to become widespread. Here we investigate how the processing of temporal information about gaze position can improve the basic performance. Convolutional neural networks (CNNs) show exceptional performance in image processing and can be useful for predicting gaze direction by webcam data. To see if CNN-based solutions could gain from including temporal information about eye movements, we integrate them with Long Short Term Memory networks (LSTMs). As a base CNN model, we use the iTracker CNN (Krafka et al.). We retrained the CNN on our dataset, which preserves temporal information and contains 19 hours of simultaneous recording of a webcam and eye-tracking data of 32 users performing everyday tasks, such as web browsing, video watching, reading, etc. We create multiple LSTM networks, different in size and number of layers and train on 700K of gaze observations (100K are used for testing). Then we compare the performance of the LSTMs to identify the best combination of data preprocessing and the architecture. The results show that the performance can be significantly improved by taking into account the temporal information about gaze position during the prediction process.

Meeting abstract presented at VSS 2018


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.