September 2015
Volume 15, Issue 12
Vision Sciences Society Annual Meeting Abstract  |   September 2015
What can we learn from eye tracking data on 20,000 images?
Author Affiliations
  • Jianxiong Xiao
    Princeton University
  • Pingmei Xu
    Princeton University
  • Yinda Zhang
    Princeton University
  • Krista Ehinger
    Harvard Medical School Brigham & Women's Hospital
  • Adam Finkelstein
    Princeton University
  • Sanjeev Kulkarni
    Princeton University
Journal of Vision September 2015, Vol.15, 790. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Jianxiong Xiao, Pingmei Xu, Yinda Zhang, Krista Ehinger, Adam Finkelstein, Sanjeev Kulkarni; What can we learn from eye tracking data on 20,000 images?. Journal of Vision 2015;15(12):790. doi:

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

Since traditional gaze tracking requires specific hardware, gathering image saliency data is expensive, tedious, and slow. Therefore, existing saliency prediction datasets are orders-of-magnitude smaller than typical datasets for other computer vision recognition tasks. The small size of these datasets limits the potential for training data-hungry saliency prediction algorithms, and may lead to over-fitting in benchmark evaluation. To address this deficiency, we introduce a webcam-based gaze tracking system that supports large-scale, crowd-sourced eye tracking deployed on Amazon Mechanical Turk. By a combination of careful algorithm and gaming protocol design, our system obtains eye-tracking data for saliency prediction with the same accuracy as in-lab webcam-based eye tracking systems, with relatively low cost and little effort on the part of the researchers. Using this tool, we construct the iSUN database with gaze data collected from 227 participants for 20,000 natural images – to our knowledge the largest eye tracking dataset available to date. This large data pool enables us to address the central bias of Internet photos, by choosing a subset of data to produce a uniform sampling as a benchmark. It also enables us to design a simple yet powerful data-driven label transfer algorithm from the nearest neighbor image for saliency prediction. Empirical results show that this simple algorithm can produce good results that are visually close to ground truth, although the current version of the algorithm does not outperform state-of-the-art saliency models. This is because data collection on the iSUN database is still ongoing, so many of the nearest-neighbor images have fixations from only 1 or 2 observers, which does not give a complete, accurate ground-truth saliency map for that image. We will open-source our tool and provide a web server where researchers can upload their own images to get gaze-tracking results from AMTurk.

Meeting abstract presented at VSS 2015


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.