Abstract
Since traditional gaze tracking requires specific hardware, gathering image saliency data is expensive, tedious, and slow. Therefore, existing saliency prediction datasets are orders-of-magnitude smaller than typical datasets for other computer vision recognition tasks. The small size of these datasets limits the potential for training data-hungry saliency prediction algorithms, and may lead to over-fitting in benchmark evaluation. To address this deficiency, we introduce a webcam-based gaze tracking system that supports large-scale, crowd-sourced eye tracking deployed on Amazon Mechanical Turk. By a combination of careful algorithm and gaming protocol design, our system obtains eye-tracking data for saliency prediction with the same accuracy as in-lab webcam-based eye tracking systems, with relatively low cost and little effort on the part of the researchers. Using this tool, we construct the iSUN database with gaze data collected from 227 participants for 20,000 natural images – to our knowledge the largest eye tracking dataset available to date. This large data pool enables us to address the central bias of Internet photos, by choosing a subset of data to produce a uniform sampling as a benchmark. It also enables us to design a simple yet powerful data-driven label transfer algorithm from the nearest neighbor image for saliency prediction. Empirical results show that this simple algorithm can produce good results that are visually close to ground truth, although the current version of the algorithm does not outperform state-of-the-art saliency models. This is because data collection on the iSUN database is still ongoing, so many of the nearest-neighbor images have fixations from only 1 or 2 observers, which does not give a complete, accurate ground-truth saliency map for that image. We will open-source our tool and provide a web server where researchers can upload their own images to get gaze-tracking results from AMTurk.
Meeting abstract presented at VSS 2015