Abstract
Automatic saliency algorithms have become useful in identifying important areas in photographic images. We compare two popular algorithms with scanpaths of human eye movements on the same images. Eye movements are sampled at 1kHz with a Bouis infrared high-resolution eye tracker calibrated on a 5×5 point grid. We estimate the average time spent looking at each pixel by convolving with a gaussian filter that spreads the contribution of each measurement over an area matched to the accuracy of the measurement. The saliency algorithms are the ‘Itti’ algorithm (Itti, Koch & Niebur, 1998) and the Graph Based Visual Saliency (GBVS) algorithm (Harel, Koch & Perona, 2006). Both find salient areas in several steps by normalizing an image and finding regions of maximum contrast between adjacent regions. By brightening regions where eye fixations are longest and most frequent we can compare the performance of the automatic algorithms with human exploratory behavior. We find that the algorithms and the human data from three observers correlate well for many images, but for others there are significant discrepancies. Specifically, the automatic methods sometimes pick up incidental background texture that humans ignore. Also, there are individual differences in the human scanpaths. Both algorithms perform about equally well.
Supported by LANL ISSDM and NSF #CCF-0746690.