Abstract
Researchers have proposed computational models of visual saliency, which have been assessed against observers' saccadic eye movements during free viewing (Parkhurst, Law, & Niebur, 2002; Itti & Koch, 2001; Peters et al., 2005; Schumann et al., 2008) and visual search (Itti & Koch, 2000; Torralba et al., 2006). Surprisingly no studies have evaluated saliency models against explicit observer judgments of visual saliency. Here, we compare saliency model predictions (Walther & Koch, 2006) to saliency judgments of one hundred observers. Methods: We used 800 natural images displaying common indoor and outdoor scenes with varying numbers of objects. One hundred human observers were instructed to view each image and click on the most salient location of each image. We evaluated the saliency model by quantifying the proportion of human-reported most salient locations falling within circular regions (with radii varying between 21.06 and 107.44 pixels) containing the top five salient model predictions. As a baseline condition, we evaluated the agreement between the model and one hundred randomly generated saliency selections. Results: Across all observers and images the proportions of explicit saliency reports within the top five model predictions were: 0.102, 0.068, 0.057, 0.047, 0.044 (0.31 ± 0.21 across all top five saliency predictions). In contrast, the agreement of model predictions with the randomly generated saliency selections was 0.08 ± 0.03. Thus, our findings show a modest agreement between model saliency predictions and human explicit judgments, but significantly larger (t = 13.86, p < 0.001) than expected by chance. The methods presented can serve as a protocol for evaluation of other saliency models (Bruce & Tsotsos, 2006; Zhang et al., 2008) and comparisons of models' ability to predict explicit saliency judgments vs. saccadic eye movements.