There are several similarity measures to quantitatively evaluate the performance of saliency models. These measures include the Receiver Operating Characteristics (ROC; Green & Swets,
1966), the Normalized Scanpath Saliency (NSS; Peters et al.,
2005), correlation-based measures (Jost, Ouerhani, von Wartburg, Mäuri, & Häugli,
2005; Rajashekar, van der Linde, Bovik, & Cormack,
2008), the least square index (Henderson, Brockmole, Castelhano, & Mack,
2007; Mannan, Ruddock, & Wooding,
1997), and the “string-edit” distance (Brandt & Stark,
1997; Choi, Mosley, & Stark,
1995; Hacisalihzade, Allen, & Stark,
1992). Among them, ROC is the most popular method and most widely used in the community. The inherent limitation of ROC, however, is that it only depends on the ordering of the fixations (ordinality) and does not capture the metric amplitude differences. In practice, as long as the hit rates are high, the area under the ROC curve (AUC) is always high regardless of the false alarm rate (
Figure 5). Therefore, an ROC analysis, while very useful, is by itself insufficient to describe the deviation of predicted fixation patterns from the actual fixation map. To conduct a more comprehensive evaluation, we also employ the NSS (Peters et al.,
2005) and the Earth Mover's Distance (EMD; Rubner, Tomasi, & Guibas,
2000) that measure the real difference rather than only ordering of the values. By definition, NSS (Peters et al.,
2005) evaluates salience values at fixated locations. It works by first linearly normalizing the saliency map to have zero mean and unit standard deviation. Next, it extracts from each point corresponding to the fixation locations along a subject's scanpath its computed saliency and averages these values to compute the NSS that is compared against the saliency distribution of the entire image (which is, by definition, zero mean). The NSS is the average distance between the fixation saliency and zero. A larger NSS implies a greater correspondence between fixation locations and the saliency predictions. A value of
zero indicates no such correspondence. Unlike the NSS that focuses on the saliency values of the scanpath, EMD (Rubner et al.,
2000) captures the global discrepancy of two distributions. Intuitively, given two distributions, EMD measures the least amount of work needed to move one distribution to map onto the other one. It is computed through linear programming and accommodates distribution alignments well. A larger EMD indicates a larger overall discrepancy between the two distributions.