Different criteria to quantify saliency exist. Itti and Baldi (
2006) hypothesized that the information-theoretical concept of spatio-temporal surprise is central to saliency. Raj, Geisler, Frazor, and Bovik (
2005) derived an entropy minimization algorithm to select fixations. Seo and Milanfer (
2009) computed saliency using a “self-resemblance” measure, in which each pixel of the saliency map indicates the statistical likelihood of saliency of a feature matrix given its surrounding feature matrices. Bruce and Tsotsos (
2009) presented a model based on “self-information” after Independent Component Analysis (ICA) decomposition (Hyvarinen & Oja,
2000) that is in line with the sparseness of the response of cortical cells to visual input (Field,
1994). Wang, Wang, Huang, and Gao (
2010) defined the Site Entropy Rate as a saliency measure, also after ICA decomposition. In most of the saliency models, features are predefined. Some commonly used features include contrast (Reinagel & Zador,
1999), edge content (Baddeley & Tatler,
2006), intensity bispectra (Krieger, Rentschler, Hauske, Schill, & Zetzsche,
2000), color (Jost, Ouerhani, von Wartburg, Müri, & Hügli,
2005), and symmetry (Privitera & Stark,
2000), as well as more semantic ones such as faces and text (Cerf et al.,
2009). On the other hand, various inference algorithms were designed for saliency estimation. For example, Avraham and Lindenbaum (
2009) used a stochastic model to estimate the probability that an image part is of interest. In Harel, Koch, and Perona (
2007), an activation map within each feature channel was generated based on graph computations. In Carbone and Pirri (
2010), a Bernouli mixture model is proposed to capture context dependency.