As we alluded to in
Overview of the proposed approach section, saliency at a pixel
x i is measured using the conditional density of the feature matrix at that position:
S i =
p(
F∣
y i = 1). Hence, the task at hand is to estimate
p(
F∣
y i = 1) over
i = 1,…,
M. In general, the Parzen density estimator is a simple and generally accurate non-parametric density estimation method (Silverman,
1986). However, in higher dimensions and with an expected long-tail distribution, the Parzen density estimator with an isotropic kernel is not the most appropriate tool (Bengio, Larochelle, & Vincent,
2005; Brox, Rosenhahn, & Cremers,
2007; Vincent & Bengio,
2003). As explained earlier, the LSK features tend to generically come from long-tailed distributions, and as such, there are generally no tight clusters in the feature space. When we estimate a probability density at a particular feature point, for instance
Fi = [
fi1,…,
fiL] (where
Lis the number of vectorized LSKs (
f's) employed in the feature matrix), the isotropic kernel centered on that feature point will spread its density mass equally along all the feature space directions, thus giving too much emphasis to irrelevant regions of space and too little along the manifold. Earlier studies (Bengio et al.,
2005; Brox et al.,
2007; Vincent & Bengio,
2003) also pointed out this problem. This motivates us to use
a locally data-adaptive kernel density estimator. We define the conditional probability density
p(
F∣
yi = 1) at
xi as a center value of a normalized adaptive kernel (weight function)
G(·) computed in the center + surround region as follows:
Inspired by earlier works such as Fu and Huang (
2008), Fu, Yan, and Huang (
2008), Ma, Lao, Takikawa, and Kawade (
2007) and Seo and Milanfar (
2009a) that have shown the effectiveness of correlation-based similarity, the kernel function
Gi in
Equation 13 can be defined by using the concept of matrix cosine similarity (Seo & Milanfar,
2009a) as follows:
where
i =
[
fi1,…
fiL] and
j =
[
fj1,…
fjL], ∥ · ∥
F is the Frobenius norm, and
σ is a parameter (This parameter is set to 0.07 and fixed for all the experiments.) controlling the fall-off of weights. Here,
ρ(
Fi,
Fj) is the “Matrix Cosine Similarity (MCS)” between two feature matrices
Fi,
Fj and is defined as the “Frobenius inner product” between two normalized matrices (
ρ(
Fi,
Fj) = 〈
Fi,
Fj〉
F = trace (
) ∈ [−1, 1].) This matrix cosine similarity can be rewritten as a weighted sum of the vector cosine similarities (Fu & Huang,
2008; Fu et al.,
2008; Ma et al.,
2007)
ρ(
fi,
fj) between each pair of corresponding feature vectors (i.e., columns) in
Fi,
Fj as follows:
The weights are represented as the product of
and
which indicate the relative importance of each feature in the feature sets
Fi,
Fj. This measure not only generalizes the cosine similarity, but also overcomes the disadvantages of the conventional Euclidean distance which is sensitive to outliers (This measure can be efficiently computed by column-stacking the matrices
Fi,
Fj and simply computing the cosine similarity between two long column vectors.) By inserting
Equation 14 into
Equation 13,
Si can be rewritten as follows: