Following the approach by
Robilotto et al. (2002), we calculated various contrast metrics for our stimuli to test which metric could quantitatively account for the empirical scales.
We evaluated six contrast metrics compiled by
Moulden et al. (1990) and employed by
Robilotto et al. (2002) (see Discussion). In the following definitions,
\(l_i\) are the luminance values in the region of transparency and
\(\bar{l}\) their arithmetic mean;
\(n\) is the number of luminance values, which in our stimuli was 13.
The following metrics are space averages, that is, the calculation is done for all possible nonequal combinations of luminances values (\(\forall i \ne j\)).
In addition to these six metrics, we also included
\(\alpha _c\) is based on Michelson contrast and hence depends only on the minimum and maximum luminance in the regions of transparency and plain view. The other metrics capture the entire distribution of luminance values in transparency. Note, however, that all metrics assume a preceding segmentation of the input.
Figure 5 shows the contrast metrics evaluated for our stimuli. The plots are analogous in design to those for the perceptual scales (
Figure 4) in order to facilitate the comparison. By design, RMS contrast was constant across the mean luminance of the transparent medium. This is inconsistent with the obtained scales and hence no adequate model of perceived transparency. All other metrics produce a similar pattern of results consistent with a trade-off between physical transmittance (
\(\alpha\)) and luminance
\(t\). One group of models, including
\(RMS_{norm}\),
\(\alpha _c\),
\(SDLG\),
\(SAM\), and
\(SAW\), predicts an interaction effect between
\(\alpha\) and
\(t\) (
Figure 5). The remaining two metrics,
\(SAMLG\) and
\(SAWLG\), which compute the logarithm of contrast values, predict constant differences of perceived transparency as a function of
\(t\) for different
\(\alpha\)s. The
\(SAMLG\) and
\(SAWLG\) predictions are closer to the empirical scales.
To quantitatively test the goodness of fit for each of the tested metrics, we rescaled the prediction from each metric to each observer’s individual response range. Different observers have different scale maxima, because they reflect each person’s decision noise (the maximum value of an MLCM scale is inversely related to the amount of decision noise; see
Figure 4 for interobserver variability in scale maxima).
We applied a linear transformation to rescale the prediction of each contrast metric to each observer’s individual range. Minimizing the sum of squared errors (SSE), we used a single factor and intercept to rescale the four perceptual scales for different
\(\alpha\)s. We did this separately for each contrast metric, observer, and experiment. Linear transformations are valid, because MLCM provides interval scales that do not change their numerical representation by linear transformations (
Krantz et al., 1971;
Gescheider, 1997).
\(SAMLG\) was one of the best-fitting metrics, and its predictions are shown as solid lines in
Figure 4.
Table 1 summarizes the goodness-of-fit evaluations for all contrast metrics.
\(SAMLG\) and
\(SAWLG\) produced the lowest sum of squared errors for all observers and in both experiments. The average Pearson’s correlation coefficient between scales and metric prediction across observers was highest for
\(SAMLG\) and
\(SAWLG\) (
\(r=0.99\)). Despite these relatively high average correlation coefficients, there was some variability across observers. The best fits with
\(SAMLG\) were observed for O3 (
\(r=0.99\), Experiment 1) and O6 (
\(r=0.99\), Experiment 2), while the worse fit with
\(SAMLG\) was observed for O7 in both experiments (
\(r=0.97\)). The pattern was similar for
\(SAWLG\).