Abstract
In the companion study (C. Ripamonti et al., 2004), we present data that measure the effect of surface slant on perceived lightness. Observers are neither perfectly lightness constant nor luminance matchers, and there is considerable individual variation in performance. This work develops a parametric model that accounts for how each observer’s lightness matches vary as a function of surface slant. The model is derived from consideration of an inverse optics calculation that could achieve constancy. The inverse optics calculation begins with parameters that describe the illumination geometry. If these parameters match those of the physical scene, the calculation achieves constancy. Deviations in the model’s parameters from those of the scene predict deviations from constancy. We used numerical search to fit the model to each observer’s data. The model accounts for the diverse range of results seen in the experimental data in a unified manner, and examination of its parameters allows interpretation of the data that goes beyond what is possible with the raw data alone.
In the companion study (Ripamonti et al.,
2004), we report measurements of how perceived surface lightness varies with surface slant. The data indicate that observers take geometry into account when they judge surface lightness, but that there are large individual differences. This work develops a quantitative model of our data. The model is derived from an analysis of the physics of image formation and of the computations that the visual system would have to perform to achieve lightness constancy. The model allows for failures of lightness constancy by supposing that observers do not perfectly estimate the lighting geometry. Individual variation is accounted for within the model by parameters that describe each observer’s representation of that geometry.
Figure 1 replots experimental data for three observers (HWK, EEP, and FGS) from Ripamonti et al. (
2004). Observers matched the lightness of a standard object to a palette of lightness samples, as a function of the slant of the standard object. The data consist of the normalized relative match reflectance at each slant. If the observer had been perfectly lightness constant, the data would fall along a horizontal line, indicated in the plot by the red dashed line. If the observer were making matches by equating the reflected luminance from the standard and palette sample, the data would fall along the blue dashed curves shown in the figure. The complete data set demonstrates reliable individual differences ranging from luminance matches (e.g., HWK) toward approximations of constancy (e.g., FGS). Most of the observers, though, showed intermediate performance (e.g., EEP).
Given that observers are neither perfectly lightness constant nor luminance matchers, our goal is to develop a parametric model that can account for how each observer’s matches vary as a function of slant. Establishing such a model offers several advantages. First, individual variability may be interpreted in terms of variation in model parameters, rather than in terms of the raw data. Second, once a parametric model is established, one can study how variations in the scene affect the model parameters (cf., Krantz,
1968; Brainard & Wandell,
1992). Ultimately, the goal is to develop a theory that allows prediction of lightness matches across a wide range of scene geometries.
A number of broad approaches have been used to guide the formulation of quantitative models of context effects. Helmholtz (
1896) suggested that perception should be conceived of as a constructed representation of physical reality, with the goal of the construction being to produce stable representations of object properties. The modern instantiation of this idea is often referred to as the computational approach to understanding vision (Marr,
1982; Landy & Movshon,
1991). Under this view, perception is difficult because multiple scene configurations can lead to the same retinal image. In the case of lightness constancy, the ambiguity arises because illuminant intensity and surface reflectance can trade off to leave the intensity of reflected light unchanged.
Because the retinal image is ambiguous, what we see depends not only on the scene but also on the rules the visual system employs to interpret the image. Various authors choose to formulate the these rules in different ways, with some focusing on constraints imposed by known mechanisms (e.g., Stiles,
1967; Cornsweet,
1970) and others on constraints imposed by the statistical structure of the environment (e.g., Gregory,
1968; Marr,
1982; Landy & Movshon,
1991; Wandell,
1995; Geisler & Kersten,
2002; Purves & Lotto,
2003).
In previous work, we have elaborated
equivalent illuminant models of observer performance for tasks where surface mode or surface color was judged (Speigle & Brainard,
1996; Brainard, Brunt, & Speigle,
1997; see also Brainard, Wandell, & Chichilnisky,
1993; Maloney & Yang,
2001; Boyaci, Maloney, & Hersh,
2003). In such models, the observer is assumed to be correctly performing a constancy computation, with the one exception that their estimate of the illuminant deviates from the actual illuminant. The parameterization of the observer’s illuminant estimate determines the range of performance that may be explained, with the detailed calculation then following from an analysis of the physics of image formation. Here we present an equivalent illuminant model for how perceived lightness varies with surface slant. Our model is essentially identical to that formulated recently by Boyaci et al. (
2003).
Our model is derived from consideration of an inverse optics calculation that could achieve constancy. The inverse optics calculation begins with parameters that describe the illumination geometry. If these parameters match those of the physical scene, the calculation achieves constancy. Deviations in the model’s parameters from those of the scene predict deviations from constancy. In the next sections we describe the physical model of illumination and how this model can be incorporated into an inverse optics calculation to achieve constancy. We then show how the formal development leads to a parametric model of observer performance.
Consider a Lambertian flat matte standard object
1 that is illuminated by a point
2 directional light source. The standard object is oriented at a slant
θ_{N} with respect to a reference axis (
x-axis in
Figure 2). The light source is located at a distance
d from the standard surface. The light source azimuth is indicated by
θ_{D} and the light source declination (with respect to the
z-axis) by
ϕ_{D}.
The luminance
of the light reflected from the standard surface
i depends on its surface reflectance
r_{i}, its slant
θ_{N}, and the intensity of the incident light
_{E}:
When the light arrives only directly from the source, we can write
where
Here
I_{D} represents the luminous intensity of the light source.
Equation 3 applies when
. For a purely directional source and
outside of this range,
E_{D} = 0.
In real scenes, light from a source arrives both directly and after reflection off other objects. For this reason, the incident light
E can be described more accurately as a compound quantity made of the contribution of directional light
E_{D} and some diffuse light
E_{A}. The term
E_{A} provides an approximate description of the light reflected off other objects in the scene. We rewrite
Equation 2 as
and
Equation 1 becomes
The luminance of the standard surface
reaches its maximum value when
and its minimum when
. In the latter case only the ambient light
E_{A} illuminates the standard surface.
It is useful to simplify
Equation 5 by factoring out a multiplicative scale factor
α that is independent of
θ_{N}:
In this expression,
is given by
How well does the physical model describe the illumination in our apparatus? We measured the luminance of our standard objects under all experimental slants, and averaged these over standard object reflectance.
Figure 3 (solid circles) shows the resulting luminances from each experiment of the companion work (Ripamonti et al.,
2004) plotted versus the standard object slant. For each experiment, the measurements are normalized to a value of 1 at
. We denote the normalized luminances by
. The solid curves in
Figure 3 denote the best fit of
Equation 6 to the measurements, where
θ_{D},
F_{A} and
θ were treated as a free parameters and chosen to minimize the mean squared error between model predictions and measured normalized luminances.
The fitting procedure returns two estimated parameters of interest: the azimuth
θ_{D} of the light source and the amount
F_{A} of ambient illumination. (The scalar
α simply normalizes the predictions in accordance with the normalization of the measurements.) We can represent these parameters in a polar plot, as shown in
Figure 4. The azimuthal position of the plotted points represents
θ_{D}, while the radius
v at which the points are plotted is a function of
F_{A}:
If the light incident on the standard is entirely directional, then the radius of the plotted point will be 1. In the case where the light incident is entirely ambient, the radius will be 0.
The physical model provides a good fit to the dependence of the measured luminances on standard object slant. It should be noted, however, that the recovered azimuth of the directional light source differs from our direct measurement of this azimuth. The most likely source of this discrepancy is that the ambient light arising from reflections off the chamber walls has some directional dependence. This dependence is absorbed into the model’s estimate of θ_{D}.
Suppose an observer has full knowledge of the illumination and scene geometry and wishes to estimate the reflectance of the standard surface from its luminance. From
Equation 6 we obtain the estimate
We use a tilde to denote perceptual analogs of physical quantities.
To the extent that the physical model accurately predicts the luminance of the reflected light,
Equation 8 predicts that the observer’s estimates of reflectance will be correct and thus
Equation 8 predicts lightness constancy. To elaborate
Equation 8 into a parametric model that allows failures of constancy, we replace the parameters that describe the illuminant with perceptual estimates of these parameters:
where
and
are perceptual analogs of
θ_{D} and
F_{A}. Note that the dependence of
on slant in
Equation 9 is independent of
r_{i}.
Equation 9 predicts an observer’s reflectance estimates as a function of surface slant, given the parameters
and
of the observer’s
equivalent illuminant. These parameters describe the illuminant configuration that the observer uses in his or her inverse optics computation.
Our data analysis procedure aggregates observer matches over standard object reflectance to produce relative normalized matches
. The relative normalized matches describe the overall dependence of observer matches on slant. To link
Equation 8 with the data, we assume that the normalized relative matches obtained in our experiment (see “Appendix” of Ripamonti et al.,
2004) are proportional to the computed
, leading to the model prediction
where
β is a constant of proportionality that is determined as part of the model fitting procedure. In
Equation 10 we have substituted
for
because the contribution of surface reflectance
r_{i} can be absorbed into
β.
Equation 10 provides a parametric description of how our measurements of perceived lightness should depend on slant. By fitting the model to the measured data, we can evaluate how well the model is able to describe performance, and whether it can capture the individual differences we observe. In fitting the model, the two parameters of interest are
and
, while the parameter
β simply accounts for the normalization of the data.
In generating the model predictions, values for
θ_{N} and
are taken as veridical physical values. It would be possible to develop a model where these were also treated as perceptual quantities and thus fit to the data. Without constraints on how
and
are related to their physical counterparts, however, allowing these as parameters would lead to excessive degrees of freedom in the model. In our slant matching experiment, observer’s perception of slant was close to veridical and thus using the physical values of
θ_{N} seems justified. We do not have independent measurements of how the visual system registers luminance.
For each observer, we used numerical search to fit the model to the data. The search procedure found the equivalent illuminant parameters
(light source azimuth) and
(relative ambient) as well as the overall scaling parameter
β that provided the best fit to the data. The best fit was determined as follows. For each of the three sessions
k = 1,2,3 we found the normalized relative matches for that session,
. We then found the parameters that minimized the mean squared error between the model’s prediction and these
. The reason for computing the individual session matches and fitting to these, rather than fitting directly to the aggregate
, is that the former procedure allows us to compare the model’s fit to that obtained by fitting the session data at each slant to its own mean.
Model fit results are illustrated in the left hand columns of
Figures 5 to
10. The dot symbols are observers’ normalized relative matches and the orange curve in each panel shows the best fit of our model. We also show the predictions for luminance and constancy matches as, respectively, a blue or red dashed line. The right hand columns of
Figures 5 to
10 show the model’s
and
for each observer, using the same polar format introduced in
Figure 4.
With only a few exceptions, the equivalent illuminant model captures the wide range of performance exhibited by individual observers in our experiment. To evaluate the quality of the fit, we can compare the mean squared error for the equivalent illuminant
model to the variability in the data. To make this comparison, we also fit the
at each session and slant by their own means. For each observer, the resulting mean squared error
is a lower bound on the mean squared error that could be obtained by any model. A figure of merit for the equivalent illuminant model is then quantity
This quantity should be near unity if the model fits well, and values greater than unity indicate fit error in units yoked to the precision of the data. Across all our observers and light source positions, the mean value of
η_{equiv} was 1.23, indicating a good but not perfect fit.
For comparison, we also computed η values associated with four other models. These are
The various models evaluated above have different numbers of parameters. For this reason, it is worth asking whether the equivalent illuminant model performs better simply because it overfits the data. Answering this question is difficult. Selection amongst non-nested and/or non-linear models remains a topic of active investigation (see the following special issue on model selection: Journal of Mathematical Psychology, 2000, 44) and the literature does not yet provide a recipe. Here we adopt a cross-validation approach.
Our measurements consist of the
measured in three sessions. We selected the data from each possible pair of two sessions and used the result to fit each model. Then for each model and session pair we evaluated how well the model fit the session data that had been excluded from the fitting procedure, using the same
η metric described above. The intuition is that a model that overfits the data should generalize poorly and have high cross-validation
η values, while a model that captures structure in the data should generalize well and have low cross-validation
η values.
The light bars in
Figure 11 show the cross-validation
η values we obtained. The equivalent illuminant model continues to perform best. Note that the cross-validation
η value obtained when the data for each session is predicted from the mean of the other two sessions (labeled “Precision”) is higher than that obtained for the equivalent illuminant model. This difference is statistically significant (sign test,
p < .005).
Although the equivalent illuminant model provides the best fit among those we examined, it does not account for all of the systematic structure in the data. ANOVAs conducted on the model residuals indicated that these depend on surface slant in a statistically significant manner for several of our conditions (Experiment 1, p = .14; Experiment 2, p = .14; Experiment 3 Left Neutral, p < .005; Experiment 3 Right Neutral, p < .005; Experiment 3 Left Paint, p < .1; Experiment 3 Right Paint, p < .005). The systematic nature of the residuals was more salient for all four of the comparison models (p < .001 for all models/conditions) than for the equivalent illuminant model.
The equivalent illuminant allows interpretation of the large individual differences observed in our experiments. In the context of the model, these differences are revealed as variation in the equivalent illuminant model parameters
and
, rather than as a qualitative difference in the manner in which observers perform the matching task. In the polar plots we see that for each condition, the equivalent illuminant model parameters lie roughly between the origin and the corresponding physical illuminant parameters. Observers whose data resemble luminance matching have parameters that plot close to the origin, while those whose data resemble constancy matching have parameters that plot close to those of the physical illuminant. This pattern in the data reflects the fact that observers’ performance lies between that of luminance matching and lightness constancy. The fact that many observers have illuminant parameters that differ from the corresponding physical values could be interpreted as an indication of the computational difficulty of estimating light source position and relative ambient from image data.
Various patterns in the raw data shown by many observers, particularly the sharp drop in match for θ_{N} = 60° when the light is on the left and the non-monotonic nature of the matches with increasing slant, require no special explanation in the context of the equivalent illuminant model. Both of these patterns are predicted by the model for reasonable values of the parameters. Indeed, striking to us was the richness of the model’s predictions for relatively small changes in parameter values.
A question of interest in Experiment 3 was whether observers are sensitive to the actual position of the light source. Comparison of
across changes in the light source position indicates that they are. The average value of
when the light source was on the left in Experiment 3 was −35°, compared to 16° when it was on the right. The shift in equivalent illuminant azimuth of 51° is comparable to the corresponding shift in the physical model parameter (55°).
In the companion study, we developed a constancy index based on comparing the fit error for luminance matching and constancy. Such indices provide a summary of what the data imply about lightness constancy. At the same time, any given constancy index is of necessity somewhat arbitrary. It is therefore of interest to derive a model-based constancy index and compare it with the error-based index.
Let the vector
be a function of the physical model’s parameters
θ_{D} and
F_{A}, with the scalar
v computed from
F_{A} using
Equation 7 above. Let the vector
be the analogous vector computed from the equivalent illuminant model parameters
and
. Then we define the model based constancy index as
This index takes on a value of 1 when the equivalent illuminant model parameters match the physical model parameters and a value near 0 when the equivalent illuminant model parameter
is very large. This latter case corresponds to where the model predicts luminance matching.
We have computed this
CI_{m} for each observer/condition, and the resulting values are indicated on the top left of each polar plot in
Figures 5–
10. The model based constancy index ranges from 0.23 to 0.91, with a mean of 0.57, a median of 0.57. These values are larger than those obtained with the error based index (mean/median 0.40).
Figure 12 shows a scatter plot of the two indices, which are correlated at
r = 0.73. The discrepancy between the two indices provides a sense of the precision with which they should be interpreted. Given the computational difficulty of recovering lighting geometry from images, we regard the average degree of constancy shown by the observers (∼0.40 – ∼0.57) as a fairly impressive achievement. The large individual variability in performance remains clear in
Figure 12.
The equivalent illuminant model has two parameters,
and
, that describe the lighting geometry. These parameters are not, however, set by measurements of the physical lighting geometry but are fit to each observer’s data. Given the equivalent illuminant parameters, the model predicts the lightness matches through an inverse optics calculation.
It is tempting to associate the parameters
and
with observers’ consciously accessible estimates of the illumination geometry. Because our experiments do not explicitly measure this aspect of perception, we have no empirical basis for making the association. In interpreting the parameters as observer estimates of the illuminant, it is important to bear in mind that they are derived from surface lightness matching data, and thus, at present, should be treated as illuminant estimates only in the context of our model of surface lightness. It is possible that a future explicit comparison could tighten the link between the derived parameters and conscious perception of the illuminant. Prior attempts to make such links between implicit and explicit illumination perception, however, have not led to positive results (see e.g., Rutherford & Brainard,
2002).
Independent of the connection between model parameters and explicitly judged illumination properties, equivalent illuminant models are valuable to the extent (a) that the provide a parsimonious account of rich data sets and (b) that their parameters can be predicted by computational algorithms that estimate illuminant properties (e.g., Brainard, Kraft, & Longère,
2003.; Brainard et al.,
2004). As computational algorithms for estimating illumination geometry become available, our hope is that these may be used in conjunction with the type equivalent illuminant model presented here to predict perceived surface lightness directly from the image data.
This work was supported National Institutes of Health Grant EY 10016. We thank B. Backus, H. Boyaci, L. Maloney, R. Murray, J. Nachmias, and S. Sternberg for helpful discussions.
Commercial relationships: none.
Corresponding author: David Brainard.
Address: Department of Psychology, University of Pennsylvania, Suite 302C, 3401Walnut Street, Philadelphia, PA 19104.
A Lambertian surface is a uniformly diffusing surface with constant luminance regardless of the direction from which it is viewed.
A light source whose distance from the illuminated object is at least 5 times its main dimension is considered to be a good approximation of a point light source (Kaufman & Christensen,
1972).