Vision provides information about the properties and identity of objects. The ease with which we perceive object properties belies the difficulty of the underlying information-processing task. In the case of object color, retinal information about object reflectance is confounded with information about the illumination as well as about the object's shape and pose. There is no obvious rule that allows transformation of the retinal image to a color representation that depends primarily on object surface reflectance. Under many circumstances, however, object color appearance is remarkably stable across scenes in which the object is viewed. Here, we review a line of experiments and theory that aim to understand how the visual system stabilizes object color appearance. Our emphasis is on models derived from explicit analysis of the computational problem of estimating the physical properties of illuminants and surfaces from the retinal image, and experiments that test these models. We argue that this approach has considerable promise for allowing generalization from simplified laboratory experiments to richer scenes that more closely approximate natural viewing. We discuss the relation between the work we review and other theoretical approaches available in the literature.

*invariant*to object-extrinsic variation is a fundamental task of visual information processing (DiCarlo & Cox, 2007; Rust & Stocker, 2010; Ullman, 1989).

*flat–matte–diffuse*conditions, and specify the computational problem of recovering descriptions of illuminant and surface properties from photoreceptor responses for these scenes. Although such recovery cannot be performed with complete accuracy, we show how the structure of the computational analysis leads to a class of candidate models for human performance. We call these

*equivalent illumination models*(EIMs). We show that such models provide a compact description of how changing the spectrum of the illumination affects perceived color appearance for human observers.

*S*(

*λ*), which specifies the fraction of incident illumination reflected at each wavelength in the visible spectrum. The spatially diffuse illuminant is characterized by its spectral power distribution,

*E*(

*λ*), which specifies the illuminant power at each wavelength. The spectrum of the light reflected from the surface,

*C*(

*λ*), is given as the wavelength-by-wavelength product:

*C*(

*λ*) is proportional to the light that reaches the human retina and is called the

*color signal*(Buchsbaum, 1980). For our purposes, we can set the constant of proportionality to 1. The color signal is encoded by the excitations of three classes of cone photoreceptors present in a trichromatic human retina. These are the L, M, and S cones and we denote the excitation of cones in each class by the symbol

*ρ*

_{ k },

*k*= 1, 2, 3, where the subscript

*k*indexes cone class. The excitation of a cone is computed from the color signal as

*R*

_{ k }(

*λ*) is the spectral sensitivity of the

*k*

^{th}cone class. For a typical trichromatic human observer,

*k*ranges from one to three. The

*cone excitation vector ρ*= (

*ρ*

_{1},

*ρ*

_{2},

*ρ*

_{3}) provides the information about surface reflectance and illumination available at one retinal location, corresponding to one surface patch in the flat–matte–diffuse environment. We will use superscripts to distinguish cone excitation vectors corresponding to different retinal locations (or to surfaces at different locations in the scene). For flat–matte–diffuse conditions, the cone excitation vectors {

*ρ*

^{1}, …,

*ρ*

^{ n }} across all retinal locations carry the information available to the visual system about illuminant and surface properties.

*σ*and for illuminants,

*ɛ*. We refer to these as the

*surface*and

*illuminant coordinates*. Any choice of surface coordinates

*σ*specifies a particular surface reflectance function,

*S*

_{ σ }(

*λ*), within the constrained class of reflectance spectra, and any choice of illuminant coordinates

*ɛ*specifies a particular illuminant spectral power distribution,

*E*

_{ ɛ }(

*λ*), within the constrained class of illuminant spectral power distributions.

*rendering equation*for a single surface and illuminant, since it converts a description of physical

*scene parameters,*here

*ɛ*and

*σ,*into the information available to the visual system, the cone excitation vector

*ρ*.

^{1}The key point preserved by the abstract notation of Equation 3 is that the initial retinal information available to the visual system depends both on the illuminant and on the surface.

*σ,*can produce different retinal excitations

*ρ*as the illumination, parameterized by

*ɛ,*varies. Indeed, for realistic constraints on natural spectra there are typically many combinations of

*ɛ*and

*σ*that produce the same retinal excitations

*ρ*. Starting with only photoreceptor excitations

*ρ*

^{ j }at a set of locations (indexed by

*j*) across the retina, estimating surface reflectance at each location in the scene under conditions where the scene illumination is unknown is an underdetermined problem. At the same time, Equation 3 satisfies an important constraint that we refer to as

*surface–illuminant duality*.

*ɛ*and surface

*σ*we can compute the information available to the visual system

*ρ*. However, for typical choices of surface and illuminant coordinate systems, we can do more. The following two properties emerge from analyses of the type of low-dimensional linear model surface–illuminant coordinate systems used in many computational analyses of color constancy and are valid for flat–matte–diffuse conditions (D'Zmura & Iverson, 1993a, 1993b; Maloney, 1984). Such coordinate systems provide accurate approximations to natural surfaces and daylights (Cohen, 1964; DiCarlo & Wandell, 2000; Jaaskelainen et al., 1990; Judd et al., 1964; Maloney, 1986).

*Surface–Illuminant Duality Property 1:*Given the illumination coordinates

*ɛ*and retinal information

*ρ*

^{ j }corresponding to any illuminated surface, we can solve for that surface's coordinates

*σ*

^{ j }.

*Surface–Illuminant Duality Property 2:*If we know the coordinates of a sufficient number of surfaces {

*σ*

^{ j };

*j*= 1, …,

*n*} and the corresponding retinal information {

*ρ*

^{ j };

*j*= 1, …,

*n*}, then we can solve for

*ɛ*. The number of surfaces needed depends on the complexity of the lighting model.

*n*= 1). In our consideration of a wider range of possible scenes later in the paper, information about more than one surface is required to determine the illuminant coordinates from the retinal information and surface coordinates.

*A priori,*computational estimates of the coordinates of illuminants and surfaces need have little or nothing to do with human color vision. After all, our subjective experience of color is not in the form of spectral functions or coordinate vectors. Rather, we associate a percept of color appearance with object surfaces. This percept is often described in terms of its “hue,” “colorfulness,” and “lightness,” and these terms do not immediately connect to the constructs used in computations. A natural link between perception of color and computational estimation arises, however, when we consider stability of representation across scene changes.

*color constant*.

*scene color constancy,*to distinguish it from the more general case where both the illumination and surrounding surfaces change. Gilchrist (2006) refers to it as

*illumination-independent color constancy*and we follow his terminology here. A visual system can be illumination-independent color constant without being color constant in general.

*vice versa*. This observation suggests an approach to modeling object surface perception, which we call the

*equivalent illumination model*(

*EIM*) approach (Brainard et al., 1997; Brainard & Wandell, 1991; Brainard, Wandell, & Chichilnisky, 1993; Speigle & Brainard, 1996). The idea is illustrated in Figure 3. The EIM approach supposes that visual processing proceeds in the same general two-stage fashion as many computational surface estimation algorithms. First, the visual system uses the cone excitation vectors from all surfaces in the scene to form an estimate of the illuminant coordinates,

*equivalent illuminant*. Second, the parameters

*j*can be thought of as a function of an implicit estimate of object surface coordinates,

^{ j }. These, in turn, are generated by processing the color signal in a manner that depends on the equivalent illuminant.

^{ j }will also deviate from their physical counterparts.

*n*surface patches whose true surface coordinates are

*σ*

^{1},

*σ*

^{2}, …,

*σ*

^{ n }. The true illuminant coordinates are

*ɛ*but the visual system's estimate is

^{1},

^{2}, …,

^{ n }and some or all of these estimates may be in error. However, the possible patterns of error that can occur are highly constrained. So although a two-stage algorithm can grossly misestimate surface coordinates, the resulting errors are patterned: knowledge of any one of them determines all the others as well as the misestimate of the illuminant. This constraint allows us to develop experimental tests of two-stage algorithms as models of human color vision.

*a** and

*b** coordinates of the color signal reaching the eye from the reference and matching test surfaces.

^{2}The open black circles plot the light reflected from the reference surfaces under the reference illuminant. When the illuminant is changed from reference to test, the light reflected from these surfaces changes. If the visual system made no adjustment in response to the change in the illuminant, then matches would have the same cone excitation vectors as the corresponding reference surfaces, and the data would lie near the open black circles. These can be considered the predictions for a visual system with no color constancy.

*a priori*guarantee that this EIM recipe will prove successful for conditions other than flat–matte–diffuse. We may discover that human performance is not consistent with any choice of equivalent illuminant, correct or misestimated. Nonetheless, the fact that it has been possible to elaborate Steps I, II, and III above into a promising model for flat–matte–diffuse conditions motivates asking whether the same recipe can be extended to richer viewing conditions. We turn to this question below and show that an affirmative answer is possible with respect to Steps I and II. Whether similar success will be possible for Step III awaits future research.

*ɛ*

_{p}. The diffuse source is a non-directional ambient illuminant with intensity

*ɛ*

_{d}. Because we are working only with achromatic lights, the illuminant coordinates are scalars. The theoretical framework developed here is easily extended to scenes in which illuminants are colored and illuminant coordinates are three-dimensional (Boyaci, Doerschner, & Maloney, 2004).

*θ*. The light reflected from an achromatic matte surface is then proportional to

*σ*

_{a}denotes the fraction of light flux reflected from the surface. Since we are working with achromatic surfaces,

*σ*

_{a}(the surface

*albedo*) and

*ρ*are scalars. The restriction on

*θ*simply guarantees that the light source is on the side of the surface being viewed.

^{3}

*θ*is determined by the orientation of the surface and the direction to the collimated source. We can specify the orientation of the surface at any point by the azimuth and elevation

^{4}

*D*

_{S}= (

*ψ*

_{S},

*φ*

_{S}) of a line perpendicular to the surface at that point, the surface normal. Similarly, we can specify the direction to the collimated light source by azimuth and elevation

*D*

_{E}= (

*ψ*

_{E},

*φ*

_{E}). We can compute

*θ*for any choice of surface orientation

*D*

_{S}and direction to the collimated source

*D*

_{E}by standard trigonometric identities (Gerhard & Maloney, 2010):

*θ*=

*θ*(

*D*

_{E},

*D*

_{S}).

*k*is a constant that depends on the intensities of the two illumination sources but not on

*D*

_{E}or

*D*

_{S}, and

*π*is a measure of the intensity of the diffuse source relative to the collimated source and is again independent of

*D*

_{E}and

*D*

_{S}. We refer to

*π*as

*diffuseness*. We drop the explicit specification of the restriction on the range of

*θ*in Equation 5 and following, but it is still in force.

*ρ*) as a function of illuminant and surface coordinates. In Equation 6, the illuminant coordinates

*ɛ*are

*π, D*

_{E}= (

*ψ*

_{E},

*φ*

_{E}) while the surface coordinates

*σ*are

*σ*

_{a},

*D*

_{S}= (

*ψ*

_{S},

*φ*

_{S}).

*π,*(

*ψ*

_{E},

*φ*

_{E}) did not vary across the variations in test surface orientation. Ripamonti et al. examined human performance with the collimated illuminant at either of two azimuths and a single elevation. In both experiments, only the azimuth of the test surface was varied; the parameter

*φ*

_{S}was held constant. These restrictions are to allow us to further simplify the illuminant–surface coordinates by folding the effect of

*φ*

_{E}and

*φ*

_{S}into the normalizing constant

*k*of Equation 5 and subsequently neglecting them. The theoretical goal of the experiments was to determine (a) whether observers' lightness matches were systematically affected by changes in surface azimuth and (b) whether an equivalent illumination model could describe performance, and if so, what were the coordinates

_{E}of the equivalent illuminant for each set of experimental conditions.

_{E}leads to a prediction of how the (normalized) equivalent surface albedo estimate,

_{a}, should vary as a function of surface azimuth

*ψ*

_{S}. The green solid curve shown in Figure 7A shows this dependence for the actual illumination parameters

*π, ψ*

_{E}used in the experiments of Boyaci et al. This plot represents predictions for the case where the luminance reaching the observer from the test surface is held fixed across the changes of azimuth, again as was done in the experiments of Boyaci et al. In addition, because of the normalization procedure used in the data analysis, the units of albedo are arbitrary and here have been set so that their minimum is one. We refer to the plot as a

*matching function*. If the observer's estimates of albedo were based on a correct physical model with correct estimates of the illuminant, then his or her normalized matches would fall along this particular matching function.

_{E}that differ from the value of the physical illumination, with

*ψ*

_{S}=

_{E}, that is when the surface azimuth agrees with the azimuth of the direction estimated for the collimated light source. This change is readily interpretable: a higher albedo is required to predict a constant reflected luminance as the surface normal is rotated away from the direction of the collimated source.

_{E}and

*ψ*

_{S}and elevation

*φ*

_{S}of a matte test surface and for two locations of the collimated source. Observers adjusted the test surface at each orientation until it appeared achromatic. Their data allowed recovery of equivalent illuminant parameters

_{E}, and

_{E}. Two settings determined the full shape of each EIM prediction, and the fact that all the settings fell near a single matching function provided evidence that observers were discounting illumination for a scene much like the physical one but with respect to estimates of the illuminant coordinates that deviated from those of the scene illuminant.

*fail*to be constant, for the constraints that appear in their data, and for individual differences in the degree of constancy failure (see Gilchrist et al., 1999, for a general discussion of the diagnosticity of accounting for failures of constancy).

*normative models*of performance (Geisler, 1989).

^{5}If biological performance approximates normative in a particular task, the experimentalist can attempt to develop a

*descriptive model*based on the normative model. There is no

*a priori*guarantee that a normative model will lead to an accurate descriptive model. The premise that it might do so is also a research gambit.

*a** direction corresponds to reddish-greenish perceptual variation, while variation in the

*b** direction corresponds to bluish-yellowish variation. There is also an

*L** coordinate that roughly corresponds to perceptual variation in lightness. In computing CIELAB coordinates, we omitted its model of retinal adaptation by fixing the reference white used in the transformation. Thus, in our use of the space here, the CIELAB coordinates are in a one-to-one invertible relation to cone excitation vectors.