The term colour constancy describes the extent to which the colour of an object appears unchanging despite changes in the spectral composition of the light reflected from that object to the eye (Helson, Judd, & Warren,
1952; Land & McCann,
1971; Land,
1983; Brainard,
1998; Foster,
2003). In the present paper we consider colour constancy under a change in illuminant from sunlight to skylight, although in general the light reflected to the eye from a particular object can change for a number of reasons (e.g. occlusion or filtering of one or multiple light sources, or other changes in the geometry of the scene). With environmental reflectance spectra, the “colour conversion” (Helson,
1938) between two illuminant conditions has a simple form when expressed in terms of cone-coordinates. Our experiments were aimed towards identifying the receptoral and post-receptoral neural processes that undo this colour conversion and “transform” (Helson,
1938) the perceived colours of objects under a test illuminant towards the colours of objects under a reference illuminant.
To quantify colour constancy, we assessed changes in colour appearance under different illuminants. Our stimulus displays consisted of a square test patch presented on a variegated background of randomly oriented elliptical patches. Examples of these displays are given in
Figure 1A & B. Each patch was assigned a reflectance spectrum and rendered under a particular illuminant. Reflectance spectra were chosen from measurements of natural and man-made objects, and we used the spectra of direct sunlight and of zenith skylight as illuminants. The observer’s task was to classify the appearance of sequentially presented test-patches as either red or green in one set of trials, and as either yellow or blue in a second set (Chichilnisky & Wandell,
1999). We thus obtained a locus of test-patches that appeared neither red nor green, and a second locus that appeared neither yellow nor blue. If we assume that colour boundaries measured under different conditions describe a set of stimuli that generate equivalent signals at the decision stage, then shifts in the locations of colour boundaries provide a measure of the neural transformations performed under different conditions of observing.
In a series of experiments we performed critical tests of whether these neural transformations depend on information that is distributed over space, or on information that is spatially localized but distributed over time. In addition, we ask, are judgements of colour appearance under different conditions well predicted by differences in early adaptation, or do they reflect higher-level perceptual mechanisms?
In order to identify the neural transformations required for colour constancy we must first consider the nature of the colour conversion due to changes in the illuminant. For sets of everyday objects, and natural and man-made illuminants, when the L- (or M-, or S-) cone-coordinate for each object under one illuminant is plotted against the L- (or M-, or S-) cone-coordinate for that object under a different illuminant, the points fall close to a straight line through the origin (Dannemiller,
1993; Foster & Nascimento,
1994; Zaidi, Spehar, & DeBonet,
1997). For the object reflectances used in this study, such plots are shown in the left-hand panels of
Figure 2. Within each cone class, the effect of a change in the spectrum of the illuminant is to scale the cone-coordinate by approximately the same multiplicative constant for each object. Cone-excitation ranks across a set of objects are thus approximately invariant under an illuminant change. The Macleod-Boynton (
1979) chromaticity axes (L/(L+M), S/(L+M) provide a good representation of the post-receptoral colour signals that are transmitted to the cortex (Derrington, Krauskopf, & Lennie,
1984). Zaidi et al. (
1997) showed that when the effects of changes in illuminant spectrum are transformed to Macleod-Boynton coordinates, the L/(L+M) chromaticities are shifted by an additive constant, whereas the S/(L+M) chromaticities are shifted by a multiplicative constant (see right-hand panels of
Figure 2). Nascimento & Foster (
1997) showed that multiplicative scaling of cone-signals provides a compelling cue to observers trying to distinguish between illuminant and reflectance changes in scenes, even when such scaling corresponds to highly unlikely natural events.
Identifying the type of transformation required to undo a colour conversion is the first stage in specifying a model of colour constancy. Determining how the parameters of the transformation might be set by the image is the second (e.g. Stiles,
1961; Brainard,
2004). Any complete model of colour constancy must additionally include a third component that specifies where in our perceptual apparatus these transformations are implemented. The highly systematic nature of colour conversions under a change in illuminant implies that colour constancy could be supported by simple neural mechanisms that could in principle range from automatic to volitional and from peripheral to central. The present study is aimed at elucidating the second and third components of a model of colour constancy.
Von Kries (
1878,
1905) suggested that the invariance of colour metamers to adaptation level, might be due to multiplicative gain control at the photoreceptor level, and that these gains are set independently within each class of photoreceptor in inverse proportion to the local stimulation. Ives (
1912) may have been the first to suggest an explicit mechanism for constancy under an illuminant change. He showed that the multiplicative factors that transform the
illuminant’s cone-coordinates to those of an equal energy illuminant, also transform the cone-coordinates of
surfaces to approximately their cone-coordinates under the equal-energy illuminant. The left-hand panels of
Figure 2 help to illustrate why this simple transform will work. The illuminant (indicated by a black cross within a red circle) plots at the extreme end of the line of reflectances. Multiplying each cone-coordinate by the ratio of the illuminant cone-coordinates will transform most cone-coordinates to the unit diagonal, thus equating neural signals under the two illuminants. Mathematically, the Ives transform consists of multiplying all cone-coordinates by the same diagonal matrix and has been widely analyzed in the computer vision literature where it is misnamed the Von Kries transform. Von Kries’ original transform multiplies each local cone-coordinate by a scalar depending only on its
local magnitude, and thus shifts all colours towards a neutral colour (Vimal, Pokorny, & Smith,
1987; Webster,
1996) rather than achieving the required transformation to an equal energy illuminant.
The Ives transformation relies on the visual system’s ability to estimate the cone-coordinates of the illuminant. Since the illuminant itself is often not in the field of view, its cone-coordinates have to be estimated from the visual scene. The most common suggestion for the estimate involves taking the mean cone-coordinates of the scene (Buchsbaum,
1980) under the assumption that the mean surface reflectance is likely to have uniform spectral reflectance (the “grey-world” hypothesis). This assumption is unlikely to be true for most scenes (Brown,
1994; Brown & MacLeod,
1997; Webster & Mollon,
1997; Webster, Malkoc, Bilson, & Webster,
2002), so Golz & MacLeod (
2002) have suggested that luminance-chromaticity correlations may provide estimates that are less influenced by the set of reflectances available. Tominaga, Ebisui & Wandell (
2001) argue that it is better to use just the brightest objects to make the illuminant estimate, since darker surfaces in the scene contribute more noise than signal to the estimate. Specular highlights are the extreme example of bright objects, and several authors have suggested using these to derive the illuminant estimate (D’Zmura & Lennie,
1986; Lee,
1986; Lehmann & Palm,
2001; Yang & Maloney,
2001).
A neural mechanism that integrated over a large spatial area could in principle extract the mean chromaticity. If the outputs of local subunits of such a mechanism were subjected to accelerating nonlinearities before integration, then this mechanism would estimate the illuminant by weighting scene chromaticities as an increasing function of their brightness. Psychophysical measurements, however, indicate that early adaptation mechanisms are extremely local in their spatial properties (MacLeod, Williams, & Makous,
1992; MacLeod & He,
1993; He & MacLeod,
1998). Local mechanisms could estimate the illuminant from an extended scene by using eye movements to convert spatial variations into temporal variations (D’Zmura & Lennie,
1986; Fairchild & Lennie,
1992).
Early adaptation is not the only neural transformation that could use estimated illuminant cone-coordinates. Later perceptual mechanisms could use these estimates to adjust for colour conversions (Adelson & Pentland,
1996), without losing information about the illuminant colour (Zaidi,
1998). Such mechanisms are particularly salient when the geometrical properties of the scene promote colour scission, i.e. separation of the colours of the scene into material colours and the colours of illuminants or transparencies (Hagedorn & D’Zmura,
2000). Khang & Zaidi (
2002) showed that observers were able to identify like versus unlike filters across illuminants based on the similarity between colour-shifts of backgrounds and the colour-shifts of tests.
A different class of transformation mechanism involves the concept of “level of reference” or “anchoring” (Rogers,
1941; Helson,
1947). Thomas & Jones (
1962) showed that matches to a reference colour were biased by the distribution of possible matching colours. In its extreme form, if perceived colours in a scene were determined entirely by rank-orders of cone-coordinates, good colour constancy would be the result because, as shown in
Figure 2, colour conversions do not disturb rank-orders of cone-coordinates. This mechanism would not need an estimate for the illuminant but would, like adaptation to the mean, lead to inconstancy if the set of available materials changed.
In this study we have tried to distinguish between different types of neural transformation and the ways in which they are driven by properties of the scene. Our observers were not asked to make inferences about objects in the world. They were simply asked to judge the appearance of a test-patch displayed in the centre of a variegated image. These images were constructed by rendering a set of materials (reflectance spectra) under a particular illuminant. In the first experiment, we determined boundaries between colour categories as a function of the illuminant. Under prolonged adaptation to a single illuminant, observers demonstrated a high degree of phenomenological (appearance-based) colour constancy. The chromaticity that elicited the percept of neither red nor green (or neither yellow nor blue) was substantially different for the two illuminant-conditions, while the classification of materials was largely unaffected.
In the first experiment, the set of object reflectances was balanced so that the mean chromaticity was a reasonable estimate of the illuminant chromaticity. In the second experiment, we used sets of background reflectances whose means were significantly biased, yet this had only a small effect on the classification of test materials. Khang and Zaidi (
2004) showed that on biased backgrounds, the perceived colour of the illuminant is close to that of the mean chromaticity of the scene. The high levels of constancy we observe with biased backgrounds suggest that the colour constancy transformation is not based on the simple spatial integration that seems to set the perceived colour of the illuminant. However, our results with biased backgrounds do not rule out all spatially extended constancy mechanisms since it is always possible that there exists some scene-statistic that could appropriately set the parameters of a constancy transformation, even for biased scenes.
In our third experiment, we performed a critical manipulation. We used one illuminant for the test and a different illuminant for the background. Under these conditions, the spatial context provides information only about the background illuminant, and so any spatially extended illumination mechanism would estimate the wrong illuminant for the test, and constancy would be low. In this experiment, information about the test illuminant is available only by collating local information over successive trials. Observers continued to demonstrate reasonable colour constancy for reflectances presented under the test-illuminant.
Our final experiment was designed to separate purely automatic, adaptation mechanisms from higher-level perceptual mechanisms. Test-patches, rendered under one illuminant, were briefly presented within a background rendered under a conflicting illuminant. If the test illuminant influences observers’ judgements in disproportion to the relative exposure time to the two illuminants, we have evidence that contextual information about the test is tracked by higher-level mechanisms that can collate information about the test independently from information about the background.