Lightness constancy is the remarkable ability of the human visual system to maintain a stable percept of surface reflectance across a wide range of lighting conditions. Reflectance is the proportion of incident light reflected by a surface, as measured in photometric units, and lightness is perceived reflectance. The interactions between lighting, material properties, and 3-D shape during image formation mean that recovering surface reflectance from image luminance is an underdetermined problem: Under different lighting conditions, surface patches with the same reflectance can yield different luminances in the retinal image, and surface patches with different reflectances can yield the same luminance. How the human visual system achieves lightness constancy remains poorly understood, and research on this problem is a fundamental topic in vision science.
Different theories of lightness perception identify different image features as playing important roles in estimating lightness. Anchoring theory (Gilchrist et al.,
1999) states that the image patch with the highest luminance is a crucial reference point, and that other image regions are assigned lightness values relative to this region. Center–surround models (Heinemann & Chase,
1995; Shapiro & Lu,
2011) emphasize the role of the immediate surround of the region whose lightness is being judged. The oriented difference-of-Gaussians (ODOG; Blakeslee & McCourt,
1999) model, and its extensions LODOG and FLODOG (Robinson, Hammon, & de Sa,
2007), rely on oriented receptive fields at multiple scales. Adelson (
1993) emphasizes the importance of perceptual segmentation, highlighting X-junctions as a possible cue to lighting boundaries (Beck, Prazdny, & Ivry,
1984).
Most experiments on lightness perception have examined human observers' lightness matches in scenes that were carefully designed so that different models predicted different lightness percepts. Here we take the novel approach of measuring classification images to evaluate models of lightness perception (Ahumada,
2002; Murray,
2011; Volterra,
1930). Classification images measure the influence that local stimulus regions have on an observer's responses in a perceptual task, and so they provide information about what image features guide perceptual judgments. They are a psychophysical version of methods that are called
reverse correlation or
spike-triggered averaging in the neurophysiological literature (Ringach & Shapley,
2004). While most often used to study spatial vision, classification images provide a flexible experimental tool for identifying important features in a variety of domains, such as the perception of illusory contours (Gold, Murray, Bennett, & Sekuler,
2000), facial expressions (Kontsevich & Tyler,
2004), and translucency (Nagai et al.,
2013). Different theories of lightness perception identify different image features as being crucial to computing lightness percepts, so classification images should provide a powerful way of testing these theories. In the domain of brightness perception, classification images have been used to examine simultaneous contrast effects (Shimozaki, Eckstein, & Abbey,
2005). Here we use classification images to examine more complex stimuli where lightness percepts may depend on scene structures such as lighting boundaries.
We use the argyle illusion (Adelson,
1993;
Figure 1A) as a test case for evaluating models of lightness perception. In this illusion, one region (
Figure 1A, diamond A) appears lighter than another (diamond B) even though they are of the same physical luminance. We chose the argyle illusion as our “fruit fly” because it is one of the strongest known lightness illusions, and one which has consistently resisted explanations by low-level models (e.g., Blakeslee & McCourt,
2012). Therefore, it poses a difficult and interesting problem for modeling visual perception, and understanding it may reveal general principles of lightness constancy.
Adelson (
1993) explains the argyle illusion in terms of zones of uniform lighting—or, in the terminology of Gilchrist et al. (
1999),
lighting frameworks. In
Figure 1A, diamond A appears to belong to a dimmer lighting framework than B, yet it has the same luminance; Adelson suggests that, from this, the visual system infers that A has a higher reflectance than B. Adelson further suggests that lighting frameworks in the argyle illusion are determined by nonreversing X-junctions at the boundaries between light and dark columns, which create a percept of dark, vertical shadows or semitransparent filters (Beck et al.,
1984). Indeed, if the X-junctions are destroyed by splitting apart the columns (
Figure 1B, the
broken argyle), observers report a much smaller lightness difference between diamonds A and B.
In the present study, we investigate how human observers perceive and process the argyle illusion, and we compare human behavior to four computational models. We compare the strength of the argyle illusion for human and model observers, and we also compare the critical image features for human and model observers, using classification images to determine what image features influence observers' lightness judgments most strongly. Our results show that human observers' lightness judgments depend strongly on the luminances in the immediate neighborhood of the test patches being judged, in a way that tracks the boundaries of local lighting frameworks. Interestingly, none of the models that we tested are able to both replicate the argyle illusion and produce classification images that are even qualitatively similar to those from human observers. These findings show that making progress with computational models of lightness perception will require a better understanding of how lighting frameworks are established and of how the luminance of elements within lighting frameworks contributes to lightness percepts.