Lightness perception is the ability to perceive black, white, and gray surface colors in a wide range of lighting conditions and contexts. This ability is fundamental for any biological or artificial visual system, but it poses a difficult computational problem, and how the human visual system computes lightness is not well understood. Here I show that several key phenomena in lightness perception can be explained by a probabilistic graphical model that makes a few simple assumptions about local patterns of lighting and reflectance, and infers globally optimal interpretations of stimulus images. Like human observers, the model exhibits partial lightness constancy, codetermination, contrast, glow, and articulation effects. It also arrives at human-like interpretations of strong lightness illusions that have challenged previous models. The model’s assumptions are reasonable and generic, including, for example, that lighting intensity spans a much wider range than surface reflectance and that shadow boundaries tend to be straighter than reflectance edges. Thus, a probabilistic model based on simple assumptions about lighting and reflectance gives a good computational account of lightness perception over a wide range of conditions. This work also shows how graphical models can be extended to develop more powerful models of constancy that incorporate features such color and depth.

^{1}) is a difficult problem with a long history (Hering, 1905/1964; von Helmholtz, 1910/1924). Research over several decades has shown, however, that features such as perceived lighting boundaries, depth discontinuities, and cues to transparency play a central role in lightness perception; for reviews, see Adelson (2000), Gilchrist (2006), and Kingdom (2011).

^{1}), which are almost always image-based computational models, and can make falsifiable predictions for any achromatic two-dimensional stimulus (e.g., Blakeslee and McCourt, 1999; Dakin and Bex, 2003).

^{2}called a “Gibbs distribution”:

*C*

_{i}of the MRF. In the natural image example where each pixel has eight neighbors, a little thought shows that the largest cliques are 2 × 2 squares of pixels. The functions ϵ

_{i}are “potential functions” that put a cost on the state of each clique, such that states with high costs tend to occur less frequently

^{3}.

*Z*is a constant that gives the density function a total volume of 1. The probability density of a state

*X*of the ensemble is determined by the the sum of the potentials of all cliques under that state. Thus, we can specify the probability density of an MRF by specifying a set of potential functions ϵ

_{i}on cliques, and because cliques tend to be small, this makes modeling the ensemble of elements much more tractable.

_{i}for hidden elements depend on the state of the observed elements. In the shape-from-shading example, we observe a stimulus image, and a CRF model would then provide a corresponding set of potential functions ϵ

_{i}that specify (as in Equation 1) the probability density of slant and tilt across the image. We then interpret Equation 1 as the conditional probability density of the hidden elements, given the observed elements.

^{1}and a luminance map that represents the observed stimulus (Figure 2). Given a luminance map, the model constructs potential functions on 2 × 2 patches of the illuminance and reflectance maps. The potential functions impose the following soft constraints. (a) Reflectance mostly spans the range 3% to 90%, with a rapid decline in probability outside these limits. (b) Low illuminances are more probable than high illuminances. (c) Illuminance edges are less common than reflectance edges. (d) Reflectance and illuminance edges usually occur at image luminance edges (Freeman, 1996). (e) X-junctions are evidence for illuminance edges (Metelli, 1970; Beck et al., 1984). (f) Costs are evaluated on uniform image regions instead of pixelwise

^{4}(Katz, 1935; Gilchrist, 2006). (g) Illuminance edges tend to be straighter than reflectance edges (Logvinenko et al., 2005). Given a stimulus image, MIR uses belief propagation to find global illuminance and reflectance assignments that generate the image and match these local assumptions as closely as possible. I describe these assumptions quantitatively in the Appendix, and I provide a MATLAB implementation of the model at doi:10.17605/OSF.IO/4FWJV.

^{2}. The 16 × 16 reflectance patterns are provided with the model code at doi:10.17605/OSF.IO/4FWJV.

*increase*the illumination estimate, can cause the target patch to appear lighter, not darker. One possibility for a future revision of MIR is to incorporate a prior for smooth reflectance patterns, which would make each reflectance value tend to be similar to surrounding reflectance values.

^{4}not pixelwise, so when the image is divided into a larger number of uniform regions there is a greater cost reduction for positing a strong lighting boundary along the vertical edge that divides the figure in two.

*brightness*(i.e., perceived luminance

^{1}) are typically computational models that use operations such as spatial filtering and contrast normalization derived from the physiology of early visual cortical areas (DeValois & DeValois, 1988), or are based on simple stimulus properties such as luminance ratios at edges (Heinemann, 1955; Rudd & Zemach, 2004). The relationship between lightness and brightness is not well understood, partly because of this difference between the goals and methods of the researchers who study them. Nevertheless, under some viewing conditions we expect lightness and brightness judgments to be similar, for example, with simple, reflective, uniformly illuminated two-dimensional geometric figures such as the ones used in Experiments 1 and 2, for which luminance is proportional to reflectance. Thus, with some caveats it can be useful to evaluate models of lightness and brightness on the same stimuli, as has often been done in the past (Adelson, 1993; Shapiro & Lu, 2011; Blakeslee & McCourt, 2012).

^{5}and for this reason I have kept the straight illumination boundary assumption. To take another example, the assumption that low illuminances are more probable than high illuminances plays an important role in MIR, as it drives the model to assign reflectances that fill the range of 0.03 to 0.90 (MIR’s version of the “highest luminance rule” in anchoring theory). It remains to be seen, however, whether this assumption correctly reflects natural scene statistics, or whether the human visual system’s tendency to see the highest luminance in a scene as white has a different explanation (e.g., Murray, 2013). As these examples illustrate, there is a great deal of room for exploring the effects of alternative assumptions about lighting and reflectance. Another promising approach would be to learn potential functions from illuminance and reflectance patterns in natural scenes (or realistic renderings of natural scenes), instead of manually specifying a number of discrete assumptions (Freeman et al., 2000).

*perceived reflectance*. Luminance is a measure of physical lighting magnitude in a viewed stimulus (units cd/m

^{2}). Brightness is

*perceived luminance*. Illuminance is a measure of incident lighting magnitude on a surface (units lux).

_{i}(

*C*

_{i})) is not proportional to the probability density of clique

*C*

_{i}. Finding the marginal probability density of a clique from the Gibbs distribution of the ensemble (Equation 1) is not straightforward.

_{i}are defined on 2 × 2 cliques, so a more complete statement of this property is that cost functions are evaluated on uniform image regions, up to a maximum 2 × 2 pixel region.

*Science,*262 (5142), 2042–2044.

*The new cognitive neurosciences*(p. 339–351). Cambridge, MA: The MIT Press.

*Perception as Bayesian inference*(p. 409–412). Cambridge, UK: Cambridge University Press.

*Journal of Vision,*13 (7)

*:*1–18.

*Perception & Psychophysics,*54 (4), 446–457.

*IEEE Transactions on Pattern Analysis and Machine Intelligence,*37 (8), 1670–1687.

*Perception & Psychophysics,*35 (5), 407–422.

*International Journal of Computer Vision,*35 (1), 33–44.

*Journal of Vision,*15 (14:1), 1–17.

*Pattern recognition and machine learning*. New York: Springer.

*Behavior Research Methods,*48, 306–312.

*Vision Research,*39, 4361–4377.

*Vision Research,*44, 2483–2503.

*Vision Research,*60, 40–50.

*Perception,*23 (9), 991–1006.

*Perception & Psychophysics,*61 (5), 786–797.

*Current Biology,*25, R549–R568.

*Journal of the Optical Society of America A,*3 (10), 1651–1661.

*Proceedings of the Royal Society B,*270, 2341–2348.

*Spatial vision*. Oxford, UK: Oxford University Press.

*Journal of Vision,*7 (12:2), 1–15.

*Psihologija,*47 (3), 353–358.

*International Journal of Computer Vision,*20 (3), 243–261.

*International Journal of Computer Vision,*40 (1), 25–47.

*Proceedings of the IS&T/SID Eighth Color Imaging Conference*(p. 112–121). Scottsdale, AZ, November 2000.

*Journal of the Optical Society of America A,*1 (7), 775–782.

*Annual Review of Psychology,*59, 167–192.

*Science,*195 (4274), 185–187.

*Seeing black and white*. ***Oxford, UK: Oxford University Press.

*Psychological Review,*106 (4), 795–834.

*Journal of Experimental Psychology,*50 (2), 89–96.

*Outlines of a theory of the light sense*(Hurvich, L. M., Jameson, D., Trans.). Cambridge, MA: Harvard University Press.

*Perceptual and Motor Skills,*31 (Suppl. 2-V31), 947–969.

*Zeitschrift fuer Psychologie und Physiologie der Sinnesorgane,*23 (Suppl.), 1–184.

*The world of colour*. London: Kegan & Paul.

*Journal of Vision,*18 (13): 1–20.

*Vision Research,*51, 652–673.

*Principles of Gestalt psychology*. New York: Harcourt, Brace, & World, Inc.

*Probabilistic graphical models: principles and techniques*. Cambridge, MA: The MIT Press.

*Proceedings of the Eighteenth International Conference on Machine Learning*. Association for Computing Machinery.

*Journal of the Optical Society of America,*61 (1), 1–11.

*Perception,*44, 243–268.

*Perception & Psychophysics,*67 (1), 120–128.

*Perception & Psychophysics,*68 (1), 76–83.

*SPIE Visual Communications and Image Processing IV,*1199, 1154–1163.

*Vision: a computational investigation into the human representation and processing of visual information*. San Francisco: W. H. Freeman and Company.

*Proceedings of the IS&T/SID Seventh Color Imaging Conference*(pp. 1–8). Scottsdale, AZ, November 1999.

*The art and science of HDR imaging*. New York: John Wiley & Sons, Ltd.

*Ergonomics,*13 (1), 59–66.

*Journal of Vision,*14 (9)

*:*

*15*, 1–18.

*Proceedings of SPIE 8651, Human Vision and Electronic Imaging XVIII*. San Francisco, CA, February 4-7, 2013.

*Current Opinion in Behavioral Sciences,*30, 48–54.

*Vision Research,*40 (10-12), 1227–1268.

*Journal of Vision,*18 (5):

*1*, 1–13.

*Computer vision: models, learning, and inference*. Cambridge, UK: Cambridge University Press.

*Journal of Vision,*16 (11)

*:*

*2*, 1–18.

*Mach bands: quantitative studies on neural networks in the retina*. San Francisco: Holden-Day, Inc.

*Vision Research,*44, 971–981.

*Psychological Science,*13 (2), 142–149.

*Psychological Science,*22, 1452–1459.

*Treatise on physiological optics*(Southall, J. P. C., Trans.). Rochester, NY: Optical Society of America.

*Journal of Experimental Psychology,*38, 310–324.

*Perception,*8 (4), 413–416.

*L*= (

*l*

_{ij}), and layer 2 represents the illuminance assignment

*M*= (

*m*

_{ij}). For a Lambertian, frontoparallel surface, these two layers imply a reflectance assignment \(R = (r_{ij}) = ( \pi \, l_{ij} / m_{ij} )\). (The factor of π arises when we use units of cd/m

^{2}for luminance and lux for illuminance.) Each node in layer 2 (except edge nodes) has an undirected connection to the nearest nine neighbors in layer 1 and the eight nearest neighbors in layer 2. Given an observed luminance image

*L*, the model represents the posterior probability density of illuminance

*M*and implied reflectance

*R*as a Gibbs distribution, with total energy given by a sum of local potentials over 4-cliques (i.e., 2 × 2 squares) in layer 2:

*ij*indexes the 4-clique with its upper left corner in row

*i*and column

*j*. ϵ

_{ij}is the potential function that returns the energy of the 4-clique

*ij*(see Section 2 for details).

*M*

_{ij}is the 2 × 2 submatrix of

*M*on the 4-clique

*ij*.

*Z*is the partition function, a constant that normalizes the probability density to unit volume.

*m*

_{ij}, but some of its potential functions (see below) also include costs that depend on the implied reflectance \(r_{ij} = \pi \, l_{ij} / m_{ij}\). Thus, a more complete way of describing layer 2 is to say that it represents illuminance-reflectance assignments (

*m*

_{ij},

*r*

_{ij}), parameterized by the illuminance

*m*

_{ij}. In the exposition in the main text (e.g., Figure 2), I gave the two layers equal status for simplicity, but in fact the illuminance layer

*M*is represented explicitly, and the reflectance layer

*R*is implied.

_{ij}in Equation 2. These are the seven assumptions described in the main text.

*Illuminance spans a wide range, and lower illuminances are more likely.*Each node in layer 2 has a potential \(\epsilon ^{1m}_{ij} = \log _{10} m_{ij}\) that assigns a logarithmic cost to the illuminance

*m*

_{ij}at that node. This allows illuminance to take any positive value, but assigns a lower cost, and hence a higher probability, to lower values.

*m*indicates that it places a cost on illuminance. In the accompanying MATLAB implementation, \(\epsilon ^{1m}_{ij}\) (and all other potential functions) are represented as a table, and

*m*

_{ij}takes 20 logarithmically spaced values between 30 lux and 350 lux.

*Reflectance mostly spans the range of 3% to 90%.*Reflectance is modeled using the following probability density function, which is shown in Figure 15:

*f*

_{r}is the minimum of an increasing and a decreasing logistic function, with parameters

*a*and

*b*chosen so that most of the area under the resulting function lies between

*r*= 0.03 and

*r*= 0.90. Each node in layer 2 has a potential \(\epsilon ^{1r}_{ij} = -\log _{10} f_r( \pi \, l_{ij} / m_{ij} )\), the negative log likelihood of the implied reflectance \(r_{ij} = \pi \, l_{ij} / m_{ij}\).

^{−5}elsewhere. The key properties of the density function are that it mostly limits reflectance to a physically realistic range, and allows values outside this range with some small probability.

*ij*is the 1 × 2 block of layer 2 nodes with its left node at position

*ij*. The vertical 2-clique indexed by

*ij*is the 2 × 1 block with its upper node at position

*ij*. I define horizontal and vertical 2-potentials (with superscript labels

*h*and

*v*) as follows.

*k*

_{ij}is the number of horizontal or vertical 2-cliques that 1-clique

*ij*belongs to, which is four except at edge nodes. Dividing by

*k*

_{ij}distributes the 1-clique potentials evenly, in such a way that summing all 2-clique potentials \(\epsilon ^{2hmr}_{ij}\) and \(\epsilon ^{2hmr}_{ij}\) gives the same result as summing all 1-clique potentials \(\epsilon ^{1m}_{ij}\) and \(\epsilon ^{1r}_{ij}\).

*ij*. For 1-clique potentials \(\epsilon ^{1*}_{ij},\) the range is

*i*= 1: 16,

*j*= 1: 16. For horizontal 2-clique potentials \(\epsilon ^{2h*}_{ij},\) the range is

*i*= 1: 16,

*j*= 1: 15, because \(\epsilon ^{2h*}_{i,16}\) would extend beyond the right side of the CRF. Similarly, for vertical 2-clique potentials \(\epsilon ^{2v*}_{ij},\) the range is

*i*= 1: 15,

*j*= 1: 16, and for 4-clique potentials \(\epsilon ^{4*}_{ij},\) the range is

*i*= 1: 15,

*j*= 1: 15.

*Illuminance edges are less common than reflectance edges.*Each horizontal and vertical 2-clique has a potential that assigns a sum-of-squares cost to log-illuminance edges:

*w*depends on the local image luminance pattern as described in (A4) and (A5). The weights were arrived at by manual adjustment, and the model is fairly flexible with regard to weight assignments. The model assigns no cost to reflectance edges.

*X-junctions are evidence for illuminance edges.*For a two-clique at an image luminance edge that is part of an X-junction, the potentials in Equations 6 and 7 have zero weight,

*w*=

*w*

_{x}= 0. I liberally define an X-junction as a 2 × 2 square where there are luminance edges between all four pairs of adjacent nodes; no special relationship between the four luminances is required. In future revisions of the model, it will be worth exploring whether requiring physically realistic luminance relationships in X-junction cues to lighting boundaries improves performance (Metelli, 1970).

*Illuminance edges tend to occur at image luminance edges.*For a 2-clique at an image luminance edge that is not part of an X-junction, the potentials in Equations 6 and 7 have a moderate weight,

*w*=

*w*

_{1}= 20. For a 2-clique with no luminance edge, the potentials have a large weight,

*w*=

*w*

_{0}= 600. These and the other weights

*w*

_{*}in the model were arrived at by manual experimentation.

*Potentials are evaluated on uniform image regions, not pixelwise.*Articulation effects show that lightness constancy tends to be better in images that consist of many distinct luminance regions (Katz, 1935). One possible explanation for these effects is that the visual system partitions an image into uniform luminance regions, and considers each region to provide a sample of information about the lighting conditions in the scene: more luminance regions provide more information, and as a result lightness percepts tend to be more accurate. I cannot fully implement this hypothesis in the present model, as potentials are evaluated over regions no larger than 4-cliques. I take a step in this direction, though, by having 2-clique potentials that span a luminance edge provide twice the potential of 2-clique potentials that fall within uniform luminance regions. I could do the same with 4-clique potentials, but I find that modifying 2-clique potentials this way is sufficient to create articulation effects.

*Illuminance edges tend to be straighter than reflectance edges.*Each 4-clique has a potential \(\epsilon ^{4m}_{ij}\) that assigns a cost to any 2 × 2 illuminance pattern that is not a multiplicative combination of vertical and horizontal edges. Consider a local illuminance pattern:

*m*

_{11}/

*m*

_{21}=

*m*

_{12}/

*m*

_{22}, or alternatively τ ≡ (log

_{10}

*m*

_{11}− log

_{10}

*m*

_{21}) − (log

_{10}

*m*

_{12}− log

_{10}

*m*

_{22}) = 0. If the pattern is a vertical edge, for example,

*m*

_{11}/

*m*

_{12}=

*m*

_{21}/

*m*

_{22}, and again τ = 0. Any pointwise product of horizontal and vertical edges also has τ = 0. Thus, a 2 × 2 illuminance pattern produced by a vertical or horizontal shadow, or by a shadow meeting a transmissive filter at right angles (Metelli, 1970), or by two transmissive filters meeting at right angles under uniform illumination, all have τ = 0.

*twist*of the log illuminance, as it is a discrete analog to \(\frac{\partial ^2}{\partial x \partial y } \log _{10} m(x,y)\) and measures the local rate of slope change in log illuminance. There are many ways of quantifying the straightness of local illuminance edges, but I have found τ to be an easy-to-compute and effective measure. The potential \(\epsilon ^{4m}_{ij}\) puts a cost on τ:

_{ij}for each 4-clique (see Equation 2) is the sum of the twist potential and the 2-potentials contained within the 4-clique:

*ij*belongs to, which is two except in the top and bottom rows of the CRF. \(k^{2v}_{ij}\) is the corresponding number for vertical 2-cliques. Dividing by \(k^{2h}_{ij}\) and \(k^{2v}_{ij}\) distributes the 2-clique potentials evenly, in such a way that summing all 4-clique potentials ϵ

_{ij}gives the same result as summing all \(\epsilon ^{4m}_{ij}\), \(\epsilon ^{2h}_{ij}\), and \(\epsilon ^{2v}_{ij}\).

_{ij}for each 4-clique in layer 2. There is an undirected connection between each 1-node and the 4-nodes that represent the potential functions that the 1-node contributes to. In each message passing operation, a node in the cluster graph sends a function δ(

*m*

_{ij}) to another node that it is connected to. The max-sum message δ(

*m*

_{ij}) reports the lowest energy of any illuminance-reflectance assignment to the sending node’s clique, that assigns illuminance

*m*

_{ij}(and so reflectance \(r_{ij} = \pi \, l_{ij}/m_{ij}\)) to the one CRF node

*ij*that the sender and receiver have in common. The message takes into account incoming messages from the sending node’s other neighbors, and if the sending node is a 4-node then it also takes into account the node’s potential ϵ

_{ij}.