**Abstract**:

**Abstract**
**The lightness of a test stimulus depends in a complex manner on the context in which it is viewed. To predict lightness, it is necessary to leverage measurements of a feasible number of contextual configurations into predictions for a wider range of configurations. Here we pursue this goal, using the idea that lightness results from the visual system's attempt to provide stable information about object surface reflectance. We develop a Bayesian algorithm that estimates both illumination and reflectance from image luminance, and link perceived lightness to the algorithm's estimates of surface reflectance. The algorithm resolves ambiguity in the image through the application of priors that specify what illumination and surface reflectances are likely to occur in viewed scenes. The prior distributions were chosen to allow spatial variation in both illumination and surface reflectance. To evaluate our model, we compared its predictions to a data set of judgments of perceived lightness of test patches embedded in achromatic checkerboards (Allred, Radonjić, Gilchrist, & Brainard, 2012). The checkerboard stimuli incorporated the large variation in luminance that is a pervasive feature of natural scenes. In addition, the luminance profile of the checks both near to and remote from the central test patches was systematically manipulated. The manipulations provided a simplified version of spatial variation in illumination. The model can account for effects of overall changes in image luminance and the dependence of such changes on spatial location as well as some but not all of the more detailed features of the data.**

*r*, where

_{i,j}*i*and

*j*denote the row and column location of the square, respectively. In our experiments, we employed 5 × 5 checkerboards, so that

*i*and

*j*ranged between 1 and 5. The entire checkerboard of surfaces was described by the column vector

*r*in raster order.

_{i,j}*e*. This was summarized for the entire scene by the column vector

_{i,j}*l*at each checkerboard location. This was described by a column vector

_{i,j}*l*=

_{i,j}*e*. The algorithm's task was to estimate

_{i,j}r_{i,j}*likelihood*. This expresses the relationship between the representation of the visual world (

*P*(

*d*is a constant. Using a noise-free likelihood means that algorithm performance is governed by how the prior, described in following text, resolves the ambiguity about reflectance introduced by uncertainty about the illumination.

*prior*. This captures statistical regularities of the visual world as a probability distribution

*P*(

*P*(

*P*([

*P*(

*P*(

*P*(

_{i}_{, j}

*P*(

*r*

_{i}_{, j}). We took the reflectance distribution at each image location to be a beta distribution The beta is defined over the range 0 to 1. The relative probability of surfaces of different reflectance is adjusted by the parameters

*α*and

_{surface}*β*.

_{surface}*μ*. The lognormal also has a covariance matrix

_{illum}**K**

_{illum}, which allowed us to specify that illuminant intensities at neighboring locations are correlated. Such specification captures the assumption that the illuminant varies slowly over space. How slowly the illuminant varies is determined by the exact structure of the covariance matrix. Indeed,

**K**

_{illum}was constructed to represent a first-order Markov field, so that the correlational structure was controlled by a single parameter

*ρ*. Let the variance of the illuminant intensity at each location be the same and be given by

_{illum}*i*,

*j*] and [

*k*,

*l*] was given by

*α*,

_{surface}*β*,

_{surface}*μ*,

_{illum}*ρ*] we used numerical search as implemented by the fmincon function of MATLAB (Mathworks, Natick, MA). Because we assumed a noise-free likelihood, it was sufficient to search only over the space of illuminant vectors

_{illum}*n*-dimensional linear models for the space of illuminants (where

*n*took values of [2, 4, 6, 9, 10, 12, 14]). We searched over illuminants within each of these linear models in order of increasing dimension, using the result of the preceding search as the initial guess for the next. The estimate of

*α*,

_{surface}*β*,

_{surface}*μ*,

_{illum}*ρ*] and a set of luminance values

_{illum}*P*(

^{2}to 211 cd/m

^{2}. The smallest value was the minimum luminance value of the high-dynamic range display and should be considered approximate. The remainder of the test patches were chosen in equal log steps between 0.24 cd/m

^{2}and the maximum luminance of the display 211 cd/m

^{2}. The patches had CIE xy chromaticity (0.43, 0.40). The same 24 test patches were judged within nine separate checkerboard contexts (Figure 1).

^{2}(contrast ratio 1,878:1) that were equidistant in logarithmic units. These 24 luminance values were assigned to a 5 × 5 checkerboard surrounding the center test square. To assign luminance values to squares, we took random draws of spatial arrangement until neither the brightest nor the darkest luminance were in the inner ring immediately adjacent to the center square. This arrangement was used as the standard context in all experiments; a representation of this standard checkerboard context is shown in Figure 1. The remaining eight test checkerboard contexts were created in the following fashion. We divided the 24 checkerboard squares into inner (eight locations immediately adjacent to the center test square) and outer rings (16 locations surrounding the inner ring). We created low, standard, and high luminance distributions for inner and outer rings (for details, see Allred et al., 2012). Then we assigned each possible permutation of these rings to the eight test checkerboard contexts (i.e., low inner–low outer checkerboard; low inner–standard outer checkerboard; low inner–high outer checkerboard, etc.). The spatial arrangement of the low and high inner and outer rings in each test checkerboard context preserved the rank order of luminance values in the standard checkerboard context.

*L*

_{a(Context x)},

*L*

_{b(Context y)}] had the same reflectance (

*R*), then these two test luminance values would match in lightness across the context change. This linking hypothesis is based on the general idea that perceived lightness is a perceptual correlate of surface reflectance, but takes into account the fact that reflectance is not explicitly available in the retinal image. The role of the algorithm in the model is to provide a computation that converts proximal luminance to a form that is more plausibly related to perceived lightness.

_{z}*R*=

_{estimated}*f*(

_{x}*L*) represent the interpolated reflectance values, where

_{i}*x*represents one of the nine checkerboard contexts and

*i*indicates the 24 test patch values. In the standard context (

*x*=

*St*), we evaluated this function for all

*L*to obtain a set of reflectance values [

_{t}*R*]

*that served as the referents for establishing CTFs (much as the Munsell papers did for the psychophysical judgments). To compute a CTF*

_{St}*, we inverted the interpolated function*

_{x}*f*to find the value

_{x}*L*that yielded each [

*R*]

*. Thus, each algorithm-based CTF consists of 24 [*

_{St}*L*,

_{St}*L*] pairs that were taken as perceptually equivalent.

_{x}*α*,

_{surface}*β*,

_{surface}*μ*,

_{illum}*ρ*control the prior probability and hence drive the algorithm estimates. The parameter values we used for the algorithm were chosen to minimize the average error between algorithm-based CTFs and psychophysical CTFs. To find these values, we used a grid search on the algorithm parameters. We computed algorithm estimates for the 216 test–checkerboard pairs described above for thousands of sets of parameter values. Initial parameters were chosen through visual inspection of model predictions for a variety of simulated scenes. From these initial values, we varied each parameter in coarse steps to determine the best region of parameter space and then sampled this space more finely. Since our grid search was not exhaustive, it remains possible that a different set of parameter values could fit the data better.

_{illum}*L*,

_{St}*L*] pairs while the psychophysical CTFs were constructed using the 16 [

_{x}*L*,

_{St}*L*] defined by the Munsell chips. To directly compare the two sets of CTFs, we interpolated the algorithm-based CTFs to obtain values for each of the 16 psychophysical

_{x}*L*values. We chose final algorithm parameters that minimized the average prediction error in a least-squares sense. We refer to these as the

_{st}*derived priors*to emphasize that they were obtained by a fit to the psychophysical data, rather than directly from measurements of naturally occurring illuminants and surfaces.

*ρ*controls this aspect of the prior, with 1 indicating perfect correlation (uniform illumination) and 0 indicating independent illumination at each spatial location. The derived value of

_{illum}*ρ*was 0.46, intermediate between these two extrema. In Figure 3, the illuminants shown vary, but slowly, across space. The illuminant prior permits high and low illuminant luminance within one checkerboard context (top left illumination draw), but the highest illumination is not likely to be immediately adjacent to the darkest illumination. Relatively spatially uniform illuminations of different mean intensities are also probable (rightmost illumination draws).

_{illum}*ρ*, holding other prior parameters constant, and computed algorithm-based CTFs for all checkerboard contexts. This offset index is shown in Figure 5 for the six checkerboard contexts in which the inner, outer, or both rings were increased or decreased. The solid horizontal lines represent the observed psychophysical offset in each condition. For each checkerboard context, more spatially correlated illuminant priors (higher

_{illum}*ρ*) yield higher average offset values (all points tend away from 0 with increased

_{illum}*ρ*in Figure 5). This effect is more pronounced when luminance is decreased relative to the standard checkerboard context (lines with offset indices less than 0). When

_{illum}*ρ*is too low, the illuminant varies spatially too much and the context does not affect the test patch enough; that is, the algorithm offset from 0 is smaller than the average psychophysically measured offset.

_{illum}*overall mean*(OM) model, which fit the entire set of CTFs with their grand mean. The OM model errors represent an upper bound for any reasonable model. They also provide a sense of the total variance in the data set. In the second model, a

*context mean*(CM) model, each CTF was fit by its own mean. The CM model errors provide a sense of the variance in the data set that results from changing the test patch luminance, once the overall effect of checkerboard context has been modeled. Finally, in a

*single-CTF*(SCTF) model, all eight CTFs were fit with the mean CTF. The SCTF model errors provide a sense of the variance in the data set that results from changing the checkerboard context, once the overall effect of test patch luminance has been modeled.

*nonparametric regression*(NP Reg) model, designed to provide an excellent description of the data (low overall fit error). The fits of this model were obtained using multivariate kernel smoothing regression (Nadaraya, 1964; Watson, 1964) with a Gaussian kernel, as implemented in the routine ksrmv made available by Yi Cao at the MATLAB Central File Exchange (http://www.mathworks.com/matlabcentral/fileexchange). This method, in essence, provides a smoothed look-up table of the data. We choose the width of the Gaussian kernel by hand to achieve a good overall fit. The overall fit error for NP Reg model is not of interest per se because it can be made very small by optimizing the parameters of the kernel regression. Further, such models do not provide any scientific insight about the nature of the computations mediating lightness perception. However, the cross-validation error for the NP Reg model is of interest because it provides a benchmark for other models. A competitor model that has a higher overall fit error than the NP Reg model can still in principle have a lower cross-validation error, depending on the degree to which the NP Reg model overfits the noisy data and the degree to which the competitor model captures structure in the data that survives measurement variability. Indeed, models that capture most of the underlying structure in the data should have cross-validation errors comparable to or lower than that of this model.

*linear regression*(Lin Reg) model. This model predicts the CTFs as a linear function of the test and contextual log luminances. The model is a variant of the well-known retinex lightness algorithm (Land and McCann, 1971) and fit to our data. This follows because one of the standard variants of the retinex reduces to normalizing the test luminance by a spatially weighted geometric mean of all of the luminances in the image (Brainard & Wandell, 1986; Land, 1986). The Lin Reg model shares with our Bayesian model the fact that the CTFs are predicted directly from the image data, but with the Lin Reg model the predicted CTFs are constrained to be lines in the log-log plots of Figure 4. The Lin Reg model provides a reasonable benchmark for the performance of the Bayes model. Although we believe that the pursuit of Bayesian models of color and lightness is well-motivated theoretically (see Introduction and Discussion), it would reduce enthusiasm for further exploration if they cannot perform as well as extant more heuristically motivated models.

*The new cognitive neurosciences*(2nd ed., pp. 339–351). Cambridge, MA: MIT Press.

*Science*

*,*3, 2042–2044. [CrossRef]

*Visual experience: Sensation, cognition, and constancy*(1st ed., chap. 11). Oxford: Oxford University Press.

*Journal of Vision*, 12 (2): 7, 1–16, http://www.journalofvision.org/content/12/2/7, doi:10.1167/12.2.7. [PubMed] [Article] [CrossRef] [PubMed]

*Color constancy, intrinsic images, and shape estimation*. Firenze, Italy: European Conference on Computer Vision.

*Computer Vision, 2001*.

*ICCV 2001*

*,*1

*,*670–677.

*Statistical decision theory and Bayesian analysis*. New York: Springer-Verlag.

*Vision Research*, 39 (26), 4361–4377. [CrossRef] [PubMed]

*Vision Research*

*,*44 (21), 2483–2503. [CrossRef] [PubMed]

*Journal of Vision*, 4 (9): 6, 735–746, http://www.journalofvision.org/content/4/9/6, doi:10.1167/4.9.6. [PubMed] [Article] [CrossRef]

*Journal of Vision*, 3 (8): 2, 541–553, http://www.journalofvision.org/content/3/8/2, doi:10.1167/3.8.2. [PubMed] [Article] [CrossRef]

*The cognitive neurosciences*(4th ed., pp. 395–408). Cambridge, MA: MIT Press.

*The visual neurosciences*(pp. 948–961). Cambridge, MA: MIT Press.

*Colour perception: Mind and the physical world*(pp. 307–334). Oxford: Oxford University Press.

*Journal of Vision*, 6 (11): 10, 1267–1281, http://www.journalofvision.org/content/6/11/10, doi:10.1167/6.11.10 [PubMed] [Article] [CrossRef]

*Journal of Vision*, 11 (5): 1, 1–18, http://www.journalofvision.org/content/11/5/1, doi:10.1167/11.5.1. [PubMed] [Article] [CrossRef] [PubMed]

*The visual neurosciences*(2nd ed.). Cambridge, MA: MIT Press.

*Journal of the Optical Society of America A*

*,*3, 1651–1661. [CrossRef]

*Journal of Vision*, 8 (5): 15, 1–23, http://www.journalofvision.org/content/8/5/15, doi:10.1167/8.5.15. [PubMed] [Article] [CrossRef] [PubMed]

*Perception and Psychophysics*

*,*57 (2), 125–135. [CrossRef] [PubMed]

*Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition*(pp. 193–200). Providence, RI: Conference proceedings.

*Proceedings of the National Academy of Sciences, USA*

*,*86 (23), 9631–9635. [CrossRef]

*Siggraph*(pp. 189–198). New York: ACM.

*Journal of Vision*, 4 (9): 11, 821–837, http://www.journalofvision.org/content/4/9/11, doi:10.1167/4.9.11. [PubMed] [Article] [CrossRef]

*Visual Neuroscience*

*,*21, 1–6. [CrossRef] [PubMed]

*Proceedings CVPR ‘88*(pp. 544–549). Simon Fraser University, Burnaby, BC: Computer Society Conference.

*Advances in neural information processing systems 24*. NIPS: http://books.nips.cc/nips24.html.

*Vision Research*

*,*51, 771–781. [CrossRef] [PubMed]

*IEEE Transactions on Pattern Analysis and Machine Intelligence*

*,*6, 721–741. [CrossRef] [PubMed]

*Seeing black and white*. Oxford: Oxford University Press.

*Perception and Psychophysics*

*,*28, 527–538. [CrossRef] [PubMed]

*Psychological Review*, 106 (4), 795–834. [CrossRef] [PubMed]

*Nature Neuroscience*, 14 (7), 926–932. [CrossRef] [PubMed]

*Computer Vision, 2009 IEEE 12th International Conference,*Kyoto, Japan (pp. 2335–2342).

*Seventeenth Color Imaging Conference: Color Science and Engineering Systems, Technologies and Applications*(pp. 8–14). Albuquerque, NM: Society for Imaging Science and Technology.

*Helmholtz's physiological optics*. New York: Optical Society of America. (Original work published 1867).

*Journal of Experimental Psychology*

*,*47, 263–266. [CrossRef] [PubMed]

*Computational models of visual processing*(pp. 209–228). Cambridge, MA: MIT Press.

*Annual Review of Psychology*

*,*55, 271–304. [CrossRef] [PubMed]

*Perception as Bayesian inference*. Cambridge: Cambridge University Press.

*Vision Research*

*,*26, 7–21. [CrossRef] [PubMed]

*Journal of the Optical Society of America*, 61 (1), 1–11. [CrossRef] [PubMed]

*Bayesian statistics*. London: Oxford University Press.

*Journal of the Optical Society of America (A)*

*,*29, A247–A257. [CrossRef]

*Journal of Vision*, 10 (13): 14, 1–16, http://www.journalofvision.org/content/11/13/14, doi:10.1167/11.13.14. [PubMed] [Article] [CrossRef]

*Markov random field modeling in image analysis*. Tokyo: Springer-Verlag.

*Journal of Vision*, 10 (9): 19, 1–6, http://www.journalofvision.org/content/10/9/19, doi:10.1167/10.9.19. [PubMed] [Article] [CrossRef] [PubMed]

*Proceedings of the National Academy of Science, USA*

*,*108 (30), 12551–12553. [CrossRef]

*Applied Optics*

*,*48 (28), 5386–5395. [CrossRef] [PubMed]

*Theory of Probability and its Applications*

*,*9 (1), 141–142. [CrossRef]

*Perception*

*,*33, 1463–1473. [CrossRef] [PubMed]

*Journal of The Optical Society of America A*

*,*15, 563–569. [CrossRef]

*Vision and visual dysfunction: Vol. 6. The perception of colour*(pp. 43–61). London: Macmillan.

*Why we see what we do: An empirical theory of vision*. Sunderland, MA: Sinauer.

*Current Biology*, 21 (22), 1931–1936. [CrossRef] [PubMed]

*Journal of Vision*, 4 (9): 7, 747–763, http://www.journalofvision.org/content/4/9/7, doi:10.1167/4.9.7. [PubMed] [Article] [CrossRef]

*Inferring reflectance under real-world illumination*(Technical Report No. TR-10-10). Cambridge, MA: Harvard School of Engineering and Applied Sciences.

*Vision Research*

*,*44 (10), 971–981. [CrossRef] [PubMed]

*Current Opinion in Neurobiology*

*,*20 (3), 382–388. [CrossRef] [PubMed]

*Annual Review of Psychology*

*,*59, 143–166. [CrossRef] [PubMed]

*Handbook of image and video processing*(pp. 431–441). Salt Lake City, UT: Academic Press.

*Philosophical Transactions of the Royal Society of London B*

*,*360, 1329–1346. [CrossRef]

*Mechanisms of colour vision*(pp. 1–34). London: Academic Press.

*Nature Neuroscience*

*,*9 (4), 578–585. [CrossRef] [PubMed]

*IEEE Transactions on Pattern Analysis and Machine Intelligence*, 27 (9), 1459–1472. [CrossRef] [PubMed]

*PLoS ONE*

*,*6 (6), e20409. [CrossRef] [PubMed]

*Proceedings: Biological Sciences*

*,*265 (1394), 359–366. [CrossRef]

*The Indian Journal of Statistics, Series A*, 26 (4), 359–372.

*Nature Neuroscience*, 5 (6), 598–604. [CrossRef] [PubMed]

*High dynamic range imaging of natural scenes*. Scottsdale, AZ: Conference proceedings from the 10th Color Imaging Conference: Color Science, Systems and Applications.