**One central problem in perception research is to understand how internal experiences are linked to physical variables. Most commonly, this relationship is measured using the method of adjustment, but this has two shortcomings: The perceptual scales that relate physical and perceptual variables are not measured directly, and the method often requires perceptual comparisons between viewing conditions. To overcome these problems, we measured perceptual scales of surface lightness using maximum likelihood difference scaling, asking observers only to compare the lightness of surfaces presented in the same context. Observers were lightness constant, and the perceptual scales qualitatively and quantitatively predicted perceptual matches obtained in a conventional adjustment experiment. Additionally, we show that a contrast-based model of lightness perception predicted 98% of the variance in the scaling and 88% in the matching data. We suggest that the predictive power was higher for scales because they are closer to the true variables of interest.**

*x*]) and the match (Ψ[

_{T}*x*]). What is being measured, however, are not the transducer functions relating the two but the corresponding luminances of the target and the match (

_{M}*x*and

_{T}*x*, Figure 2B).

_{M}*B*and the foreground

*F*are combined according to some weighting factor

*α*so as to result in a new image luminance at the position of transparency

*T*=

*α*×

*B*+ (1 −

*α*) ×

*F*. An

*α*value of 0 corresponds to an opaque foreground

*T*=

*F*;

*α*of 1 corresponds to a fully transparent foreground

*T*=

*B*. The transparent layer varied in transmittance and reflectance. The dark transparency had a value of 0.35 in

*povray*reflectance units (19 cd/m

^{2}) and the light transparency of 2 (110 cd/m

^{2}). Values of

*α*= 0.4 and 0.2 were used in the high and low transmittance conditions, respectively. The rendered images were converted to grayscale images. The background luminance was 141 cd/m

^{2}. Detailed values of luminance for each transparent medium can be found in Supplementary Table S3).

^{2}, which is identical to the mean luminance of the 13 checks in the main checkerboard in plain view. The surround checkerboard was presented in four different spatial arrangements, resulting from clockwise rotation of the original in steps of 90°. A configuration was assigned randomly to each trial.

*p*= 10 reflectance values, the total number of unique triads was

*n*=

*p*!/((

*p*− 3)! × 3!) = 10!/(7! × 3!) = 120. Each triad contained three values that were selected so as to enclose nonoverlapping intervals. They were presented in ascending (

*x*

_{1}<

*x*

_{2}<

*x*

_{3}) or descending (

*x*

_{1}>

*x*

_{2}>

*x*

_{3}) order (Knoblauch & Maloney, 2008). The reference,

*x*

_{2}(check

*I2*in Figure 4A), was located between the two comparisons,

*x*

_{1}and

*x*

_{3}(checks

*B2*and

*I9*in Figure 4A). In each trial, observers judged which comparison check,

*x*

_{1}or

*x*

_{3}, was more different in lightness from the reference. Observers used a left or right response button to indicate their choice. No time limit was imposed.

^{2}, which was identical to the mean luminance of all checks seen in plain view. The remaining 73 checks were drawn randomly without replacement from a set consisting of six repeats of the 13 different reflectance values. This resulted in a slight variation of the mean luminance of those checks between trials (up to 6 cd/m

^{2}). The checks were positioned so that two neighboring checks did not have the same reflectance.

*I2*in Figure 1) in the MLDS experiment. Observers adjusted the luminance of the external test field to match the perceived lightness of the target check. The luminance was adjusted by pressing one of four buttons, two of them for coarse adjustments (±10 cd/m

^{2}) and the other two for fine adjustments (±1 cd/m

^{2}). The maximum luminance of the monitor was 550 cd/m

^{2}. Satisfactory matches were confirmed with a fifth button that initiated the next trial. No time limit was imposed on the adjustment procedure.

*R*to analyze the data. The left panel in Figure 3 depicts a hypothetical perceptual scale that relates psychological experience, Ψ(

*x*), to a physical variable,

*x*. The central panel illustrates how the decision model translates into the statistical model that is used to estimate scale parameters, and the right panel depicts the estimated scale values. Observers perform the triad judgments for different levels (

*x*) of the physical variable (e.g.,

_{i}*x*

_{2},

*x*

_{5},

*x*

_{8}in Figure 3). They judge whether the difference Δ = (Ψ[

*x*

_{8}] − Ψ[

*x*

_{5}]) − (Ψ[

*x*

_{5}] − Ψ[

*x*

_{2}]) is smaller or larger than zero. The decision model for all possible triads is summarized in the design matrix

*X*, which contains separate columns for each

*x*-value (Figure 3B). Each row of the design matrix contains the weights for the decision model of that respective triad. The coefficients (

*β*) are estimated in a (binomial) generalized linear model (GLM) to account for the observed responses (

*Y*) using maximum likelihood, and they represent the scale values for all levels of the physical variable. The linear predictors

*X * β*are related to the observed responses by using a link function

*g*(), which maps the range of the linear predictors to a range of the response probabilities

*E*[

*Y*]. The decision model in MLDS is stochastic, and it assumes a single Gaussian-distributed noise source

*ε*that corrupts the decision variable. By default, the GLM estimation assumes a variance of the noise source of one

*X*in Figure 3 by two—giving 0.5, −1, and 0.5—yields a scale for which

*d*′ (as shown in more detail in Aguilar et al., 2017; Devinck & Knoblauch, 2012).

*x*is luminance, and

*a*,

*b*are linear coefficients calculated to map the range of luminance in plain view [

*L*

_{min},

*L*

_{max}] to the range [0, 1].

*x*is luminance, and

*a*,

_{i}*b*are linear coefficients calculated to map the range of luminance for each viewing condition to the range [0, 1] (for simplicity, we used linear functions, but power functions could be used as well and would not change our ideal observer results).

_{i}*N*(0,

*σ*

^{2}), and Ψ

^{*}is either Ψ

*or Ψ*

^{lum}*. Simulated responses were generated choosing the triad (*

^{light}*x*

_{2},

*x*

_{3}) when Δ > 0 and (

*x*

_{1},

*x*

_{2}) otherwise. Finally, the simulated data were subjected to the MLDS analysis to obtain the coefficients

*β*that constitute the scale values. Figure 4B shows the model perceptual scales (left) and the estimated scales (right), and it is evident that for the chosen noise level (

*σ*= 0.15) the method recovers the underlying scale.

*σ*, minimum = 0.01 and maximum = 1.2, see Supplementary Material). The two observer models were distinguishable for a broad range of noise levels up to approximately 0.4. This upper-bound value was much higher than the noise levels that have been observed in previous experiments (Devinck & Knoblauch, 2012; Knoblauch & Maloney, 2008). We therefore concluded that MLDS could be used to derive meaningful scales because they would allow us to distinguish between these two different observer models.

*x*-axis of the perceptual scales. In such a perceived lightness versus reflectance plot, the scales of a lightness-constant observer should coincide on a single function. Figure 5B shows that this was indeed that case.

*x*) =

*ax*+

^{e}*b*using a nonlinear least squares method (Ritz & Streibig, 2008). To evaluate the goodness of fit, we computed

*R*

^{2}values for linear fits to the data. The average

*R*

^{2}was already reasonably high (0.86). We then performed

*F*tests on nested models (power function vs. its linear submodel with

*e*= 1), which revealed that the power functions fitted the data significantly better than the linear ones,

*F*

_{min}(1, 97) = 15.6,

*p*< 0.001. From this, we concluded that the power functions captured the data sufficiently well.

*a*,

*b*,

*e*) and for the separate models with 5 × 3 parameters. There was a benefit for the separate model fits relative to the global model,

*F*(12, 497) = 18.57,

*p*< 0.001. To explore the cause for this difference, we computed one-way repeated-measures ANOVAs for each of the three parameters of the power functions. We found a significant difference between scales for the exponent parameter,

*e*,

*F*(4, 36) = 16.6,

*p*< 0.001, which determines the curvature of the function. Post hoc tests on the exponents revealed significant differences between each of the light transparency conditions and the plain view and the dark transparency with high transmittance (Bonferroni corrected

*p*< 0.05). The main difference between the light transparency conditions and the plain view and the dark transparency (high transmittance) conditions is the difference in curvature between these functions (Figure 5B).

*x*) that corresponds to a particular target luminance

_{T}*x*in one of the transparency conditions. In the next step, we needed to find the luminance value

_{T}*x*that corresponds to the scale value at the match position Ψ(

_{M}*x*), assuming that observers match the lightness of the match region to that of the target region according to Ψ(

_{M}*x*) == Ψ(

_{M}*x*). We did not measure a perceptual scale at the match position but instead adopt the plain view scale to represent the scale for the matches. In order to be able to read out

_{T}*x*-values corresponding to any possible Ψ-value and vice versa, we fitted the scales with power functions,

*ψ*(

*x*) =

*ax*+

^{e}*b*, using a nonlinear least squares method. We derived the predicted matching data from the “unconstrained” scales individually for each observer, and we then aggregated them in the same way as the empirical data obtained from the matching experiment.

*t*tests to compare slopes and intercepts between predicted and empirical functions. The average slope and intercept values are listed in Supplementary Table S4 together with the relevant test statistics. We found significant differences between the predicted and the empirical functions only for the dark transparent medium with a high transmittance.

*x*-axis into units of normalized Michelson contrast did indeed linearize the perceptual scales. To test how well the normalized contrast model accounts for the variability between the different context conditions, we computed a global

*R*

^{2}value. As described before, we treat all data as if they were coming from one underlying model. The normalized contrast measure accounts for 98% of the variance in the scaling data and for 88% of the variance in the matching data. This indicates that the normalized contrast measure is a better predictor for the scales than the matching data by explaining more variance. The residuals of these fits are provided in Supplementary Figure S6.

*x*

_{3}] − Ψ[

*x*

_{2}]) ∼ (Ψ[

*x*

_{2}] − Ψ[

*x*

_{1}]). This critical assumption in MLDS is different than other scaling methods, such as Fechnerian scaling that uses integration of just-noticeable differences or other discrimination-based scaling methods (Baird, 1978). These scaling methods assume a noise source at an early sensory representation level and not at a late decision level. Here, we compared perceptual lightness scales that were measured in different viewing conditions and hence could have been associated with different amounts of decision noise. This was not what we observed. Although individual observers differed in their overall noise level, all scales measured for one observer had comparable estimated noise levels. However, these assumptions must be considered carefully, and ultimately their validity must be addressed experimentally (Aguilar et al., 2017).

*σ̂*= 0.4 (Supplementary Material). In our observers the estimated

*σ̂*values varied from 0.13 to 0.21 for observers O1 to O8, i.e., values below the upper limit of model discriminability. For observer O9,

*σ̂*= 0.39 was at the boundary of discriminability, and for observer O10,

*σ̂*= 0.71 was beyond the upper limit. Thus, the noise level of observer O10 did not allow a definite selection of either of the two models. The estimated noise level also must be considered carefully when comparing scales against ideal observer models.

*veridical*surface reflectances but is rather tightly correlated with them. One might be tempted to conclude that the predictive power of the contrast-based model “exceeds” that of physical surface reflectances because it accounts for the deviations from lightness constancy that we observed in the data.

*veridical*percept with respect to the physical world, but to an overall reliable estimate of the appearance of objects (e.g., Marlow, Kim, & Anderson, 2012). The estimated scales were linearized by the transformation to contrast units, which implies that the model accounts for the sensitivity differences between low and high reflectances (e.g., Lu & Sperling, 2012), a feature that cannot be quantitatively captured with matching. The higher agreement between the model and the perceptual scales (compared to matching) supports the idea that the perceptual scales are a more direct and informative measure of the internal variable of lightness and subject to fewer sources of variability.

*. (pp. 339–351). Cambridge, MA: MIT Press.*

*The new cognitive neurosciences*(2nd ed)*, 17 (1): 37, 1–18, doi:10.1167/17.1.37. [PubMed] [Article]*

*Journal of Vision**, 24 (4), 919–928.*

*Neuron**, 21 (24), R978–R983.*

*Current Biology**, 4 (12), 2281–2285.*

*Journal of the Optical Society of America A**. New York: Wiley.*

*Fundamentals of scaling and psychophysics**(pp. 3–26). New York: Academic Press.*

*Computer vision systems**, 14, 2091–2110.*

*Journal of the Optical Society of America A**, 44 (27), 3223–3232.*

*Vision Research**, 12 (3): 19, 1–14, doi:10.1167/12.3.19. [PubMed] [Article]*

*Journal of Vision**, 10 (10), 2166–2180.*

*Journal of the Optical Society of America A**, 30, 342–352.*

*Journal of the Optical Society of America A**. Leipzig, Germany: Breitkopf und Hartel.*

*Elemente der psychophysik*[Translation:*Elements of psychophysics*]*, 94, 62–75.*

*Vision Research**, 22 (6), 812–820.*

*Psychological Science**, 7, 439–443.*

*Trends in Cognitive Sciences**, 51 (7), 674–700.*

*Vision Research**, 39, 169–200.*

*Annual Review of Psychology**, 106 (4), 795–834.*

*Psychological Review**, 17, 1714–1719.*

*Current Biology**. London: Academic Press.*

*Psychophysics: A practical introduction**, 25, 1–26.*

*Journal of Statistical Software**. New York: Springer.*

*Modeling psychophysical data in R**(pp. 41–54). Oxford, UK: Oxford University Press.*

*Handbook of perceptual organization**. Mineola, NY: Dover Publications.*

*Foundations of measurement*. volume I:Additive and polynomial representations*, 68 (1), 76–83.*

*Perception & Psychophysics**, 70 (5), 828–840.*

*Perception & Psychophysics**, 12 (10): 8, 1–21, doi:10.1167/12.10.8. [PubMed] [Article]*

*Journal of Vision**, 30, 289–298.*

*Visual Neuroscience**, 3 (8): 5, 573–585, doi:10.1167/3.8.5. [PubMed] [Article]*

*Journal of Vision**, 22 (20), 1909–1913.*

*Current Biology**, 23 (11), 394–411.*

*Journal of the Optical Society of America**, 4 (9): 4, 711–720, doi:10.1167/4.9.4. [PubMed] [Article]*

*Journal of Vision**, 66 (8), 866–867.*

*Journal of the Optical Society of America**, 42 (6), 847–865.*

*Journal of Experimental Psychology: Human Perception and Performance**, 15 (13): 3, 1–21, doi:10.1167/15.13.3. [PubMed] [Article]*

*Journal of Vision**, 15 (6): 13, 1–19, doi:10.1167/15.6.13. [PubMed] [Article]*

*Journal of Vision**. New York: Springer.*

*Nonlinear regression with R**, 44, 1827–1842.*

*Vision Research**, 109 (3), 492–519.*

*Psychological Review**(Doctoral dissertation, Eberhard-Karls-Universität Tubingen, presented March, 2014; published May, 2014).*.

*Verlag Dr. Hut**(pp. 35–110). New York: Psychology Press.*

*Lightness, brightness, and transparency**, 16 (11): 17, 1–19, doi:10.1167/16.11.17. [PubMed] [Article]*

*Journal of Vision**, 14 (7): 3, 1–15, doi:10.1167/14.7.3. [PubMed] [Article]*

*Journal of Vision*