A central question in psychophysical research is how perceptual differences between stimuli translate into physical differences and vice versa. Characterizing such a psychophysical scale would reveal how a stimulus is converted into a perceptual event, particularly under changes in viewing conditions (e.g., illumination). Various methods exist to derive perceptual scales, but in practice, scale estimation is often bypassed by assessing appearance matches. Matches, however, only reflect the underlying perceptual scales but do not reveal them directly. Two recently developed methods, MLDS (Maximum Likelihood Difference Scaling) and MLCM (Maximum Likelihood Conjoint Measurement), promise to reliably estimate perceptual scales. Here we compared both methods in their ability to estimate perceptual scales across context changes in the domain of lightness perception. In simulations, we adopted a lightness constant, a contrast, and a luminance-based observer model to generate differential patterns of perceptual scales. MLCM correctly recovered all models. MLDS correctly recovered only the lightness constant observer model. We also empirically probed both methods with two types of stimuli: (a) variegated checkerboards that support lightness constancy and (b) center-surround stimuli that do not support lightness constancy. Consistent with the simulations, MLDS and MLCM provided similar scale estimates in the first case and divergent estimates in the second. In addition, scales from MLCM–and not from MLDS–accurately predicted asymmetric matches for both types of stimuli. Taking experimental and simulation results together, MLCM seems more apt to provide a valid estimate of the perceptual scales underlying judgments of lightness across viewing conditions.

*x*

_{T}(i.e., luminance), evokes a response on the perceptual dimension of interest Ψ(

*x*

_{T}) (i.e., lightness). To perform a matching, the observer chooses a physical match intensity,

*x*

_{M}, which evokes a perceptual response, Ψ(

*x*

_{M}), that is as close as possible to the perceptual response to the target. The functions that relate Ψ(

*x*) and

*x*are known as perceptual scales, transducer functions (e.g., Kingdom and Prins, 2010), or transfer functions in lightness perception (Adelson, 2000). It is evident from Figure 1 that one and the same pattern of matching data (Figure 1B) may be consistent with different combinations of internal response functions (Figure 1A). Thus, matching data alone are insufficient to infer perceptual scales.

*jnds*) to Stevens’s direct scaling techniques (for a review, see, e.g., Gescheider, 1997; Marks and Gescheider, 2002), but their validity has been a topic of debate. For example, integrating

*jnds*is problematic, practically, because the error in each JND estimation propagates to the subsequent estimation, and theoretically, because the shapes of the estimated functions will differ as a function of the noise underlying the perceptual judgments (Kingdom and Prins, 2010; Kingdom, 2016). Stevens’s direct methods (e.g., magnitude estimation, ratio estimation) might be affected by the choice of the numerical categorization and hence are not guaranteed to provide a meaningful perceptual scale either (see, e.g., Treisman, 1964; Krueger, 1989).

^{1}Using simulations, we showed that MLDS is able to recover different ground truth perceptual scales regardless of whether we assumed the underlying noise to be additive or multiplicative, that is, constant or proportionally increasing across the scale (Aguilar et al., 2017).

^{2}The putatively underlying perceptual scales are described in more detail below (see section Simulation of perceptual scales).

*x*

_{1}or

*x*

_{3}, is more different in lightness from

*x*

_{2}(Figure 3A and C). The observer only compares triads within the same context as indicated by the two example stimuli in Figure 3A and C. In MLCM, the observer judges which of two checks,

*x*

_{1}or

*x*

_{2}, is lighter (Figure 3B and D). As indicated in the figure, in MLCM, the paired comparison can be done within the same viewing condition (upper panel) or between different viewing conditions (lower panel). MLDS estimates independent scales for each viewing condition. By default, each scale is anchored to zero at the minimum stimulus value. The maximum is inversely proportional to the noise estimated for each condition. The noise is assumed to be additive. The estimated scales are interval scales.

*x*), Figure 4A, left). This is an inversion of the mapping from reflectance to luminance shown in Figure 2. Both methods, MLDS and MLCM, recover the ground-truth model.

*x*), is thus a one-to-one mapping with luminance values in different transparent media covering different parts of the luminance range. MLCM is able to recover the luminance-based observer model when plain view is specified as the reference scale. MLDS anchoring policy erroneously shifts the luminance range of the light transparent medium from its actual value to zero and hence does not recover the ground-truth model.

*povray*(Persistence of Vision Raytracer Pty. Ltd., Williamstown, Victoria, Australia, 2004). The position of the checkerboard, the light source, and the camera were kept constant across all images. Checks were assigned 1 of 13 surface reflectance values according to the experimental design (see below). In plain view, the luminances ranged from 15 to 415 cd/m

^{2}. To keep the local contrast of each target check in the checkerboard comparable, we used the same eight reflectances for the surround checks but shuffled their positions. The mean luminance of the surround was equal to the mean luminance of all 13 reflectance values (Suppl. Table S3). The remaining checks in the checkerboard (73 in MLDS and 82 in MLCM) were randomly assigned 1 of the 13 reflectance values. The only constraint was that no two adjacent checks had the same reflectance. A different checkerboard was rendered for each trial in each of the procedures and for each observer.

*l*′ is obtained by linearly combining the luminance of the check in plain view

*l*and the luminance of the foreground transparency when rendered opaque

*l*

_{τ}, weighted by the transparency’s transmittance α:

*povray*(arbitrary) reflectance units (

*l*

_{τ}= 19 cd/m

^{2}) and a light transparency that had a reflectance value of 2 (

*l*

_{τ}= 110 cd/m

^{2}). The transmittance for both transparencies was α = 0.4. Supplementary Table S3 provides the luminance values for each reflectance in each viewing condition.

^{2}, which was identical to the mean luminance of the checkerboard seen in plain view. To keep the luminance and geometric structure of the surround with respect to the match comparable between trials, we used the same surround checkerboard but presented it in different orientations, rotated from the original in steps of 90 deg.

*x*

_{1},

*x*

_{2},

*x*

_{3}) are drawn from the set of possible reflectance values and presented in descending or ascending order at the target positions (see Figure 3). Observers judged which of the extremes (

*x*

_{1}or

*x*

_{3}) was more different in perceived lightness from the central one (

*x*

_{2}; see Figure 3). To indicate their judgment, they pressed the left or right button on the response box, respectively.

*p*= 10). For 10 stimulus intensities, the set of possible triads is 120 (

*n*=

*p*!/((

*p*− 3)! · 3!)) in each viewing condition. Each unique set of triads was repeated 10 times, resulting in a total of 3,600 trials for each observer (120 unique triads × 3 viewing conditions × 10 repeats). Trial sequence was randomized across conditons and repeats, and it was also randomized whether a triad was presented in ascending or descending order. We divided the total number of trials into 10 blocks of 360 trials, which took observers between 40 and 50 minutes to complete.

*x*

_{1}and

*x*

_{3}in the MLDS experiment (see Figure 3). Observers were asked to judge which of the targets was lighter (Figure 3) and they indicated their choice by pressing the left or right key in the response box. No time limit was imposed.

*x*

_{2}in MLDS. Observers adjusted the external test field so as to match the target in perceived lightness. There were two buttons for coarse and two buttons for fine adjustments. A fifth button was used to indicate a match. This triggered the presentation of the next trial. No time limit was imposed. Each judgment was repeated 10 times, resulting in a total of 300 matching trials (10 reflectance values × 3 viewing conditions × 10 repeats).

*MLDS*(Knoblauch and Maloney, 2008) and

*MLCM*(Knoblauch and Maloney, 2014) in the R programming language (R Core Team, 2017). For both methods, scales are estimated via maximizing the likelihood of a generalized linear model (GLM), derived in detail in Knoblauch and Maloney (2012). Confidence intervals for the scale values were obtained using bootstrap techniques. The goodness of fit for the scales was also evaluated using bootstrap techniques (see Knoblauch and Maloney, 2012; Wood, 2006). In the supplementary material, we describe the details of the goodness-of-fit evaluation for our data set.

^{2}(this corresponds to the luminance of the light transparent medium when rendered as an opaque surface; see Methods). The MLDS scales for the remaining observers are also indicative of a lightness constant observer model, whereas the MLCM scales indicate a mixture between a lightness constant and a contrast or luminance-based observer.

*x*

_{T}, we find its corresponding value on the perceptual axis, Ψ

^{T}(

*x*

_{T}), using the perceptual scale measured in this condition (Ψ

^{T}). We then find the numerically identical value Ψ

^{P}(

*x*

_{M}) on the perceptual axis of the plain-view scale, Ψ

^{P}, and use that scale to find the corresponding luminance value,

*x*

_{M}. In this way,

*x*

_{M}and

*x*

_{T}are the luminance values that produce equal values on the perceptual dimension, that is, Ψ

^{T}(

*x*

_{T}) = Ψ

^{P}(

*x*

_{M}).

*r*in Figure 7). In MLCM, predicted and actual matches were correlated with coefficients of

*r*= 0.997 and

*r*= 0.996 for the variegated checkerboard and the center-surround stimulus, respectively. In MLDS, the correlation coefficients were

*r*= 0.95 for the variegated checkerboards and

*r*= 0.889 for the center-surround stimuli.

*differences*about the test stimuli in a forced-choice setting. MLDS avoids the problem of cross-context comparisons, because the stimuli belonging to one triad are all shown in the same context. In MLCM, observers do not have to produce equality but judge which of the stimuli is higher on some perceptual dimension. Thus, even though in MLCM, sometimes stimuli are compared across contexts, this comparison is not as problematic as producing perceptual equality as in asymmetric matching (see below).

*additive*model, in which the perceptual judgments are explained by the sum of the effects of each individual stimulus dimension (Knoblauch and Maloney, 2012).

*additive*model is insufficient to recover the scaling functions, because it cannot capture full affine transformations among scales. Specifically, it only captures models with an (additive) offset but not with an offset and a multiplicative factor as it was the case for the scales in the transparency condition (Figure 4; see supplementary material and Suppl. Fig. S2 that illustrate this point). Consequently, we used the alternative and more general

*saturated*model provided by MLCM. This model includes almost as many parameters (29) as combinations of stimuli (30, 10 test reflectances × 3 contexts). We collected sufficient data to fit the

*saturated*model and used the nested likelihood ratio test to check whether the saturated model provided a better fit to the data than the additive model (Knoblauch and Maloney, 2012).

^{3}This was the case for all experimental data.

^{i}(

*x*) is the perceptual scale in the

*i*th context. The observer perceives the pair (

*x*

_{2},

*x*

_{3}) as being more different from (

*x*

_{2},

*x*

_{1}) when Δ

_{MLDS}> 0.

*i*could be different from

*j*), and the observer perceives

*x*

_{2}as lighter than

*x*

_{1}when Δ

_{MLCM}> 0. In both cases, the decision variable is corrupted by additive Gaussian noise with variance σ

^{2}(ϵ ∼

*N*(0, σ

^{2})). The scale maximum of either method equals the inverse of the estimated variability, that is, \(1/ \hat{\sigma }_{\mathrm{ MLDS}}\) and \(1/ \hat{\sigma }_{\mathrm{ MLCM}}\). If the noise is additive and Gaussian and if both methods are probing the same underlying dimension, Ψ(

*x*), then the noise estimates from both methods should be related in the following way:

*The new cognitive neurosciences*(2nd ed., pp. 339–351). Cambridge, MA: MIT Press.

*Journal of Vision,*17(1), 37, doi:10.1167/17.1.37.

*Vision Research,*44, 1765–1786.

*Journal of Vision,*14(4), 9, doi:10.1167/14.4.9.

*Vision Research,*144, 9–19, doi:10.1016/j.visres.2018.01.003.

*Psychophysics: The fundamentals*(3rd ed.). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.

*i-Perception,*8, doi:10.1177/2041669516687770.

*Psychological Science,*19, 196–204, doi:10.1111/j.1467-9280.2008.02067.x.

*Psychophysics: A practical introduction*. London, UK: Academic Press.

*Vision Research,*128, 1–5, doi:10.1016/j.visres.2016.09.004.

*Journal of Statistical Software,*25, 1–26.

*Modeling psychophysical data in R*. New York, NY: Springer.

*Handbook of perceptual organization*(pp. 41–54). Oxford, UK: Oxford University Press.

*Behavioral and Brain Sciences,*12, 251–320, doi:10.1017/S0140525X0004855X.

*Perception & Psychophysics,*68, 76–83.

*Journal of Mathematical Psychology,*1, 1–27. Retrieved from http://linkinghub.elsevier.com/retrieve/pii/002224966490015X. doi:10.1016/0022-2496(64)90015-X.

*Visual Neuroscience,*30, 289–298.

*Journal of Vision,*15(1), 15, doi:10.1167/15.1.15.

*Journal of Vision,*3 (8), 573–585, doi:10.1167/3.8.5.

*Stevens' handbook of experimental psychology: Vol. 4. Methodology in experimental psychology*(pp. 91–138). New York, NY: John Wiley & Sons.

*Journal of the Optical Society of America A,*33, A184–A193, doi:10.1364/JOSAA.33.00A184.

*Proceedings of the National Academy of Sciences of the United States of America,*82, 5983–5986, doi:10.1073/pnas.82.17.5983.

*Quarterly Journal of Experimental Psychology,*16, 11–22, doi:10.1080/17470216408416341.

*Lightness, brightness, and transparency*(pp. 35–110). Hillsdale, New Yersey: Lawrence Erlbaum Associates.

*Generalized additive models: An introduction with R*. Boca Raton, FL: Chapman & Hall/CRC.

*Journal of Vision,*14, 1–15, doi:10.1167/14.7.3.