Establishing the relation between perception and discrimination is a fundamental objective in psychophysics, with the goal of characterizing the neural mechanisms mediating perception. Here, we show that a procedure for estimating a perceptual scale based on a signal detection model also predicts discrimination performance. We use a recently developed procedure, Maximum Likelihood Difference Scaling (MLDS), to measure the perceptual strength of a long-range, color, filling-in phenomenon, the Watercolor Effect (WCE), as a function of the luminance ratio between the two components of its generating contour. MLDS is based on an equal-variance, Gaussian, signal detection model and yields a perceptual scale with interval properties. The strength of the fill-in percept increased 10–15 times the estimate of the internal noise level for a 3-fold increase in the luminance ratio. Each observer's estimated scale predicted discrimination performance in a subsequent paired-comparison task. A common signal detection model accounts for both the appearance and discrimination data. Since signal detection theory provides a common metric for relating discrimination performance and neural response, the results have implications for comparing perceptual and neural response functions.

*SD*) = 27.5 (9.8) years), and both genders were equally represented in the sample.

^{2}with CIE 1931 chromaticity coordinates (0.320, 0.310). The purple exterior contour had a luminance of 1.56 cd/m

^{2}with chromaticity coordinates (0.316, 0.182). The orange exterior contour was set initially at 12.1 cd/m

^{2}with chromaticity coordinates (0.470, 0.337). We also specified the stimuli in a cardinal direction space (Derrington, Krauskopf, & Lennie, 1984) with axes LM, S in an equiluminant plane and a third axis corresponding to luminance contrast normalized to limits at ±1. In this space, the orange and purple contours were at azimuths of 45 and 320 deg, respectively. A series of 10 stimuli was generated by varying the luminance between the base level of the interior contour and 32.8 cd/m

^{2}. These were chosen to be equally spaced in cardinal direction space yielding 10 elevations between 0 and 0.72 with 0 defined as the base level. The purple elevation was fixed at −0.8.

*a*,

*b*,

*c*) of WCE patterns with three elevations chosen from a series of 10, with

*a*<

*b*<

*c*, as in Figure 1d. Stimulus

*b*was always the upper stimulus in the middle, and stimuli

*a*and

*c*were randomly positioned on the left or right. A fixation cross appeared in the center of the screen. Each WCE pattern was offset 1.62 deg vertically from the fixation cross. The two bottom patterns were offset 2.28 deg laterally.

*d*′ from Signal Detection Theory, i.e., in units of the standard deviation of the internal noise (see next section). In a second experiment, each observer's average scale for the test stimuli was used to estimate stimulus pairs for which the elevation difference corresponded to perceptual scale differences (or

*d*′ values) of 0.5, 1, and 2. Then, sessions were run in which these stimulus pairs were presented in random order with the left/right positions of the stimuli also randomized across trials as well (Figure 1e). Interleaved with the test stimuli were an equal number of braided control stimuli also corresponding to the same elevation differences of the orange component of the contours. The observer's task was to rate on a scale of 1–5 whether the stimulus with a stronger orange filling-in was on the left or right, with 1 indicating that it was most likely on the left and 5 most likely on the right. A session consisted of 120 trials (20 presentations of the 3 stimulus differences for test and control). The number of sessions completed by each observer was: FD—16, SF—8, NM—5, and BA—6. For each condition, the proportions of hits and false alarms were estimated from the distributions of the cumulative ratings of the observers with respect to the position of the stimulus with the higher luminance elevation of the orange component. Receiver Operating Characteristics (ROC) were fit to the ratings with a cumulative odds, ordinal regression model (McCullagh & Nelder, 1989; see 2) using functions from the package ordinal (Christensen, 2010) with the software R and used to estimate

*d*′ under an equal-variance, Gaussian assumption.

*ϕ*

_{ i },

*i*= 1, …, 10, and suppose that each physical stimulus level evokes a perceptual filling-in response

*ψ*

_{ i },

*i*= 1, …, 10. For simplification, we refer to the stimulus triple (

*ϕ*

_{ a },

*ϕ*

_{ b },

*ϕ*

_{ c }) as (

*a*,

*b*,

*c*). Given the stimulus triple (

*a*,

*b*,

*c*) with

*ϕ*

_{ a }<

*ϕ*

_{ b }<

*ϕ*

_{ c }, we assume that the observer judges the difference between the pair (

*a*,

*b*) larger than that between (

*b*,

*c*). When

*b*is close to being equally different from

*a*and

*c*. To incorporate this inherent variability of human responses, we suppose that the decision variable, Δ

_{ abc }, is contaminated by internal noise so that the observer chooses the first interval exactly when

*ε*

_{ abc }are independent and identically distributed, normal variables with

*μ*= 0 and variance = 4

*σ*

^{2}. This is an equal-variance, Gaussian signal detection model. The coefficient of 4 on the variance parameterizes the estimated scale values so that the variance of the response for each stimulus level is equal to

*σ*

^{2}. While the stimulus

*b*appears only once, its weight is twice as large because it participates in both comparisons in the decision variable (see 1 for further details). As a result, the estimated scale values

_{ i }/

*σ*are distributed as normal variables with

*σ*

^{2}= 1 and are, therefore, in the same units as the sensitivity measure

*d*′ from Signal Detection Theory (Green & Swets, 1966). We test this equivalence in the second experiment.

*F*

_{8,24}= 0.904,

*p*= 0.529). These results support that the observers responded on the basis of the perceived color of the filled-in region and did not simply judge the appearance of the orange contour.

*p*physical values, only

*p*− 1 scale values are estimated (Knoblauch & Maloney, 2008; Maloney & Yang, 2003). The estimated scales are unique only up to a linear transformation, i.e., addition of a constant or multiplication by a factor (Maloney & Yang, 2003; see 1 and also see Falmagne, 1985; Krantz et al., 1971; Roberts, 1985 for more complete discussion of the axioms and proofs underlying the interval representation and uniqueness of difference scaling). Here, the estimates have been parameterized so that the scale values are expressed in units of the estimated internal standard deviation of the response. Thus, the results indicate a 10- to 15-fold increase of the effect with respect to the internal noise for a roughly 3-fold change in the luminance of the test contour.

*d*

_{ i }, of the

*i*th data point is defined as

*y*

_{ i }) is the expected or fitted value of the

*i*th data point and

*y*

_{ i }is the

*i*th binary response. The sum of the squared deviance residuals is minus twice the log likelihood, the criterion optimized in fitting the model.

*p*-values shown in the upper right of each graph indicate the proportion of bootstrap samples with fewer runs. All of the

*p*-values are greater than 0.05 providing no evidence to reject the fits based on patterns in the residuals.

Observer | 0 | 0.5 | 1 | 2 |
---|---|---|---|---|

FD | 0.0 | 0.041 | 0.083 | 0.124 |

SF | 0.073 | 0.109 | 0.146 | 0.219 |

NM | 0.233 | 0.265 | 0.298 | 0.364 |

BA | 0.111 | 0.138 | 0.165 | 0.219 |

*P*

_{H}versus probability of a false alarm,

*P*

_{FA}) for the four observers for stimulus differences predicted to yield values of

*d*′ equal to 0.5, 1, and 2, according to the difference scales in Figure 3. The purple points indicate the results for the control stimuli and the orange for the test. The points were obtained by cumulating the ratings conditional on the higher elevation stimulus being on the right (Hit) or left (False Alarm) across each of the 5 levels for each of the two conditions. The gray curves (solid for the test, dashed for the control) indicate the expected ROC contours for an equal-variance Gaussian model if the observer's sensitivity is equal to the nominal value, i.e., predicted by the perceptual scales obtained by MLDS. In the case of the control stimuli, the points are closer to the positive diagonal, indicated by the dashed lines, corresponding to

*d*′ = 0 for most of the conditions. In two cases (observers SF and NM at nominal

*d*′ = 0.5), the fitted curves bow inward, indicating a negative value of

*d*′. For this weak stimulus condition, these observers may have been judging the contour rather than the fill-in color and choosing the stimulus with the interior orange contour that displayed a greater contrast with the background. Recall that the fill-in color typically is determined by the contour with the lower contrast with the background. For the test conditions, the points fall close to the predicted curves. The black curves (solid for the test, dashed for the control) indicate the best fit equal-variance, Gaussian model (see 2 for discussion of the fitting procedure and a test of the equal-variance assumption). The estimated values of

*d*′ and the estimated standard errors are provided in Table 2.

Observer | Control | Test | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

d _{0.5}′ | SE _{0.5} | d _{1}′ | SE _{1} | d _{2}′ | SE _{2} | d _{0.5}′ | SE _{0.5} | d _{1}′ | SE _{1} | d _{2}′ | SE _{2} | |

FD | −0.01 | 0.16 | −0.02 | 0.13 | 0.15 | 0.13 | 0.44 | 0.15 | 0.95 | 0.12 | 2.25 | 0.16 |

SF | −0.53 | 0.17 | −0.04 | 0.18 | 0.12 | 0.18 | 0.57 | 0.17 | 1.63 | 0.20 | 2.16 | 0.23 |

NM | −0.67 | 0.23 | 0.07 | 0.22 | 0.09 | 0.21 | 0.21 | 0.20 | 1.20 | 0.24 | 2.04 | 0.27 |

BA | 0.22 | 0.21 | 0.51 | 0.22 | −0.23 | 0.22 | 0.34 | 0.20 | 0.89 | 0.21 | 1.38 | 0.22 |

Average | −0.25 | 0.21 | 0.13 | 0.13 | 0.03 | 0.09 | 0.40 | 0.07 | 1.17 | 0.17 | 1.96 | 0.20 |

*d*′ estimated from the individual sessions plotted against the nominal values predicted from the individual difference scales of Figure 3 for each of the four observers. The results for the test conditions are plotted as orange points and the control as purple. The agreement of the data and prediction is very good for observer FD. The other 3 observers' data display more variability but are, also, based on fewer trials. There is a tendency for two of the observers (SF and NM) to select the control stimulus in the control condition for the lowest nominal value of

*d*′, as indicated above. In addition, for observer BA, the lowest two control points are closer to the test prediction than the control. This could indicate that she based her judgments on the luminance of the orange component of the contour rather than the fill-in color although her behavior shows better separation of the two conditions when the stimulus difference is greater. Thus, we cannot exclude that this observer based judgments on the contour for the weaker two stimuli. Globally, however, the test conditions tend to fall near a line of unit slope and the control near a line of zero slope. The error bars approximate 95% confidence intervals for the individual points but have not been corrected for multiple comparisons, which would make them larger. Except in the cases indicated, these results, again, support the hypothesis that observers' judgments were based on the perceived color of the interior region of the stimulus and not of the physical contours.

*d*′ values were modeled using a repeated measures analysis of covariance with Condition (Control or Test) taken as a two-level factor and the nominal value of

*d*′ as a covariate. A significant difference in the slopes across conditions was obtained (

*F*

_{1,3}= 48.15;

*p*= 1.8

*E*− 4) with estimated slopes (

*SE*) for the control and test conditions equal to 0.146 (0.157) and 1.007 (0.157), respectively. The average results with error bars on each point indicating plus and minus twice the standard error are shown in the fifth panel and indicate good agreement between the experimental and predicted values.

*intervals*between stimuli rather than simply ordering the stimuli themselves.

*d*′ values. If the judgments were based only on the interior region, then observers would have performed at chance. It is not surprising that in a discrimination experiment, observers might exploit alternative cues, if present.

*a*,

*b*) will, on some trials, be judged

*a*>

*b*and on others

*b*>

*a*. This typically means that many closely spaced stimuli must be scaled, and the number of trials needed to estimate the scale would be much greater than in MLDS. An underlying scale can be derived from discrimination data by measuring discrimination steps over a large range of the stimulus dimension. This is rarely done as the task is heroic. Instead, an underlying response function is typically assumed and the data fit by estimating its free parameters (for example, see Foley & Legge, 1981; Legge & Foley, 1980; Smithson, Henning, MacLeod, & Stockman, 2009 for recent examples). The scale's interpretation as a perceptual scale depends on the assumption that equally discriminable differences are perceptually equal. Such an assumption is unnecessary for difference scaling since the task involves comparison of stimulus intervals directly.

*p*stimuli,

*ϕ*

_{ i }, distributed along a physical continuum with

*ϕ*

_{1}<

*ϕ*

_{2}< … <

*ϕ*

_{ p }. For the method of triads, the observer is presented with triples, (

*ϕ*

_{ a },

*ϕ*

_{ b },

*ϕ*

_{ c }) on each trial and judges whether the interval (

*ϕ*

_{ a },

*ϕ*

_{ b }) is smaller than (

*ϕ*

_{ b },

*ϕ*

_{ c }). Equivalently, the observer chooses whether

*ϕ*

_{ b }is more similar to

*ϕ*

_{ a }or

*ϕ*

_{ c }. We assume that on a given trial, the choice depends on a decision variable, Δ, contaminated by noise and that the choice is the pair (

*ϕ*

_{ a },

*ϕ*

_{ b }) precisely when

*ψ*

_{ i }are internal responses to the stimuli and

*ε*

_{ i }is distributed as a Gaussian random variable with

*μ*= 0 and variance = 4

*σ*

^{2}. The coefficient of 4 follows from the rule for calculating the variance of a linear combination of independent random variables (Chung, 1974), given the weights of each response in the decision variable and the assumptions of equal variance and independence of the responses to each stimulus. We would like to estimate the values

*ψ*

_{ i },

*i*= 1,

*p*, and

*σ*

^{2}. Maloney and Yang (2003) proposed to estimate the values by directly maximizing the likelihood:

*ψ*

_{ i }to estimate,

**R**is the vector of observer responses (0 for choosing the left, 1 for the right) on each trial, Φ is the cumulative normal distribution function, and

*δ*

_{ i }is defined in Equation A1. The choice of coefficients is unique only up to a linear transformation so Maloney and Yang (2003) constrained the values of

*ψ*

_{1}and

*ψ*

_{ p }to 0 and 1, respectively, thus, leaving

*p*− 1 parameters (

*ψ*

_{2}, …,

*ψ*

_{ p−1},

*σ*) to estimate by maximizing Equation A2 over all

*n*trials.

*Y*is the probability of choosing the left or right stimulus and is related to a linear predictor,

*X**β*by a link function,

*g*, here Φ

^{−1}.

**is an**

*X**n*×

*p*design matrix with one column for each stimulus level and one row for each trial. Each row has 3 non-zero elements taking on the values (−1, 2, −1) in the columns corresponding to the stimuli on that trial and the weights in Equation A1. To render the model identifiable, however, the first column of

**is dropped, which fixes the estimate of the first coefficient to 0.**

*X**β*is the

*p*− 1 vector (

*β*

_{2}, …,

*β*

_{ p }), of scale values to be estimated. Once the design matrix is set up with the response vector, the solution can be obtained with many off-the-shelf software packages that contain a function for performing a GLM.

*β*

_{ i }.

*g*is a link function and

*Y*the rating whose probability being less than the

*k*th category boundary is conditional on the explanatory variable

*x*. The categories here refer to the ratings. On the right side of the model,

*X**β*is a linear predictor and

*θ*

_{ k }is a category (rating) dependent intercept that is constrained such that

*θ*

_{1|2}<

*θ*

_{2|3}< ··· <

*θ*

_{ p−1|p }; it provides an estimate of the rating boundaries on the scale of the linear predictor. The minus sign attached to the linear predictor is a convention that results in probability increasing with increased rating and increase in the linear predictor. The model matrix,

**, corresponds to that of a two-level factor indicating whether the higher luminance orange contour stimulus was presented on the left or right. When the link function is chosen to be the inverse of a Gaussian cumulative distribution function (or probit), this is equivalent to fitting the data with the equal-variance Gaussian signal detection model. We performed the fits of this model to the rating data by maximum likelihood using the clm function in the package ordinal in the software R (Christensen, 2010; R Development Core Team, 2011).**

*X**σ*

_{ i }is a scale parameter that depends on the value of

*x*, i.e., whether the test condition was on the left (0) or right (1). The ratio

*σ*

_{1}/

*σ*

_{0}is the inverse of the slope of the zROC contour, i.e., the contour obtained when the two axes of the ROC plot are transformed by the inverse of the Gaussian cumulative distribution function. This model and the equal-variance model, Equation B1, form a nested pair, and so the equal-variance assumption can be tested with a likelihood ratio test. The two models differ by 1 degree of freedom.

*χ*

_{1}

^{2}values (Column 4) and the unadjusted

*p*-values (Column 5) for the likelihood ratio tests of the equal-variance model for each observer and condition (Columns 1–3). Three of the 24 tests display

*p*-values between 0.01 and 0.05. Bonferroni's adjustment of the

*p*-values for multiple testing is given by

*n*is the number of tests. Adjusting the

*p*-values with this conservative correction eliminates all of the values less than 0.05 (Column 6). The False Discovery Rate provides a more powerful test, in that it finds more significant results, at the cost of a less conservative criterion (Bretz, Hothorn, & Westfall, 2010). The adjusted

*p*-values are obtained by initially computing

*p*-values have been ordered, such that

*p*

_{ i }is the

*i*th in the sequence. The corrected values are given, then, by taking the minimum cumulatively through the reversed ordered sequence of

*q*

_{ f }values. However, even with this less conservative correction none of the values attains significance (Column 7). Thus, we find no evidence to reject the equal-variance model in favor of the unequal-variance alternative.

Observer | Condition | Nominal d′ | χ _{1} ^{2} | p | Bonferroni | FDR |
---|---|---|---|---|---|---|

FD | Control | 0.50 | 0.53 | 0.47 | 1.00 | 0.90 |

FD | Control | 1.00 | 1.10 | 0.29 | 1.00 | 0.70 |

FD | Control | 2.00 | 0.00 | 0.96 | 1.00 | 1.00 |

FD | Test | 0.50 | 0.15 | 0.70 | 1.00 | 0.96 |

FD | Test | 1.00 | 2.74 | 0.10 | 1.00 | 0.47 |

FD | Test | 2.00 | 0.08 | 0.77 | 1.00 | 0.96 |

SF | Control | 0.50 | 0.01 | 0.92 | 1.00 | 1.00 |

SF | Control | 1.00 | 0.33 | 0.56 | 1.00 | 0.90 |

SF | Control | 2.00 | 3.18 | 0.07 | 1.00 | 0.45 |

SF | Test | 0.50 | 4.19 | 0.04 | 0.97 | 0.33 |

SF | Test | 1.00 | 1.11 | 0.29 | 1.00 | 0.70 |

SF | Test | 2.00 | 0.06 | 0.80 | 1.00 | 0.96 |

NM | Control | 0.50 | 1.88 | 0.17 | 1.00 | 0.68 |

NM | Control | 1.00 | 1.17 | 0.28 | 1.00 | 0.70 |

NM | Control | 2.00 | 0.12 | 0.73 | 1.00 | 0.96 |

NM | Test | 0.50 | 0.00 | 1.00 | 1.00 | 1.00 |

NM | Test | 1.00 | 0.79 | 0.37 | 1.00 | 0.81 |

NM | Test | 2.00 | 0.38 | 0.54 | 1.00 | 0.90 |

BA | Control | 0.50 | 0.33 | 0.56 | 1.00 | 0.90 |

BA | Control | 1.00 | 0.01 | 0.91 | 1.00 | 1.00 |

BA | Control | 2.00 | 4.34 | 0.04 | 0.90 | 0.33 |

BA | Test | 0.50 | 1.17 | 0.28 | 1.00 | 0.70 |

BA | Test | 1.00 | 0.07 | 0.79 | 1.00 | 0.96 |

BA | Test | 2.00 | 6.63 | 0.01 | 0.24 | 0.24 |