Free
Article  |   March 2012
A common signal detection model accounts for both perception and discrimination of the watercolor effect
Author Affiliations
Journal of Vision March 2012, Vol.12, 19. doi:10.1167/12.3.19
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to Subscribers Only
      Sign In or Create an Account ×
    • Get Citation

      Frédéric Devinck, Kenneth Knoblauch; A common signal detection model accounts for both perception and discrimination of the watercolor effect. Journal of Vision 2012;12(3):19. doi: 10.1167/12.3.19.

      Download citation file:


      © 2016 Association for Research in Vision and Ophthalmology.

      ×
  • Supplements
Abstract

Establishing the relation between perception and discrimination is a fundamental objective in psychophysics, with the goal of characterizing the neural mechanisms mediating perception. Here, we show that a procedure for estimating a perceptual scale based on a signal detection model also predicts discrimination performance. We use a recently developed procedure, Maximum Likelihood Difference Scaling (MLDS), to measure the perceptual strength of a long-range, color, filling-in phenomenon, the Watercolor Effect (WCE), as a function of the luminance ratio between the two components of its generating contour. MLDS is based on an equal-variance, Gaussian, signal detection model and yields a perceptual scale with interval properties. The strength of the fill-in percept increased 10–15 times the estimate of the internal noise level for a 3-fold increase in the luminance ratio. Each observer's estimated scale predicted discrimination performance in a subsequent paired-comparison task. A common signal detection model accounts for both the appearance and discrimination data. Since signal detection theory provides a common metric for relating discrimination performance and neural response, the results have implications for comparing perceptual and neural response functions.

Introduction
Long-range visual phenomena have received considerable attention because of the possibility that they can elucidate high-level integrative processes in the visual system (Spillmann & Werner, 1996). The watercolor effect (WCE) is a particular example, in which local contrast across a contour produces a uniform filling-in of color across a large region of the visual field (Pinna, 1987; Pinna, Brelstaff, & Spillmann, 2001). Figure 1a shows an example of the WCE in which a light orange interior contour added to a darker purple contour (see stimulus detail, Figure 1b) generates a uniform orange region. In general, the color of the contour with the lower luminance contrast relative to the background generates a more salient (saturated) fill-in color than the contour with higher contrast (Devinck, Delahunt, Hardy, Spillmann, & Werner, 2005; Pinna et al., 2001). In Figure 1a, the orange contour has a lower luminance contrast with the background and the filling-in of the orange color is more apparent (left and right regions) than that of the purple (central region). Although the phenomenon occurs for equiluminant contours, the effect appears stronger with an increase in the luminance difference between the interior and exterior contours (Devinck et al., 2005). 
Figure 1
 
(a) Example of the watercolor effect. Three adjacent oblong regions (left, center, right) are defined by a wiggly double contour. The left and right flanking regions are bounded by an exterior dark purple and an interior lighter orange contour. The regions bounded by the contour appear to be filled-in with a light orange color. In the central region in which the contour colors are reversed, no filling-in occurs. (b) A schematic detail of the contour that produces the WCE. (c) A detail of the contour for the control condition in which the two colors are braided and no apparent filling-in occurs. (d) Triad of stimuli for a trial from a difference scaling experiment. (e) Stimulus configuration for paired comparisons.
Figure 1
 
(a) Example of the watercolor effect. Three adjacent oblong regions (left, center, right) are defined by a wiggly double contour. The left and right flanking regions are bounded by an exterior dark purple and an interior lighter orange contour. The regions bounded by the contour appear to be filled-in with a light orange color. In the central region in which the contour colors are reversed, no filling-in occurs. (b) A schematic detail of the contour that produces the WCE. (c) A detail of the contour for the control condition in which the two colors are braided and no apparent filling-in occurs. (d) Triad of stimuli for a trial from a difference scaling experiment. (e) Stimulus configuration for paired comparisons.
While the fill-in color is perceptually salient, the strength of the WCE has proven difficult to quantify. Previous attempts have used, for example, matching (Devinck et al., 2005; Devinck, Spillmann, & Werner, 2006; von der Heydt & Pierson, 2006), hue cancellation (Devinck, Delahunt, Hardy, Spillmann, & Werner, 2006; Devinck, Hardy, Delahunt, Spillmann, & Werner, 2006; Devinck & Spillmann, 2009), and paired comparisons (Cao, Yazdanbakhsh, & Mingolla, 2011). Matching and cancellation measure the phenomenon by specifying stimuli of equivalent appearance or strength to the fill-in color, but the results show large within and between individual variability (Devinck et al., 2005; von der Heydt & Pierson, 2006). Paired comparisons do allow for estimation of stimulus strength for small differences in terms of differences in the probability of discriminating the effect. For large differences in appearance, however, only ordinal information is available. 
Here, we measure the relation between the WCE and the luminance ratio between the two inducing contours, using difference scaling, a psychophysical method that permits estimation of interval scales for large stimulus differences, i.e., equal scale differences correspond to equal perceptual differences (Krantz, Luce, Suppes, & Tversky, 1971). Originally, this method involves presenting the observer with an ordered quadruple or two (non-overlapping) pairs of stimuli distributed along a physical continuum (Schneider, Parker, & Stein, 1974). The observer judges which pair displays the greatest difference. Maloney and Yang (2003) developed an equal-variance, Gaussian, signal detection model of the task that they fit by maximum likelihood to estimate scale values that best predict the observer's choices. Their approach is referred to as Maximum Likelihood Difference Scaling (MLDS) and is explained in detail in Knoblauch and Maloney (2008). 
MLDS has been successfully applied to estimate perceptual scales along a diverse set of physical continua, including color differences (Brown, Lindsey, & Guckes, 2011; Lindsey et al., 2010; Maloney & Yang, 2003), surface glossiness (Obein, Knoblauch, & Viénot, 2004; Emrith, Chantler, Green, Maloney, & Clarke, 2010), transparency of thick materials (Fleming, Jäkel, & Maloney, 2011), image quality (Charrier, Maloney, Cherifi, & Knoblauch, 2007), distortion of faces (Rhodes, Maloney, Turner, & Ewing, 2007), and correlation in scatterplots (Knoblauch & Maloney, 2008). It has also been used to aid in the localization of neural processes common to perceptual decisions (Yang, Szeverenyi, & Ts'o, 2008). 
We use a variant of this procedure that employs ordered triads of stimuli to quantify the effect of the luminance ratio on the strength of the WCE (Kingdom & Prins, 2009; Knoblauch & Maloney, in press). The observer still judges intervals, but now the high endpoint of one pair is the same as the low endpoint of the other. An example of a triad is shown in Figure 1d. The observer can simply judge whether the stimulus on top is more similar or closer in appearance to the one on the left or the right. Intuitively, a stimulus value that leads to equal probabilities of making this choice is considered to be equally distant from the two bottom stimuli on a perceptual scale. 
The signal detection model for MLDS applies, as well, to triads. While the task is based on judgments of large stimulus differences, the model also predicts discrimination for small ones, although this assertion has never been tested. We evaluated whether difference scales were equally valid for small differences by testing whether the estimated scale values predict discriminability for comparisons between two stimuli. The results support a common signal detection model for both large and small stimulus differences. 
Methods
Observers
Four observers participated in the experiments, each returning for several sessions to complete all conditions. Three of the observers remained naive over the course of the experiment and one was an author. All observers had normal color vision as assessed by a Farnsworth Panel D15 and normal or corrected-to-normal visual acuity. Ages ranged from 23 to 36 (mean age (SD) = 27.5 (9.8) years), and both genders were equally represented in the sample. 
All experiments were performed in accordance with the principles of the Declaration of Helsinki for the protection of human subjects. 
Apparatus
Stimuli were presented on a 33-cm CRT video monitor (FlexScan F77S) driven by a PC computer (2.26 GHz using a 10-bit video card). Experiments were performed in a dark room. The experimental software was written in MATLAB 7.9 (http://www.mathworks.com/) using the Psychophysics Toolbox extensions (Brainard, 1997; Pelli, 1997). The monitor was calibrated using a Minolta colorimeter (CS 100 Chroma Meter) and procedures described in Brainard, Pelli, and Robson (2002). Observer position was stabilized by a chin rest so that the screen was viewed binocularly at a distance of 180 cm. 
Stimuli
The stimuli to generate a WCE were oblong regions defined by sinusoidally wavy contours. The contours undulated at a frequency of 1.5 c/deg with a peak-to-trough amplitude of 0.83 deg. The region was 3.56 deg high and 1.62 deg wide. The width of the contour was 1.6 min and was divided into inner orange and outer purple contours of equal widths for the test stimuli (Figure 1b). Control stimuli were generated by interleaving or braiding the orange and purple contours (Figure 1c). These appeared to produce no WCE. 
The stimuli were presented on a background of 47 cd/m2 with CIE 1931 chromaticity coordinates (0.320, 0.310). The purple exterior contour had a luminance of 1.56 cd/m2 with chromaticity coordinates (0.316, 0.182). The orange exterior contour was set initially at 12.1 cd/m2 with chromaticity coordinates (0.470, 0.337). We also specified the stimuli in a cardinal direction space (Derrington, Krauskopf, & Lennie, 1984) with axes LM, S in an equiluminant plane and a third axis corresponding to luminance contrast normalized to limits at ±1. In this space, the orange and purple contours were at azimuths of 45 and 320 deg, respectively. A series of 10 stimuli was generated by varying the luminance between the base level of the interior contour and 32.8 cd/m2. These were chosen to be equally spaced in cardinal direction space yielding 10 elevations between 0 and 0.72 with 0 defined as the base level. The purple elevation was fixed at −0.8. 
Procedure
Observers adapted to the screen for 2 min before starting the experiment. A trial consisted of the presentation of a triad, (a, b, c) of WCE patterns with three elevations chosen from a series of 10, with a < b < c, as in Figure 1d. Stimulus b was always the upper stimulus in the middle, and stimuli a and c were randomly positioned on the left or right. A fixation cross appeared in the center of the screen. Each WCE pattern was offset 1.62 deg vertically from the fixation cross. The two bottom patterns were offset 2.28 deg laterally. 
A session consisted of the random presentation of the 10!/(3!7!) = 120 unique triads from the series of 10 elevations of the orange contour. At the beginning of each trial, the observer fixated the cross. With presentation of a triad, he was instructed to fixate each pattern until he could choose which of the two bottom patterns (left or right) was most similar to the upper pattern with respect to the color of its interior region. The observer's response initiated the next trial. A session required no more than 5 min so that the average duration of a trial did not exceed 2.5 s. 
A free viewing procedure was employed to ensure that the observer's judgments were based on foveal views of the stimulus and not comparisons across different peripheral, retinal regions. A temporal presentation was ruled out to avoid the possibility of memory effects. Free viewing has been used to study the WCE in numerous previous studies (Devinck et al., 2005; Devinck, Delahunt et al., 2006; Devinck, Hardy et al., 2006; Devinck & Spillmann, 2009; Devinck, Spillmann et al., 2006; Pinna, 1987; Pinna et al., 2001). von der Heydt and Pierson (2006) used a duration of 1 s, not too much shorter than our estimated average duration, and Cao et al. (2011) used 150 ms to study an achromatic variant of the WCE. 
To control for responding simply on the basis of the contour instead of the filled-in appearance, sessions were also run with just control stimuli (see Figure 1c). Each observer performed 8 sessions with the test stimuli and 5 sessions with the control stimuli. Observer FD repeated the 8 sessions, providing a total of 16 sessions. 
A difference scale was estimated from each session using functions from the MLDS package (Knoblauch & Maloney, 2008) in the open source software R (R Development Core Team, 2011, http://www.r-project.org/), and the scales from the individual sessions were averaged to obtain means and confidence intervals for the perceptual scale. 
When properly parameterized, the estimated scale values of an MLDS experiment are in the same units as the measure d′ from Signal Detection Theory, i.e., in units of the standard deviation of the internal noise (see next section). In a second experiment, each observer's average scale for the test stimuli was used to estimate stimulus pairs for which the elevation difference corresponded to perceptual scale differences (or d′ values) of 0.5, 1, and 2. Then, sessions were run in which these stimulus pairs were presented in random order with the left/right positions of the stimuli also randomized across trials as well (Figure 1e). Interleaved with the test stimuli were an equal number of braided control stimuli also corresponding to the same elevation differences of the orange component of the contours. The observer's task was to rate on a scale of 1–5 whether the stimulus with a stronger orange filling-in was on the left or right, with 1 indicating that it was most likely on the left and 5 most likely on the right. A session consisted of 120 trials (20 presentations of the 3 stimulus differences for test and control). The number of sessions completed by each observer was: FD—16, SF—8, NM—5, and BA—6. For each condition, the proportions of hits and false alarms were estimated from the distributions of the cumulative ratings of the observers with respect to the position of the stimulus with the higher luminance elevation of the orange component. Receiver Operating Characteristics (ROC) were fit to the ratings with a cumulative odds, ordinal regression model (McCullagh & Nelder, 1989; see 2) using functions from the package ordinal (Christensen, 2010) with the software R and used to estimate d′ under an equal-variance, Gaussian assumption. 
Model
We represent the 10 elevations of the stimulus by variables ϕ i , i = 1, …, 10, and suppose that each physical stimulus level evokes a perceptual filling-in response ψ i , i = 1, …, 10. For simplification, we refer to the stimulus triple (ϕ a , ϕ b , ϕ c ) as (a, b, c). Given the stimulus triple (a, b, c) with ϕ a < ϕ b < ϕ c , we assume that the observer judges the difference between the pair (a, b) larger than that between (b, c). When 
ψ b ψ a > ψ c ψ b
(1)
or 
δ a b c = ( ψ b ψ a ) ( ψ c ψ b ) = 2 ψ b ψ a ψ c > 0 .
(2)
It is unlikely, however, that the observer would make the same response on every trial when the stimulus b is close to being equally different from a and c. To incorporate this inherent variability of human responses, we suppose that the decision variable, Δ abc , is contaminated by internal noise so that the observer chooses the first interval exactly when 
Δ a b c = δ a b c + ε a b c > 0 ,
(3)
where the ε abc are independent and identically distributed, normal variables with μ = 0 and variance = 4σ 2. This is an equal-variance, Gaussian signal detection model. The coefficient of 4 on the variance parameterizes the estimated scale values so that the variance of the response for each stimulus level is equal to σ 2. While the stimulus b appears only once, its weight is twice as large because it participates in both comparisons in the decision variable (see 1 for further details). As a result, the estimated scale values
ψ ^
i /σ are distributed as normal variables with σ 2 = 1 and are, therefore, in the same units as the sensitivity measure d′ from Signal Detection Theory (Green & Swets, 1966). We test this equivalence in the second experiment. 
The estimated scale values that best predict the observer's responses are fitted by maximum likelihood (Maloney & Yang, 2003), and since the decision variable is linear in the estimated coefficients, the fitting procedure is simply implemented as a Generalized Linear Model (McCullagh & Nelder, 1989) with a binomial family (Knoblauch & Maloney, 2008; see 1). 
Results
Difference scales
Subsequent to a small number of practice trials, most observers described the difference scaling task as easy and performed the task rapidly. Despite the growing number of studies published that have used this technique, we have seen no data on the test–retest reliability of the technique. Figure 2 shows all of the scales estimated from the individual sessions for each condition and for each observer as a function of the luminance elevation of the orange contour. The graphs indicate that despite session-to-session variability in the heights of the curves, which correspond to across-session sensitivity differences, the scales have similar shapes, and the data tend to cluster around a common value for each observer. We have verified through simulation that the session-to-session variability observed corresponds approximately to that which would be expected on the basis of a binomial variable with the probabilities estimated from a single session. The exceptional outlying curves that three of the observers display were neither from the first nor the last sessions, so these are unlikely to be the result of learning or practice effects. Implicitly, the higher the curve, the steeper the underlying psychometric function involved in the interval discrimination, so these occasional outliers are similar to the occasional steep psychometric function that one might obtain when performing many discrimination experiments. 
Figure 2
 
Perceptual scales estimated from each session for each observer for the test (top) and control (bottom) conditions. Different sessions are indicated in different colors.
Figure 2
 
Perceptual scales estimated from each session for each observer for the test (top) and control (bottom) conditions. Different sessions are indicated in different colors.
Figure 3 shows the average estimated perceptual scale values with 95% confidence intervals as a function of the luminance elevation of the orange, interior contour. The orange points of Figure 3 correspond to the scales obtained with the test stimuli and the purple with control stimuli in which the orange and purple contours were braided, thereby attenuating the phenomenon (see Figure 1c). The estimated scales for the control condition are nearly flat, i.e., independent of the luminance elevation. A repeated measures analysis of variance provided no evidence for the significance of fluctuations in the average levels of the control condition (F 8,24 = 0.904, p = 0.529). These results support that the observers responded on the basis of the perceived color of the filled-in region and did not simply judge the appearance of the orange contour. 
Figure 3
 
Average perceptual scales as a function of luminance elevation of the orange contour for four observers. The orange symbols were obtained with the test pattern and the purple with the braided control. The orange symbols are the means of 8 sessions (16 for FD) and the purple symbols are the means of 5. The error bars are 95% confidence intervals.
Figure 3
 
Average perceptual scales as a function of luminance elevation of the orange contour for four observers. The orange symbols were obtained with the test pattern and the purple with the braided control. The orange symbols are the means of 8 sessions (16 for FD) and the purple symbols are the means of 5. The error bars are 95% confidence intervals.
Over the middle range of elevations, the strength of the WCE increases nearly linearly with luminance elevation in a qualitatively similar manner for all 4 observers. Three of the observers show a maximum value near 10, while FD attains a maximum value 50% greater, perhaps reflecting his greater experience as a psychophysical observer or with these stimuli. The results confirm previous findings obtained with hue matching and cancellation that the strength of the WCE increases with the luminance ratio between the interior and exterior contours (Devinck et al., 2005). The scales obtained by MLDS, however, show considerably less variability than those obtained with these other techniques. 
The perceptual scales are fixed at 0 for the lowest stimulus level by construction. Thus, for p physical values, only p − 1 scale values are estimated (Knoblauch & Maloney, 2008; Maloney & Yang, 2003). The estimated scales are unique only up to a linear transformation, i.e., addition of a constant or multiplication by a factor (Maloney & Yang, 2003; see 1 and also see Falmagne, 1985; Krantz et al., 1971; Roberts, 1985 for more complete discussion of the axioms and proofs underlying the interval representation and uniqueness of difference scaling). Here, the estimates have been parameterized so that the scale values are expressed in units of the estimated internal standard deviation of the response. Thus, the results indicate a 10- to 15-fold increase of the effect with respect to the internal noise for a roughly 3-fold change in the luminance of the test contour. 
Model goodness of fit
In this section, we analyze the goodness of fit of the decision model on which the scale estimates are based. If the scale estimates are to be interpreted as representative of the observers' perceived differences in the stimuli, we should expect that the decision model provides a good description of the observers' choices over the experiment. 
Linear models are typically evaluated with respect to the size, homogeneity, and patterning of the residuals. In the case of binary data, such residuals are difficult to judge because the data take only the values of 0 and 1, but the fitted values are the expected proportion of correct responses. The residuals of a linear model actually arise from the signed square roots of the components of the log likelihood. Extending this idea to GLMs gives rise to the definition of deviance residuals that can be used for goodness-of-fit evaluations for these models (McCullagh & Nelder, 1989). In the case of binary data, the deviance residual, d i , of the ith data point is defined as 
d i = 2 s i g n ( y i E ( y i ) ) | y i log ( E ( y i ) ) + ( 1 y i ) log ( 1 E ( y i ) ) | 0.5 ,
(4)
where
E
(y i ) is the expected or fitted value of the ith data point and y i is the ith binary response. The sum of the squared deviance residuals is minus twice the log likelihood, the criterion optimized in fitting the model. 
Wood (2006) describes two diagnostic plots to visualize the adequacy of a binary GLM model based on analyses of deviance residuals. The first compares the empirical cumulative distribution function (ecdf) of the deviance residuals to a bootstrapped confidence envelope of the curve (Efron & Tibshirani, 1993). Bootstrap samples are obtained by repeatedly generating new binary responses on the basis of the fitted values and refitting the model to the new responses. The second examines the number of runs in the sign of the residuals that have been sorted on the basis of the order of the fitted values and compares this number with those expected on the basis of independence in the residuals, again using resampling based on the model's fitted values. Further description of these analyses and software to perform them can be found in Knoblauch and Maloney (2008). 
Figure 4 shows both of these diagnostics plots for each of the four observers of the study. The graphs for each observer are based on 1000 bootstrap simulations. On the left are the ecfds for each observer indicated as black points. The thin blue lines demarcate the envelope for the 95% confidence interval. For three of the observers, the ecdfs fall within the envelopes. For observer BA, some of the points fall outside, but the deviations do not appear severe. For all of the observers, most of the residuals fall between ±2, again indicating a reasonable description of the data by the model. 
Figure 4
 
For each observer, a pair of diagnostic plots is shown based on the distribution of the deviance residuals of the fitted model. On the left of each pair, the graphs show the empirical cumulative distribution function of the residuals (black points) and a 95% confidence interval based on 1000 bootstrap simulations. On the right, histograms of the number of runs in the residuals from 1000 bootstrap simulations are shown with the number of runs from the observed data indicated as a vertical line. The proportion of bootstrap samples with fewer runs is indicated in the upper right of each graph.
Figure 4
 
For each observer, a pair of diagnostic plots is shown based on the distribution of the deviance residuals of the fitted model. On the left of each pair, the graphs show the empirical cumulative distribution function of the residuals (black points) and a 95% confidence interval based on 1000 bootstrap simulations. On the right, histograms of the number of runs in the residuals from 1000 bootstrap simulations are shown with the number of runs from the observed data indicated as a vertical line. The proportion of bootstrap samples with fewer runs is indicated in the upper right of each graph.
The histograms on the right of each pair of graphs indicate the number of runs in the signs of the residuals of 1000 bootstrap simulations. The vertical lines show the number obtained from each observer's own data. Too few runs would suggest systematic patterns in the residuals that would be interpreted as violations of the model. The p-values shown in the upper right of each graph indicate the proportion of bootstrap samples with fewer runs. All of the p-values are greater than 0.05 providing no evidence to reject the fits based on patterns in the residuals. 
Rating scales
The estimated scales in Figure 3 describe how the appearance of the fill-in color changes with the luminance difference between the distant contours that generate the phenomena. We tested whether these scales are equally valid to describe discrimination of small differences by performing a paired-comparison experiment for small stimulus differences predicted from each observer's own scale to yield a given performance level. 
Figure 5 illustrates schematically how levels were selected. The orange points indicate three of the estimated scale values on this detail from the whole curve. The smooth black curve is a cubic spline interpolation curve. We chose 4 levels on the ordinate whose separations correspond to the discrimination performance levels that we wished to test. The arrows from these ordinate levels to the interpolated curve and then to the abscissa give the desired stimulus elevations. In practice, we chose the reference level for each observer to be near to where the estimated curve approached linearity. The elevations used for each observer are indicated in Table 1
Figure 5
 
Schematic illustration of a detail from an estimated difference scale. The orange points indicate experimentally estimated values and the black curve is a smooth curve interpolated through the points. The dashed arrows indicate the choice of 4 stimulus levels (S 0, S 0.5, S 1, S 2) that correspond to response differences (d′) of 0.5, 1, and 2 with respect to the reference intensity, 0.
Figure 5
 
Schematic illustration of a detail from an estimated difference scale. The orange points indicate experimentally estimated values and the black curve is a smooth curve interpolated through the points. The dashed arrows indicate the choice of 4 stimulus levels (S 0, S 0.5, S 1, S 2) that correspond to response differences (d′) of 0.5, 1, and 2 with respect to the reference intensity, 0.
Table 1
 
Luminance elevations interpolated from difference scale curves predicted to yield criterion level performances of d′ = 0.5, 1, and 2 (Columns 3–5) with respect to the reference elevation in Column 2.
Table 1
 
Luminance elevations interpolated from difference scale curves predicted to yield criterion level performances of d′ = 0.5, 1, and 2 (Columns 3–5) with respect to the reference elevation in Column 2.
Observer 0 0.5 1 2
FD 0.0 0.041 0.083 0.124
SF 0.073 0.109 0.146 0.219
NM 0.233 0.265 0.298 0.364
BA 0.111 0.138 0.165 0.219
Figure 6 shows ROC plots (probability of a hit, P H versus probability of a false alarm, P FA) for the four observers for stimulus differences predicted to yield values of d′ equal to 0.5, 1, and 2, according to the difference scales in Figure 3. The purple points indicate the results for the control stimuli and the orange for the test. The points were obtained by cumulating the ratings conditional on the higher elevation stimulus being on the right (Hit) or left (False Alarm) across each of the 5 levels for each of the two conditions. The gray curves (solid for the test, dashed for the control) indicate the expected ROC contours for an equal-variance Gaussian model if the observer's sensitivity is equal to the nominal value, i.e., predicted by the perceptual scales obtained by MLDS. In the case of the control stimuli, the points are closer to the positive diagonal, indicated by the dashed lines, corresponding to d′ = 0 for most of the conditions. In two cases (observers SF and NM at nominal d′ = 0.5), the fitted curves bow inward, indicating a negative value of d′. For this weak stimulus condition, these observers may have been judging the contour rather than the fill-in color and choosing the stimulus with the interior orange contour that displayed a greater contrast with the background. Recall that the fill-in color typically is determined by the contour with the lower contrast with the background. For the test conditions, the points fall close to the predicted curves. The black curves (solid for the test, dashed for the control) indicate the best fit equal-variance, Gaussian model (see 2 for discussion of the fitting procedure and a test of the equal-variance assumption). The estimated values of d′ and the estimated standard errors are provided in Table 2
Figure 6
 
ROC curves for the four observers for stimulus differences expected to yield d′ values of 0.5, 1, and 2 (left to right) and using either the control (purple) or test pattern (orange). The points are the hit and false alarm probabilities calculated from the observer's ratings. The black curves are the ROCs for the estimated d′ value that best fit the data under the equal-variance, Gaussian model. The gray curves are the expected curves under the same model based on the observer's MLDS scale. The solid curves correspond to the fit and predictions for the test conditions and the dashed curves for the control.
Figure 6
 
ROC curves for the four observers for stimulus differences expected to yield d′ values of 0.5, 1, and 2 (left to right) and using either the control (purple) or test pattern (orange). The points are the hit and false alarm probabilities calculated from the observer's ratings. The black curves are the ROCs for the estimated d′ value that best fit the data under the equal-variance, Gaussian model. The gray curves are the expected curves under the same model based on the observer's MLDS scale. The solid curves correspond to the fit and predictions for the test conditions and the dashed curves for the control.
Table 2
 
Estimated values of d′ and standard errors for the four observers and the average for each nominal value of d′ for the control and test conditions.
Table 2
 
Estimated values of d′ and standard errors for the four observers and the average for each nominal value of d′ for the control and test conditions.
Observer Control Test
d 0.5 SE 0.5 d 1 SE 1 d 2 SE 2 d 0.5 SE 0.5 d 1 SE 1 d 2 SE 2
FD −0.01 0.16 −0.02 0.13 0.15 0.13 0.44 0.15 0.95 0.12 2.25 0.16
SF −0.53 0.17 −0.04 0.18 0.12 0.18 0.57 0.17 1.63 0.20 2.16 0.23
NM −0.67 0.23 0.07 0.22 0.09 0.21 0.21 0.20 1.20 0.24 2.04 0.27
BA 0.22 0.21 0.51 0.22 −0.23 0.22 0.34 0.20 0.89 0.21 1.38 0.22
Average −0.25 0.21 0.13 0.13 0.03 0.09 0.40 0.07 1.17 0.17 1.96 0.20
The left four panels of Figure 7 show the average values of d′ estimated from the individual sessions plotted against the nominal values predicted from the individual difference scales of Figure 3 for each of the four observers. The results for the test conditions are plotted as orange points and the control as purple. The agreement of the data and prediction is very good for observer FD. The other 3 observers' data display more variability but are, also, based on fewer trials. There is a tendency for two of the observers (SF and NM) to select the control stimulus in the control condition for the lowest nominal value of d′, as indicated above. In addition, for observer BA, the lowest two control points are closer to the test prediction than the control. This could indicate that she based her judgments on the luminance of the orange component of the contour rather than the fill-in color although her behavior shows better separation of the two conditions when the stimulus difference is greater. Thus, we cannot exclude that this observer based judgments on the contour for the weaker two stimuli. Globally, however, the test conditions tend to fall near a line of unit slope and the control near a line of zero slope. The error bars approximate 95% confidence intervals for the individual points but have not been corrected for multiple comparisons, which would make them larger. Except in the cases indicated, these results, again, support the hypothesis that observers' judgments were based on the perceived color of the interior region of the stimulus and not of the physical contours. 
Figure 7
 
Average d′ values plotted against the nominal values based on each individual's difference scale and the average of the four observers. The orange points correspond to the test condition and the purple points correspond to the control. The gray lines show the predictions for the two conditions. Error bars are plus and minus twice the standard errors based on session differences. Error bars for the average data are twice the standard errors based on observer means.
Figure 7
 
Average d′ values plotted against the nominal values based on each individual's difference scale and the average of the four observers. The orange points correspond to the test condition and the purple points correspond to the control. The gray lines show the predictions for the two conditions. Error bars are plus and minus twice the standard errors based on session differences. Error bars for the average data are twice the standard errors based on observer means.
The d′ values were modeled using a repeated measures analysis of covariance with Condition (Control or Test) taken as a two-level factor and the nominal value of d′ as a covariate. A significant difference in the slopes across conditions was obtained (F 1,3 = 48.15; p = 1.8E − 4) with estimated slopes (SE) for the control and test conditions equal to 0.146 (0.157) and 1.007 (0.157), respectively. The average results with error bars on each point indicating plus and minus twice the standard error are shown in the fifth panel and indicate good agreement between the experimental and predicted values. 
Discussion
While difference scaling is most frequently performed with stimulus quadruples, the possibility of using triads has previously been discussed (Falmagne, 1985; Krantz et al., 1971; Roberts, 1985). We find that the task is simpler with triads, in that one only needs to indicate whether a reference stimulus is more similar to one of two stimuli rather than to judge two pairs simultaneously. In either case, the observer makes ordinal judgments of intervals between stimuli rather than simply ordering the stimuli themselves. 
The results are consistent both within and across observers. It may be that the binary judgments are also easier to perform than matching or canceling an evanescent illusion, thus leading to the greater reliability observed. In addition, matching and cancellation only yield equivalences to reference stimuli. Without a measure of the perceptual strength of the reference, there is still no measure of the strength of the effect. Since MLDS is based on ordering intervals, the resulting scale is automatically interpretable as an interval scale, i.e., equal scale differences correspond to equal perceptual differences. In addition, the signal detection model establishes a scale in which strength is related to a measure of internal variability. 
Since the strength of the perceived fill-in color that we are trying to measure depends on the luminance of a distant physical contour, there is the danger that the observers will respond solely on the basis of the contour. To test for this possibility, we used a control stimulus with the same chromatic components but distributed in a manner that interfered with the filling-in phenomenon. If the observers respond on the basis of the contour, we expect to obtain similar perceptual scales from both test and control sessions. In fact, control scales were statistically independent of the luminance difference in the inducing contours, supporting the hypothesis that the observers' judgments were based on the strength of the perceived filling-in. The only situation in which there seemed to be evidence that the judgments were based on the contour rather than the interior region of the stimuli was in the paired-comparison experiments for the control conditions at the lowest nominal d′ values. If the judgments were based only on the interior region, then observers would have performed at chance. It is not surprising that in a discrimination experiment, observers might exploit alternative cues, if present. 
Cao et al. (2011) used the degree of inconsistency in judging pairs of stimuli evoking different strengths of an achromatic version of the WCE to demonstrate that the WCE can be quantified. For this method to work, stimuli must be spaced closely enough so that the observer's judgments will be probabilistic, e.g., a given pair (a, b) will, on some trials, be judged a > b and on others b > a. This typically means that many closely spaced stimuli must be scaled, and the number of trials needed to estimate the scale would be much greater than in MLDS. An underlying scale can be derived from discrimination data by measuring discrimination steps over a large range of the stimulus dimension. This is rarely done as the task is heroic. Instead, an underlying response function is typically assumed and the data fit by estimating its free parameters (for example, see Foley & Legge, 1981; Legge & Foley, 1980; Smithson, Henning, MacLeod, & Stockman, 2009 for recent examples). The scale's interpretation as a perceptual scale depends on the assumption that equally discriminable differences are perceptually equal. Such an assumption is unnecessary for difference scaling since the task involves comparison of stimulus intervals directly. 
In addition, scales based on paired comparisons alone depend on distributional and variance assumptions (Kingdom & Prins, 2009). Maloney and Yang (2003) have shown, however, that scales estimated by MLDS are robust to both distributional and equal-variance assumptions. We could not reject an equal-variance, Gaussian model in favor of one based on an unequal-variance assumption (see 2). The consistency between threshold and suprathreshold regimes that we observe, thus, may depend on the fact that the equal-variance assumption was not violated in the paired-comparison judgments. 
A number of previous studies have attempted to test whether different psychophysical tasks might depend on a common underlying mechanism. Some of these studies involve only testing the consistency of performance across tasks (for example, Calkins, Thornton, & Pugh, 1992; Heinemann, 1961; Pugh & Larimer, 1980; Thornton & Pugh, 1983), thereby implying a common response mechanism. Others explicitly use signal detection models to relate various tasks through a common response function (for example, see Hillis & Brainard, 2005, 2007a, 2007b for comparisons of discrimination and asymmetric brightness matching). The principal difference of these latter studies from ours is that in the latter cases an underlying parametric response function was assumed to test whether it could account for both suprathreshold and threshold performance whereas the novelty of the difference scaling approach is that an underlying response function is estimated directly from observers' judgments. 
Signal Detection Theory provides a common metric for comparing discrimination performance and neural activity (Britten, Shadlen, Newsome, & Movshon, 1992; Green & Swets, 1966; Tolhurst, Movshon, & Dean, 1983). The extrapolation of psychophysical results beyond the threshold domain to encompass perception has been problematic, however (Krantz, 1971; Luce & Edwards, 1958; Stevens, 1957). Brown et al. (2011) have recently demonstrated that color scales estimated by MLDS predict reaction times as a function of color difference. Here, we show that discrimination performance for small differences in the WCE is predicted by perceptual scales that are based on comparisons of suprathreshold intervals. Our results, thus, permit an account of threshold and suprathreshold performance within the same signal detection model. Further tests to validate this framework might examine whether difference scales predict performance in other tasks, such as asymmetric matching (Hillis & Brainard, 2005). If these results generalize to other perceptual tasks, then signal detection theory will have provided a natural bridge between threshold and suprathreshold performance. 
Response functions assumed to be mediating psychophysical performance often are inspired by neurophysiological data. Interesting questions include whether the response functions estimated by MLDS will have the same form as those obtained from physiological recordings and whether the psychophysically obtained functions can be used to identify signals in the brain mediating perception. Of course, depending on the level at which neural activity is probed and the methods used, we do not necessarily expect the limiting sources of noise to be the same. We believe, however, that analyzing such differences will provide useful information in identifying the neural substrates of perception. The approach then could provide a common basis for comparing perceptual and neural response functions. 
Appendix A
Maximum likelihood methods for difference scaling
In the difference scaling task, the observer is tested with p stimuli, ϕ i , distributed along a physical continuum with ϕ 1 < ϕ 2 < … < ϕ p . For the method of triads, the observer is presented with triples, (ϕ a , ϕ b , ϕ c ) on each trial and judges whether the interval (ϕ a , ϕ b ) is smaller than (ϕ b , ϕ c ). Equivalently, the observer chooses whether ϕ b is more similar to ϕ a or ϕ c . We assume that on a given trial, the choice depends on a decision variable, Δ, contaminated by noise and that the choice is the pair (ϕ a , ϕ b ) precisely when 
Δ a b c = δ a b c + ε a b c = 2 ψ b ψ a ψ c + ε a b c > 0 ,
(A1)
where the ψ i are internal responses to the stimuli and ε i is distributed as a Gaussian random variable with μ = 0 and variance = 4σ 2. The coefficient of 4 follows from the rule for calculating the variance of a linear combination of independent random variables (Chung, 1974), given the weights of each response in the decision variable and the assumptions of equal variance and independence of the responses to each stimulus. We would like to estimate the values ψ i , i = 1, p, and σ 2. Maloney and Yang (2003) proposed to estimate the values by directly maximizing the likelihood: 
L ( Ψ , σ ; R ) = i = 1 n Φ ( δ i 2 σ ) R i ( 1 Φ ( δ i 2 σ ) ) 1 R i ,
(A2)
where Ψ is the vector of response coefficients, ψ i to estimate, R is the vector of observer responses (0 for choosing the left, 1 for the right) on each trial, Φ is the cumulative normal distribution function, and δ i is defined in Equation A1. The choice of coefficients is unique only up to a linear transformation so Maloney and Yang (2003) constrained the values of ψ 1 and ψ p to 0 and 1, respectively, thus, leaving p − 1 parameters (ψ 2, …, ψ p−1, σ) to estimate by maximizing Equation A2 over all n trials. 
Knoblauch and Maloney (2008) exploited the linearity of the decision rule, Equation A1, to conceptualize the problem as a Generalized Linear Model (GLM) with a binomial family (McCullagh & Nelder, 1989): 
g ( E [ Y ] ) = X β ,
(A3)
where the expected value of the response Y is the probability of choosing the left or right stimulus and is related to a linear predictor, X β by a link function, g, here Φ−1. X is an n × p design matrix with one column for each stimulus level and one row for each trial. Each row has 3 non-zero elements taking on the values (−1, 2, −1) in the columns corresponding to the stimuli on that trial and the weights in Equation A1. To render the model identifiable, however, the first column of X is dropped, which fixes the estimate of the first coefficient to 0. β is the p − 1 vector (β 2, …, β p ), of scale values to be estimated. Once the design matrix is set up with the response vector, the solution can be obtained with many off-the-shelf software packages that contain a function for performing a GLM. 
In this parameterization, the scale is fixed at 0 for the lowest stimulus level but unconstrained at the highest level. The coefficients obtained by GLM are related to those obtained by the direct maximization approach by 
β i = ψ i 2 σ .
(A4)
To parameterize the scale to a unit variance for each response level, we double the β i
Appendix B
Test of equal-variance assumption in discrimination data
The rating scale data were modeled with a cumulative, ordinal regression model (McCullagh & Nelder, 1989). Such models can be represented as follows: 
g ( P ( Y k | x ) ) = θ k X β ,
(B1)
where g is a link function and Y the rating whose probability being less than the kth category boundary is conditional on the explanatory variable x. The categories here refer to the ratings. On the right side of the model, X β is a linear predictor and θ k is a category (rating) dependent intercept that is constrained such that θ 1|2 < θ 2|3 < ··· < θ p−1|p ; it provides an estimate of the rating boundaries on the scale of the linear predictor. The minus sign attached to the linear predictor is a convention that results in probability increasing with increased rating and increase in the linear predictor. The model matrix, X , corresponds to that of a two-level factor indicating whether the higher luminance orange contour stimulus was presented on the left or right. When the link function is chosen to be the inverse of a Gaussian cumulative distribution function (or probit), this is equivalent to fitting the data with the equal-variance Gaussian signal detection model. We performed the fits of this model to the rating data by maximum likelihood using the clm function in the package ordinal in the software R (Christensen, 2010; R Development Core Team, 2011). 
The unequal-variance, Gaussian, signal detection model is represented as follows: 
g ( P ( Y k | x i ) ) = θ k X β σ i ,
(B2)
where σ i is a scale parameter that depends on the value of x, i.e., whether the test condition was on the left (0) or right (1). The ratio σ 1/σ 0 is the inverse of the slope of the zROC contour, i.e., the contour obtained when the two axes of the ROC plot are transformed by the inverse of the Gaussian cumulative distribution function. This model and the equal-variance model, Equation B1, form a nested pair, and so the equal-variance assumption can be tested with a likelihood ratio test. The two models differ by 1 degree of freedom. 
Table B1 shows the χ 1 2 values (Column 4) and the unadjusted p-values (Column 5) for the likelihood ratio tests of the equal-variance model for each observer and condition (Columns 1–3). Three of the 24 tests display p-values between 0.01 and 0.05. Bonferroni's adjustment of the p-values for multiple testing is given by 
p B = min ( 1 , n p ) ,
(B3)
where n is the number of tests. Adjusting the p-values with this conservative correction eliminates all of the values less than 0.05 (Column 6). The False Discovery Rate provides a more powerful test, in that it finds more significant results, at the cost of a less conservative criterion (Bretz, Hothorn, & Westfall, 2010). The adjusted p-values are obtained by initially computing 
q f = min ( 1 , n p i / i ) ,
(B4)
where the p-values have been ordered, such that p i is the ith in the sequence. The corrected values are given, then, by taking the minimum cumulatively through the reversed ordered sequence of q f values. However, even with this less conservative correction none of the values attains significance (Column 7). Thus, we find no evidence to reject the equal-variance model in favor of the unequal-variance alternative. 
Table B1
 
Nested hypothesis tests and corrected p-values for equal-variance versus unequal-variance, Gaussian models: (1) Observer ID, (2) condition, (3) nominal value of d′, (4) χ 2 value obtained for nested likelihood ratio test between the two hypotheses, (5) p-value for likelihood ratio test, (6) p-values corrected for multiple testing by Bonferroni's method, (7) p-values corrected by false discovery rate.
Table B1
 
Nested hypothesis tests and corrected p-values for equal-variance versus unequal-variance, Gaussian models: (1) Observer ID, (2) condition, (3) nominal value of d′, (4) χ 2 value obtained for nested likelihood ratio test between the two hypotheses, (5) p-value for likelihood ratio test, (6) p-values corrected for multiple testing by Bonferroni's method, (7) p-values corrected by false discovery rate.
Observer Condition Nominal d χ 1 2 p Bonferroni FDR
FD Control 0.50 0.53 0.47 1.00 0.90
FD Control 1.00 1.10 0.29 1.00 0.70
FD Control 2.00 0.00 0.96 1.00 1.00
FD Test 0.50 0.15 0.70 1.00 0.96
FD Test 1.00 2.74 0.10 1.00 0.47
FD Test 2.00 0.08 0.77 1.00 0.96
SF Control 0.50 0.01 0.92 1.00 1.00
SF Control 1.00 0.33 0.56 1.00 0.90
SF Control 2.00 3.18 0.07 1.00 0.45
SF Test 0.50 4.19 0.04 0.97 0.33
SF Test 1.00 1.11 0.29 1.00 0.70
SF Test 2.00 0.06 0.80 1.00 0.96
NM Control 0.50 1.88 0.17 1.00 0.68
NM Control 1.00 1.17 0.28 1.00 0.70
NM Control 2.00 0.12 0.73 1.00 0.96
NM Test 0.50 0.00 1.00 1.00 1.00
NM Test 1.00 0.79 0.37 1.00 0.81
NM Test 2.00 0.38 0.54 1.00 0.90
BA Control 0.50 0.33 0.56 1.00 0.90
BA Control 1.00 0.01 0.91 1.00 1.00
BA Control 2.00 4.34 0.04 0.90 0.33
BA Test 0.50 1.17 0.28 1.00 0.70
BA Test 1.00 0.07 0.79 1.00 0.96
BA Test 2.00 6.63 0.01 0.24 0.24
Acknowledgments
This work was supported in part by Grant ANR 11 JSH2 0021 from the Agence Nationale de la Recherche to Frédéric Devinck. We gratefully thank Michel Dojat, Peggy Gerardin, Laurence T. Maloney, and John S. Werner for comments. 
Commercial relationships: none. 
Corresponding author: Kenneth Knoblauch. 
Email: ken.knoblauch@inserm.fr. 
Address: INSERM, U846, Stem-cell and Brain Research Institute, Department of Integrative Neurosciences, 69500 Bron, France. 
References
Brainard D. H. (1997). The psychophysics toolbox. Spatial Vision, 10, 433–436. [PubMed] [CrossRef] [PubMed]
Brainard D. H. Pelli D. G. Robson T. (2002). Display characterization. In Hornak J. (Ed.), Encyclopedia of imaging science and technology (pp. 172–188). New York: Wiley.
Bretz F. Hothorn T. Westfall P. (2010). Multiple comparisons using R.. Boca Raton, FL: CRC Press.
Britten K. H. Shadlen M. N. Newsome W. T. Movshon J. A. (1992). The analysis of visual motion: A comparison of neuronal and psychophysical performance. Journal of Neurosciences, 12, 4745–4765. [PubMed]
Brown A. M. Lindsey D. T. Guckes K. M. (2011). Color names, color categories, and color-cued visual search: Sometimes, color perception is not categorical. Journal of Vision, 11(12):2, 1–21, http://www.journalofvision.org/content/11/12/2, doi:10.1167/11.12.2. [PubMed] [Article] [CrossRef] [PubMed]
Calkins D. J. Thornton J. E. Pugh E. N. (1992). Monochromatism determined at a long-wavelength/middle-wavelength cone-antagonistic locus. Vision Research, 32, 2349–2367. [PubMed] [CrossRef] [PubMed]
Cao B. Yazdanbakhsh A. Mingolla E. (2011). The effect of contrast intensity and polarity in the achromatic watercolor effect. Journal of Vision, 11(3):18, 1–8, http://www.journalofvision.org/content/11/3/18, doi:10.1167/11.3.18. [PubMed] [Article] [CrossRef] [PubMed]
Charrier C. Maloney L. T. Cherifi H. Knoblauch K. (2007). Maximum likelihood difference scaling of image quality in compression-degraded images. Journal of the Optical Society of America A, 24, 3418–3426. [PubMed] [CrossRef]
Christensen R. H. B. (2012). ordinal: Regression Models for Ordinal Data R package version 2012.01-19, http://www.cran.r-project.org/package=ordinal/.
Chung K. L. (1974). Elementary probability theory with stochastic processes. New York: Springer-Verlag.
Derrington A. M. Krauskopf J. Lennie P. (1984). Chromatic mechanisms in lateral geniculate nucleus of macaque. The Journal of Physiology, 357, 241–265. [PubMed] [CrossRef] [PubMed]
Devinck F. Delahunt P. B. Hardy J. L. Spillmann L. Werner J. S. (2005). The watercolor effect: Quantitative evidence for luminance-dependent mechanisms of long-range color assimilation. Vision Research, 45, 1413–1424. [PubMed] [Article] [CrossRef] [PubMed]
Devinck F. Delahunt P. B. Hardy J. L. Spillmann L. Werner J. S. (2006). Spatial dependence of color assimilation by the watercolor effect. Perception, 35, 461–468. [PubMed] [CrossRef] [PubMed]
Devinck F. Hardy J. L. Delahunt P. B. Spillmann L. Werner J. S. (2006). Illusory spreading of watercolor. Journal of Vision, 6(5):7, 625–633, http://www.journalofvision.org/content/6/5/7, doi:10.1167/6.5.7. [PubMed] [Article] [CrossRef]
Devinck F. Spillmann L. (2009). The watercolor effect: Spacing constraints. Vision Research, 49, 2911–2917. [PubMed] [CrossRef] [PubMed]
Devinck F. Spillmann L. Werner J. S. (2006). Spatial profile of contours inducing long-range color assimilation. Visual Neuroscience, 23, 573–577. [PubMed] [CrossRef] [PubMed]
Efron B. Tibshirani R. J. (1993). An introduction to the bootstrap. New York: Chapman Hall.
Emrith K. Chantler M. J. Green P. R. Maloney L. T. Clarke A. D. F. (2010). Measuring perceived differences in surface texture due to changes in higher order statistics. Journal of the Optical Society of America A, 27, 1232–1244. [PubMed] [CrossRef]
Falmagne J.-C. (1985). Elements of psychophysical theory. Oxford, UK: Oxford University Press.
Fleming R. W. Jäkel F. Maloney L. T. (2011). Visual perception of thick transparent materials. Psychological Science, 22, 812–820. [PubMed] [CrossRef] [PubMed]
Foley J. M. Legge G. E. (1981). Contrast detection and near-threshold discrimination in human vision. Vision Research, 21, 1041–1053. [PubMed] [CrossRef] [PubMed]
Green D. M. Swets J. A. (1966). Signal detection theory and psychophysics. Huntington, NY: Robert E Krieger Publishing Company.
Heinemann E. G. (1961). The relation of apparent brightness to the threshold for differences in luminance. Journal of Experimental Psychology, 61, 389–399. [PubMed] [CrossRef] [PubMed]
Hillis J. M. Brainard D. H. (2005). Do common mechanisms of adaptation mediate color discrimination and appearance? Uniform backgrounds. Journal of the Optical Society of America A, 22, 2090–2106. [PubMed] [CrossRef]
Hillis J. M. Brainard D. H. (2007a). Distinct mechanisms mediate visual detection and identification. Current Biology, 17, 1714–1719. [PubMed] [CrossRef]
Hillis J. M. Brainard D. H. (2007b). Do common mechanisms of adaptation mediate color discrimination and appearance? Contrast adaptation. Journal of the Optical Society of America A, 24, 2122–2133. [PubMed] [CrossRef]
Kingdom F. A. A. Prins N. (2009). Psychophysics: A practical introduction. New York: Academic Press.
Knoblauch K. Maloney L. T. (2008). MLDS: Maximum likelihood difference scaling in R. Journal of Statistical Software, 25, 1–26. [Article]
Knoblauch K. Maloney L. T. (in press). Modeling psychophysical data in R. New York: Springer.
Krantz D. H. (1971). Integration of just-noticeable differences. Journal of Mathematical Psychology, 8, 591–599. [CrossRef]
Krantz D. H. Luce R. D. Suppes P. Tversky A. (1971). Foundations of measurement: Additive and polynomial representation (vol. 1). New York: Academic Press.
Legge G. E. Foley J. M. (1980). Contrast masking in human vision. Journal of the Optical Society of America, 70, 1458–1471. [PubMed] [CrossRef] [PubMed]
Lindsey D. T. Brown A. M. Reijnen E. Rich A. N. Kuzmova Y. I. Wolfe J. M. (2010). Color channels, not color appearance or color categories, guide visual search for desaturated color targets. Psychological Science, 21, 1208–1214. [PubMed] [Article] [CrossRef] [PubMed]
Luce R. D. Edwards W. (1958). The derivation of subjective scales from just noticeable differences. Psychological Review, 65, 222–237. [PubMed] [CrossRef] [PubMed]
Maloney L. T. Yang J. N. (2003). Maximum likelihood difference scaling. Journal of Vision, 3(8):5, 573–585, http://www.journalofvision.org/content/3/8/5, doi:10.1167/3.8.5. [PubMed] [Article] [CrossRef]
McCullagh P. Nelder J. A. (1989). Generalized linear models. London: Chapman and Hall.
Obein G. Knoblauch K. Viénot F. (2004). Difference scaling of gloss: Nonlinearity, binocularity, and constancy. Journal of Vision, 4(9):4, 711–720, http://www.journalofvision.org/content/4/9/4, doi:10.1167/4.9.4. [PubMed] [Article] [CrossRef]
Pelli D. G. (1997). The VideoToolbox software for visual psychophysics: Transforming numbers into movies. Spatial Vision, 10, 437–442. [PubMed] [CrossRef] [PubMed]
Pinna B. (1987). Un effeto di colorazione. In Majer V. Santinello M. (Eds.), l laboratorio e la citt. XXI Congresso degli Psicologi Italiani (p. 158). Milan: Societ Italiana di Psicologia.
Pinna B. Brelstaff G. Spillmann L. (2001). Surface color from boundaries: A new ‘watercolor’ illusion. Vision Research, 41, 2669–2676. [PubMed] [CrossRef] [PubMed]
Pugh E. N., Jr. Larimer J. (1980). Test of the identity of the site of blue/yellow hue cancellation and the site of chromatic antagonism in the pi 1 pathway. Vision Research, 20, 779–788. [PubMed] [CrossRef] [PubMed]
R Development Core Team (2011). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, http://www.R-project.org/.
Rhodes G. Maloney L. T. Turner J. Ewing L. (2007). Adaptive face coding and discrimination around the average face. Vision Research, 47, 974–989. [PubMed] [CrossRef] [PubMed]
Roberts F. S. (1985). Measurement theory. Cambridge, UK: Cambridge University Press.
Schneider B. Parker S. Stein D. (1974). The measurement of loudness using direct comparisons of sensory intervals. Journal of Mathematical Psychology, 11, 259–273. [CrossRef]
Smithson H. E. Henning G. B. MacLeod D. I. Stockman A. (2009). The effect of notched noise on flicker detection and discrimination. Journal of Vision, 9(5):21, 1–18, http://www.journalofvision.org/content/9/5/21, doi:10.1167/9.5.21. [PubMed] [Article] [CrossRef] [PubMed]
Spillmann L. Werner J. S. (1996). Long-range interactions in visual perception. Trends in Neuroscience, 19, 428–434. [PubMed] [CrossRef]
Stevens S. S. (1957). On the psychophysical law. Psychological Review, 64, 153–181. [PubMed] [CrossRef] [PubMed]
Thornton J. E. Pugh E. N. (1983). Red/Green color opponency at detection threshold. Science, 219, 191–193. [PubMed] [CrossRef] [PubMed]
Tolhurst D. J. Movshon J. A. Dean A. F. (1983). The statistical reliability of signals in single neurons in cat and monkey visual cortex. Vision Research, 23, 775–785. [PubMed] [CrossRef] [PubMed]
von der Heydt R. Pierson R. (2006). Dissociation of color and figure–ground effects in the watercolor illusion. Spatial Vision, 19, 323–340. [PubMed] [CrossRef] [PubMed]
Wood S. N. (2006). Generalized additive models: An introduction with R. Boca Raton, FL: Chapman & Hall/CRC.
Yang J. N. Szeverenyi N. M. Ts'o D. (2008). Neural resources associated with perceptual judgment across sensory modalities. Cerebral Cortex, 18, 38–45. [PubMed] [CrossRef] [PubMed]
Figure 1
 
(a) Example of the watercolor effect. Three adjacent oblong regions (left, center, right) are defined by a wiggly double contour. The left and right flanking regions are bounded by an exterior dark purple and an interior lighter orange contour. The regions bounded by the contour appear to be filled-in with a light orange color. In the central region in which the contour colors are reversed, no filling-in occurs. (b) A schematic detail of the contour that produces the WCE. (c) A detail of the contour for the control condition in which the two colors are braided and no apparent filling-in occurs. (d) Triad of stimuli for a trial from a difference scaling experiment. (e) Stimulus configuration for paired comparisons.
Figure 1
 
(a) Example of the watercolor effect. Three adjacent oblong regions (left, center, right) are defined by a wiggly double contour. The left and right flanking regions are bounded by an exterior dark purple and an interior lighter orange contour. The regions bounded by the contour appear to be filled-in with a light orange color. In the central region in which the contour colors are reversed, no filling-in occurs. (b) A schematic detail of the contour that produces the WCE. (c) A detail of the contour for the control condition in which the two colors are braided and no apparent filling-in occurs. (d) Triad of stimuli for a trial from a difference scaling experiment. (e) Stimulus configuration for paired comparisons.
Figure 2
 
Perceptual scales estimated from each session for each observer for the test (top) and control (bottom) conditions. Different sessions are indicated in different colors.
Figure 2
 
Perceptual scales estimated from each session for each observer for the test (top) and control (bottom) conditions. Different sessions are indicated in different colors.
Figure 3
 
Average perceptual scales as a function of luminance elevation of the orange contour for four observers. The orange symbols were obtained with the test pattern and the purple with the braided control. The orange symbols are the means of 8 sessions (16 for FD) and the purple symbols are the means of 5. The error bars are 95% confidence intervals.
Figure 3
 
Average perceptual scales as a function of luminance elevation of the orange contour for four observers. The orange symbols were obtained with the test pattern and the purple with the braided control. The orange symbols are the means of 8 sessions (16 for FD) and the purple symbols are the means of 5. The error bars are 95% confidence intervals.
Figure 4
 
For each observer, a pair of diagnostic plots is shown based on the distribution of the deviance residuals of the fitted model. On the left of each pair, the graphs show the empirical cumulative distribution function of the residuals (black points) and a 95% confidence interval based on 1000 bootstrap simulations. On the right, histograms of the number of runs in the residuals from 1000 bootstrap simulations are shown with the number of runs from the observed data indicated as a vertical line. The proportion of bootstrap samples with fewer runs is indicated in the upper right of each graph.
Figure 4
 
For each observer, a pair of diagnostic plots is shown based on the distribution of the deviance residuals of the fitted model. On the left of each pair, the graphs show the empirical cumulative distribution function of the residuals (black points) and a 95% confidence interval based on 1000 bootstrap simulations. On the right, histograms of the number of runs in the residuals from 1000 bootstrap simulations are shown with the number of runs from the observed data indicated as a vertical line. The proportion of bootstrap samples with fewer runs is indicated in the upper right of each graph.
Figure 5
 
Schematic illustration of a detail from an estimated difference scale. The orange points indicate experimentally estimated values and the black curve is a smooth curve interpolated through the points. The dashed arrows indicate the choice of 4 stimulus levels (S 0, S 0.5, S 1, S 2) that correspond to response differences (d′) of 0.5, 1, and 2 with respect to the reference intensity, 0.
Figure 5
 
Schematic illustration of a detail from an estimated difference scale. The orange points indicate experimentally estimated values and the black curve is a smooth curve interpolated through the points. The dashed arrows indicate the choice of 4 stimulus levels (S 0, S 0.5, S 1, S 2) that correspond to response differences (d′) of 0.5, 1, and 2 with respect to the reference intensity, 0.
Figure 6
 
ROC curves for the four observers for stimulus differences expected to yield d′ values of 0.5, 1, and 2 (left to right) and using either the control (purple) or test pattern (orange). The points are the hit and false alarm probabilities calculated from the observer's ratings. The black curves are the ROCs for the estimated d′ value that best fit the data under the equal-variance, Gaussian model. The gray curves are the expected curves under the same model based on the observer's MLDS scale. The solid curves correspond to the fit and predictions for the test conditions and the dashed curves for the control.
Figure 6
 
ROC curves for the four observers for stimulus differences expected to yield d′ values of 0.5, 1, and 2 (left to right) and using either the control (purple) or test pattern (orange). The points are the hit and false alarm probabilities calculated from the observer's ratings. The black curves are the ROCs for the estimated d′ value that best fit the data under the equal-variance, Gaussian model. The gray curves are the expected curves under the same model based on the observer's MLDS scale. The solid curves correspond to the fit and predictions for the test conditions and the dashed curves for the control.
Figure 7
 
Average d′ values plotted against the nominal values based on each individual's difference scale and the average of the four observers. The orange points correspond to the test condition and the purple points correspond to the control. The gray lines show the predictions for the two conditions. Error bars are plus and minus twice the standard errors based on session differences. Error bars for the average data are twice the standard errors based on observer means.
Figure 7
 
Average d′ values plotted against the nominal values based on each individual's difference scale and the average of the four observers. The orange points correspond to the test condition and the purple points correspond to the control. The gray lines show the predictions for the two conditions. Error bars are plus and minus twice the standard errors based on session differences. Error bars for the average data are twice the standard errors based on observer means.
Table 1
 
Luminance elevations interpolated from difference scale curves predicted to yield criterion level performances of d′ = 0.5, 1, and 2 (Columns 3–5) with respect to the reference elevation in Column 2.
Table 1
 
Luminance elevations interpolated from difference scale curves predicted to yield criterion level performances of d′ = 0.5, 1, and 2 (Columns 3–5) with respect to the reference elevation in Column 2.
Observer 0 0.5 1 2
FD 0.0 0.041 0.083 0.124
SF 0.073 0.109 0.146 0.219
NM 0.233 0.265 0.298 0.364
BA 0.111 0.138 0.165 0.219
Table 2
 
Estimated values of d′ and standard errors for the four observers and the average for each nominal value of d′ for the control and test conditions.
Table 2
 
Estimated values of d′ and standard errors for the four observers and the average for each nominal value of d′ for the control and test conditions.
Observer Control Test
d 0.5 SE 0.5 d 1 SE 1 d 2 SE 2 d 0.5 SE 0.5 d 1 SE 1 d 2 SE 2
FD −0.01 0.16 −0.02 0.13 0.15 0.13 0.44 0.15 0.95 0.12 2.25 0.16
SF −0.53 0.17 −0.04 0.18 0.12 0.18 0.57 0.17 1.63 0.20 2.16 0.23
NM −0.67 0.23 0.07 0.22 0.09 0.21 0.21 0.20 1.20 0.24 2.04 0.27
BA 0.22 0.21 0.51 0.22 −0.23 0.22 0.34 0.20 0.89 0.21 1.38 0.22
Average −0.25 0.21 0.13 0.13 0.03 0.09 0.40 0.07 1.17 0.17 1.96 0.20
Table B1
 
Nested hypothesis tests and corrected p-values for equal-variance versus unequal-variance, Gaussian models: (1) Observer ID, (2) condition, (3) nominal value of d′, (4) χ 2 value obtained for nested likelihood ratio test between the two hypotheses, (5) p-value for likelihood ratio test, (6) p-values corrected for multiple testing by Bonferroni's method, (7) p-values corrected by false discovery rate.
Table B1
 
Nested hypothesis tests and corrected p-values for equal-variance versus unequal-variance, Gaussian models: (1) Observer ID, (2) condition, (3) nominal value of d′, (4) χ 2 value obtained for nested likelihood ratio test between the two hypotheses, (5) p-value for likelihood ratio test, (6) p-values corrected for multiple testing by Bonferroni's method, (7) p-values corrected by false discovery rate.
Observer Condition Nominal d χ 1 2 p Bonferroni FDR
FD Control 0.50 0.53 0.47 1.00 0.90
FD Control 1.00 1.10 0.29 1.00 0.70
FD Control 2.00 0.00 0.96 1.00 1.00
FD Test 0.50 0.15 0.70 1.00 0.96
FD Test 1.00 2.74 0.10 1.00 0.47
FD Test 2.00 0.08 0.77 1.00 0.96
SF Control 0.50 0.01 0.92 1.00 1.00
SF Control 1.00 0.33 0.56 1.00 0.90
SF Control 2.00 3.18 0.07 1.00 0.45
SF Test 0.50 4.19 0.04 0.97 0.33
SF Test 1.00 1.11 0.29 1.00 0.70
SF Test 2.00 0.06 0.80 1.00 0.96
NM Control 0.50 1.88 0.17 1.00 0.68
NM Control 1.00 1.17 0.28 1.00 0.70
NM Control 2.00 0.12 0.73 1.00 0.96
NM Test 0.50 0.00 1.00 1.00 1.00
NM Test 1.00 0.79 0.37 1.00 0.81
NM Test 2.00 0.38 0.54 1.00 0.90
BA Control 0.50 0.33 0.56 1.00 0.90
BA Control 1.00 0.01 0.91 1.00 1.00
BA Control 2.00 4.34 0.04 0.90 0.33
BA Test 0.50 1.17 0.28 1.00 0.70
BA Test 1.00 0.07 0.79 1.00 0.96
BA Test 2.00 6.63 0.01 0.24 0.24
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×