Contrast discrimination functions for simple gratings famously look like a dipper. Discrimination thresholds are lower than detection thresholds for moderate pedestal contrasts, and the rate of growth of thresholds as the pedestal contrast gets larger typically lies between the values implied by two popular treatments of noise. Here, we suggest a new normative treatment of the dipper, showing how it emerges from Bayesian inference based on the responses of a population of orientation-tuned units. Our central assumption concerns the noise corrupting the outputs of these units as a function of the contrast: We suggest that it has the shape of a hinge. We show the match to the psychophysical data and discuss the neurobiological and statistical rationales for this form of noise. Finally, we relate our model to other major accounts of contrast discrimination.

*r*

_{ i }of neuron

*i,*given an input grating at angle

*θ*and contrast

*c*as being Gaussian distributed (e.g., Goris et al., 2009; Itti et al., 2000; Kontsevich, Chen, & Tyler, 2002):

*θ*

_{ i }is the preferred orientation of the unit.

*τ*

^{2}is the signal-dependent variance of the response, which we describe in detail below.

*ϕ*= 15° is the tuning width, chosen based on the mean empirical values in cat visual cortex for both simple and complex cells (Li, Peterson, & Freeman, 2003), and also close to values used before in fitting human contrast discrimination performance (Itti et al., 2000). In all the simulations we report, we used a population of 36 equally spaced filters across the possible 180 degrees of orientation (for symmetric stimuli), with one unit directly centered on the stimulus orientation. We also make the approximation of allowing the activities

*r*

_{ i }to be negative; this is a simplification away from a non-zero baseline.

*r*

_{ i }scales linearly with the contrast is equivalent to using a linear (in fact, an identity) transducer. Figure 2A cartoons the mean of responses

*r*

_{ i }for various contrasts. Our model abstracts away some of the detail of the neural architecture, and so we cannot specify the precise mapping to individual cell types and regions. However, we note that cells in the visual cortex are linear in the middle of their range (Albrecht & Hamilton, 1982) and the responses of neurons very early in the visual pathway are often rather linear functions of contrast, especially for lower contrasts (Derrington & Lennie, 1984; Kaplan & Shapley, 1986), though these cells are not tuned to orientation.

*p*

_{A}[

**r**

_{A}∣

*c*] and

*p*

_{B}[

**r**

_{B}∣

*c*], which are determined by the identity transducer and variance functions we assumed above, and the prior information about which contrasts are likely to appear. Different methods of collecting thresholds imply different prior distributions over the target and pedestal contrasts; we consider three canonical cases.

*c*

_{T}, and pedestal contrast,

*c*

_{P}, choosing A as the target if

*p*

_{A}[

**r**

_{A}∣

*c*

_{T}]

*p*

_{B}[

**r**

_{B}∣

*c*

_{P}] >

*p*

_{A}[

**r**

_{A}∣

*c*

_{P}]

*p*

_{B}[

**r**

_{B}∣

*c*

_{T}].

^{1}Second, a threshold obtained by a staircase on the contrast increment, such as used in Foley's (1994) data that we fit in Figure 1, implies that the pedestal contrast would be well known, but the target contrast could be a range of values (Pedestal Known). We captured this case by using a prior distribution in which the pedestal was known and the target contrast was equally likely to be any larger contrast for this task. In this case, interval A is chosen if

*p*

_{A}[

**r**

_{A}∣

*c*

_{T}]

*dc*

_{P}

*p*

_{B}[

**r**

_{B}∣

*c*

_{P}] >

*p*

_{B}[

**r**

_{B}∣

*c*

_{T}]

*dc*

_{P}

*p*

_{A}[

**r**

_{A}∣

*c*

_{P}]. Finally, if target and pedestal are chosen randomly on each trial, this might imply a uniform prior distribution over both pedestal and target contrasts with the assumption that the target contrast is higher (Neither Known). In this case, interval A is chosen if

*dc*

_{A}

*p*

_{A}[

**r**

_{A}∣

*c*

_{A}]

*dc*

_{B}

*p*

_{B}[

**r**

_{B}∣

*c*

_{B}] > 0.5, where

*c*

_{A}and

*c*

_{B}are the possible contrasts of interval A and interval B, respectively. In various limits, the full Bayesian treatment with this prior distribution will give the same answer as maximum likelihood inference (Gelman, Carlin, Stern, & Rubin, 2004; Jaynes, 2003).

^{2}The reason that the prior only has a slight influence on the threshold is analogous to being told only the final score in a match between two teams: It is often not much help in guessing which team was the winner. This prediction corresponds to empirical results in contrast detection and discrimination, in which prior knowledge of contrast has surprisingly little effect. In forced-choice detection, Davis, Kramer, and Graham (1983) found no difference between blocks with a single contrast (Both Known) and blocks with multiple target contrasts (Pedestal Known). Huang and Dobkins (2005) used a staircase procedure to determine contrast discrimination thresholds and found little difference between blocks in which a single pedestal contrast was used (Pedestal Known) and blocks in which pedestal contrasts were mixed (Neither Known).

*p*[

**r**∣

*c*] that relates contrast to population response

**r**(we omit parameters other than contrast, since these are known) and considers estimators

**r**) of the contrast given a sample of

**r**. The bound limits the minimum variance

*c*) of any unbiased such estimators (i.e., having mean

*c*) to

**r**and not the estimator, and is given in 1. The variance of the estimator is important because it helps to determine discrimination performance. For instance, for an unbiased estimator with Gaussian statistics with nearly equal variances, the probability of getting the contrast discrimination correct when

*c*

_{p}is the pedestal contrast and

*c*

_{t}is the target contrast would be approximately

^{−1}is the inverse of the standard cumulative Normal distribution. This would then determine the threshold contrast increment shown in Figure 1.

*c*) affects the CR lower bound, since bias acts in the same way as non-linear transduction. Seriés, Stocker, and Simoncelli (2009) show that the resulting change in the variability of the estimator is exactly matched by a change in its systematic output, thus leaving thresholds unaffected. This only holds when the slope of the bias is relatively constant; for the case with low contrast, this slope changes significantly.

*c*) either accurately reflects the variance of the Bayesian estimator or can be used to calculate the thresholds. In those cases, with a flat prior on

*c*(ignoring limits on the contrast), the posterior distribution is approximately Gaussian, with mean and median being the maximum likelihood value of

*c*:

*τ*(

_{ i }) is the hinge noise at a particular mean response and

*c*that is present in signal-dependent noise (Abbott & Dayan, 1999).

*τ*(

_{ i }) with

_{ i }specified by hinge noise has two opposite effects: The rising standard deviation will increase the bound and push the threshold upward, but an increasing slope will decrease the bound and push the threshold

*downward*. For the case of hinge noise, the change in regime, from nugatory dependence on

*c*to substantial dependence, makes for the dip.

_{ i }=

*cf*

_{ i }(

*θ*). In 1, we derive the Cramér–Rao lower bound as having the expected square-root dependence:

*c*:

^{3}

*c*= 0.1 and

*c*= 0.36. This result matches the qualitative finding of a slope between 0.5 and 1, though quantitatively misses the slope of the data at high contrasts.

^{4}The intermediate slope of the function is due to hinge noise being partly constant and partly growing. A high-contrast stimulus produces high activations in units that are well tuned to the stimulus but low activations in units that are poorly tuned to the stimulus, as shown in Figure 2. For very low activations, the noise is essentially flat, which would produce a contrast discrimination function with a slope of 0. For higher activations, the variance of the response is equal to the mean, which produces a slope of 1 as shown above. The result is a compromise between the two possible slopes—a result of the interaction between the population of variously tuned units, hinge noise, and optimal decisions.

^{5}while our proposed model uses a linear transducer and noise that never decreases.

*f*

_{ i }(

*θ*) as

*f*

_{ i }in this appendix for convenience.

*i*of the total of

*n*units, let the mean response equal

_{ i }=

*cf*

_{ i }, where

*f*

_{ i }is the orientation-dependent response of unit

*i*based on the discrepancy between its preferred orientation and the stimulus orientation at a contrast of 1. Let

**r**be the vector of outputs drawn as independent Poisson samples with these mean values.

*cf*

_{ i }on a particular interval, the interarrival times ℓ

_{ i }follow an exponential distribution with mean 1/

*cf*

_{ i }. Thus, writing

*r*

_{ i }as the time to the first event of unit

*i,*we have

_{ i }=

*cf*

_{ i }and

*τ*

^{2}(

*c*) =

*cf*

_{ i }. In this case, the Fisher information is

*P*

_{C}simplifies from Equation 3 to

*c*is the threshold difference in contrast between the pedestal

*c*

_{p}and the target

*c*

_{t}. We assume that

*P*

_{C}is equal to the threshold probability correct. Solving for this threshold increment using the quadratic formula:

*c*is positive eliminates the negative root and where

*A*= [Φ(

*P*

_{C})]

^{2}≥ 0. To find the slope of the threshold in log–log coordinates, we take the derivative of log Δ

*c*with respect to log

*c*

_{p}:

*c*

_{P}, this slope will converge to 1/2.

*c*

_{t}and

*c*

_{p}to produce an equivalent

*P*

_{C}, the sum of the variances in the denominator must decrease.