Perception is an “inverse problem,” in which the state of the world must be inferred from the sensory neural activity that results. However, this inference is both ill-posed (Helmholtz, 1856; Marr, 1982) and corrupted by noise (Green & Swets, 1989), requiring the brain to compute perceptual beliefs under conditions of uncertainty. Here we show that human observers performing a simple visual choice task under an externally imposed loss function approach the optimal strategy, as defined by Bayesian probability and decision theory (Berger, 1985; Cox, 1961). In concert with earlier work, this suggests that observers possess a model of their internal uncertainty and can utilize this model in the neural computations that underlie their behavior (Knill & Pouget, 2004). In our experiment, optimal behavior requires that observers integrate the loss function with an estimate of their internal uncertainty rather than simply requiring that they use a modal estimate of the uncertain stimulus. Crucially, they approach optimal behavior even when denied the opportunity to learn adaptive decision strategies based on immediate feedback. Our data thus support the idea that flexible representations of uncertainty are pre-existing, widespread, and can be propagated to decision-making areas of the brain.

*degree*of uncertainty is combined with the loss function, rather than simply an optimal estimate of the stimulus. This design provides an alternative to the cue-combination approach, allowing us to assess whether observers use information about uncertainty over a single visual quantity.

*σ*) of 29.9 mm, truncated at a full width of 100 mm (14.4° of visual angle) in the horizontal direction, and a rectangular envelope with a width of 10.3 mm (1.48° of visual angle) in the vertical (see Figure 1a). The pixel intensity in the two patches ranged from 0 to 255 (black to white) against a grey background of intensity 128. The separation of the two patches was 5.67 mm (0.813° of visual angle), and the stimulus appeared with the center of the upper patch located 66.7 mm (9.56° of visual angle) either to the left or to the right of the fixation cross in a pseudorandomized order. On each trial, the entire lower patch (both envelope and grating) was displaced relative to the upper patch by one of 20 pseudorandomized values, ranging from −15 to +15 pixels (positive numbers indicating offsets to the right). Each pixel corresponded to 0.333 mm (0.0478° of visual angle).

_{ r}) or “left” (

_{ l}) were constant and equal, but the cost for incorrectly answering “right” (

_{ r}) could be different from that for incorrectly answering “left” (

_{ l}). A similar asymmetric penalty approach has been used to probe uncertainty in recent studies of motor planning (Trommershäuser et al., 2003, 2005). When the penalty for answering “left” incorrectly is greater, a reasonable strategy would be to answer “right” more often when uncertain of the answer. This would result in a psychometric curve shifted in the direction of the higher penalty, yielding a higher overall score (see Figure 2a).

R r = R l | C l | C r | α | Curve shift |
---|---|---|---|---|

+20 | −10 | −50 | 0.3 | → |

+20 | −20 | −40 | 0.4 | → |

+20 | −30 | −30 | 0.5 | 0 |

+20 | −40 | −20 | 0.6 | ← |

+20 | −50 | −10 | 0.7 | ← |

*x,*evokes a stochastic neural response, on the basis of which the observer constructs an internal belief about the value of the stimulus offset (Step 1 in Figure 1b). This belief is then used to guide a decision about the appropriate response (Step 2).

*sensory*noise and any stimulus- or estimate-centered

*decision*noise that might, for instance, arise as sensory information is integrated with the loss function. Our definition of Bayesian optimality in decisions thus refers to all stimulus-centered variation. In concert with earlier analyses, we do however assume that the majority of this variation is “sensory noise,” and so we treat and refer to it as such.

*ξ,*is normally distributed with constant variance

*σ*

^{2}around the true stimulus offset:

*p*(

*ξ*∣

*x*) =

*ξ*;

*x,*

*σ*

^{2}) (Green & Swets, 1989; Thurstone, 1927). We test this assumption below, showing that the psychometric curves were all well fit by cumulative normal functions, with a constant slope parameter for each observer in each session. However, our observers each displayed a systematic bias in their responses; this will be addressed later.

*x*is not limited to a single estimated value

*ξ*. Instead,

*ξ*parameterizes a belief distribution over all possible values of

*x*that are consistent with the sensory evidence. The optimal form for this belief distribution is given by Bayes rule:

*x*is uniform, which implies that this optimal belief will also be Gaussian, with the same variance as the sensory noise distribution, and mean given by

*ξ*:

*p*(

*x*∣

*ξ*) =

*x;*

*ξ,*

*σ*

^{2}) (we might also have assumed a broad zero-centered Gaussian prior, although then the variance of the posterior belief would have been slightly smaller than that of the sensory noise, for which there was no evidence in the data). In fact, if observers are able to learn the true distribution of

*x,*their prior (and therefore posterior) belief should take the form of a series of delta functions located at each discrete offset value used. In addition, for extreme values of

*x,*the stochastic response

*ξ*may fall outside the range of possible values, distorting the posterior. However, variability in decisions at the extremes was minimal, so that any divergence from normality at those points would have little impact on estimates of sensory variance. And within the central range, where decisions did vary, the values of the stimulus offset used were very closely spaced, and we saw no evidence that observers were aware of the discretization.

*p*(

*x*∣

*ξ*) that lie to the right and to the left of 0, respectively,

*α,*that includes all the cost and reward terms. The values of

*α*for each set of costs and rewards used in our experiment are given in Table 1,

*A*

_{ r}(

*x,*

*α*) for the probability that the Bayesian observer answers “right,” given a stimulus offset

*x*and cost structure

*α,*and

*x*′ is a dummy variable of integration over the observer's belief.

*p*(

*ξ*∣

*x*) and

*p*(

*x*∣

*ξ*) but zero mean:

*f*

_{ σ}(

*ζ*) = exp(−

*ζ*

^{2}/2

*σ*

^{2})/

*p*(

*ξ*∣

*x*) =

*f*

_{ σ}(

*ξ*−

*x*) and

*p*(

*x*∣

*ξ*) =

*f*

_{ σ}(

*x*−

*ξ*), and the corresponding cumulative function is

*Φ*

_{ σ}(

*ζ*) =

*f*

_{ σ}(

*ζ*′)

*dζ*′. Then,

*Φ*

_{ σ}(and exploit its symmetry), we obtain the following easily computed expression for the optimal probability with which the observer should answer “right” for a given value of

*α*and

*x*;

*σ,*of the zero-mean cumulative Gaussian distribution

*Φ*

_{ σ}. This plays two roles in our Bayesian analysis; it is both the width of the sensory noise distribution and, under the assumed uniform prior, the width of the consequent belief distribution. Under the symmetric cost condition (

*α*= 0.5), the observer's decision reflects only whether the mean of his or her belief lies to the left or right of 0 (according to sensory noise) and is independent of the width of the belief distribution. Thus, following the standard psychometric approach, we estimate the variance of the noise by fitting a psychometric function based on a cumulative Gaussian to the behavioral data, with the slope of the function providing an estimate of

*σ*.

*α*changes, the psychometric curves (1) retain the same cumulative-normal shape, with the same width parameter, and (2) translate by an amount

*Φ*

_{σ}^{−1}(

*α*). Figure 2b shows an example of the psychometric function fit to the data for one observer in one session. Shifts with changing

*α*are clearly visible.

*α*by Bayesian model selection. We then tested the agreement of the observed curve translations with those predicted by the optimal Bayesian analysis. We fit psychometric functions to the data and measured the center

*μ*of each, i.e., the value of

*x*at which the fitted psychometric curve gave equal probabilities of each answer, given by the mean of the underlying Gaussian (see Figure 2c). We then used the maximal slope of the psychometric functions as a measure of

*σ*and inverted Equation 13 to recover the predicted

*optimal*values of the center

*μ**

_{ j}for each cost asymmetry value

*α*

_{ j}(see Figure 2d).

*lapse*term (see, e.g., Wichmann & Hill, 2001) and binomially distributed response counts. We used 20 different true offsets

*x*

_{i}and 5 different cost asymmetries

*α*

_{j}, with

*N*

_{ij}trials in each condition. The number of trials

*n*

_{ij}in which observers answer “right” for stimulus offset

*x*

_{i}and cost distribution

*α*

_{j}is assumed to be drawn from a binomial distribution

*p*

_{ ij}should be given by

*A*

_{ r}(

*x*

_{ i},

*α*

_{ j}) in Equation 13, which has a cumulative normal form. To fit the data, we therefore also assumed an underlying cumulative Gaussian shape, parameterized in terms of the standard error function, such that the parameters

*μ*

_{ j}and

*ρ*

_{ j}gave the center and maximal slope, respectively, of the curve under the

*j*th value of

*α*

*p*

_{ij}in such cases to

*ε*

_{j}(the “lapse rate” parameter referred to above), leaving the probability that the response was instead based on the cumulative Gaussian function as 1 −

*ε*

_{j};

*α*: the center

*μ*

_{ j}and slope

*ρ*

_{ j}of the cumulative Gaussian, and the random error or lapse rate

*ε*

_{ j}. An estimate of the slope parameter

*ρ*

_{ j}provides an estimate of the width of the underlying Gaussian, according to

*α*conditions or fit separately (see below).

*α,*we examined the residual error between the measured response data and the best fit psychometric curve. Figure 3 shows the deviance residuals (McCullagh & Nelder, 1989; Wichmann & Hill, 2001) for all four participants, for each of the two sessions. The deviance residual is used to measure discrepancies in terms of the underlying likelihood model; in effect, it rescales the error by the locally predicted variance. Based on the total deviance, the cumulative normal model could not be rejected by a degrees-of-freedom-adjusted

*χ*

^{2}test, nor by a Monte-Carlo-based exact-binomial test (Wichmann & Hill, 2001) (

*p*> 0.3 and

*p*> 0.8, respectively, after correcting for multiple tests; in neither case could the distribution of

*p*-values over the multiple tests be distinguished from uniform; Kolmogorov–Smirnov test,

*p*> 0.05).

*shape*of the psychometric function was inappropriate for any value of

*α*for any observer. This was confirmed using a runs test for randomness of the sign of the residuals, by which the hypothesis that the scatter of residuals was random could not be rejected (

*p*> 0.7 after multiple-test correction,

*p*-values uniform by K–S test,

*p*> 0.05).

*α*; that is, the parameters

*ρ*

_{ j}in Equation 17 are, in fact, all the same. Visual inspection of the data supported this assumption (for an example, see Figure 2b). To assess this quantitatively, we employed Bayesian model selection to ask which of the various models with either shared or varying slope and lapse parameters was more probable, given the data. Details of this procedure are given in 1, along with results in Table A1, and the fitted parameter values for the best model in Table A2. As predicted, for all observers, the best model had a single slope parameter for all

*α*conditions within each session. However, lapse rates did vary with

*α*for two observers, while being constant for the other two. The lapse rate captures decision noise, motor errors, and moments of inattention, and it seems reasonable that, while the internal uncertainty is the same for each

*α*value, such random lapses might depend on the costs.

Summed Laplace approximation for individual session models | ||||
---|---|---|---|---|

Observer | Single ρ | Separate ρ | ||

Single ε | Separate ε | Single ε | Separate ε | |

1 | 1995 | 1997 | 1970 | 1975 |

2 | 1602 | 1638 | 1613 | 1619 |

3 | 2351 | 2338 | 2346 | 2335 |

4 | 2061 | 2033 | 2055 | 2041 |

Laplace approximation for pooled session model | ||||

Single ρ | Separate ρ | |||

Observer | Single ε | Separate ε | Single ε | Separate ε |

1 | 1851 | 1841 | 1827 | 1830 |

2 | 1433 | 1487 | 1446 | 1441 |

3 | 2200 | 2185 | 2197 | 2181 |

4 | 1944 | 1948 | 1941 | 1929 |

Observer | α | Session 1 | Session 2 | ||||||
---|---|---|---|---|---|---|---|---|---|

μ (pixels) | ρ (1/pixels) | σ (pixels) | ε (probability) | μ (pixels) | ε (1/pixels) | σ (pixels) | ε (probability) | ||

1 | 0.7 | −4.0 | }0.072 | }5.6 | 0.052 | −3.8 | }0.086 | }4.6 | 0.031 |

0.6 | −3.5 | 0.025 | −3.3 | 0.00002 | |||||

0.5 | −1.9 | 0.13 | −2.1 | 0.017 | |||||

0.4 | 2.5 | 0.16 | 0.55 | 0.031 | |||||

0.3 | 3.1 | 0.045 | 0.63 | 0.072 | |||||

2 | 0.7 | −7.2 | }0.049 | }8.2 | 0.019 | −7.3 | }0.055 | }7.2 | 0.013 |

0.6 | −7.5 | 0.0019 | −5.1 | 0.00012 | |||||

0.5 | −2.1 | 0.11 | −1.8 | 0.10 | |||||

0.4 | 2.0 | 0.021 | 3.1 | 0.10 | |||||

0.3 | 4.2 | 0.0098 | 5.8 | 0.016 | |||||

3 | 0.7 | −2.3 | }0.087 | }4.6 | }0.017 | −0.92 | }0.13 | }3.1 | }0.011 |

0.6 | −0.79 | −0.60 | |||||||

0.5 | 0.82 | 0.28 | |||||||

0.4 | 1.6 | 0.44 | |||||||

0.3 | 4.5 | 0.76 | |||||||

4 | 0.7 | −3.5 | }0.075 | }5.3 | }0.0069 | −3.6 | }0.079 | }5.1 | }0.042 |

0.6 | −2.3 | −2.1 | |||||||

0.5 | 0.45 | 1.3 | |||||||

0.4 | 4.3 | 4.2 | |||||||

0.3 | 4.8 | 5.0 |

*α*. We next asked whether the observed shifts in the curve centers were aligned with the predictions of the ideal Bayesian observer model.

*α*= 0.5 condition is always 0. However, for all observers, the curve centers for the

*α*= 0.5 condition were non-zero. Two observers showed a rightward bias in both sessions, and two showed a leftward bias in both sessions (see Table A2), and we found no evidence that the bias was absent in the asymmetric penalty conditions. A similar directional bias has been reported widely in psychophysical studies (Green & Swets, 1989). In the analysis below, we treat the directional bias as a constant constraint on observers' computations and attempt to separate this form of non-optimality from the novel question of whether observers were able to integrate correctly the loss function with an estimate of internal uncertainty. Thus, we compute shifts as relative to the biased center for

*α*= 0.5, yielding “predicted relative shifts” for the other four

*α*conditions

*p*< 0.01 under a 2-tailed paired-samples

*t*-test) for the greater cost asymmetries.

*x*and

*α*(see Equations 15 and 17). We used this model to compute the mean and the standard deviation of the score observers would have obtained had they shifted their psychometric curves by the optimal amount from the biased mean of the curve for the

*α*= 0.5 condition (for details, see 2). We were then able to ask whether the true score was statistically distinguishable from this predicted value.

*sensory*noise and any stimulus-centered

*decision*noise not modeled by the stimulus-independent lapse-rate parameter. However, we assumed that the majority of this stimulus-centered variation was in fact due to sensory noise and treated it as such. If this assumption is incorrect, the measured slope may incorporate stimulus-centered “decision” noise associated with integrating the loss function with the true uncertainty, and thus an increase in slope might reflect an improvement in task performance rather than a change in internal uncertainty. However, inspecting Figures 4a and 4b shows that for only one observer did the slope of the linear fit to performance (the dashed line) get closer to the identity line in the second session, supporting the assertion that the internal uncertainty was changing rather than the ability to perform the task. Indeed, the observer whose fit improved was the same observer who obtained a score outside the predicted range in the first session, and it seems possible that she

*did*change her strategy.

*α*that demanded relatively small shifts and used practice trials and a blocked design with regular reminders of the current cost values.

*sensory*noise and any stimulus-centered

*decision*noise not modeled by the lapse-rate parameter. However, we assumed that the majority of this stimulus-centered variation was due to sensory noise. This assumption was supported by examining the results across sessions—the ability of observers to choose optimally in the face of asymmetric costs did not change as their ability on the task, measured by the slope of the psychometric function, did (see Figures 4a and 4b). Using external manipulations to produce randomly intermixed uncertainty levels on each trial would allow us to separate more explicitly any stimulus-centered decision noise from uncertainty due to sensory processing. However, Landy et al. (2007) found more suboptimality when levels of uncertainty were randomly intermixed rather than blocked as in our experiment, and it is not clear

*why*this should be the case. It could reveal limits on the ability to perform online Bayesian processing, or alternatively arise from corrupting cognitive or psychological factors. In the present study, we were not interested in trying to delineate these factors, and so we used a blocked design. In addition, we wanted to retain the property of all stimulus uncertainty arising internally. Future investigations of the effect of task and experimental design on optimality are important to pull apart optimal sensory processing from cognitive reasoning effects.

*α*, in each session. In principle, each such function might have entirely different parameters. Our expectation, however, based on the Bayesian observer analysis, is that the slope parameter should remain constant as

*α*varies within one session, although it may change between sessions. We used Bayesian model comparison to determine which group of shared parameters was best supported by our data, by evaluating an approximation to the marginal likelihood or “evidence” for a number of different models. This approach to choosing an appropriate model originates with Jeffreys (1939) and incorporates an Occam's razor-type penalty for models with more parameters (Gull, 1988; Kass & Raftery, 1995; Mackay, 2004). In the absence of prior bias towards any particular model, the marginal likelihoods are proportional to the probabilities of each model being the one from which the data arose.

*ρ*

_{ j}and

*ε*

_{ j}parameters for each

*α*, models with the values of

*ρ*

_{ j}and

*ε*

_{ j}restricted to have the same value for all

*α*, and models with one parameter restricted while the other was allowed to vary. The centers

*μ*

_{ j}always varied with

*α*, as all data sets showed very clear shifts.

*maximum a posteriori*(MAP) parameter values, given the data, under a non-informative prior. As the exact evidence could not be calculated, these MAP parameters for each model were then used to compute a Laplace approximation to the marginal likelihood in each case. The Laplace approximation results from taking the first three terms of a Taylor expansion about the MAP parameters (Mackay, 2004). In the equations below,

*D*refers to the data,

*m*to the model,

*θ*to the vector of all parameters,

*θ** to the MAP parameters, and

*d*to the number of parameters in the model. The matrix

*A*is the Hessian of the log posterior, i.e., the matrix of second partial derivatives of log

*P*(

*θ*|

*D,*

*m*) with respect to

*θ,*evaluated at

*θ**

*ρ*parameter for all

*α,*whether or not this slope was the same or different between sessions. A single lapse parameter was best for two observers and separate lapse parameters for the other two. As mentioned above, the lapse rate is incorporated in the model to account for decision noise, motor errors, and moments of inattention, and it seems reasonable that, while the internal uncertainty is the same for each

*α*value, such random lapses might vary.

*α*value;

*α*value is given by the sum, over the different possible stimulus offsets

*x*

_{ i}, of the number of trials on which the observer answers “right” and “left” correctly and incorrectly, multiplied by the appropriate reward or cost parameter. Using the same definitions of

*N*

_{ ij}and

*n*

_{ ij}as in Equation 15, this is

*n*

_{ ij}is binomially distributed with mean

*N*

_{ ij}

*p*

_{ ij}, where

*p*

_{ ij}is given by the psychometric function ( Equation 17). To obtain the expected score under the optimal strategy (constrained by the observed bias), we evaluated Equation 16 for each offset, using the measured value of

*ρ,*but using the optimal relative value of

*μ*

_{ j}obtained by adding the optimal relative shift to the observed bias in the symmetric condition (i.e.,

*μ*

_{0.5}+

*Δμ*

_{ j}*). Calling these optimal relative values

*p*

_{ ij}*, the expected score under the constrained optimal strategy is,

_{ j}in Equation 23, the binomial mean as before, and the binomial second moments:

*δ*

_{ ii′}is the Kronecker delta. We obtain