Free
Research Article  |   March 2008
Implicit knowledge of visual uncertainty guides decisions with asymmetric outcomes
Author Affiliations
Journal of Vision March 2008, Vol.8, 2. doi:https://doi.org/10.1167/8.3.2
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Louise Whiteley, Maneesh Sahani; Implicit knowledge of visual uncertainty guides decisions with asymmetric outcomes. Journal of Vision 2008;8(3):2. https://doi.org/10.1167/8.3.2.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Perception is an “inverse problem,” in which the state of the world must be inferred from the sensory neural activity that results. However, this inference is both ill-posed (Helmholtz, 1856; Marr, 1982) and corrupted by noise (Green & Swets, 1989), requiring the brain to compute perceptual beliefs under conditions of uncertainty. Here we show that human observers performing a simple visual choice task under an externally imposed loss function approach the optimal strategy, as defined by Bayesian probability and decision theory (Berger, 1985; Cox, 1961). In concert with earlier work, this suggests that observers possess a model of their internal uncertainty and can utilize this model in the neural computations that underlie their behavior (Knill & Pouget, 2004). In our experiment, optimal behavior requires that observers integrate the loss function with an estimate of their internal uncertainty rather than simply requiring that they use a modal estimate of the uncertain stimulus. Crucially, they approach optimal behavior even when denied the opportunity to learn adaptive decision strategies based on immediate feedback. Our data thus support the idea that flexible representations of uncertainty are pre-existing, widespread, and can be propagated to decision-making areas of the brain.

Introduction
The idea that perception should be viewed as unconscious inference dates back to Al Hazen's 11th century treatise on optics and is still fundamental to vision science today. The challenge is to understand how such inference takes place in situations that are both ill-posed (Helmholtz, 1856; Marr, 1982) and noisy (Green & Swets, 1989). Considerable recent evidence suggests that in such situations human inference can closely approach the performance of a Bayes-optimal observer (the probabilistic inferential analogue to the ideal observer of signal detection theory). These demonstrations have largely focused on sensory cue-combination tasks, both cross-modal (Ernst & Banks, 2002; Deneve, Latham, & Pouget, 2001) and unimodal (see e.g., Jacobs, 1999; Hillis, Watt, Landy, & Banks, 2004; Knill & Saunders, 2003; Landy & Kojima, 2001), and on the effects of motor and sensory uncertainties on motor planning (Körding & Wolpert, 2004; Saunders & Knill, 2004, 2005; Trommershäuser, Gepshtein, Maloney, Landy, & Banks 2003; Trommershäuser, Gepshtein, Maloney, Landy, & Banks, 2005; Tassinari, Hudson, & Landy, 2006). There is much less evidence for Bayesian optimality in perceptual estimation tasks for a single, visual quantity (but see Landy, Goutcher, Trommershäuser, & Mamassian, 2007; Schwartz, Sejnowski, & Dayan, 2006; Stocker & Simoncelli, 2006; Weiss, Simoncelli, & Adelson, 2002). 
The present study was designed to probe what might be the simplest context in which Bayesian optimality can be demonstrated and to rule out other explanations for apparently Bayes-optimal behavior. The task was a simple unimodal visual Vernier or “offset” discrimination (Westheimer, 1979), in which observers had to answer whether one stimulus was offset to the left or to the right of another. The stimulus set was fixed, and thus any perceptual uncertainty was due to sensory noise and subsequent processing. This enabled us to ask whether even for the simplest visual task the brain has an estimate of internal uncertainty available to guide behavior (see also Schwartz et al., 2006). There are two key issues with this kind of behavioral Bayesian optimality experiment: (a) what optimality implies about which aspect of the sensory noise distribution is represented, and (b) whether we can conclude that observers do not mimic optimal behavior via another strategy which does not require the representation of uncertainty. We attempt to address both these issues in the present study. 
To probe the observers' model of internal uncertainty, we imposed an asymmetric loss function on their Vernier judgments. In the face of such an asymmetry, the Bayes-optimal observer combines information about their internal uncertainty, at least in the form of a likelihood ratio, with knowledge about the loss function in order to arrive at an optimal decision (Berger, 1985; Cox, 1961). In any such behavioral task information from a potentially uncertain distribution is used in the computation of a decision, but here we show that information about the degree of uncertainty is combined with the loss function, rather than simply an optimal estimate of the stimulus. This design provides an alternative to the cue-combination approach, allowing us to assess whether observers use information about uncertainty over a single visual quantity. 
Similar external loss functions have long been used in the psychophysics literature to modify response criteria and thus explore the receiver operating characteristic (ROC) curve (Green & Swets, 1989). In these earlier experiments, however, observers were provided with trial-by-trial feedback about their performance, potentially allowing them to adopt a feedback-driven threshold-adaptation strategy that mimics the Bayes-optimal strategy without requiring an explicit model of internal uncertainty. Crucially, in our experiment, observers received only periodic cumulative feedback, thus ruling out the use of such simple adaptive threshold schemes and strengthening the conclusions that can be drawn from behavioral results. 
Methods
Observers
Four participants (2 male, 2 female) took part in the experiment. They had a mean age of 25.5 years, were all right-handed, and had normal or corrected-to-normal vision. Three were entirely naive with respect to the aims of, and theory behind, the experiment; and the fourth (observer 4) was an author (LW). 
Stimulus and equipment
The stimulus consisted of a pair of vertically arranged Gabor-like patches, in which a sinusoidal grating with a spatial frequency of 0.03 cycles/mm (0.21 cycles/° of visual angle) was multiplied by a Gaussian envelope with a characteristic width (2 σ) of 29.9 mm, truncated at a full width of 100 mm (14.4° of visual angle) in the horizontal direction, and a rectangular envelope with a width of 10.3 mm (1.48° of visual angle) in the vertical (see Figure 1a). The pixel intensity in the two patches ranged from 0 to 255 (black to white) against a grey background of intensity 128. The separation of the two patches was 5.67 mm (0.813° of visual angle), and the stimulus appeared with the center of the upper patch located 66.7 mm (9.56° of visual angle) either to the left or to the right of the fixation cross in a pseudorandomized order. On each trial, the entire lower patch (both envelope and grating) was displaced relative to the upper patch by one of 20 pseudorandomized values, ranging from −15 to +15 pixels (positive numbers indicating offsets to the right). Each pixel corresponded to 0.333 mm (0.0478° of visual angle). 
Figure 1
 
Experimental design. (a) On each trial, a stimulus consisting of two vertically arranged Gabor patches was briefly flashed, and the observer pressed a key to say whether the lower patch was offset to the left or to the right of the upper patch. Participants were asked to maximize their score, with varying numbers of points being awarded for a correct answer (“reward”) and deducted for an incorrect answer (“cost”). Participants received only periodic feedback about their performance, in the form of a cumulative score every 15 trials. (b) Schematic of the quantities and transformations involved in the construction of the Bayes-optimal observer. The stimulus produces a stochastic neural response. The observer transforms this neural response into a belief distribution (see (1)) and then uses this belief to decide which answer to give in the face of varying costs and penalties (see (2)). The Bayes-optimal observer specifies the optimal forms of transformations (1) and (2), thereby providing a behavioral benchmark of optimality.
Figure 1
 
Experimental design. (a) On each trial, a stimulus consisting of two vertically arranged Gabor patches was briefly flashed, and the observer pressed a key to say whether the lower patch was offset to the left or to the right of the upper patch. Participants were asked to maximize their score, with varying numbers of points being awarded for a correct answer (“reward”) and deducted for an incorrect answer (“cost”). Participants received only periodic feedback about their performance, in the form of a cumulative score every 15 trials. (b) Schematic of the quantities and transformations involved in the construction of the Bayes-optimal observer. The stimulus produces a stochastic neural response. The observer transforms this neural response into a belief distribution (see (1)) and then uses this belief to decide which answer to give in the face of varying costs and penalties (see (2)). The Bayes-optimal observer specifies the optimal forms of transformations (1) and (2), thereby providing a behavioral benchmark of optimality.
Before each block of main trials, observers were given a short block of practice trials in which the stimulus duration was 500 ms. In main trials, the stimulus duration was 160 ms, which is shorter than the latency for initiating a saccade (Carpenter, 1988; Hodgson, 2002), and was chosen to avoid fixation of the stimuli. There was a randomized delay period of 750–1250 ms between the time of response and the time of presentation of the next stimulus. 
The experimental program was written in MATLAB (The Mathworks Inc., Natick, MA), using the Psychophysics Toolbox extensions (Brainard, 1997; Pelli, 1997). 
Procedure
Observers sat at a table in front of a computer screen and placed their head on a chin rest such that the perpendicular distance from their eyes to the screen was 400 mm. During the experiment, observers fixated a central cross on the screen and were asked to make simple Vernier judgments (Westheimer, 1979), reporting whether the lower of the two Gabor patches was offset to the right or to the left of the upper one. The task is depicted schematically in Figure 1a. Responses were given using a computer keyboard. 
We imposed an asymmetric loss function to probe the observers' representation of uncertainty. On each trial observers were awarded points for a correct answer (“rewards”) or had points taken away for an incorrect answer (“costs”). Observers were instructed and given an incentive to maximize the cumulative number of points scored during the experiment. The loss function varied between blocks of trials—the rewards for correctly answering “right” (
R
r) or “left” (
R
l) were constant and equal, but the cost for incorrectly answering “right” (
C
r) could be different from that for incorrectly answering “left” (
C
l). A similar asymmetric penalty approach has been used to probe uncertainty in recent studies of motor planning (Trommershäuser et al., 2003, 2005). When the penalty for answering “left” incorrectly is greater, a reasonable strategy would be to answer “right” more often when uncertain of the answer. This would result in a psychometric curve shifted in the direction of the higher penalty, yielding a higher overall score (see Figure 2a). 
Figure 2
 
Evaluating behavioral optimality. (a) Illustration of the qualitative prediction for optimization of the loss function—observers should give the answer with the lower penalty when uncertain, resulting in a shift of the psychometric curve in the direction of the higher cost. (b) Example data from one observer in the five different conditions. Crosses show data points, and the smooth lines show psychometric functions fit to the data, with the slope constrained to be the same for each condition. (c) Illustration of the procedure for measuring observed shifts once psychometric functions have been fit to the data from the five conditions. (d) Illustration of the procedure of taking the inverse value of the psychometric function at the values of α used in the experiment. The optimal shift between two psychometric curves is then given by the difference between the two corresponding inverse values.
Figure 2
 
Evaluating behavioral optimality. (a) Illustration of the qualitative prediction for optimization of the loss function—observers should give the answer with the lower penalty when uncertain, resulting in a shift of the psychometric curve in the direction of the higher cost. (b) Example data from one observer in the five different conditions. Crosses show data points, and the smooth lines show psychometric functions fit to the data, with the slope constrained to be the same for each condition. (c) Illustration of the procedure for measuring observed shifts once psychometric functions have been fit to the data from the five conditions. (d) Illustration of the procedure of taking the inverse value of the psychometric function at the values of α used in the experiment. The optimal shift between two psychometric curves is then given by the difference between the two corresponding inverse values.
We used five different sets of costs and rewards, listed in Table 1. The final column in this table shows, qualitatively, the relative shift we would expect for each set of costs and rewards according to the strategy just described. In the Results section, we describe an optimal Bayes observer analysis that confirms and quantifies the optimality of this “curve shifting” strategy. 
Table 1
 
Values of α corresponding to costs and rewards.
Table 1
 
Values of α corresponding to costs and rewards.
R r = R l C l C r α Curve shift
+20 −10 −50 0.3
+20 −20 −40 0.4
+20 −30 −30 0.5 0
+20 −40 −20 0.6
+20 −50 −10 0.7
The five different sets of costs and rewards were presented in a counterbalanced, pseudorandomized block design. This was repeated in two experimental sessions on separate days, which provided a further test of our hypothesis—the level of observers' internal uncertainty might be expected to differ in the two sessions due to perceptual learning, consolidation, or extrinsic factors. If observers were to behave optimally in two sessions with different noise distributions, this would support the claim that they carry a flexible representation of current internal uncertainty. 
Each block consisted of a short practice session (60 trials) and a main session (260 trials). In the main session, feedback on performance was provided only every 15 trials, when observers were shown the total score they obtained in the last 15 trials, as well as their cumulative total in the block so far. The sparseness of this feedback made it unlikely that observers could learn an optimal internal threshold by an adaptive threshold adjustment strategy. Control analyses (see Results) support this view. In the practice session, observers received trial-by-trial feedback to familiarize them with the cost values for that block and encourage consistent performance in the main trials. However, the practice stimuli were presented for 500 ms rather than 160 ms, which made the task much easier. As the effective internal noise should therefore be different for the practice stimuli, feedback in the practice session could not be used to adaptively learn a response threshold relevant to the main-block trials. In addition, the easier stimulus meant that there were very few trials on which observers failed to give the correct answer, implying that there should have been very little uncertainty in their belief. This further limits the likelihood that they could use the practice session to test alternative strategies for dealing with the loss function. 
An instruction screen appeared at the beginning of each block, and after each feedback screen, reminding observers of the task and the costs for that block. After the experiment was finished, observers were debriefed using a simple questionnaire. Participants were paid according to ethical guidelines, with a score-related bonus in gift vouchers to motivate concentrated performance and encourage observers to try to maximize their total score. 
Results
Bayes-optimal observer analysis
The final column of Table 1 shows the relative shift of the psychometric function we might expect for different settings of the costs and rewards, under an intuitive strategy for maximizing score in which observers shift their psychometric function in the direction with the higher penalty. A quantitative Bayesian observer analysis was used to confirm and to quantify the optimality of this strategy. Figure 1b depicts the quantities involved in this analysis. The visual stimulus, with a Vernier offset x, evokes a stochastic neural response, on the basis of which the observer constructs an internal belief about the value of the stimulus offset (Step 1 in Figure 1b). This belief is then used to guide a decision about the appropriate response (Step 2). 
An individual observer's responses to repeated presentations of the same visual stimulus may vary. We assume that this variability arises from at least two separate sources of noise. The first source perturbs the observer's sensory estimate of the Vernier offset by a random additive increment. This creates a noise distribution centered on the stimulus, the uncertainty due to which is reflected in the observer's belief. The second source affects the observer's decision directly, in a way that is independent of the value of the stimulus offset. We may think of this as “decision noise” or as the result of motor errors or of lapses in attention. We do not expect this source of variation to affect the observer's internal belief about the value of the offset, and so it is neglected in the theoretical development below. When modelling experimental responses, however, we introduce a separate “lapse rate” parameter, so that these stimulus-independent errors do not corrupt our estimate of the stimulus-centered noise. Note that we do not distinguish between stimulus-centered sensory noise and any stimulus- or estimate-centered decision noise that might, for instance, arise as sensory information is integrated with the loss function. Our definition of Bayesian optimality in decisions thus refers to all stimulus-centered variation. In concert with earlier analyses, we do however assume that the majority of this variation is “sensory noise,” and so we treat and refer to it as such. 
In keeping with the standard psychophysical treatments of sensory noise, our model assumes that the internal estimate of the Vernier offset, ξ, is normally distributed with constant variance σ 2 around the true stimulus offset: p( ξx) =
N
( ξ; x, σ 2) (Green & Swets, 1989; Thurstone, 1927). We test this assumption below, showing that the psychometric curves were all well fit by cumulative normal functions, with a constant slope parameter for each observer in each session. However, our observers each displayed a systematic bias in their responses; this will be addressed later. 
In the Bayesian view, the observer's belief about the Vernier offset x is not limited to a single estimated value ξ. Instead, ξ parameterizes a belief distribution over all possible values of x that are consistent with the sensory evidence. The optimal form for this belief distribution is given by Bayes rule:  
p ( x | ξ ) = p ( ξ | x ) · p ( x ) p ( ξ ) .
(1)
 
We assume that the prior belief about x is uniform, which implies that this optimal belief will also be Gaussian, with the same variance as the sensory noise distribution, and mean given by ξ: p( xξ) =
N
( x; ξ, σ 2) (we might also have assumed a broad zero-centered Gaussian prior, although then the variance of the posterior belief would have been slightly smaller than that of the sensory noise, for which there was no evidence in the data). In fact, if observers are able to learn the true distribution of x, their prior (and therefore posterior) belief should take the form of a series of delta functions located at each discrete offset value used. In addition, for extreme values of x, the stochastic response ξ may fall outside the range of possible values, distorting the posterior. However, variability in decisions at the extremes was minimal, so that any divergence from normality at those points would have little impact on estimates of sensory variance. And within the central range, where decisions did vary, the values of the stimulus offset used were very closely spaced, and we saw no evidence that observers were aware of the discretization. 
The observer must base his or her response on the belief distribution (Step 2 in Figure 1b), and Bayesian decision theory gives the optimal form of this response (see Berger, 1985; Maloney, 2002; Yuille & Bulthoff, 1996). The observer should answer “right” if and only if, on the basis of his or her belief, the expected (mean) reward (Γ) for answering “right” is greater than the expected reward for answering “left”: i.e., if Γ[“right”] > Γ[“left”]. In this simple case, the expected reward is obtained by adding together the product of the probability of the answer being correct times the corresponding reward and the probability of the answer being wrong times the corresponding cost. These two probabilities express the degree of the participant's belief that the lower patch fell to the right or left of the upper patch and are given by the areas under the belief distribution p(xξ) that lie to the right and to the left of 0, respectively, 
P ( a n s w e r " r i g h t " ) = P ( Γ [ " r i g h t " ] > Γ [ " l e f t " ] ) ,
(2)
 
Γ [ " r i g h t " ] = 0 p ( x | ξ ) · R r d x + 0 p ( x | ξ ) · C r d x ,
(3)
 
Γ [ " l e f t " ] = 0 p ( x | ξ ) · C l d x + 0 p ( x | ξ ) · R l d x .
(4)
 
With some rearrangement, and combination of the integrals, we arrive at an expression in which the observer's belief that the Vernier displacement was to the left is compared to a single quantity, α, that includes all the cost and reward terms. The values of α for each set of costs and rewards used in our experiment are given in Table 1,  
A r ( x , α ) = P ( 0 p ( x | ξ ) d x < α ) ; α C l R r C l R r + C r R l ,
(5)
where we have introduced the notation A r( x, α) for the probability that the Bayesian observer answers “right,” given a stimulus offset x and cost structure α, and x′ is a dummy variable of integration over the observer's belief. 
It is useful at this point to introduce a normal density function that has the same width as p( ξx) and p( xξ) but zero mean: f σ( ζ) = exp(− ζ 2/2 σ 2)/
2 π σ 2
. Thus, p( ξx) = f σ( ξx) and p( xξ) = f σ( xξ), and the corresponding cumulative function is Φ σ( ζ) =
ζ
f σ( ζ′) ′. Then,  
A r ( x , α ) = P ( 0 f σ ( x ξ ) d x < α )
(6)
 
= P ( ξ f σ ( ζ ) d ζ < α ) [ w h e r e ζ = x ξ ]
(7)
 
= P ( Φ σ ( ξ ) < α )
(8)
 
= P ( ξ > Φ σ 1 ( α ) ) .
(9)
 
The probability that ξ is greater than − Φ σ −1( α) can then be found by integrating the assumed sensory noise distribution:  
A r ( x , α ) = Φ σ 1 ( α ) p ( ξ | x ) d ξ .
(10)
 
If we again insert Φ σ (and exploit its symmetry), we obtain the following easily computed expression for the optimal probability with which the observer should answer “right” for a given value of α and x;  
A r ( x , α ) = Φ σ 1 ( α ) f ( ξ x ) d ξ
(11)
 
= x + Φ σ 1 ( α ) f ( ζ ) d ζ [ w h e r e ζ = x ξ ]
(12)
 
= Φ σ ( x + Φ σ 1 ( α ) ) .
(13)
 
The only unknown quantity in Equation 13 is the standard deviation, σ, of the zero-mean cumulative Gaussian distribution Φ σ. This plays two roles in our Bayesian analysis; it is both the width of the sensory noise distribution and, under the assumed uniform prior, the width of the consequent belief distribution. Under the symmetric cost condition ( α = 0.5), the observer's decision reflects only whether the mean of his or her belief lies to the left or right of 0 (according to sensory noise) and is independent of the width of the belief distribution. Thus, following the standard psychometric approach, we estimate the variance of the noise by fitting a psychometric function based on a cumulative Gaussian to the behavioral data, with the slope of the function providing an estimate of σ
The analysis of the Bayesian observer expressed in Equation 13 makes two predictions: as α changes, the psychometric curves (1) retain the same cumulative-normal shape, with the same width parameter, and (2) translate by an amount Φ σ −1( α). Figure 2b shows an example of the psychometric function fit to the data for one observer in one session. Shifts with changing α are clearly visible. 
The fitting procedure, and the methods used to test these predictions, are detailed below. Briefly, we first verified that the shape of the psychometric function did not change with α by Bayesian model selection. We then tested the agreement of the observed curve translations with those predicted by the optimal Bayesian analysis. We fit psychometric functions to the data and measured the center μ of each, i.e., the value of x at which the fitted psychometric curve gave equal probabilities of each answer, given by the mean of the underlying Gaussian (see Figure 2c). We then used the maximal slope of the psychometric functions as a measure of σ and inverted Equation 13 to recover the predicted optimal values of the center μ* j for each cost asymmetry value α j (see Figure 2d).  
0.5 = Φ σ ( μ j * + Φ σ 1 ( α j ) ) , μ j * = Φ σ 1 ( 0.5 ) Φ σ 1 ( α j ) , μ j * = Φ σ 1 ( α j ) .
(14)
 
Fitting the psychometric function
The pattern of observers' responses was modelled by a cumulative normal psychometric function incorporating a random lapse term (see, e.g., Wichmann & Hill, 2001) and binomially distributed response counts. We used 20 different true offsets xi and 5 different cost asymmetries αj, with Nij trials in each condition. The number of trials nij in which observers answer “right” for stimulus offset xi and cost distribution αj is assumed to be drawn from a binomial distribution 
P ( n i j ) = ( N i j n i j ) p i j n i j ( 1 p i j ) N i j n i j .
(15)
 
In the absence of lapses, the optimal probability p ij should be given by A r( x i, α j) in Equation 13, which has a cumulative normal form. To fit the data, we therefore also assumed an underlying cumulative Gaussian shape, parameterized in terms of the standard error function, such that the parameters μ j and ρ j gave the center and maximal slope, respectively, of the curve under the jth value of α  
p i j n o l a p s e = 1 + e r f ( π · ρ j · ( x i μ j ) ) 2 ; e r f ( z ) = 2 π 0 z e t 2 d t .
(16)
 
However, it is likely that observers occasionally make errors due to stimulus-independent (but possibly cost-structure-dependent) sources such as “decision noise,” motor response errors, or moments of inattention (Green & Swets, 1989; Wichmann & Hill, 2001). In this case, they might give either answer with equal probability, effectively setting pij in such cases to
12
rather than the value given above. We took the probability of such an event occurring in any trial to be εj (the “lapse rate” parameter referred to above), leaving the probability that the response was instead based on the cumulative Gaussian function as 1 − εj; 
p i j = ( 1 ε j ) · [ 1 + e r f ( π · ρ j · ( x i μ j ) ) 2 ] + ε j · 1 2 .
(17)
 
There are thus three parameters, all of which potentially depend on α: the center μ j and slope ρ j of the cumulative Gaussian, and the random error or lapse rate ε j. An estimate of the slope parameter ρ j provides an estimate of the width of the underlying Gaussian, according to  
σ = 1 2 π ρ .
(18)
When fitting the model to the data, we used Bayesian model comparison to determine whether the slope and the lapse rate parameters should be shared between different α conditions or fit separately (see below). 
Shape of the psychometric curves
The Bayesian analysis predicts that, as the loss function varies, the psychometric curve will shift but will retain both the cumulative Gaussian shape as well as the same maximal slope. We tested both of these predictions. 
To ask whether the cumulative Gaussian model with allowance for lapses ( Equation 17) was appropriate for the data at all values of α, we examined the residual error between the measured response data and the best fit psychometric curve. Figure 3 shows the deviance residuals (McCullagh & Nelder, 1989; Wichmann & Hill, 2001) for all four participants, for each of the two sessions. The deviance residual is used to measure discrepancies in terms of the underlying likelihood model; in effect, it rescales the error by the locally predicted variance. Based on the total deviance, the cumulative normal model could not be rejected by a degrees-of-freedom-adjusted χ2 test, nor by a Monte-Carlo-based exact-binomial test (Wichmann & Hill, 2001) (p > 0.3 and p > 0.8, respectively, after correcting for multiple tests; in neither case could the distribution of p-values over the multiple tests be distinguished from uniform; Kolmogorov–Smirnov test, p > 0.05). 
Figure 3
 
Deviance residuals between model and data. Deviance residuals between the model fitted to the behavioral data and the data points for each observer in each of the two sessions.
Figure 3
 
Deviance residuals between model and data. Deviance residuals between the model fitted to the behavioral data and the data points for each observer in each of the two sessions.
There is also no systematic trend evident in Figure 3 to suggest that the shape of the psychometric function was inappropriate for any value of α for any observer. This was confirmed using a runs test for randomness of the sign of the residuals, by which the hypothesis that the scatter of residuals was random could not be rejected ( p > 0.7 after multiple-test correction, p-values uniform by K–S test, p > 0.05). 
The second prediction was that the slope of the psychometric curve is also independent of the value of α; that is, the parameters ρ j in Equation 17 are, in fact, all the same. Visual inspection of the data supported this assumption (for an example, see Figure 2b). To assess this quantitatively, we employed Bayesian model selection to ask which of the various models with either shared or varying slope and lapse parameters was more probable, given the data. Details of this procedure are given in 1, along with results in Table A1, and the fitted parameter values for the best model in Table A2. As predicted, for all observers, the best model had a single slope parameter for all α conditions within each session. However, lapse rates did vary with α for two observers, while being constant for the other two. The lapse rate captures decision noise, motor errors, and moments of inattention, and it seems reasonable that, while the internal uncertainty is the same for each α value, such random lapses might depend on the costs. 
Table A1
 
Results of Bayesian model selection: Laplace approximation to marginal log likelihood for each of four models for each observer. Bold text shows the model with the highest log likelihood for each participant. It should be noted that each unit difference in log likelihood corresponds to an e-fold ratio of model probabilities.
Table A1
 
Results of Bayesian model selection: Laplace approximation to marginal log likelihood for each of four models for each observer. Bold text shows the model with the highest log likelihood for each participant. It should be noted that each unit difference in log likelihood corresponds to an e-fold ratio of model probabilities.
Summed Laplace approximation for individual session models
Observer Single ρ Separate ρ
Single ε Separate ε Single ε Separate ε
1 1995 1997 1970 1975
2 1602 1638 1613 1619
3 2351 2338 2346 2335
4 2061 2033 2055 2041
Laplace approximation for pooled session model
Single ρ Separate ρ
Observer Single ε Separate ε Single ε Separate ε
1 1851 1841 1827 1830
2 1433 1487 1446 1441
3 2200 2185 2197 2181
4 1944 1948 1941 1929
Table A2
 
Results of model fitting to experimental data: center ( μ), slope ( ρ), and lapse ( ε) parameters for each observer in each α condition and each session (values given to 2 significant figures). For each observer, there is a separate μ for each α condition, representing the center of the psychometric function in pixels. However, there is only a single ρ for all α conditions, representing the fact that the observer's internal uncertainty is the same regardless of the value of α. The Gaussian standard deviation in pixels corresponding to these values of ρ is given in the next column. Two observers have a single ε for all α, and two have separate ε for each α. These constraints on parameter values were determined via Bayesian model comparison (see Table A1).
Table A2
 
Results of model fitting to experimental data: center ( μ), slope ( ρ), and lapse ( ε) parameters for each observer in each α condition and each session (values given to 2 significant figures). For each observer, there is a separate μ for each α condition, representing the center of the psychometric function in pixels. However, there is only a single ρ for all α conditions, representing the fact that the observer's internal uncertainty is the same regardless of the value of α. The Gaussian standard deviation in pixels corresponding to these values of ρ is given in the next column. Two observers have a single ε for all α, and two have separate ε for each α. These constraints on parameter values were determined via Bayesian model comparison (see Table A1).
Observer α Session 1 Session 2
μ (pixels) ρ (1/pixels) σ (pixels) ε (probability) μ (pixels) ε (1/pixels) σ (pixels) ε (probability)
1 0.7 −4.0 }0.072 }5.6 0.052 −3.8 }0.086 }4.6 0.031
0.6 −3.5 0.025 −3.3 0.00002
0.5 −1.9 0.13 −2.1 0.017
0.4 2.5 0.16 0.55 0.031
0.3 3.1 0.045 0.63 0.072
2 0.7 −7.2 }0.049 }8.2 0.019 −7.3 }0.055 }7.2 0.013
0.6 −7.5 0.0019 −5.1 0.00012
0.5 −2.1 0.11 −1.8 0.10
0.4 2.0 0.021 3.1 0.10
0.3 4.2 0.0098 5.8 0.016
3 0.7 −2.3 }0.087 }4.6 }0.017 −0.92 }0.13 }3.1 }0.011
0.6 −0.79 −0.60
0.5 0.82 0.28
0.4 1.6 0.44
0.3 4.5 0.76
4 0.7 −3.5 }0.075 }5.3 }0.0069 −3.6 }0.079 }5.1 }0.042
0.6 −2.3 −2.1
0.5 0.45 1.3
0.4 4.3 4.2
0.3 4.8 5.0
Optimal and observed shifts
Consideration of the various models thus showed that the behavior of each observer in each session was best modeled by a family of curves of the same shape and slope, but with centers depending on α. We next asked whether the observed shifts in the curve centers were aligned with the predictions of the ideal Bayesian observer model. 
In at least one regard, observers were not optimal. The Bayesian prediction for the curve center in the α = 0.5 condition is always 0. However, for all observers, the curve centers for the α = 0.5 condition were non-zero. Two observers showed a rightward bias in both sessions, and two showed a leftward bias in both sessions (see Table A2), and we found no evidence that the bias was absent in the asymmetric penalty conditions. A similar directional bias has been reported widely in psychophysical studies (Green & Swets, 1989). In the analysis below, we treat the directional bias as a constant constraint on observers' computations and attempt to separate this form of non-optimality from the novel question of whether observers were able to integrate correctly the loss function with an estimate of internal uncertainty. Thus, we compute shifts as relative to the biased center for α = 0.5, yielding “predicted relative shifts” for the other four α conditions 
Δ μ j * = μ j * μ 0.5 * = Φ σ 1 ( α j ) Φ σ 1 ( 0.5 ) = Φ σ 1 ( α j ) .
(19)
 
The comparison between observed and predicted relative shifts for the two sessions is shown in Figures 4a and 4b. Note that both observed and predicted shifts derive from the same set of data, as the predicted shifts are based on an estimate of internal uncertainty derived from the slope of the psychometric curve. Thus, estimation errors due to limited sampling may be correlated, and independent error bars for the two quantities cannot be drawn. Instead, we employed a bootstrap procedure to estimate the covariance of the errors in the two derived quantities, shown by the ellipses in Figures 4a and 4b. A linear fit to the observed shifts was computed by minimizing weighted squared error in the plane with respect to these covariances and is also shown in Figures 4a and 4b
Figure 4
 
Comparison of predicted and observed behavior. (a and b) Predicted and observed relative shifts in the centers of the psychometric curves, for the first (a) and the second (b) sessions. If performance is quantitatively optimal (up to a constant bias), the data points (grey circles) should lie on the dotted identity line. The ellipses show the 2 σ covariance expected due to sampling errors, and the dashed line is a linear fit to the data points, computed by minimizing the weighted squared error in the plane with respect to these covariances. All observers showed the predicted pattern of shifts but were not quantitatively exact. (c and d) The mean and variance of the score that would be obtained if each observer behaved optimally given the directional bias was calculated for the first (c) and the second (d) sessions (see 2). Crosses plot predicted against observed score, with observers numbered as in panels a and b. The identity line again represents optimal performance (given the directional bias), and the vertical bars show one standard deviation from the mean. All points are within this range except for one observer in the first session. Filled circles show the mean score expected if observers failed to shift the center of their psychometric curves from the biased center of the curve for α = 0.5, and all such points lie outside the predicted range.
Figure 4
 
Comparison of predicted and observed behavior. (a and b) Predicted and observed relative shifts in the centers of the psychometric curves, for the first (a) and the second (b) sessions. If performance is quantitatively optimal (up to a constant bias), the data points (grey circles) should lie on the dotted identity line. The ellipses show the 2 σ covariance expected due to sampling errors, and the dashed line is a linear fit to the data points, computed by minimizing the weighted squared error in the plane with respect to these covariances. All observers showed the predicted pattern of shifts but were not quantitatively exact. (c and d) The mean and variance of the score that would be obtained if each observer behaved optimally given the directional bias was calculated for the first (c) and the second (d) sessions (see 2). Crosses plot predicted against observed score, with observers numbered as in panels a and b. The identity line again represents optimal performance (given the directional bias), and the vertical bars show one standard deviation from the mean. All points are within this range except for one observer in the first session. Filled circles show the mean score expected if observers failed to shift the center of their psychometric curves from the biased center of the curve for α = 0.5, and all such points lie outside the predicted range.
There is a strong qualitative match between measured and predicted relative shifts. Observers shift their psychometric curves in the right direction, and by an amount that is proportional to the size of the penalty asymmetry. This is in contrast to the simple strategy verbally reported by all observers, which was to give the answer with the lower penalty whenever they were unsure. Interference from this cognitive strategy might help to explain the non-linearity of the observed shift plot in Figures 4a and 4b—the two leftward and two rightward shifts are more similar than in the quantitatively optimal scenario. However, if observers' behavior was dictated by this simple strategy, we would expect the shifts to be of the same magnitude regardless of the size of the penalty asymmetry. In fact, the shifts are significantly larger ( p < 0.01 under a 2-tailed paired-samples t-test) for the greater cost asymmetries. 
It has been observed previously that observers are reluctant to behave optimally when this entails an extreme bias in their responses (Green & Swets, 1989). To try and avoid such psychological effects, we chose asymmetries that demanded a relatively small shift in the psychometric curve. In general, observers tended to over-compensate for the penalty asymmetry (see Figures 4a and 4b), but for most sessions, a smaller over-compensation was seen for larger values, as would be predicted by such an effect. This could also have contributed to the non-linearity in the observed shifts plot. 
Optimality of achieved score
Although all observers showed the predicted pattern of shifts, the quantitative match was not exact. This is perhaps unsurprising given the requirement to integrate implicit knowledge of internal uncertainty with high level cognitive instructions. This raises the issue of which behavioral measure should be used to statistically test for optimality. In the present study, observers are asked to maximize their point score, not to work out what the optimal shift of the curve should be. It is possible that the function relating curve shift to total score is relatively flat in the region of the maximum score obtainable, such that there is little benefit from an exact quantitative match to the predicted curve shifts. 
Our model of the psychometric curve was a composite function based on an underlying cumulative Gaussian, which gives the probability of answering “right” for a particular value of the stimulus x and α (see Equations 15 and 17). We used this model to compute the mean and the standard deviation of the score observers would have obtained had they shifted their psychometric curves by the optimal amount from the biased mean of the curve for the α = 0.5 condition (for details, see 2). We were then able to ask whether the true score was statistically distinguishable from this predicted value. 
Scores for all observers fell within one standard deviation of the predicted score in the second session ( Figure 4d), as did all but one in the first session ( Figure 4c). The failure of this one observer to obtain a score in this range in the first session could be due to cognitive interference or motivational issues. To test the sensitivity of total score as a measure of optimality, we computed the mean score that would have been obtained had participants failed to shift their psychometric curves from the biased central point. All such points lay outside the predicted range, as shown in Figures 4c and 4d. This analysis suggests that, while not quantitatively optimal, the observed shifts were sufficient to obtain a score well within the predicted range. 
Changes in performance
As described above, each observer participated in two experimental sessions on different days. We predicted that the level of observers' internal uncertainty might have differed between the two sessions, due to perceptual learning, consolidation, or extrinsic factors. If true, this would provide a further test of the hypothesis that observers' behavior is driven by internal beliefs that accurately reflect their sensory noise. If that sensory noise were to change, their beliefs, and thus their behavior under the asymmetric loss function, should change concomitantly. 
We first established that the level of observers' sensory noise did, in fact, appear to change, as would be reflected by a change in the slope of the psychometric curve. We used Bayesian model selection (see 1) to quantitatively compare models with the same slope in the two sessions to models in which the slope could differ. Table A1 shows that the model with different slopes in the two sessions was overwhelmingly preferred in all cases, although, within each session, the model with the same slope for different loss functions was still the most probable. Thus, despite the apparent change in sensory noise, the basic prediction that the shape of the psychometric curve is unaffected by the loss function is confirmed. 
In general, the slope of the psychometric curve was steeper in the second session, and observers' behavior altered in accordance with the predictions of the Bayesian analysis. This can be seen as a trend towards smaller shifts in the second session (compare Figure 4a with Figure 4b) and towards higher scores (compare Figure 4c with Figure 4d). In particular, the three observers whose scores were in the predicted range in both sessions maintained this performance in the face of a clear change in apparent sensory noise. Furthermore, had the three observers who showed substantial changes in accuracy between the two sessions retained the same relative shifts in the second session as in the first, their expected scores would have fallen outside the optimal ranges shown. This suggests that observers were indeed adopting an efficient strategy, taking into account both the level of uncertainty and the external loss function. 
As discussed in the Methods section, we did not attempt to distinguish between stimulus-centered sensory noise and any stimulus-centered decision noise not modeled by the stimulus-independent lapse-rate parameter. However, we assumed that the majority of this stimulus-centered variation was in fact due to sensory noise and treated it as such. If this assumption is incorrect, the measured slope may incorporate stimulus-centered “decision” noise associated with integrating the loss function with the true uncertainty, and thus an increase in slope might reflect an improvement in task performance rather than a change in internal uncertainty. However, inspecting Figures 4a and 4b shows that for only one observer did the slope of the linear fit to performance (the dashed line) get closer to the identity line in the second session, supporting the assertion that the internal uncertainty was changing rather than the ability to perform the task. Indeed, the observer whose fit improved was the same observer who obtained a score outside the predicted range in the first session, and it seems possible that she did change her strategy. 
Controls for feedback
In order to use an ideal observer analysis to conclude that observers represent and compute with the relevant uncertainty, it is crucial to rule out alternative strategies for obtaining near optimal performance that do not require such knowledge. In the present task, it is possible that trial-by-trial feedback, had we provided it, would have allowed observers to incrementally adjust an internal threshold (perhaps in proportion to the size of the penalty) until their payoff was optimized. This could have led to psychometric curves that looked very much like those we predict from the analysis above. Indeed, classic psychophysical studies have used a similar paradigm with trial-by-trial feedback to demonstrate this kind of “optimal” criterion selection (Green & Swets, 1989). 
Previous studies of uncertainty that have used trial-by-trial feedback have dealt with a similar potential confound by looking for evidence of incremental threshold adjustment in the data (Trommershäuser et al., 2003, 2005). The alternative strategy, that of withholding feedback, was adopted by Körding and Wolpert (2004) in a sensorimotor task, although without any asymmetry in costs. Here, we chose to provide only occasional (every 15 trials) cumulative feedback during the testing blocks (see Methods). This provided motivation but did not allow observers to behave optimally via trial-by-trial threshold adjustment. 
However, even such scarce feedback does provide some limited information about sensory noise, so we performed control analyses to confirm that the magnitude of the cumulative feedback had no measurable effect on behavior. First, we fit a psychometric curve to all data which followed “good” feedback (i.e., a cumulative score for the preceding 15 trials that fell above the 75th percentile) and to all data that followed “bad” feedback (i.e., a cumulative score for the preceding 15 trials that fell below the 25th percentile). There were no feedback-related trends in the data (data not shown). To test for effects that might have been lost in averaging in this technique, we then examined whether “good” feedback reinforced the direction of any changes in threshold and whether “bad” feedback reversed the direction of any such changes. If observers were modifying their behavior in this way, we would expect a positive correlation in threshold changes following “good” feedback and a negative correlation following “bad” feedback. However, we observed only a slight negative correlation in both cases (data not shown). 
Discussion
Uncertainty inescapably affects almost all domains of brain function, arising due to variability in external processes, due to the under-constrained nature of many problems of perceptual inference and motor planning, from variability in motor execution, and due to noise in sensory processing. A long-standing and fundamental question in neuroscience is whether, and if so how, the brain takes account of this uncertainty in the course of perception, decision making, action, and learning. 
Our results address one aspect of this question. We show that observers possess an internal model of the visual processing uncertainty that affects their Vernier judgments, and that they use this model to guide their decisions. Crucially, observers' decisions are sensitive to uncertainty even when they do not receive significant feedback about their accuracy or score. This indicates that the uncertainty-sensitive decision strategy is not learnt during the experiment itself but is instead based on a pre-existing, implicit model of current internal uncertainty, that is presumably available at all times. In addition, observers' decisions, and thus their models of internal uncertainty, track the changes in uncertainty that are associated with varying levels of sensory noise in different experimental sessions. Taken together, these observations suggest that the processing of uncertainty, is a fundamental aspect of sensory computation. Furthermore, in our experiment, observers must combine knowledge of this uncertainty, rather than simply a modal estimate of the stimulus, with an externally imposed loss function to perform well. Our results therefore also indicate that information about early sensory uncertainty, at least in the form of a two-alternative likelihood ratio between the models for left and right displacement, is preserved and made available to decision areas. This implies that the relevant information is propagated across multiple cortical layers. 
Our new results join a growing body of work showing how uncertainty in its various forms affects behavior (see Introduction; and for a review, Knill & Pouget, 2004). In the task reported here, we aimed to show a “minimal context” in which Bayesian optimality can be demonstrated, and in which alternative strategies that do not require the representation of uncertainty can be ruled out. We therefore used a purely visual stimulus and asked observers to make a simple Vernier offset judgment (Westheimer, 1979). We chose a Vernier task, as the requisite sensory processing is most likely to occur early in the visual pathway, perhaps principally in the relatively well-understood primary visual cortex. Combined with the existing evidence for optimality in cross-modal, motor, and visual cue combination experiments, our results support the claim that uncertainty is processed throughout the brain, even for simple, low-level visual quantities. Furthermore, such a task is a strong candidate for future integration with physiological data and with theoretical work concerning how neuronal populations represent and compute with uncertainty (Knill & Pouget, 2004; Pouget, Dayan, & Zemel, 2003). 
Another key property of our task is that the set of visual stimuli was fixed, so that all stimulus-related uncertainty arose exclusively from visual processing, corresponding to “internal noise” in psychophysical experiments (Green & Swets, 1989). This is in contrast to many previous studies of sensory uncertainty, where variability was driven by external manipulations, such as the random placement of dots or the addition of corrupting noise (but see Stocker & Simoncelli, 2006). Using a fixed stimulus set thus strengthens the conclusion that the mechanisms exposed are fundamental to sensory processing rather than being limited to strategies for dealing with uncertainty in the external world. 
There are many technical issues with Bayesian optimality experiments, which can obscure optimal behavior and limit conclusions about the underlying computation. For example, with monetary loss functions observers often demonstrate over-compensation, a reluctance to make extreme shifts, and failure to keep track of the current loss function (Green & Swets, 1989; Landy et al., 2007). We therefore used values of α that demanded relatively small shifts and used practice trials and a blocked design with regular reminders of the current cost values. 
Recent work using a similar loss function approach in an unspeeded visual orientation estimation task (Landy et al., 2007) found evidence for optimality but also for a variety of suboptimal strategies. In psychological “betting” paradigms, failures of probabilistic reasoning characterize human behavior, and it may be that unspeeded adjustment tasks are more vulnerable to such influences. Our task is very simple and requires a forced-choice categorization response that observers performed very quickly. It seems likely that we therefore avoid some of these cognitive effects, making it easier to model observers' use of information about their own uncertainty. However, as discussed above, we do see evidence for over-compensation and for relatively smaller over-compensation for larger cost asymmetries. 
In our analysis, we did not attempt to distinguish between stimulus-centered sensory noise and any stimulus-centered decision noise not modeled by the lapse-rate parameter. However, we assumed that the majority of this stimulus-centered variation was due to sensory noise. This assumption was supported by examining the results across sessions—the ability of observers to choose optimally in the face of asymmetric costs did not change as their ability on the task, measured by the slope of the psychometric function, did (see Figures 4a and 4b). Using external manipulations to produce randomly intermixed uncertainty levels on each trial would allow us to separate more explicitly any stimulus-centered decision noise from uncertainty due to sensory processing. However, Landy et al. (2007) found more suboptimality when levels of uncertainty were randomly intermixed rather than blocked as in our experiment, and it is not clear why this should be the case. It could reveal limits on the ability to perform online Bayesian processing, or alternatively arise from corrupting cognitive or psychological factors. In the present study, we were not interested in trying to delineate these factors, and so we used a blocked design. In addition, we wanted to retain the property of all stimulus uncertainty arising internally. Future investigations of the effect of task and experimental design on optimality are important to pull apart optimal sensory processing from cognitive reasoning effects. 
In the present study, we aimed to demonstrate minimal conditions for Bayes-optimal behavior under uncertainty. We showed that observers approach the quantitatively optimal strategy given a directional bias, and score within the predicted range, in a simple unimodal visual task that requires them to integrate a model of their internal uncertainty with an external loss function. The assumptions of the model, and the predictions that arise from them, were tested, and we took care to rule out alternative strategies for achieving the observed behavior. Our results therefore support the assertion that the processing of uncertainty is a fundamental aspect of sensory computation and can be used to inform subsequent decision-making processes. 
Appendix A1: Bayesian Model Comparison
A different psychometric function was obtained for each observer, for each value of α, in each session. In principle, each such function might have entirely different parameters. Our expectation, however, based on the Bayesian observer analysis, is that the slope parameter should remain constant as α varies within one session, although it may change between sessions. We used Bayesian model comparison to determine which group of shared parameters was best supported by our data, by evaluating an approximation to the marginal likelihood or “evidence” for a number of different models. This approach to choosing an appropriate model originates with Jeffreys (1939) and incorporates an Occam's razor-type penalty for models with more parameters (Gull, 1988; Kass & Raftery, 1995; Mackay, 2004). In the absence of prior bias towards any particular model, the marginal likelihoods are proportional to the probabilities of each model being the one from which the data arose. 
For each observer, we fit models that shared parameters between sessions ( Table A1, lower half) and models that had independent parameters for each session ( Table A1, upper half). In the latter case, the total log evidence was the sum of the log evidences obtained for each session. For each case, we fit models with individual ρ j and ε j parameters for each α, models with the values of ρ j and ε j restricted to have the same value for all α, and models with one parameter restricted while the other was allowed to vary. The centers μ j always varied with α, as all data sets showed very clear shifts. 
A gradient ascent procedure was used to find the most probable or maximum a posteriori (MAP) parameter values, given the data, under a non-informative prior. As the exact evidence could not be calculated, these MAP parameters for each model were then used to compute a Laplace approximation to the marginal likelihood in each case. The Laplace approximation results from taking the first three terms of a Taylor expansion about the MAP parameters (Mackay, 2004). In the equations below, D refers to the data, m to the model, θ to the vector of all parameters, θ* to the MAP parameters, and d to the number of parameters in the model. The matrix A is the Hessian of the log posterior, i.e., the matrix of second partial derivatives of log P(θ|D, m) with respect to θ, evaluated at θ* 
log P ( D | m ) = log d θ P ( D , θ | m ) ,
(A1)
 
log P ( D | m ) log P ( D | θ * , m ) + log P ( θ * | m ) + d 2 log 2 π 1 2 log | A | .
(A2)
 
The values of the Laplace approximation for each of the four models are shown in Table A1. The highest evidence, corresponding to the “best model,” is highlighted in bold for each observer and each case. In accordance with our assumption, the best model had a single ρ parameter for all α, whether or not this slope was the same or different between sessions. A single lapse parameter was best for two observers and separate lapse parameters for the other two. As mentioned above, the lapse rate is incorporated in the model to account for decision noise, motor errors, and moments of inattention, and it seems reasonable that, while the internal uncertainty is the same for each α value, such random lapses might vary. 
In addition, the model with different parameters for the two sessions was always preferred, suggesting that observers' sensory noise changed between the two experimental sessions. The values of the parameters fit to the best model for each observer, in each session, are given in Table A2
Appendix A2: Computation of Optimal Score Range
To assess whether observers' performance was significantly different from optimal, we computed the mean and the standard deviation of the scores that they would have obtained under the optimal strategy, given their apparent internal uncertainties and their observed biases. The total score (“reward”) for one session is obtained by summing the scores for each α value;  
R t o t a l = j R j .
(A3)
 
The score for each α value is given by the sum, over the different possible stimulus offsets x i, of the number of trials on which the observer answers “right” and “left” correctly and incorrectly, multiplied by the appropriate reward or cost parameter. Using the same definitions of N ij and n ij as in Equation 15, this is  
R j = i | x i < 0 ( R l ( N i j n i j ) + C r n i j ) + i | x i > 0 ( R r n i j + C l ( N i j n i j ) ) .
(A4)
 
Under our model, n ij is binomially distributed with mean N ij p ij, where p ij is given by the psychometric function ( Equation 17). To obtain the expected score under the optimal strategy (constrained by the observed bias), we evaluated Equation 16 for each offset, using the measured value of ρ, but using the optimal relative value of μ j obtained by adding the optimal relative shift to the observed bias in the symmetric condition (i.e., μ 0.5 + Δμ j*). Calling these optimal relative values p ij*, the expected score under the constrained optimal strategy is,  
R t o t a l = j R j ,
(A5)
with  
R j = i | x i < 0 N i j ( R l ( 1 p i j * ) + C r p i j * ) + i | x i > 0 N i j ( R r p i j * + C l ( 1 p i j * ) ) .
(A6)
 
To compare the measured scores to this value, we must also calculate the variance in the score that is to be expected as decisions vary due to sensory noise. As the score in each condition is independent, this variance is given by  
V a r ( R t o t a l ) = j R j 2 R j 2 ,
(A7)
 
The second moment above can also be computed in closed form, using the expression for
R
j in Equation 23, the binomial mean as before, and the binomial second moments:  
n i j n i j = N i j N i j p i j * p i j * + δ i i N i j p i j * ( 1 p i j * ) ,
(A8)
where δ ii is the Kronecker delta. We obtain  
R j 2 = R j 2 + ( C r R l ) 2 i | x i < 0 N i j p i j * ( 1 p i j * ) + ( R r C l ) 2 i | x i > 0 N i j p i j * ( 1 p i j * ) ,
(A9)
and so the expected variance in score is  
V a r ( R t o t a l ) = j ( ( C r R l ) 2 i | x i < 0 N i j p i j * ( 1 p i j * ) + ( R r C l ) 2 i | x i > 0 N i j p i j * ( 1 p i j * ) ) .
(A10)
 
The corresponding standard deviation is shown in Figure 4
Acknowledgments
We thank P. Dayan, J. Solomon, and M. Landy for comments. This work was supported by grants from the Wellcome Trust (to L.W.) and the Gatsby Foundation (to M.S.) 
Commercial relationships: none. 
Corresponding author: Louise Whiteley or Maneesh Sahani. 
Email: louisew@gatsby.ucl.ac.uk or maneesh@gatsby.ucl.ac.uk. 
Address: Gatsby Computational Neuroscience Unit, University College London, Alexandra House, 17 Queen Square, WC1N 3AR, UK. 
References
Berger, J. O. (1985). Statistical decision theory and Bayesian analysis. New York: Springer Verlag.
Brainard, D. H. (1997). The Psychophysics Toolbox. Spatial Vision, 10, 433–436. [PubMed] [CrossRef] [PubMed]
Carpenter, R. H. S. (1988). The movements of the eyes. London: Pion Ltd.
Cox, R. T. (1961). The algebra of probable inference. Baltimore: Johns Hopkins University Press.
Deneve, S. Latham, P. E. Pouget, A. (2001). Efficient computation and cue integration with noisy population codes. Nature Neuroscience, 4, 826–831. [PubMed] [Article] [CrossRef] [PubMed]
Ernst, M. O. Banks, M. S. (2002). Humans integrate visual and haptic information in a statistically optimal fashion. Nature, 415, 429–433. [PubMed] [CrossRef] [PubMed]
Green, D. M. Swets, J. A. (1989). Signal detection theory and psychophysics. Los Altos, CA: Peninsula Publishing.
Gull, S. F. Erickson, G. Smith, C. R. (1988). Bayesian inductive inference and maximum entropy. Maximum-entropy and Bayesian methods in science and engineering: Volume 1. Foundations. Dordrecht, The Netherlands: Kluwer Academic Publishers.
Helmholtz, H. L. F. (1856). Treatise on physiological optics. Bristol: Thoemmes Continuum.
Hillis, J. M. Watt, S. J. Landy, M. S. Banks, M. S. (2004). Slant from texture and disparity cues: Optimal cue combination. Journal of Vision, 4, 967–992, http://journalofvision.org/4/12/1/, doi:10.1167/4.12.1. [PubMed] [Article] [PubMed]
Hodgson, T. L. (2002). The location marker effect Saccadic latency increases with target eccentricity. Experimental Brain Research, 145, 539–542. [PubMed] [CrossRef] [PubMed]
Jacobs, R. A. (1999). Optimal integration of texture and motion cues to depth. Vision Research, 39, 3621–3629. [PubMed] [CrossRef] [PubMed]
Jeffreys, H. (1939). Theory of probability. Oxford: Oxford University Press.
Kass, R. E. Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90, 773–795. [CrossRef]
Knill, D. C. Pouget, A. (2004). The Bayesian brain: The role of uncertainty in neural coding and computation. Trends in Neurosciences, 27, 712–719. [PubMed] [CrossRef] [PubMed]
Knill, D. C. Saunders, J. A. (2003). Do humans optimally integrate stereo and texture information for judgments of surface slant? Vision Research, 43, 2539–2558. [PubMed] [CrossRef] [PubMed]
Körding, K. P. Wolpert, D. M. (2004). Bayesian integration in sensorimotor learning. Nature, 427, 244–247. [PubMed] [CrossRef] [PubMed]
Landy, M. S. Kojima, H. (2001). Ideal cue combination for localizing texture-defined edges. Journal of the Optical Society of America A, Optics, Image Science, and Vision, 18, 2307–2320. [PubMed] [CrossRef] [PubMed]
Landy, M. S. Goutcher, R. Trommershäuser, J. Mamassian, P. (2007). Visual estimation under risk. Journal of Vision, 7, (6):4, 1–15, http://journalofvision.org/7/6/4/, doi:10.1167/7.6.4. [PubMed] [Article] [CrossRef] [PubMed]
Mackay, D. J. C. (2004). Information theory, inference, and learning algorithms. Cambridge: Cambridge University Press.
Maloney, L. T. Heyer, D. Mausfeld, R. (2002). Statistical decision theory and biological vision. Perception and the physical world: Psychological and philosophical issues in perception. New York: Wiley.
Marr, D. (1982). Vision. San Francisco: W H Freeman and Company.
McCullagh, P. Nelder, J. A. (1989). Generalised linear models. London: Chapman Hall.
Pelli, D. G. (1997). The VideoToolbox software for visual psychophysics: Transforming numbers into movies. Spatial Vision, 10, 437–442. [PubMed] [CrossRef] [PubMed]
Pouget, A. Dayan, P. Zemel, R. S. (2003). Inference and computation with population codes. Annual Review of Neuroscience, 26, 381–410. [PubMed] [CrossRef] [PubMed]
Saunders, J. A. Knill, D. C. (2004). Visual feedback control of hand movements. Journal of Neuroscience, 24, 3223–3234. [PubMed] [Article] [CrossRef] [PubMed]
Saunders, J. A. Knill, D. C. (2005). Humans use continuous visual feedback from the hand to control both the direction and distance of pointing movements. Experimental Brain Research, 162, 458–473. [PubMed] [CrossRef] [PubMed]
Schwartz, O. Sejnowski, T. J. Dayan, P. Weiss,, Y. Schlkopf,, B. Platt, J. (2006). A Bayesian framework for tilt perception and confidence. Advances in neural information processing systems. (18, pp. 1201–1208). Cambridge, MA: MIT Press.
Stocker, A. A. Simoncelli, E. P. (2006). Noise characteristics and prior expectations in human visual speed perception. Nature Neuroscience, 9, 578–585. [PubMed] [CrossRef] [PubMed]
Tassinari, H. Hudson, T. E. Landy, M. S. (2006). Combining priors and noisy visual cues in a rapid pointing task. Journal of Neuroscience, 26, 10154–10163. [PubMed] [Article] [CrossRef] [PubMed]
Thurstone, L. L. (1927). A law of comparative judgement. Psychological Review, 34, 273–286. [CrossRef]
Trommershäuser, J. Maloney, L. T. Landy, M. Banks, M. S. (2003). Statistical decision theory and trade-offs in the control of motor response. Spatial Vision, 16, 255–275. [PubMed] [CrossRef] [PubMed]
Trommershäuser, J. Gepshtein, S. Maloney, L. T. Landy, M. S. Banks, M. S. (2005). Optimal compensation for changes in task-relevant movement variability. Journal of Neuroscience, 25, 7169–7178. [PubMed] [Article] [CrossRef] [PubMed]
Weiss, Y. Simoncelli, E. P. Adelson, E. H. (2002). Motion illusions as optimal percepts. Nature Neuroscience, 5, 598–604. [PubMed] [CrossRef] [PubMed]
Westheimer, G. (1979). Spatial sense of the eye Proctor lecture. Investigative Ophthalmology & Visual Science, 18, 893–912. [PubMed] [Article] [PubMed]
Wichmann, F. A. Hill, N. J. (2001). The psychometric function: I Fitting, sampling, and goodness of fit. Perception & Psychophysics, 63, 1293–1313. [PubMed] [Article] [CrossRef] [PubMed]
Yuille, A. Bulthoff, H. H. Knill, D. Richards, D. (1999). Bayesian decision theory and psychophysics. Perception as Bayesian inference. Cambridge, UK: Cambridge University Press.
Figure 1
 
Experimental design. (a) On each trial, a stimulus consisting of two vertically arranged Gabor patches was briefly flashed, and the observer pressed a key to say whether the lower patch was offset to the left or to the right of the upper patch. Participants were asked to maximize their score, with varying numbers of points being awarded for a correct answer (“reward”) and deducted for an incorrect answer (“cost”). Participants received only periodic feedback about their performance, in the form of a cumulative score every 15 trials. (b) Schematic of the quantities and transformations involved in the construction of the Bayes-optimal observer. The stimulus produces a stochastic neural response. The observer transforms this neural response into a belief distribution (see (1)) and then uses this belief to decide which answer to give in the face of varying costs and penalties (see (2)). The Bayes-optimal observer specifies the optimal forms of transformations (1) and (2), thereby providing a behavioral benchmark of optimality.
Figure 1
 
Experimental design. (a) On each trial, a stimulus consisting of two vertically arranged Gabor patches was briefly flashed, and the observer pressed a key to say whether the lower patch was offset to the left or to the right of the upper patch. Participants were asked to maximize their score, with varying numbers of points being awarded for a correct answer (“reward”) and deducted for an incorrect answer (“cost”). Participants received only periodic feedback about their performance, in the form of a cumulative score every 15 trials. (b) Schematic of the quantities and transformations involved in the construction of the Bayes-optimal observer. The stimulus produces a stochastic neural response. The observer transforms this neural response into a belief distribution (see (1)) and then uses this belief to decide which answer to give in the face of varying costs and penalties (see (2)). The Bayes-optimal observer specifies the optimal forms of transformations (1) and (2), thereby providing a behavioral benchmark of optimality.
Figure 2
 
Evaluating behavioral optimality. (a) Illustration of the qualitative prediction for optimization of the loss function—observers should give the answer with the lower penalty when uncertain, resulting in a shift of the psychometric curve in the direction of the higher cost. (b) Example data from one observer in the five different conditions. Crosses show data points, and the smooth lines show psychometric functions fit to the data, with the slope constrained to be the same for each condition. (c) Illustration of the procedure for measuring observed shifts once psychometric functions have been fit to the data from the five conditions. (d) Illustration of the procedure of taking the inverse value of the psychometric function at the values of α used in the experiment. The optimal shift between two psychometric curves is then given by the difference between the two corresponding inverse values.
Figure 2
 
Evaluating behavioral optimality. (a) Illustration of the qualitative prediction for optimization of the loss function—observers should give the answer with the lower penalty when uncertain, resulting in a shift of the psychometric curve in the direction of the higher cost. (b) Example data from one observer in the five different conditions. Crosses show data points, and the smooth lines show psychometric functions fit to the data, with the slope constrained to be the same for each condition. (c) Illustration of the procedure for measuring observed shifts once psychometric functions have been fit to the data from the five conditions. (d) Illustration of the procedure of taking the inverse value of the psychometric function at the values of α used in the experiment. The optimal shift between two psychometric curves is then given by the difference between the two corresponding inverse values.
Figure 3
 
Deviance residuals between model and data. Deviance residuals between the model fitted to the behavioral data and the data points for each observer in each of the two sessions.
Figure 3
 
Deviance residuals between model and data. Deviance residuals between the model fitted to the behavioral data and the data points for each observer in each of the two sessions.
Figure 4
 
Comparison of predicted and observed behavior. (a and b) Predicted and observed relative shifts in the centers of the psychometric curves, for the first (a) and the second (b) sessions. If performance is quantitatively optimal (up to a constant bias), the data points (grey circles) should lie on the dotted identity line. The ellipses show the 2 σ covariance expected due to sampling errors, and the dashed line is a linear fit to the data points, computed by minimizing the weighted squared error in the plane with respect to these covariances. All observers showed the predicted pattern of shifts but were not quantitatively exact. (c and d) The mean and variance of the score that would be obtained if each observer behaved optimally given the directional bias was calculated for the first (c) and the second (d) sessions (see 2). Crosses plot predicted against observed score, with observers numbered as in panels a and b. The identity line again represents optimal performance (given the directional bias), and the vertical bars show one standard deviation from the mean. All points are within this range except for one observer in the first session. Filled circles show the mean score expected if observers failed to shift the center of their psychometric curves from the biased center of the curve for α = 0.5, and all such points lie outside the predicted range.
Figure 4
 
Comparison of predicted and observed behavior. (a and b) Predicted and observed relative shifts in the centers of the psychometric curves, for the first (a) and the second (b) sessions. If performance is quantitatively optimal (up to a constant bias), the data points (grey circles) should lie on the dotted identity line. The ellipses show the 2 σ covariance expected due to sampling errors, and the dashed line is a linear fit to the data points, computed by minimizing the weighted squared error in the plane with respect to these covariances. All observers showed the predicted pattern of shifts but were not quantitatively exact. (c and d) The mean and variance of the score that would be obtained if each observer behaved optimally given the directional bias was calculated for the first (c) and the second (d) sessions (see 2). Crosses plot predicted against observed score, with observers numbered as in panels a and b. The identity line again represents optimal performance (given the directional bias), and the vertical bars show one standard deviation from the mean. All points are within this range except for one observer in the first session. Filled circles show the mean score expected if observers failed to shift the center of their psychometric curves from the biased center of the curve for α = 0.5, and all such points lie outside the predicted range.
Table 1
 
Values of α corresponding to costs and rewards.
Table 1
 
Values of α corresponding to costs and rewards.
R r = R l C l C r α Curve shift
+20 −10 −50 0.3
+20 −20 −40 0.4
+20 −30 −30 0.5 0
+20 −40 −20 0.6
+20 −50 −10 0.7
Table A1
 
Results of Bayesian model selection: Laplace approximation to marginal log likelihood for each of four models for each observer. Bold text shows the model with the highest log likelihood for each participant. It should be noted that each unit difference in log likelihood corresponds to an e-fold ratio of model probabilities.
Table A1
 
Results of Bayesian model selection: Laplace approximation to marginal log likelihood for each of four models for each observer. Bold text shows the model with the highest log likelihood for each participant. It should be noted that each unit difference in log likelihood corresponds to an e-fold ratio of model probabilities.
Summed Laplace approximation for individual session models
Observer Single ρ Separate ρ
Single ε Separate ε Single ε Separate ε
1 1995 1997 1970 1975
2 1602 1638 1613 1619
3 2351 2338 2346 2335
4 2061 2033 2055 2041
Laplace approximation for pooled session model
Single ρ Separate ρ
Observer Single ε Separate ε Single ε Separate ε
1 1851 1841 1827 1830
2 1433 1487 1446 1441
3 2200 2185 2197 2181
4 1944 1948 1941 1929
Table A2
 
Results of model fitting to experimental data: center ( μ), slope ( ρ), and lapse ( ε) parameters for each observer in each α condition and each session (values given to 2 significant figures). For each observer, there is a separate μ for each α condition, representing the center of the psychometric function in pixels. However, there is only a single ρ for all α conditions, representing the fact that the observer's internal uncertainty is the same regardless of the value of α. The Gaussian standard deviation in pixels corresponding to these values of ρ is given in the next column. Two observers have a single ε for all α, and two have separate ε for each α. These constraints on parameter values were determined via Bayesian model comparison (see Table A1).
Table A2
 
Results of model fitting to experimental data: center ( μ), slope ( ρ), and lapse ( ε) parameters for each observer in each α condition and each session (values given to 2 significant figures). For each observer, there is a separate μ for each α condition, representing the center of the psychometric function in pixels. However, there is only a single ρ for all α conditions, representing the fact that the observer's internal uncertainty is the same regardless of the value of α. The Gaussian standard deviation in pixels corresponding to these values of ρ is given in the next column. Two observers have a single ε for all α, and two have separate ε for each α. These constraints on parameter values were determined via Bayesian model comparison (see Table A1).
Observer α Session 1 Session 2
μ (pixels) ρ (1/pixels) σ (pixels) ε (probability) μ (pixels) ε (1/pixels) σ (pixels) ε (probability)
1 0.7 −4.0 }0.072 }5.6 0.052 −3.8 }0.086 }4.6 0.031
0.6 −3.5 0.025 −3.3 0.00002
0.5 −1.9 0.13 −2.1 0.017
0.4 2.5 0.16 0.55 0.031
0.3 3.1 0.045 0.63 0.072
2 0.7 −7.2 }0.049 }8.2 0.019 −7.3 }0.055 }7.2 0.013
0.6 −7.5 0.0019 −5.1 0.00012
0.5 −2.1 0.11 −1.8 0.10
0.4 2.0 0.021 3.1 0.10
0.3 4.2 0.0098 5.8 0.016
3 0.7 −2.3 }0.087 }4.6 }0.017 −0.92 }0.13 }3.1 }0.011
0.6 −0.79 −0.60
0.5 0.82 0.28
0.4 1.6 0.44
0.3 4.5 0.76
4 0.7 −3.5 }0.075 }5.3 }0.0069 −3.6 }0.079 }5.1 }0.042
0.6 −2.3 −2.1
0.5 0.45 1.3
0.4 4.3 4.2
0.3 4.8 5.0
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×