**Our brain often needs to estimate unknown variables from imperfect information. Our knowledge about the statistical distributions of quantities in our environment (called priors) and currently available information from sensory inputs (called likelihood) are the basis of all Bayesian models of perception and action. While we know that priors are learned, most studies of prior-likelihood integration simply assume that subjects know about the likelihood. However, as the quality of sensory inputs change over time, we also need to learn about new likelihoods. Here, we show that human subjects readily learn the distribution of visual cues (likelihood function) in a way that can be predicted by models of statistically optimal learning. Using a likelihood that depended on color context, we found that a learned likelihood generalized to new priors. Thus, we conclude that subjects learn about likelihood.**

*x*and the blurred visual input as

*y*. Using Bayes' theorem, the posterior probability distribution of

*x*after observing

*y*can be written as

*P*(

*x*|

*y*) ∝

*P*(

*y*|

*x*)

*P*(

*x*). Here,

*P*(

*x*) characterizes our prior knowledge about the statistics of the task-relevant quantity

*x*, and represents information about

*x*before observing

*y*. Another factor

*P*(

*y*|

*x*) (called the likelihood function) represents how likely each

*x*causes

*y*; essentially, it represents the information about

*x*that is obtained from observation. By combining the prior and the likelihood, we can optimally decide how much we should trust observations versus prior knowledge. Note that once the target variable and the observed variable are defined, we can clearly define the likelihood and the prior for the task. These functions are not arbitrarily set, as is often seen in applied statistics literature.

*P*(

*y*|

*x*), which formulates how an observed quantity can deviate from the true value. In the example above, waves and streams blur the visual image and thus impose uncertainty on the visual information, resulting in a wider likelihood function. Other factors, like the refraction of light in the water, could cause a shift in the visual image. In this paper, for clarity and simplicity, we focus on the aspect of likelihood as representing uncertainty in sensory input. Highly uncertain sensory input indicates that we cannot place a high degree of trust in observation, thus likelihood also represents sensory reliability.

*N*= 9) received information about how the height was determined, while those in the without-instruction group (

*N*= 7) did not.

*y*is the splash location,

*f*is the mapped (estimated) coin location when the splash location is in the

_{i}*i*-th piece,

*a*is the left edge of the

_{i}*i*-th piece, and

*b*and

_{i}*c*are parameters.

_{i}*m*is the number of data points in the

_{i}*i*-th piece and

*f*and

_{i,j}*x*are the estimated coin location and the actual coin location corresponding to

_{i,j}*j*-th splash in the

*i*-th piece. We added the second normalization term for a smooth mapping function to avoid initial instability. After each trial, the model learns the piecewise linear function up to the trial. The only free parameter of this mapping model is the coefficient

*λ*of the normalization factor.

*n*trials by linearly regressing feedback about coin locations from the splash locations. In each trial, the observer simply predicts the coin location based on the observed splash location using the regressed line. There is only free parameter,

*n*.

*h*is one of the two possible heights (1 or 2),

^{t}*σ*

_{1}and

*σ*

_{2}are the likelihood widths for each possible height, and

*t*. The height at the beginning of an experiment is defined as

*h*

^{1}= 1. In trial

*t*, the optimal observer's task after observing

*y*is to compute the mean value of the posterior probability distribution

^{t}*P*(

*x*|

^{t}*x*

^{1:}

^{t}^{−1},

*y*

^{1:}

*), where*

^{t}*y*

^{1:}

*denotes all*

^{t}*y*s from time 1 up to time

*t*. Assuming the probabilistic structure of the generative model of the task, after some calculation, this posterior can be written as

*x*,

^{t}*x*, the observer needs to predict

^{t}*h*,

^{t}*σ*

_{1}, and

*σ*

_{2}(the last term). After observing

*x*and

^{t}*y*, the prediction can be updated as

^{t}*x*

^{t}^{+1}in the next trial. The initial priors of

*σ*

_{1}and

*σ*

_{2}are assumed to be inversely proportional to their values (Jeffreys priors). The likelihood functions are assumed to be Gaussian.

*P*(

*h*

^{t}^{+1}|

*h*) is assumed to be constant over time:

^{t}*P*(

*h*

^{t}^{+1}|

*h*) = 1 −

^{t}*α*if

*h*

^{t}^{+1}=

*h*, and

^{t}*α*otherwise. Here,

*α*is the switching probability and is the only free parameter in this learning model. In the implementation of the model, the likelihood function is estimated at every time step and updated, taking into account both the possibility of there being no switch and the possibility of a switch, each weighed by their respective probability.

*x*were discretized in the range [–1.5, 1.5] with steps of 0.01 (301 points), and distributions over

*σ*s were discretized in the range [

*e*

^{−5},

*e*] with steps of 0.01 in power (601 points). We approximated the integrals by summing up the discretized probability distributions with appropriate weights and normalizing them afterwards.

*y*is the cue position,

*y*-intercept of the regressed line from the pooled data (See Figure 2A) for each subject. If the subject had a prior mean other than 0, we expected this to be reflected in the

*y*-intercept value. We found a small bias in the narrow likelihood condition for the with-instructions group (

*t*test across subjects,

*p*= 0.01). However, the value of this bias was much smaller (0.0025 ± 0.0023

*SD*) than the width of the vertical blue bar (0.01) used to indicate response making, thus the bias was only marginal. We found no significant

*y*-intercept values for the other conditions (wide likelihood condition for the with-instructions group and both conditions for the without-instructions group). Therefore, we can reasonably assume that the subjects obtained a nearly correct prior mean.

*t*test for data from 30–70 trials after each switch,

*p*< 10

^{−5}for all conditions). For the narrow likelihood condition, both groups placed insufficient weight on vision. While we found some weak biases, subject data rapidly converged to near-optimal slopes. We will discuss these deviations later in the Discussion.

*t*test for data from 30–70 trials after switch,

*p*< 10

^{−5}), indicating that less learning occurred in the condition without instructions about likelihood change. In the with-instructions group, all of the subjects showed a clear difference between the two conditions. However, in the without-instructions group, some subjects showed a large difference as in the with-instructions group, while other subjects showed a minimal difference (data not shown). Averaging the data from the two types of subjects in the without-instructions group resulted in a weaker overall learning level for that group. Instructions thus appear to be important for this type of learning task.

*n*trials. This model can adapt to changes in likelihood. This strategy is less computationally demanding and easier to implement than the fully Bayesian model below. (c) Bayesian learning: We used a fully Bayesian optimal model that takes likelihood switching into account. This model contains a generative model of the task (Figure 5A), which it uses to make estimations about the coin location. At trial

*t*, the splash location

*y*is observed and the optimal observer estimates the unknown coin location

^{t}*x*, and to do so, it has to estimate the likelihood width

^{t}*p*< 10

^{−8}, and posthoc Tukey-Kramer test with 95% confidence interval). The estimated free parameter, the switching probability, of the optimal model (0.051 ± 0.020

*SE*in the with-instruction group), was a little higher than the actual average value of the experimental setting, 1/85 ≈ 0.012, but was not significantly different (

*t*test,

*p*= 0.09). The best-fitted parameter

*n*of the recent slope model was 22.1 ± 2.6

*SE*. This best fit of the optimal model suggests that the subjects learned the likelihood in a very efficient way. The result for the without-instructions group was similar, although the fitted parameter was much more broadly distributed across subjects (0.26 ± 0.12

*SE*). We also checked the performance of the optimal model taking the mode of the posterior distribution as the estimation of the coin location (MAP estimate) instead of the mean. Their performance was similar but the model that takes the mean was significantly better (Supplementary Figure S1).

*t*test,

*p*= 0.0002, comparison between the slope calculated from the last 100 trials in the first phase and the slope from the first 100 trials in the last phase). Note that even though the two lines during the last phase (Figure 6B) look close, each subject is involved in either red or blue line, not both. The slopes were significantly different when analyzed for each subject (paired two sample

*t*test,

*p*= 0.0004).

*t*test,

*p*= 0.03). Thus, even this extended model of slope learning cannot explain the data. These results clearly show that the subjects learned the likelihood, not the slope, in a context-dependent way, and combined the learned likelihood with the new prior.

*Journal of Neurophysiology*, 106 (1), 163–183. [CrossRef] [PubMed]

*Current Biology*, 14 (3), 257–262. [CrossRef] [PubMed]

*Journal of Neuroscience*, 23 (7), 3066–3075. [PubMed]

*Journal of Neuroscience*, 27 (26), 6984–6994. [CrossRef] [PubMed]

*Wiley Interdisciplinary Reviews: Cognitive Science*, 2 (4), 419–428. [CrossRef]

*PLoS ONE*, 5 (9), e12686. [CrossRef] [PubMed]

*Spatial Vision*, 10 (4), 433–436. [CrossRef] [PubMed]

*Journal of Vision*, 8 (4): 20, 1–19, http://www.journalofvision.org/content/8/4/20, doi:10.1167/8.4.20. [PubMed] [Article] [PubMed]

*Journal of Neuroscience*, 30 (22), 7714–7721. [CrossRef] [PubMed]

*Neural Computation*, 10 (5), 1179–1202. [CrossRef]

*Nature*, 415 (6870), 429–433. [CrossRef] [PubMed]

*Nature Neuroscience*, 3 (1), 69–73. [CrossRef] [PubMed]

*Vision Research*, 39 (24), 4062–4075. [CrossRef] [PubMed]

*Nature Neuroscience*, 13 (8), 1020–1026. [CrossRef] [PubMed]

*Annual Review of Psychology*, 55, 271–304. [CrossRef] [PubMed]

*Perception 36 ECVP Abstract Supplement*.

*Nature*, 427 (6971), 244–247. [CrossRef] [PubMed]

*Journal of Neurophysiology*, 94 (1), 395–399. [CrossRef] [PubMed]

*Frontiers in Psychology*, 3, 276. [PubMed]

*Nature Neuroscience*, 7 (2), 111–112. [CrossRef] [PubMed]

*Spatial Vision*

*,*10

*,*437–442. [CrossRef] [PubMed]

*PLoS ONE*, 6 (4), e19377. [CrossRef] [PubMed]

*Neural Computation*, 19 (12), 3335–3355. [CrossRef] [PubMed]

*Journal of Vision*, 10 (4): 1, 1–27, http://www.journalofvision.org/content/10/4/1, doi:10.1167/10.4.1. [PubMed] [Article] [CrossRef] [PubMed]

*Advances in Neural Information Processing Systems*, 18, 1289–1296.

*Journal of Neuroscience*, 26 (40), 10154–10163. [CrossRef] [PubMed]

*Sensory cue integration*. New York: Oxford University Press.

*Journal of Vision*, 11 (10): 20, 1–16, http://www.journalofvision.org/content/11/10/20, doi:10.1167/11.10.20. [PubMed] [Article]

*Current Biology*, 22 (18), 1641–1648. [CrossRef] [PubMed]

*Neuroscience Research*, 46 (3), 319–331. [CrossRef] [PubMed]

*Frontiers in Computational Neuroscience*, 4, 11. [PubMed]

*Frontiers in Integrative Neuroscience*, 5, 75. [CrossRef] [PubMed]

*PLoS ONE*, 7 (7), e40379. [CrossRef] [PubMed]