**Abstract**:

**Abstract**
**Our perceptions are fundamentally altered by our expectations, i.e., priors about the world.** **In previous statistical learning experiments (Chalk, Seitz, & Seriès, 2010), we investigated how such priors are formed by presenting subjects with white low contrast moving dots on a blank screen and using a bimodal distribution of motion directions such that two directions were more frequently presented than the others.** **We found that human observers quickly and automatically developed expectations for the most frequently presented directions of motion.** **Here, we examine the specificity of these expectations. Can one learn simultaneously to expect different motion directions for dots of different colors? We interleaved moving dot displays of two different colors, either red or green, with different motion direction distributions. When one distribution was bimodal while the other was uniform, we found that subjects learned a single bimodal prior for the two stimuli. On the contrary, when both distributions were similarly structured, we found evidence for the formation of two distinct priors, which significantly influenced the subjects' behavior when no stimulus was presented. Our results can be modeled using a Bayesian framework and discussed in terms of a suboptimality of the statistical learning process under some conditions.**

^{2}, moving coherently at a speed of 9°/sec within a circular annulus that had a minimum and a maximum diameter of 2.2° and 7°, respectively. They were generated using the Matlab programming language with the psychophysics toolbox (Brainard, 1997; Pelli, 1997) and displayed on a Mitsubishi DiamondPro 750SB monitor with a resolution of 1024 × 768 at 100 Hz. Participants viewed the display in a darkened room at a viewing distance of 70 cm. The display luminance was calibrated and linearized with a Cambridge Research Systems Colorimeter separately for each color. The background luminance was set to 5 cd/m

^{2}.

*α*) ·

*V*(

*μ*,

*κ*) +

*α*/2

*π*, where

*α*is the proportion of trials where the participant makes random estimates, and

*V*(

*μ*,

*κ*) is a von Mises (circular normal) distribution with mean

*μ*and width 1/

*κ*, given by:

*V*(

*μ*,

*κ*) = exp[

*κ*cos(

*θ*−

*μ*)/(2

*π*·

*I*

_{0}(

*κ*))]. The parameters were chosen by maximizing the likelihood of generating the data from the distribution. Participants' estimation mean and standard deviation were taken as the circular mean and standard deviation of the von Mises distribution. The use of this approach allows for more consistent and significantly smaller variances across participants, motion directions, and contrasts than merely averaging over trials without compromising the qualitative aspect of the results.

*p*= 0.37 and

*p*= 0.52 for the uniform and bimodal distributions, respectively, one-way analysis of variance [ANOVA]), so the data was combined across all participants. Additionally, there was no significant interaction between experimental session and motion direction on the bias and standard deviation (

*p*= 0.65 and

*p*= 0.46, respectively, four-way within-subjects ANOVA), so the data across sessions was combined as well.

*p*< 0.001, two-way within-subjects ANOVA between motion direction and subjects), and the estimation bias of participants at ±16° and ±48° was significantly larger and smaller, respectively, than the bias at ±32° (

*p*= 0.043 and

*p*= 0.001, signed rank test). This verified that participants made estimates that were closer to the most frequently presented directions than the actual directions of the stimulus.

*p*< 0.001 for both conditions, two way within-subjects ANOVA), and direct comparison of the estimation biases between the two conditions showed that there was no significant difference between them (

*p*= 0.1, three-way within-subject ANOVA between motion direction, color condition, and subjects). Additionally, for the uniform condition, the estimation bias of participants at ±16° was significantly larger than the bias at ±32° (

*p*= 0.012, signed rank test), and the bias at ±48° was significantly smaller than the bias at ±32° (

*p*= 0.047, signed rank test). This suggests that participants tended to perceive motion direction as being more similar to ±32° than it really was independently of the color of the dots. There appears to be large positive bias in the bimodal condition at 0°. However, the number of trials is very small for this condition and bootstrap analysis indicates that the difference between biases at 0° between the two conditions is in fact not significant (Supplementary Figure 3a).

*p*= 0.017, two-way within-subjects ANOVA between standard deviation and subjects). The standard deviations showed no significant difference between the two conditions (

*p*= 0.08, three-way within-subjects ANOVA between standard deviation, color condition, and subjects).

*trimodal*. The trimodal distribution had 40 trials per session for directions −64°, 0°, and +64°, four trials per session for directions −32° and +32°, and 24 trials per session for each of the other directions. The bimodal distribution had 44 trials per session for directions −32° and +32°, eight trials per session for directions −64°, 0°, and +64°, and 24 trials per session for each of the other directions. The distributions of the two colors were counterbalanced between participants in order to avoid any biases caused by color selection or color sensitivity.

*p*= 0.73 and

*p*= 0.6 for the trimodal and bimodal distributions, respectively, one-way ANOVA), so the data was combined across all participants. Additionally, there was no significant interaction between the experimental session and motion direction on the bias and standard deviation (

*p*= 0.4 and

*p*= 0.55, respectively, four-way within-subjects ANOVA), so the data across sessions was combined as well.

*p*= 0.29, two-way within-subjects ANOVA between motion direction and subjects). This was not unexpected; because the combined distribution of the stimuli was uniform, potential biases in the estimation of each color condition might have cancelled each other out when averaged. There was no significant effect of motion direction on the estimation bias for both conditions (

*p*= 0.12 and

*p*= 0.15 for the trimodal and bimodal, respectively, two-way within-subjects ANOVA), but there was a significant difference between the estimation biases for the two color conditions (

*p*= 0.046, three-way within-subjects ANOVA between motion direction, color condition, and subjects, Figure 5b). However, these biases were weaker than in Experiment 1. The largest difference between the conditions was at ±48°, where, on average, estimates were positively biased for the trimodal condition and slightly negatively biased (or unbiased) for the bimodal condition. Additionally, at ±64°, estimates were largely unbiased for the trimodal condition but negatively biased for the bimodal condition. In contrast, at 0° (respectively ±16°), participants' estimates were negatively biased (resp. unbiased) for both conditions. Figure 5b (inset) shows the estimation biases predicted by an ideal observer who has learned the true statistics of the stimulus. These results suggest that the participants' motion-direction estimates were approximately biased towards the most frequent directions for each color condition for outwards angles (i.e., 32° and 64°, respectively—with an outwards shift for the bimodal condition) but dominated by an attraction towards the central direction for small angles, independently of the color condition.

*p*< 0.001, two-way within-subjects ANOVA between standard deviation and subjects). The highest values were at ±16° while the lowest were at ±64°. On average, stimuli closer to the central direction produced larger standard deviations than those further away. There was no significant difference between the standard deviations for the two color conditions (

*p*= 0.23, three-way within-subjects ANOVA between standard deviation, color condition, and subjects).

*p*would be equal to one if estimation was equally likely between the most frequently presented directions and other 16° bins. It is possible to investigate how quickly these biases developed by calculating the probability ratio for individual participants every 100 trials for both sessions (including all responses up to that point, Supplementary Figure 9). For the bimodal condition, the median value of

_{rel}*p*was significantly larger than one at the most frequently presented directions of that distribution (±32°) after only 200 trials of the first session. On the other hand, for the trimodal condition it took approximately 400 and 900 trials for the probability ratio to become significantly larger than one for the most frequently presented direction of that distribution (0° and ±64°, respectively), suggesting that it may have taken longer to learn the trimodal distribution. Also, the probability ratios for the most frequently presented directions of the opposite distribution (±32° for the trimodal and 0° and ±64° for the bimodal) were never significantly larger than one.

_{rel}*Bayesian information criterion*(BIC), which is defined as BIC = −2 · ln(

*L*) +

*k*· ln(

*n*), where

*L*is the likelihood of generating the experimental data from the model,

*k*is the number of parameters in the model, and

*n*is the number of data points available. The first term quantifies the error between the data and the model predictions, while the second term penalizes increasing model complexity, and the model with the lower value of BIC should be preferred when comparing two models (Schwarz, 1978). The Bayesian model was found to exhibit significantly smaller BIC values than all other models and produced fits for the estimation bias and the standard deviation that were at least on par with the first class of models despite having fewer free parameters. This suggests that a Bayesian strategy was the best description of the participants' behavior.

*response strategy*models, and we again found that the Bayesian model was able to fit the data accurately and exhibited significantly better BIC values than the other models (Supplementary Figure 5). Next, we evaluated several extended versions of the simple Bayesian model that took into account the statistical information of the two colored conditions and compared them to the simple model. We will briefly describe the Bayesian models before reporting their performances. A more detailed description of the simple Bayesian model can be found in Chalk et al. (2010).

*θ*) of the stimulus motion direction (

_{obs}*θ*) with a probability

*p*(

_{l}*θ*

_{obs}|

*θ*) =

*V*(

*θ*,

*κ*), where

_{l}*V*(

*θ*,

*κ*) is a circular normal distribution with width 1/

_{l}*κ*. The posterior probability that the stimulus is moving in a particular direction

_{l}*θ*, using Bayes' rule, is given by multiplying the likelihood function

*p*(

_{l}*θ*

_{obs}|

*θ*) with the prior probability

*p*(

_{prior}*θ*): It was hypothesized that participants could not access the true prior,

*p*(

_{prior}*θ*), so they learned an approximation of this distribution,

*p*(

_{exp}*θ*). This approximation was defined as the sum of two circular normal distributions, each with width determined by 1/

*κ*and centered on motion directions -

_{exp}*θ*and

_{exp}*θ*respectively: Participants were assumed to make perceptual estimates of motion direction

_{exp},*θ*by choosing the mean of the posterior distribution: where

_{exp}*Z*is a normalization constant. Finally, it was hypothesized that there is a certain amount of noise associated with moving the mouse to indicate the direction the stimulus is moving and that the participants make completely random estimates in a fraction of trials

*α*. The estimation response

*θ*given the perceptual estimate

_{est}*θ*is then: where the magnitude of the motor noise is determined by 1/

_{perc}*κ*. We assumed that the perceptual uncertainty at the highest contrast was close to zero (1/

_{m}*κ*∼ 0). So, by substituting

_{l}*θ*=

_{exp}*θ*and using Equation 4 we fit participants' estimation distributions at high contrast in order to approximate the width of the motor noise (1/

*κ*) for each participant for all models.

_{m}*θ*and 1/

_{exp}*κ*, respectively), the width of the participants' sensory likelihood (1/

_{exp}*κ*), and the fraction of trials where they made completely random estimation (

_{l}*α*).

*p*(

_{prior}*θ*) in the following way: where

*c*is a free parameter fitted for each participant,

_{pr}*p*is a uniform distribution identical to the distribution of the uniform stimuli (Figure 1b), and

_{uniform}*p*(

_{bimodal}*θ*) is equal to Equation 2. The model had a total of five free parameters (

*θ*,

_{exp}*κ*,

_{exp}*κ*,

_{l}*α,*and

*c*).

_{pr}*p*(

_{gaussian}*θ*) = V(−

*θ*,

_{u}*κ*). This model was inspired by data inspection showing that participants tend to exhibit an attraction towards the central direction. This model required two additional free parameters (

_{u}*θ*,

_{u}*κ*).

_{u}*p*= 0.006 for Uni+Bi,

*p*= 0.001 for Gaus+Bi, and

*p*< 0.001 for Split_UniBi, Split_GausBi, and Split_2Bimodal, signed rank test). The single prior models performed better than the Split models, and the bimodal prior dominated over the uniform and Gaussian priors with averaged

*c*values of 0.87 ± 0.14 and 0.8 ± 0.22, respectively. The best performing Uni+Bi model was still significantly worse than the simple 1Bimodal model.

_{pr}*BIC*values exhibited by the theoretically optimal Split_UniBi model were not significantly better than the values of the other Split models despite having two free parameters less than the other two models (Figure 7a). This strongly suggests that participants did not learn a uniform prior for the uniform condition. Moreover, the Split_2Bimodal model exhibited significantly better values than the other two models (

*p*= 0.02 and

*p*= 0.002 compared to Split_UniBi and Split_GausBi, respectively) which indicates that participants learned bimodal priors for both the uniform and bimodal conditions. However, the values were significantly worse compared to the values of the simple 1Bimodal model.

*k*– 2 · ln(

*L*) + (2 ·

*k*· (

*k*+ 1)/

*n*–

*k*− 1), where

*L*is the likelihood of generating the experimental data from the model,

*k*is the number of parameters in the model, and

*n*is the number of data points available. The AIC penalizes the number of parameters less strongly than the BIC. Even so, the extended models perform worse than the 1Bimodal model, nonsignificantly for the single prior models and significantly for the Split models (Figure 7b).

*p*= 0.45, three-way within-subjects ANOVA between motion direction, model, and subjects), further suggesting that participants did not form two independent priors for the two color conditions. Additionally, while the Split_UniBi predictions were not significantly different compared to 1Bimodal predictions (

*p*= 0.14, three-way within-subjects ANOVA), they exhibited a larger mean absolute error (3.25° compared to 1.87° for the 1Bimodal model and 1.85° for the Split_2Bimodal).

*κ*and

_{l}*α*). The model's predictions on the estimation biases differed significantly from participants' estimation biases with a mean absolute error of 6.02° (Figure 9a), suggesting that participants formed nonuniform priors.

*θ*,

_{exp}*κ*,

_{exp}*θ*2

*,*

_{exp}*κ*2

_{exp}, κ_{l}, and

*α*) and the later eight (adding

*θ*3

*,*

_{exp}*κ*3

*). The predictions of these models were significantly more accurate than the predictions of the Uniform model with mean absolute errors of 5.89° and 5.76°, respectively (Figure 9a). The prior distributions predicted by the models differed extensively between participants. The standard deviation predicted by the models was larger than the experimental results (Figure 9c). However, the qualitative trend for the standard deviation to decrease away from the central direction displayed by all models (but the uniform) was consistent with the data.*

_{exp}*θ*,

_{tri}*κ*,

_{tri}*θ*,

_{bim}*κ*

_{bim}, κ_{l}, and

*α*) and corresponds to a model of the optimal observer. The models Split_2Circ and Split_3Circ are similar to 2Circ and 3Circ defined above but now with two distinct priors for each condition, requiring 10 and 14 free parameters, respectively. As can be expected from the models' increased complexity, the estimation biases predicted by the Split models were closer to the experimental results with mean absolute errors of 5.72°, 5.66°, and 5.67°, respectively. The models predict different biases for the two color conditions (Figures 9b

_{1}and b

_{2}). The Split_TriBi model provides very accurate predictions for both conditions at ±48° and at ±64° but fails at ±16°. This suggests that the participants' estimation performances were more weakly biased towards the central direction for the trimodal condition than expected and that this attractive bias possibly transferred to the bimodal condition. The predicted standard deviations of these models did not differ much from those predicted by the single prior models (Figure 9d).

*θ*values and 0°, 32°, and 64°. We found that the

_{exp}*θ*values were distributed in a similar way across all models. We calculated the ratios of

_{exp}*θ*values based on their proximity to each direction for each of the four models; the percentage of

_{exp}*θ*values (averaged over all models) which fall closest to one of the most frequent directions (than to the other two directions) were 46.4% ± 5.2% for ±64°, 37% ± 4.5% for ±32°, and 16.6% ± 4.7% for 0°. The average minimum absolute difference was 8.45° ± 2.6° from the frequently presented directions (averaged over all models). This suggests that, on average, participants do learn a distribution with peaks located around the most frequent directions. However, it seems that the representation of the central direction is suppressed compared to the other directions (±32° and ±64°).

_{exp}*, 7(10), 1057–1058. [CrossRef] [PubMed]*

*Nature Neuroscience**, 10(8):2, 1–18, http://www.journalofvision.org/content/10/8/2, doi:10.1167/10.8.2. [PubMed] [Article] [CrossRef] [PubMed]*

*Journal of Vision**, 10(7), 287–291. [CrossRef] [PubMed]*

*Trends in Cognitive Sciences**, 20(1), 91–117. [CrossRef] [PubMed]*

*Neural Computation**(pp. 53–94). Cambridge, MA: MIT Press.*

*High-level motion processing: Computational, neurobiological, and psychophysical perspectives**, 415(6870), 429–433. [CrossRef] [PubMed]*

*Nature**, 99(24), 15822. [CrossRef]*

*Proceedings of the National Academy of Sciences*

*Psychological Science**,*12(6), 499. [CrossRef] [PubMed]

*Trends in Cognitive Sciences**,*14(3), 119–130. [CrossRef] [PubMed]

*, 17(9), 767. [CrossRef] [PubMed]*

*Psychological Science**, 4(12):1, 967–992, http://www.journalofvision.org/content/4/12/1, doi:10.1167/4.12.1. [PubMed] [Article] [CrossRef] [PubMed]*

*Journal of Vision**, 427(6971), 244–247. [CrossRef] [PubMed]*

*Nature*

*Nature Neuroscience**,*9(11), 1432–1438. [CrossRef] [PubMed]

*Spatial Vision**,*10(4), 437–442. [CrossRef] [PubMed]

*Perception**,*27

*,*393–402. [CrossRef] [PubMed]

*, 11(1), 53–60. [PubMed]*

*Nature Reviews Neuroscience**, 8(15):3, 1–10, http://www.journalofvision.org/content/8/15/3, doi:10.1167/8.15.3. [PubMed] [Article] [CrossRef] [PubMed]*

*Journal of Vision*

*The Annals of Statistics**,*6(2)

*,*461–464. [CrossRef]

*Neuron**,*24(4), 911–917. [CrossRef] [PubMed]

*Perception**,*36(10), 1445–1454. [CrossRef] [PubMed]

*, 59(2), 336–347. [CrossRef] [PubMed]*

*Neuron**, 32(2), 351–358. [CrossRef] [PubMed]*

*Neuron**, 24(5), 295–300. [CrossRef] [PubMed]*

*Trends in Neurosciences*

*Nature Neuroscience**,*3

*,*270–276. [CrossRef] [PubMed]

*Nature**,*382(6591), 539. [CrossRef] [PubMed]

*Trends in Cognitive Sciences**,*12(8), 291–297. [CrossRef] [PubMed]

*, 34(2), 399. [CrossRef] [PubMed]*

*Journal of Experimental Psychology: Learning, Memory, and Cognition**, 35(1), 195. [CrossRef] [PubMed]*

*Journal of Experimental Psychology: Human Perception and Performance**, 13(4), 479–491. [CrossRef] [PubMed]*

*Journal of Cognitive Neuroscience**, 24(4), 901–909. [CrossRef] [PubMed]*

*Neuron**, 5(6), 495–501. [CrossRef] [PubMed]*

*Nature Reviews Neuroscience*