It has been shown that human combination of crossmodal information is highly consistent with an optimal Bayesian model performing causal inference. These findings have shed light on the computational principles governing crossmodal integration/segregation. Intuitively, in a Bayesian framework priors represent *a priori* information about the environment, i.e., information available prior to encountering the given stimuli, and are thus not dependent on the current stimuli. While this interpretation is considered as a defining characteristic of Bayesian computation by many, the Bayes rule *per se* does not require that priors remain constant despite significant changes in the stimulus, and therefore, the demonstration of Bayes-optimality of a task does not imply the invariance of priors to varying likelihoods. This issue has not been addressed before, but here we empirically investigated the independence of the priors from the likelihoods by strongly manipulating the presumed likelihoods (by using two drastically different sets of stimuli) and examining whether the estimated priors change or remain the same. The results suggest that the estimated prior probabilities are indeed independent of the immediate input and hence, likelihood.

*ν*) and prior distributions: Posterior(

*ν*) ∝ Likelihood(

*ν*) * Priors. Demonstrating that a task is performed in a fashion consistent with Bayesian inference in one stimulus regime (e.g.,

*ν*

_{1}) does not necessarily predict that the priors used under that stimulus regime would be the same as those under a different stimulus regime (e.g.,

*ν*

_{2}):

*per se*. Previously testing these questions have not been possible due to a lack of theoretical framework for fitting the priors, but which is now available (Körding et al., 2007). Here, we empirically tested whether priors are indeed stable in the face of substantial changes in the sensory conditions (see Equation 1). If we find that subjects generate their posteriors under two different conditions (e.g. difference in visual contrast

*ν*) according to Equation 1, then the likelihoods are expected to be different (i.e., Likelihood(

*ν*

_{1}) ≠ Likelihood(

*ν*

_{2})). On the other hand, the priors may or may not be different between the two conditions. The priors would only be the same (i.e., Prior

_{1}= Prior

_{2}) if indeed they are independent of likelihoods.

*a priori*. Instead it assumes that the sensory signals

*x*

_{V}and

*x*

_{A}are caused by either a single source

*s*(Figure 2 left) or by two separate sources,

*s*

_{A}and

*s*

_{V}(Figure 2 right).

*x*

_{V}and

*x*

_{A}represent the visual and auditory signals, respectively, and are assumed to be conditionally independent, based on the observation that the auditory and visual signals are processed in separate pathways and are likely corrupted by independent noise.

*x*

_{ V}and

*x*

_{ A}the Bayesian observer therefore has to estimate whether the two signals originate from a common cause (

*C*= 1) or from two separate causes (

*C*= 2). How likely each scenario is depends on how similar the auditory and visual sensations (

*x*

_{ V}and

*x*

_{ A}) are. According to Bayes' rule, the probability of there being a single cause is:

*p*

_{c=}denotes the prior probability of a single cause in the environment and

*p*(

*x*

_{ V},

*x*

_{ A}∣

*C*= 1) and

*p*(

*x*

_{ V},

*x*

_{ A}∣

*C*= 2) can be found by marginalizing over

*s*

_{ A}and

*s*

_{ V}(see Körding et al., 2007). Given this knowledge, the optimal solution for the location that minimizes the mean expected squared error is:

_{VorA}is the visual or audio response,

_{C=1}is the optimal estimate if we were certain that there is a single cause, and

_{V,C=2},

_{A,C=2}are visual and auditory uni-modal estimates, respectively, if we were certain that the two stimuli are independent (two causes). We assume that the unimodal likelihoods,

*p*(

*x*

_{V}∣

*s*

_{V}),

*p*(

*x*

_{A}∣

*s*

_{A}), as well as the prior probability distribution over locations (assuming

*p*(

*s*) =

*p*(

*s*

_{V}) =

*p*(

*s*

_{A})), are normally distributed with means and variances (

*μ*

_{A},

*σ*

_{A}

^{2}), (

*μ*

_{V},

*σ*

_{V}

^{2}), and (

*μ*

_{P},

*σ*

_{P}

^{2}), respectively. Thus:

*C*is binomially distributed with

*P*(

*C*= 1) =

*p*

_{C}We assume that the mean of the likelihoods are at the veridical locations and that mean of the prior distribution over locations is at the fixation point, 0 deg. In order to relate the theoretical posterior with the subjects' responses we assume that subjects try to limit their mean deviation and therefore report the mean of their posterior. The four free parameters (

*σ*

_{ A},

*σ*

_{ V},

*σ*

_{ P},

*p*

_{C}) were fitted to the participants' responses using 10000 trials of Monte Carlo simulation and MATLAB's

*fminsearch*function (Mathworks, 2006), maximizing the likelihood of the parameters of the model.

^{2}over 300 data points (12 (

_{A},

_{V}) combinations at 25 bimodal conditions). The average human observers' performance (pooled across subjects) is remarkably consistent with the Bayesian observer in the high contrast session, yielding R

^{2}= 0.97. The goodness of fit is also good, however lower, for the low contrast session, R

^{2}= 0.75, due to the larger variability in the visual data. The consistency of the human and Bayesian observer indicates that human sensory cue combination/segregation is Bayes-optimal. We have previously compared the performance of several different models on this task and found that the Causal Inference model performs the best among these (Körding et al., 2007).

*p*(

*x*

_{ A}∣

*s*

_{ A}) and

*p*(

*x*

_{ V}∣

*s*

_{ V}) are functions of the input, whereas the prior probabilities (

*p*

_{ C}and

*p*(

*s*)) are generally assumed to be independent of the stimuli. Here we assume that the possible change in priors due to exposure to the uniform distribution of stimuli in the first session is very small due to the short duration of the session (40 minutes), and this change, if any, decays after one week of exposure to normal environment, leaving the priors in effect unchanged. Given that the auditory stimulus was the same between the two sessions, we expected the auditory likelihood

*p*(

*x*

_{ A}∣

*s*

_{ A}) to be the same across the two sessions. On the other hand, since the contrast of the visual stimulus was very different between the two sessions, we expected a noisier representation for the visual stimulus and thus, a broader likelihood distribution

*p*(

*x*

_{ V}∣

*s*

_{ V}) in the low-contrast session. A change in the stimulus supposedly has no bearing on the prior knowledge about the environment, and thus the parameters characterizing the priors (

*p*

_{C}and

*σ*

_{ p}) were expected to be the same between the two sessions. Indeed, the change in visual stimulus contrast led to considerable change in the visual performance; observers' performance in the visual-alone conditions declined on average by 40% in the low-contrast session. In contrast the average auditory performance declined only 2% in the low-contrast session.

*p*

_{C}changes from 0.24 to 0.25 and the spatial prior variance,

*σ*

_{ p}, changes from 11.55° to 13.12°. If the priors are the same in both sessions, using priors that are estimated from one data set should work as well as using priors that are optimized on the other data set. We tested this. Applying priors optimized from the high contrast data set to account for the low contrast data resulted in but a slight decrease in goodness of fit (from R

^{2}= 0.75 to R

^{2}= 0.74). Similarly, applying priors optimized from the low-contrast data to account for the high-contrast data resulted in only a slight decline in performance (from R

^{2}= 0.97 to R

^{2}= 0.95). Therefore, using priors optimized from a different data set caused hardly any decrease in goodness of fit.

*t*-test (see Figure 4). The only parameter that showed a statistically significant difference between the two sessions is that associated with the visual likelihood (visual standard deviation,

*σ*

_{ V}) (

*p*< 0.0005). No other parameter had significantly different values across the two sessions (

*p*> 0.05). Equivalently, the probability of replication is below 0.69 for each parameter except for

*σ*

_{ V}(Prep > 0.995, z = 2.65).

Likelihoods | Priors | |||
---|---|---|---|---|

Visual σ _{V} | Auditory σ _{A} | Location σ _{P} | Common cause p _{ C} | |

High (group) | 2.12° | 8.76° | 11.55° | 0.24 |

Low (group) | 11.71° | 7.95° | 13.12° | 0.25 |

High (Indiv) | 2.1 ± 0.2° | 9.2 ± 1.1° | 12.3 ± 1.1° | 0.28 ± 0.05 |

Low (Indiv) | 15.0 ± 2.1° | 9.4 ± 1.6° | 15.8 ± 2.3° | 0.24 ± 0.05 |

*μ*

_{ P}, and a bias to the likelihoods, as free parameters) does not change the results of the statistical tests above. Furthermore, neither one of these parameters undergoes a statistically significant change between the two sessions and the distribution of both parameters is not different (

*p*> 0.05) from the assumed values (i.e., zero for prior, and the veridical location for the likelihoods) in either session.

*t*-tests. The power to detect a low-mid size effect (a 0.5 standard deviation shift in the distribution) is moderately good (58%, 57% and 55% for

*σ*

_{ A},

*σ*

_{ p}, and

*p*

_{C}, respectively). The power to detect a relatively large effect size (1 standard deviation) is excellent (99% for all three). Therefore, we can be highly confident that the change in the stimuli did not cause a large change in any of these three parameters, and can be fairly confident that it did not cause a moderate change. Therefore, the magnitude of difference would have to be quite small, if any, not to be detected by these tests. In light of the fact that the difference in visual likelihoods is quite substantial (more than 10 standard deviations), such a putatively small change in priors would be negligible.

*a priori*knowledge, and that the priors and likelihoods are represented independently in the nervous system and are combined according to Bayes rule in this perceptual task.

*subjectively*optimal even when the prior does not reflect the true statistics of the environment and is thus not

*objectively*optimal. Here, we assumed that the observers have a prior bias for the center (straight ahead) location, and found that indeed this prior fits the data well and is stable across sensory conditions. If it is indeed true that most events fall in the straight-ahead location due to orienting behavior (observers quickly orient towards the events by an eye and head movement), then this prior could be considered objectively optimal. On the other hand, if this is not the case, and most auditory-visual events do not fall in the center of the auditory/visual field most of the time, then, this inference would only be subjectively optimal. Such a prior might be due to evolutionary or biological constraints (i.e. ‘hard-wired’); however, if the prior is modifiable by experience, then it is expected to reflect the true statistics of the environment as has been shown for the ‘light-from-above’ prior (Adams, Graf, & Ernst, 2004; Mamassian & Landy, 2001).

*p*(

*s*), (the counterpart of the prior on velocities in Stocker and Simoncelli's study), as well as a component,

*p*

_{C}, that encapsulates the expected probability of two auditory and visual sources being the same, and hence specifies the degree of interaction between two modalities, similar to Bresciani et al. (2006) and Shams et al. (2005). Although the prior was constant across conditions in this study, it is expected that it would vary for different tasks and different modalities. While the

*a priori*expectation of a common cause is expected to be mostly due to the learned or hard-wired statistics of the auditory-visual events in the environment, it may also be affected by the instructions provided to the observer by the experimenter or the context of the experiment (Ernst, 2007). It seems highly likely that some of the differences in the crossmodal interactions reported by different studies are due to differences in the prior expectation of the common cause,

*p*

_{C}(see Hospedales & Vijayakumar, 2009 for a recent analysis).