When different perceptual signals of the same physical property are integrated, for example, an objects' size, which can be seen and felt, they form a more reliable sensory estimate (e.g., M. O. Ernst & M. S. Banks, 2002). This, however, implies that the sensory system already knows which signals belong together and how they relate. In other words, the system has to know the mapping between the signals. In a Bayesian model of cue integration, this prior knowledge can be made explicit. Here, we ask whether such a mapping between two arbitrary sensory signals from vision and touch can be learned from their statistical co-occurrence such that they become integrated. In the Bayesian framework, this means changing the belief about the distribution of the stimuli. To this end, we trained subjects with stimuli that are usually unrelated in the world—the luminance of an object (visual signal) and its stiffness (haptic signal). In the training phase, we then presented subjects with combinations of these two signals, which were artificially correlated, and thus, we introduced a new mapping between them. For example, the stiffer the object, the brighter it was. We measured the influence of learning by comparing discrimination performance before and after training. The prediction is that integration makes discrimination worse for stimuli, which are incongruent with the newly learned mapping, because integration would cause this incongruency to disappear perceptually. The more certain subjects are about the new mapping, the stronger should the influence be on discrimination performance. Thus, learning in this context is about acquiring beliefs. We found a significant change in discrimination performance before and after training when comparing trials with congruent and incongruent stimuli. After training, discrimination thresholds for the incongruent stimuli are increased relative to thresholds for congruent stimuli, suggesting that subjects learned effectively to integrate the two formerly unrelated signals.

*r*=

*σ*

^{−2}. An optimal method for combining sensory information would maximize the reliability of the final (unbiased) estimate. Recently, several studies indicated that the human brain integrates sensory information in such an optimal way (Alais & Burr, 2004; Ernst & Banks, 2002; Helbig & Ernst, 2007; Hillis et al., 2004; Knill & Saunders, 2003; Landy & Kojima, 2001).

*s*= (

*s*

_{V},

*s*

_{H}). Assuming that the sensory measurement

*σ*

_{ i}added independently to each property

*i*(

*s*

_{ i}+

*σ*

_{ i})—for example, due to noise in the neural transmission of the signal—then the joint likelihood distribution

*p*(

*s*) for vision and touch is a 2D Gaussian with mean

*s*and standard deviations

*σ*

_{ i}. These

*σ*

_{ i}are given as elements of the diagonal 2 × 2 variance–covariance matrix Σ:

*p*(

*s*), which follows this relationship. Such a sharply defined prior is indicated in the right column of Figure 1. In this example, every pair of sensory signals that does not conform to the known perfect relationship between the two properties would be overruled by the strong prior belief. Hence, an estimate of the underlying stimulus properties should only allow values that are in accordance with the prior distribution.

*C*:

*p*(

*s*) =

*N*

_{ s}(

*p,*Π) with a mean

*p*= (0, 0) and covariance matrix

*σ*

_{1}

^{2}and

*σ*

_{2}

^{2}are the variances of the prior along its principal axis and

*R*is an orthogonal matrix that, in this example, rotates the coordinate system by 45°, then the posterior is also a 2D Gaussian

*p*(

*s*∣

*N*

_{ s}(

*s*

_{MAP}, Θ) with mean

*s*

_{MAP}and covariance matrix Θ.

*s*

_{MAP}is taken to be the estimator for the presented stimulus

*s*. This estimate is called the maximum a posteriori (MAP) estimator. The MAP estimator can be thought of as the optimal way to integrate noisy sensory signals, which are represented by the likelihood function, with extrasensory prior beliefs about the joint stimulus distribution, such as knowledge about the relationship between the physical stimuli. Actually, because such a prior belief is learned from the sensory signals, it can only represent the statistics of the transduced sensory signals and not directly the statistics of the physical stimuli.

*σ*

_{1}

^{2}→ ∞ and

*σ*

_{2}

^{2}→ ∞), so that the prior distribution is completely flat. This is indicated here by the uniformly dark square. The corresponding weights for the prior go to zero in this case. Therefore, the prior should have no influence on the estimates of the physical properties. In other words, the MAP estimate becomes an unbiased maximum likelihood estimate.

*s*

_{V},

*s*

_{H}) cue space. If subjects have no prior knowledge about the correlation between the cues, which would imply that the prior is flat, there is no particular direction in this cue space that is different from any other direction. Therefore, the discrimination performance also has to be the same in all directions of this space. This situation is illustrated in the left panel of Figure 2.

*df*and force feedback along the three translatory directions (PHANToM 1.5, SensAble Technologies, Inc.). Subjects have a convincing impression that they are haptically exploring the same scene they are seeing. For details about the setup, see Ernst and Banks (2002).

- The stiffness of the square is modeled using a linear spring model with spring constant
*k*(GHoST, SensAble Technologies, Inc.). The maximum stiffness that can be reliably generated with this device is*k*= 0.65 N/mm. That is, we used a stiffness ranging from 0 to 0.65 N/mm. The range is normalized from 0 to 1; hence, the maximum*k*= 0.65 corresponds to 1. - For the luminance, we only used the green electron beam (Sony Trinitron F500R). The exponent of the gamma correction was predetermined with a photometer (Minolta). We were able to present 1,024 different shades of green. We normalized the range from 0 to 1; hence, the maximum luminance 58 cd/m
^{2}corresponds to 1.

^{2}). Each trial consisted of the fixed standard and a comparison stimulus differing in luminance and/or stiffness from the standard. The odd stimulus, which was the stimulus that was only presented once during a trial, could be either the standard or the comparison stimulus chosen randomly with equal probability. To avoid participants learning the standard stimulus, we have included trials (which make up 10% of the total number of trials) where the standard stimulus was not shown; these trials were discarded for the analysis.

*θ*to be 1

*SD*of this Gaussian. Besides the standard deviation, we also had a nuisance parameter

*λ*to account for non-task-related observer lapses (Wichmann & Hill, 2001).

*SD*parameter for the threshold in the congruent direction

*θ*

_{c}and one for the threshold in the incongruent direction

*θ*

_{i}. However, we used common lapse rate parameter for the two directions because data for both directions came from the same session.

*σ*

_{2}

^{2}along the incongruent axis is reduced compared to stimuli without correlation (before training). At the extreme end, the subject could learn that there is no variance along the incongruent axis, which would mean that the subject believes that the two signals are completely correlated. After which, subjects had a brief break of a couple of minutes before they immediately continued with the posttest on the same day.

*F*(1, 8) = 0.34,

*p*= .56 and

*F*(1, 8) = 3.13,

*p*= .12, respectively. This indicates that subjects did not generally get better at discriminating during the course of the experiment. Given this baseline performance, we can now turn to the main data of the experiment.

*F*(1, 8) = 0.705,

*p*= .426, nor for congruent versus incongruent,

*F*(1, 8) = 4.128,

*p*= .077. However, it is important to note that we found a significant interaction between the two factors, pre/post vs. congruent/incongruent:

*F*(1, 8) = 14.58,

*p*< .005, which indicates that the thresholds for the congruent and incongruent directions, which were the same in the pretest, are now different in the posttest. This shows that these subjects learned to use the newly introduced redundancy between the luminance of an object in combination with its stiffness. That is, subjects learned to integrate arbitrary signals.

*n*= 11) were sensitive to the training and showed the predicted learning effect. That is, after training, there is a larger difference in thresholds in the incongruent direction than in the congruent direction. This suggests that subjects indeed learned to integrate the two arbitrarily chosen signals—luminance and stiffness. The asymmetry between congruent and incongruent thresholds cannot be explained by improvement of performance due to more practice because this would have affected the congruent and incongruent direction equally. Furthermore, we controlled for such unspecific learning by measuring the unimodal discrimination performance before and after the pre- and posttest and found no significant difference.

*σ*

_{V}

^{2}or

*σ*

_{H}

^{2}) would have produced an asymmetry in the discrimination performance between the congruent and the incongruent direction. This independence assumption of the noise distributions of the two sensory measurements seems safe because the measurements are derived from two separate sensory modalities. Furthermore, there is no reason to believe that introducing a correlation between the signals during training would affect this independence assumption of the noise distributions of the signals. Thus, the asymmetry in the learning effect between congruent and incongruent direction can be best explained by a change in the variance of the Coupling Prior and not by a change in the likelihood distribution.

*Essay Towards a New Theory of Vision*:

“Sitting in my Study I hear a Coach drive along the street; I look through the Casement and see it; I walk out and enter into it; thus, common Speech would incline one to think, I heard, saw, and touch'd the same thing, to wit, the Coach. It is nevertheless certain, the Ideas intromitted by each Sense are widely different, and distinct from each other; but having been observed constantly to go together, they are spoken of as one and the same thing.”