Experimental investigations of cue combination typically assume that individual cues provide noisy but unbiased sensory information about world properties. However, in numerous instances, including real-world settings, observers systematically misestimate properties of the world from sensory information. Two such instances are the estimation of shape from stereo and motion cues. Bias in single-cue estimates, therefore poses a problem for cue combination if the visual system is to maintain accuracy with respect to the world, particularly because knowledge about the magnitude of bias in individual cues is typically unknown. Here, we show that observers fail to take account of the magnitude of bias in each cue during combination and instead combine cues in proportion to their reliability so as to increase the precision of the combined-cue estimate. This suggests that observers were unaware of the bias in their sensory estimates. Our analysis of cue combination shows that there is a definable range of circumstances in which combining information from biased cues, rather than vetoing one or other cue, can still be beneficial, by reducing error in the final estimate.

*I*

_{S}and

*I*

_{M}, where the likelihood functions for stereo and motion,

*p*(

*I*

_{S}∣Ŝ) and

*p*(

*I*

_{M}∣Ŝ), represent the generative transfer functions producing this image data. The prior,

*p*(Ŝ), describes the probability of encountering a given shape in the world, independent of sensory data. Given this information, the most likely shape in the world, Ŝ, to have produced this sensory information is given by the maximum of the posterior probability distribution,

*p*(Ŝ∣

*I*

_{S},

*I*

_{M}).

_{C}, can be represented by a simple weighted average of the estimates provided by the individual cues Ŝ

_{S}and Ŝ

_{M}(Landy, Maloney, Johnston, & Young, 1995; Oruc, Maloney, & Landy, 2003):

*w*

_{S}, and motion,

*w*

_{M}, are determined by the relative reliabilities of the two estimators such that

*w*

_{S}=

*w*

_{M}=

*r*

_{S}, and motion,

*r*

_{M}, are given by the reciprocal of their variances,

*r*

_{S}=

*r*

_{M}=

*v*

_{C}, is given by

*r*

_{C}=

*r*

_{M}+

*r*

_{M}. A number of studies have shown that when combining sensory information, Bayes' rule, and more specifically a weighted average, provides a good account of sensory fusion both within and between modalities (Ernst & Banks, 2002; Helbig & Ernst, 2007; Hillis, Ernst, Banks, & Landy, 2002; Hillis, Watt, Landy, & Banks, 2004; Knill & Saunders, 2003; MacNeilage, Banks, Berger, & Bulthoff, 2007).

*required*for successful movement (Milner & Goodale, 1995, 2006), but rarely do such models consider how this information might be acquired. One strategy that humans seem to have adopted to control motor acts, such as prehension, is to use relative information continuously over the course of the movement (Saunders & Knill, 2003, 2004, 2005). This strategy is interesting because it actively compensates for those instances where external perceptual accuracy is not possible (Brooks, 1991a, 1991b). It is, thus, not at all obvious that veridical estimation of the metric shapes, sizes, and locations of objects is required for adaptive motor control (Brenner & Smeets, 2001; Smeets & Brenner, 2008). Furthermore, once the action is completed, endpoint errors could also be used for sensory calibration.

*MSE*):

*MSE*is defined as the square of the expected difference between the true value of shape in the world,

*S*

_{T}, and the estimated value,

*S*

_{C}. While we do not attach any special significance to the use of the

*MSE*, we have adopted it since it is a very widely used measure of the error of an estimate (Brainard & Freeman, 1997; DeGroot, 1986; Mamassian et al., 2002), closely related to variance (DeGroot, 1986). Variance is typically adopted in cue combination studies under the assumption that the estimator is unbiased. Indeed, when the estimator is unbiased, the

*MSE*is equal to the variance, since the average estimated shape is equal to the true world value. In other words, combining unbiased cues to minimize variance will also minimize

*MSE*. When one of more estimators is biased, this is no longer the case, and

*MSE*may be expressed as the sum of terms relating to the variance and the bias of the combined-cue estimator (Berger, 1985):

*v*

_{C}is the variance and

*b*

_{C}is the bias of the combined-cue estimate. In principle, it would be possible to weight cues differently so as to minimize

*MSE*. However, this would require knowledge of both the variance and the bias in the relevant cues. As we have stated, we think that it is unlikely that the visual system has access to a measure of bias in each cue; we thus consider the situation in which only the variance of the cues is known. Using

*MSE*in this way is a useful way of understanding the important relationship between variable and constant errors in cue combination but not a likely model of how the brain combines sensory information.

*S*

_{T}, and the stereo cue (

*S*

_{S}) is unbiased with a variance of

*v*

_{S}, but the motion cue (

*S*

_{M}) is biased by

*b*

_{M}= (

*S*

_{M}−

*S*

_{T}) and has a variance of

*v*

_{M}, the bias of the combined-cue estimator is given by

*MSE*of the combined-cue estimator:

*MSE*

_{C}for a fixed variance of 1 for the unbiased stereo cue but for varying levels of bias and variance in the motion cue. As would be expected, the lowest

*MSE*

_{C}is found when the motion cue is also unbiased. As the bias of the motion cue increases, so too does the

*MSE*

_{C}. However, this combined-cue error is less than that of the stereo cue alone for the bias levels satisfying Equation 8. These are shown by the portion of the curves that lie below the horizontal line, which shows the mean squared error of the stereo cue. This line represents the point beyond which it would be beneficial in terms of the expected error to combine cues, despite the fact that one of them is biased (see also Burge et al., 2010).

*b*

_{M}= (

*S*

_{M}−

*S*

_{T}) and

*b*

_{S}= (

*S*

_{S}−

*S*

_{T}). In this case, the bias in the combined-cue estimator is given by

*MSE*

_{C}for the combined-cue estimator for a range of variances and biases of the motion cue, with separate graphs for different levels of bias in the stereo cue. Here, the bias in the stereo cue is always positive, and the variance of the stereo estimator is set to 1. It can be seen that as the stereo cue becomes more biased, greater levels of bias in the motion cue can be present before the combined-cue

*MSE*is greater than that of the stereo cue alone. This is especially true for negative biases in the motion cue, which serve to counteract the stereo cue's positive bias. The interplay between the variances and biases of the cues shows that there are circumstances when, even though both cues are biased, it can still pay the observer to combine cues rather than to veto one or the other.

^{2}, and cylinders were positioned centrally on the screen. Figure 2 shows a diagrammatic side view of the viewing arrangement, and Figure 3 shows a stereogram of the stimulus.

*stretched*in depth extent. Conversely, a PSE less than one indicates that, to be perceived as circular, cylinders needed to be

*squashed*in depth extent. The JND was defined as the standard deviation of the cumulative Gaussian fitted to the observer's data. This is equivalent to the difference between the cylinder depth-to-half-height ratios corresponding to the 50% and 84% of the psychometric function. Figure 4 shows the PSEs and Figure 5 shows the JNDs for each observer across our range of viewing distances.

*R*

^{2}values for linear fits were 0.63 for the PSEs and 0.62 for the JNDs. Surprisingly, few cue combination studies provide explicit statistically assessments of the fit of the model, leaving the reader to judge this visually. One such study that has is that of Burge et al. (2010). For the combination of vision and haptic cues to slant, they found an overall

*R*

^{2}value of 0.60 for both observed and predicted JNDs and observed and predicted PSEs.

*MSE*for the single-cue and combined-cue conditions varies across observers, in many instances, there is a clear advantage, in terms of reduced

*MSE*, of combining cues rather than vetoing one or other cue. The mean

*R*

^{2}value of a linear fit to the predicted and observed combined-cue

*MSE*s for each observer was 0.78, which represents a good correspondence to that predicted.

*MSE*data support the analysis presented in the Introduction section. If the visual system were able to calibrate cues so as to eliminate bias and maintain external accuracy, combining cues so as to minimize variance using weighted averaging would also minimize

*MSE*. However, when bias is present, this is not necessarily the case (Figure 1). Observers exhibited clear perceptual bias but combined cues so as to minimize the variance of the combined-cue estimate. Despite this fact, there are clear instances where the increase in bias observers accrued from combining biased cues was more than compensated for by a reduction in variance, leading to a lower overall mean squared error. This suggests that weighted averaging can be a robust strategy to adopt in the face of unknown perceptual bias.

*MSE*is unlikely to be a viable strategy for the visual system, given that the bias in individual cues is unknown,

*MSE*allows us to gain some understanding of when it would be beneficial to combine biased cues rather than veto one or the other (Landy et al., 1995). This is because it incorporates the constant error as well as variable error in an observer's estimates (Berger, 1985). Across the range of biases found in the present study, there were clear instances where the mean squared error in the combined-cue estimate was less than that of the individual cues. This suggests that optimizing cue combination for variance might be a reasonably robust strategy for the visual system to adopt.

*because*behaviors are skilled and adept, they must be controlled by accurate metric representations (Milner & Goodale, 1995, 2006).

*carrying*the motion signal. A more likely consequence of cue conflict in our study would have been for the combined-cue stimuli to look nonrigid. This was also not reported by any of our observers. If anything, the observers commented that the combined-cue stimuli looked the most “real.” Our observers, therefore, seem to have treated the cues as belonging to the same object. As such, these results are consistent with those of Girshick and Banks (2009), who also observed perceptual unity with large cue-conflict stimuli. Finally, it is interesting to note that under many situations the visual system also seems quite unperturbed by highly discrepant sensory inputs arising from the same object (Smeets & Brenner, 2008).