Most research on depth cue integration has focused on stimulus regimes in which stimuli contain the small cue conflicts that one might expect to normally arise from sensory noise. In these regimes, linear models for cue integration provide a good approximation to system performance. This article focuses on situations in which large cue conflicts can naturally occur in stimuli. We describe a Bayesian model for nonlinear cue integration that makes rational inferences about scenes across the entire range of possible cue conflicts. The model derives from the simple intuition that multiple properties of scenes or causal factors give rise to the image information associated with most cues. To make perceptual inferences about one property of a scene, an ideal observer must necessarily take into account the possible contribution of these other factors to the information provided by a cue. In the context of classical depth cues, large cue conflicts most commonly arise when one or another cue is generated by an object or scene that violates the strongest form of constraint that makes the cue informative. For example, when binocularly viewing a slanted trapezoid, the slant interpretation of the figure derived by assuming that the figure is rectangular may conflict greatly with the slant suggested by stereoscopic disparities. An optimal Bayesian estimator incorporates the possibility that different constraints might apply to objects in the world and robustly integrates cues with large conflicts by effectively switching between different internal models of the prior constraints underlying one or both cues. We performed two experiments to test the predictions of the model when applied to estimating surface slant from binocular disparities and the compression cue (the aspect ratio of figures in an image). The apparent weight that subjects gave to the compression cue decreased smoothly as a function of the conflict between the cues but did not shrink to zero; that is, subjects did not fully veto the compression cue at large cue conflicts. A Bayesian model that assumes a mixed prior distribution of figure shapes in the world, with a large proportion being very regular and a smaller proportion having random shapes, provides a good quantitative fit for subjects' performance. The best fitting model parameters are consistent with the sensory noise to be expected in measurements of figure shape, further supporting the Bayesian model as an account of robust cue integration.

*a*and

*b*. When

*hidden*parameters describing object or scene properties that an observer is not necessarily estimating. For example, the shapes of figures in an image only provide cues to the figure's 3D orientation because of statistical constraints on the shapes of figures to be found in our environment. Because figures come in different categories (symmetric, isotropic, random, etc.), the true prior probability density function over the space of shape parameters is really a mixture of qualitatively different priors. The important consequence of this structure for cue integration is that the likelihood function associated with a cue that depends on a mixture of priors is itself an additive mixture of likelihood functions. Each component likelihood function is derived using a different prior model and then weighted by the probability that the prior model applies to the object being viewed (e.g., the probability that a figure is symmetric) and added together to form the full likelihood function for the cue. This is expressed in the equation

*M*

_{ i}are the different prior models used to compute the components of the mixed likelihood function.

*π*

_{ i}are the probabilities associated with each model (e.g., the probability that a surface texture is isotropic).

*π*

_{circle}and

*π*

_{ellipse}) and on whether the stereoscopic likelihood function is centered near the peak of the figure shape likelihood or is centered over one of its extended tails. When stereoscopic information suggests a slant similar to that suggested by the circle interpretation of a figure, the combined likelihood function is centered at a point that is well characterized by a weighted sum of the two. As the deviation between the two increases, the peak of the joint likelihood function shifts toward the peak of the stereoscopic likelihood function until, at high “conflicts”, it almost perfectly aligns with the stereoscopic peak. At this point, a Bayesian estimator will appear to have nearly turned off the figure shape cue. This is because the stereoscopic information at large conflicts is not consistent with the circle model and the random ellipse model in the mixed likelihood function for figure shape dominates the combined likelihood.

*s*

_{probe}represents subjects' probe slant settings and

*s*

_{stimulus}represents the stimulus slant. We then computed corrected (unbiased) slant settings,

*s*

_{compression}represents the slant suggested by the compression cue or, equivalently, the slant consistent with a circle interpretation of the projected ellipse. Rearranging terms, we arrive at an expression for the weight that subjects effectively gave to the compression cue in each stimulus condition

*F*(9, 7) = 6.73;

*p*< .0001.

*F*(9, 7) = 2.89;

*p*< .006.

*SE*) versus 0.15 (±0.025

*SE*), despite the fact that slant-from-disparity is more reliable at 55° (in Experiment 2) than at 35° (in Experiment 1; Hillis et al., 2004; Knill & Saunders, 2003). The performance of the Bayesian model that assumes a simple prior on aspect ratios provides further insight into this issue. Figure 3 shows two such observers—one that assumes all ellipses are circles and one that assumes a log-Gaussian distribution of aspect ratios that is peaked at 1 (circles) and has a standard deviation of 0.25. Both observers show a monotonic increase in the apparent weight given to the compression cue as difference between the slant suggested by the compression cue and the slant suggested by stereopsis increases from large negative values to large positive values, that is, as the slant suggested by the compression cue increases from near 0° to near 60°. This behavior reflects the change in the uncertainty induced by measurement noise as a function of the aspect ratio of the retinal ellipse, itself caused by the cosine law of projective foreshortening.

*α,*as a mixture of a delta function at

*α*= 1 (all ellipses in the category are circles) and a log-Gaussian distribution over aspect ratios. The log-Gaussian distribution ensures that the probability density function for 1/

*α*is equal to the density function for

*α*(necessary to make the prior invariant to rotations). The prior density function was given by

*π*

_{circle}(

*π*

_{ellipse}= 1 −

*π*

_{circle}), and the standard deviation of the log-Gaussian distribution of aspect ratios in the ellipse category,

*σ*

_{ α}.

*A,*as the aspect ratio of the projected ellipse corrupted by random Gaussian noise,

_{ A}is a random noise variable that is normally distributed with mean 0 and standard deviation,

*σ*

_{ A}, and

*S*is the slant of the surface. For the field of view used in the experiment, the cosine foreshortening law is within 1% of the true perspective foreshortening. Finally, we modeled the disparity cue as an estimate of slant corrupted by Gaussian noise

_{stereo}is a random noise variable that is normally distributed with a standard deviation,

*σ*

_{stereo}. We left the mean of the noise process as a free parameter to model biases in perceived slant-from-stereo.

*p*

_{ellipse}(

*α*) is taken from Equation 10. The first term in the mixture is an integral of the product of a Gaussian density function with a delta function on

*α*. This results in a Gaussian function with

*α*replaced by the value that sets the argument of the delta function equal to 0 (in this case,

*α*= 1). This is equivalent to the likelihood one would obtain by simply setting

*α*= 1, rather than integrating over a delta function prior on

*α*. The second term in the mixture is an integral over the possible aspect ratios in the ellipse model. This has the effect of shrinking the magnitude of that component of the likelihood function. The likelihood function shown in Figure 2 was computed using this model.

*σ*

_{stereo}, and mean equal to the true slant plus the bias term. The posterior distribution of slant, given the measured aspect ratio of an ellipse in the image and the measured stereoscopic disparities, is given by the product of this Gaussian with the likelihood function for compression and the prior on slant, which, assuming a generic viewpoint on a surface, is given by sin(

*S*). In the simulations described below, we used a minimum mean square error estimator. The estimator selected as its best estimate of the slant on a given trial the expected value of slant computed from the normalized joint likelihood function. The results were essentially the same when we used a maximum a posteriori (MAP) estimator that selected the mode of the posterior distribution on each trial.

*σ*

_{stereo}were 3.5° and 2.59° for the 35° and 55° slant conditions, respectively. The difference reflects the fact that disparity cues to slant improve slightly as a function of increasing slant (Hillis et al., 2004).

Slant | Proportion of ellipses ( π _{ellipse}) | Standard deviation of aspect ratios in the ellipse model ( σ _{ α}) | Standard deviation of aspect ratio measurements ( σ _{ A}) | Bias in visual estimates of slant-from-stereo |
---|---|---|---|---|

35° | 0.124 (±0.037) | 0.127 (±0.0085) | 0.024 (±0.0086) | −0.489 (±0.4511) |

55° | 0.039 (±0.025) | 0.104 (±0.011) | 0.036 (±0.0048) | −3.49 (±1.07) |

*implementations*of an optimal Bayesian observer who uses mixed priors to interpret a cue. The hard computational problem in such systems is determining how to reweight cues or which cue to veto. The Bayesian model, in this context, can be seen as characterizing the optimal way to perform either of these functions. The cost function that an estimator is designed to minimize will determine whether the estimator behaves as a linear integrator with graded reweighting of cues or as a system that vetoes one or another cue as a function of cue uncertainty and the size of cue conflict.

*heights*of the joint likelihood functions computed from combining Cue 2 with Cue 1 under each prior model, and the reliabilities of the estimates derived from each estimator (the spread of the associated likelihood functions).

*α*is the aspect ratio of the figure in the image and

*σ*

_{circle}

^{2}is the variance of the likelihood function for slant-from-figure shape given that the figure in the world is a circle,

*σ*

_{stereo}

^{2}is the variance of the likelihood function for slant-from-stereopsis, and

*σ*

_{prior}

^{2}is the variance of the prior density function for slant. The weights in the second term are similarly given by

*σ*

_{circle}

^{2}with

*σ*

_{ellipse}

^{2}, the variance of the likelihood function for slant-from-figure shape given that the figure in the world is taken form a random ensemble of ellipses.

- Size of conflict: The probability of a particular model given the image data decreases as the size of the conflict between the estimate derived using that model and the interpretations derived from other cues increases. This effect depends on the size of the conflict relative to the variances of the associated likelihood functions. A less constrained prior model is
*less*affected by conflict size than a more constrained prior model because the latter leads to a lower variance likelihood function for interpreting a cue. This is the reason why a Bayesian observer switches to a less constrained model at large cue conflicts. - Occam's razor: The number of parameters that are free to vary in a model and thus that need to be marginalized over to calculate that model's likelihood function determines in part the magnitude of the likelihood of the mode—the greater the number of these parameters, the smaller the likelihood of the model. In this article's example, the random ellipse model has one free parameter—the figure's aspect ratio, but in general, it has two—aspect ratio and orientation. The circle model has no free parameters. This biases the Bayesian estimator toward the more constrained model, until the previous factor overcomes this effect. For other cues, for example, texture cues provided by images of lattice textures, the random texture model can have a large number of free parameters. When viewing a perspective image of a square lattice, the likelihood of the random texture model is extremely low because of this Occam's razor effect, whereas the likelihood of the square lattice interpretation is high—assuming the prior probability of viewing a square lattice is not exceedingly low.
- The prior probability of a model: This is a simple multiplicative factor that appears in the posterior probability for a model given the available image data. Higher probability models are more likely given the image data.

*α*= 1, as does the circle model. At large cue conflicts, the tails of likelihood function for the random ellipse model are proportionally so much larger than the tails of the likelihood function for the circle model that the random ellipse model overwhelms the circle model when stereoscopic cues conflict by a great deal with the circle interpretation of slant, even when the proportion of circles in the environment is high. Finally, changing the standard deviation of the random ellipse model has a somewhat complicated effect on performance—it lowers the compression cue weights at large cue conflicts but slightly raises compression cue weights at small conflicts.

*A*and

*p*(

*A*∣

*S*) and assuming that surfaces are viewed from a uniform distribution on the view sphere, Equation A1 becomes

*σ*

_{ A}is the standard deviation of the noise on measurements of aspect ratio in the image and

*σ*

_{stereo}is the standard deviation of the noise on slant-from-disparity measurements. The bias term allows us to model relative biases in subjects' estimates of slant-from-stereo and slant-from-figure shape. The first likelihood function in Equation A2 is a mixture of a likelihood function computed with the assumption that a figure is a circle and a likelihood function computed with the assumption that a figure is a randomly shaped ellipse with an aspect ratio drawn from the distribution

*p*

_{ellipse}(

*α*). As described in the text, we model this as a log-Gaussian distribution,

*A*that is a random sample from a Gaussian distribution with mean,

*α*

_{stimulus}cos(

*S*

_{stimulus}), and standard deviation,

*σ*

_{ A}, and an estimate of slant-from-disparity that is a random sample from a Gaussian distribution with mean,

*S*

_{stimulus}+ bias, and standard deviation,

*σ*

_{stereo}. The bias term allows us to incorporate into the model potential relative biases in subjects' estimates of slant-from-figure shape and slant-from-disparity. The Bayesian observer computes as its estimate of the slant the expected value of the posterior distribution,

*σ*

_{ A},

*σ*

_{stereo},

*σ*

_{ α}(which parameterizes the spread of

*p*

_{ellipse}(

*α*)—see Equation 7), and the relative slant bias. To fit the model to subjects' data, for candidate settings of the model's parameters, we applied the same analysis used to analyze subjects' data to the outputs of the model observer averaged over many noise samples of aspect ratio and slant-from-disparity for each stimulus condition. For each stimulus condition, represented as a combination of figure aspect ratio in the world

*α*

_{ i}and slant

*S*

_{ i}, we computed the expected slant estimate for the model over many trials as the integral of

*p*(

*S*∣

*A,*

*A*and

*k*is a normalizing constant that guarantees that the exponential distributions for sensory noise inside the integral integrate to 1 (because the range of integration is bounded on at least one side, the noise distributions are not, strictly speaking, Gaussian, although the bounds are many standard deviations away form the means).

*χ*

^{2}statistic from the difference between the models' expected compression cue weights and the average of subjects' weights in each of the test conditions,

*N*is the number of test conditions and

*σ*

_{ i}is the standard error on the mean of subjects' compression cue weights in condition

*i*. We fit the model parameters by minimizing Equation A6 using a simplex algorithm.