Depth perception involves combining multiple, possibly conflicting, sensory measurements to estimate the 3D structure of the viewed scene. Previous work has shown that the perceptual system combines measurements using a statistically optimal weighted average. However, the system should only combine measurements when they come from the same source. We asked whether the brain avoids combining measurements when they differ from one another: that is, whether the system is robust to outliers. To do this, we investigated how two slant cues—binocular disparity and texture gradients—influence perceived slant as a function of the size of the conflict between the cues. When the conflict was small, we observed weighted averaging. When the conflict was large, we observed robust behavior: perceived slant was dictated solely by one cue, the other being rejected. Interestingly, the rejected cue was either disparity or texture, and was not necessarily the more variable cue. We modeled the data in a probabilistic framework, and showed that weighted averaging and robustness are predicted if the underlying likelihoods have heavier tails than Gaussians. We also asked whether observers had conscious access to the single-cue estimates when they exhibited robustness and found they did not, i.e. they completely fused despite the robust percepts.

^{−5}sec, which corresponds to a speed of 2.9976 × 108 m/sec. However, if the two low values were first rejected, the mean was 2.4828 × 10

^{−5}sec, which is closer to the currently accepted value of 2.4833 × 10

^{−5}for Newcomb's experiment (Gelman, Carlin, Stern, & Rubin, 2003). In the analysis of data sets like Newcomb's, the mean is often used to estimate the location of the center of the data, but outliers can have a large and potentially detrimental effect by dragging the mean away from the bulk of the data. It can be useful, therefore, to adjust the data by eliminating or down-weighting outliers before computing the statistic of interest. Robust statistics provide methods to do just that; the methods allow the estimation of statistics such as the central tendency and variation without being unduly affected by outliers (Huber, 1981). In this paper, we examine the visual system's treatment of sensory information that is either relatively consistent or quite inconsistent. We examine in particular whether the system exhibits behavior similar to statistical robustness.

*a priori*information (e.g., that light generally comes from above).

*S*denote an environmental variable (e.g., shape, size, or slant) and let {

*X*

_{i}} for

*i*= 1, …,

*N*denote the perceptual system's measurements from

*N*cues. All sensory measurements are subject to variation due to measurement error and variation in the mapping between the environment and sensory apparatus. It is conventionally assumed that

*X*

_{i}are Gaussian distributed with variance

*σ*

_{i}

^{2}and are conditionally independent (i.e., their noises are independent). Using Bayes' rule and the assumption that

*X*

_{i}are conditionally independent, the brain can minimize the uncertainty about the environmental variable:

*X*

_{i}is the sensor image data corresponding to the

*i*th cue. The first

*N*terms on the right side of the equation are the likelihood functions representing the probabilities of observing the sensor data from each of the

*N*measurements if

*S*is the actual value of the environmental variable. The last term is the prior distribution, the probability of observing the value

*S*in the scene; it is independent of the sensory data. The left side is the posterior distribution representing the combined estimate from the measurements. Unless there are immediate consequences to certain actions (payoffs and penalties), it is most advantageous for the observer to choose the value of

*S*that maximizes the posterior: this is the maximum

*a posteriori*(MAP) rule.

*Gaussian-likelihood model*to refer to the case in which the likelihoods are Gaussian and conditionally independent. Assuming a sensible decision rule, such as MAP, this model always predicts

*weighted averaging*(a phrase that derives from expressing the likelihoods in Equation 1 as Gaussian and their product as a weighted sum of Gaussian-distributed random variables). In weighted averaging, the combined estimate is always in-between the single-cue estimates, which is represented schematically in Figure 1b.

*N*measurements (1 from each of

*N*cues), one outlier may be obvious among

*N*− 1 concurring measurements; standard techniques in robust statistics proscribe how much to down-weight that measurement (Gelman et al., 2003; Huber, 1981). Here we investigate the situation in which

*N*= 2, a case in which standard statistical techniques do not pinpoint the outlier.

*μ*

_{ i}and variances

*σ*

_{ i}

^{2}. The product of two Gaussians is a Gaussian with mean (

*μ*

_{1}

*σ*

_{2}

^{2}+

*μ*

_{2}

*σ*

_{1}

^{2})/(

*σ*

_{1}

^{2}+

*σ*

_{2}

^{2}) and variance

*σ*

_{1}

^{2}

*σ*

_{2}

^{2}/(

*σ*

_{1}

^{2}+

*σ*

_{2}

^{2}). As the means of two Gaussians become increasingly disparate, the product's mean always remains at a fixed proportional distance in-between the two, corresponding to the weighted-average prediction. Thus, in this framework, Gaussian likelihood functions can never produce robust behavior, even at very large cue conflicts. However, small changes in the shape of the likelihood functions can yield significant changes in the position and shape of the posterior for large conflicts and thereby produce robust behavior (Box & Tiao, 1992; Knill, 2003). If the likelihood functions are leptokurtotic, meaning they asymptote to zero more gradually than Gaussians (“heavy” tails, Figure 1a), their product (i.e., the posterior) is no longer Gaussian. The shape of the posterior, which may be skewed or multi-modal, is then determined by the size of the conflict relative to the variance and tail heaviness of the likelihoods (intersection of joint-likelihood cross with cues-consistent line in Figure 1c). When the likelihoods' tails are heavier than Gaussians' and the conflict is small, the product is nearly Gaussian, so behavior is nearly identical to weighted averaging of the cue values. When the conflict is large, robustness can occur when one arm of the joint-likelihood cross intersects the cues-consistent line (Figure 1c). We will refer to this as the

*heavy-tailed likelihood model*.

*r*

_{ D}:

*r*

_{ T}where,

*r*

_{ i}is 1/

*σ*

_{ i}

^{2}is the variance and

*D*and

*T*are disparity and texture, respectively. To distinguish weighted averaging from robustness, the reliabilities cannot differ too greatly. For example, if

*r*

_{ D}≪

*r*

_{ T}, the Gaussian and heavy-tailed models would predict the same behavior: perceived slant close to the texture-specified slant,

*S*

_{ T}. The relative reliabilities cannot be too similar either. For example, if

*r*

_{ D}≃

*r*

_{ T}and the observer behaved robustly by always choosing disparity, we would not be able to determine the reason: it could be that robustness occurs because texture is always ignored with large conflicts, or it could be that the observer always chooses the more reliable stimulus, and the disparity stimulus was just slightly more reliable. Therefore, we chose reliability ratios of 3:1 (disparity more reliable) and 1:3 (texture more reliable) because those values made it possible to discriminate the possibilities under consideration. For each observer, we found a range of conflict sizes and viewing distances that from single-cue measurements would yield approximately those relative reliability ratios.

^{2}. The visible stimulus height was 20° and width was 25° ± 5°. The virtual viewing distance, specified by vergence and pattern of vertical disparities, ranged from 15 to 157 cm (see below). We verified that the random-dot stimulus did not contain useful monocular slant information by presenting it to all observers with one eye occluded. In all cases, the monocular random-dot stimulus was not a reliable cue to slant: observers were either not able to make slant discriminations at all, or performed much worse than with binocular information. Thus observers were quite unlikely to have used monocular slant information in the disparity-only stimulus.

*S*

_{ D}=

*S*

_{ T}) or in conflict (“cue-conflict”

*S*

_{ D}≠

*S*

_{ T}; Figure 2d). In the no-conflict cases, homogeneous textured surfaces were projected directly to the two eyes. In cue-conflict cases, we first calculated a perspective projection of the texture with slant

*S*

_{ T}at the Cyclopean eye and then found the intersections of rays through this Cyclopean projection with a surface patch at the disparity-specified slant

*S*

_{ D}. The markings on this latter surface were then projected to the left and right eyes to form the two monocular images.

*S*, and the other contained the comparison stimulus at

*S*±

*δS*. The interval order was random. We used adaptive staircase procedures to vary

*δS*. The procedures had two reversal rules—2-down/1-up and 1-down/2-up—to distribute points along the psychometric function. Two to four staircases were employed for each psychometric function, corresponding to an average of 124 trials per function. In each session, one base slant and one cue were presented with two interleaved staircases. Sessions were conducted in random order. The virtual viewing distance was constant within a given experimental session.

*σ*

_{D}and

*σ*

_{T}(Green & Swets, 1974). These values are the just-noticeable differences (JNDs), the slant differences that are correctly discriminated ∼84% of the time. Figure 3a shows single-cue JNDs as a function of slant. In general, texture JNDs decreased (improved) as the absolute value of slant increased; JNDs with the regular texture were generally lower than with the irregular texture. Disparity JNDs generally decreased as the absolute value of slant increased, but sometimes increased at large slants. As one would expect, disparity JNDs increased with viewing distance.

*S*∣ = 0, 6, 11, 22, 45, 60, 90, and 120° where Δ

*S*=

*S*

_{ D}−

*S*

_{ T}. We found the points

*S*such that

*S*

_{ D}=

*S*+ Δ

*S*and

*S*

_{ T}=

*S*− Δ

*S*along each of the two contours for each Δ

*S*for each observer; this yielded 18 conflict stimuli. We could not always find stimuli that produced the desired reliability ratios for the full range of conflict sizes, so we did not test all 18 conditions in all observers.

*λ*ranges between 0 and 1 and is the probability of the primary Gaussian distribution with primary variance

*σ*

_{1}

^{2}and mean

*μ*, and 1 −

*λ*is the probability of the secondary Gaussian with secondary variance

*σ*

_{2}

^{2}and the same mean

*μ*. We assumed that the secondary Gaussian distribution produced the heavy tails: i.e.,

*σ*

_{2}>

*σ*

_{1}. We also assumed that JNDs from the single-cue measurements ( Figure 2a) provided an estimate of

*σ*

_{1}, which is reasonable for

*σ*

_{2}≫

*σ*

_{1}and

*λ*≥ 0.5. Thus, we set

*σ*

_{1}based on the single-cue discrimination thresholds leaving

*σ*

_{2}as a free parameter. We observed that

*λ*and

*σ*

_{2}had similar effects on the fits to the data (decreasing

*λ*was very similar to increasing

*σ*

_{2}), so we fixed

*λ*at 0.5 leaving

*σ*

_{2}as the only free parameters (i.e.,

*p*(

*X*∣

*S*) = 0.5N(

*μ*

_{1},

*σ*

_{1}) + 0.5N(

*μ*

_{2},

*σ*

_{2})). There were, therefore, two free parameters per observer:

*σ*

_{ D2}for disparity and

*σ*

_{ T2}for texture.

*σ*

_{ T2}was estimated separately for the irregular and regular textures. The joint likelihood in the heavy-tailed model is the product:

*σ*

_{ T2}>

*σ*

_{ D2}). When the stimulus had a regular texture, the texture likelihoods had less heavy tails than the disparity likelihoods (i.e.,

*σ*

_{ D2}>

*σ*

_{ T2}). This occurred because most observers chose disparity when robust with irregular texture and chose texture with regular texture and consequently, the tail heaviness in the modeling changed to capture this change in behavior. In sum, the estimated secondary variances are consistent with the ignored cue having heavier tails.

σ _{ D2} | SD | σ _{ T2} Irregular | SD | σ _{ T2} Regular | SD | |
---|---|---|---|---|---|---|

S1 | 21.01 | 1.12 | 54.74 | 16.31 | 4.92 | 0.31 |

S2 | 33.85 | 16.83 | 59.52 | 25.52 | 7.45 | 2.46 |

S3 | 70.10 | 13.88 | 22.29 | 6.15 | 6.09 | 0.66 |

S4 | 74.20 | 20.10 | 64.56 | 16.55 | 6.75 | 7.59 |

S5 | 36.59 | 17.58 | 70.85 | 31.50 | 8.58 | 3.73 |

S6 | 10.58 | 4.56 | 16.28 | 4.61 | 10.68 | 2.93 |

*r*

_{ D}:

*r*

_{ T}where,

*r*

_{ i}is 1/JND

_{ i}

^{2}and JND

_{ i}is the slant difference that is correctly discriminated ∼84% of the time. The difference in the two models is the shape of the psychometric function: cumulative Gaussian for the Gaussian model and a more complex function for the heavy-tailed model. In the cases in which observers were robust choosing the less reliable cue according to the Gaussian model, this cue was always more reliable under the heavy-tailed model. Said another way, under the heavy-tailed model, the observer always chooses the more reliable cue (i.e., the cue with the smaller JND). Thus, in the framework of the heavy-tailed model, observers' behavior was statistically optimal.

*ρ*= 0.42,

*p*< 10

^{−5}).

*complete fusion*. Hillis and colleagues used the same paradigm to examine the combination of cues from different senses. They found that observers could usually make oddity discriminations among two-cue, visual-haptic stimuli based on the separate visual and haptic stimulus values. This means that conscious access was maintained to the visual and haptic signals. We will refer to this behavior as

*no fusion*.

*fusion vector*is the yellow line between the two-cue stimulus (black circle) and the two matching single-cue stimuli (yellow square). The red and blue lines represent the predicted fusion vectors for the Gaussian and heavy-tailed models, respectively. The vector direction for the Gaussian model is independent of conflict size while the direction for the heavy-tailed model rotates toward horizontal or vertical with increasing conflict size. The Gaussian and heavy-tailed predictions are thus similar when the conflict is small, but become quite dissimilar with increasing conflict size as robustness occurs and the vector predicted by the heavy-tailed model rotates toward horizontal or vertical.

*fusion index*is the length of the fusion vector divided by the distance from the two-cue conflict stimulus to the cues-consistent line along the vector. An index of 1 indicates complete fusion and an index of 0 no fusion. The indices in Figure 9 are spread more widely at small conflicts because the normalization involved in computing the fusion index has an increasingly small denominator with small conflict sizes and so the index becomes more sensitive to measurement noise. The figure shows quite clearly that fusion indices were centered around 1, indicative of complete fusion, for all conflict sizes. The fact that the data are quite consistent with complete fusion, particularly at large conflicts, is inconsistent with the finding of van Ee et al. (2002) that conflict stimuli yield bi-stable percepts.

*p*

_{common}that equals the prior probability of one cause p(

*C*= 1), where

*C*equals the number of causes, and three others. We simulated three variants of the model: no fusion (

*p*

_{common}= 0), complete fusion (

*p*

_{common}= 1), and partial fusion (

*p*

_{common}as a free parameter). The likelihood parameters (

*σ*

_{D1}

^{2},

*σ*

_{D2}

^{2},

*σ*

_{T1}

^{2}, and

*σ*

_{T2}

^{2}) were set by measurement and analysis in Experiment 1. The prior over slant was assumed to be uniform; we later verified that this assumption had no impact on our main conclusions. We then measured goodness of fit in the same fashion as we functions (df = 16–48 depending on the observer), coin flipping (df = 0), Gaussian, no fusion (df = 0,

*p*

_{common}did for Experiment 1. Eight models were tested: psychometric = 0), Gaussian, partial fusion (df = 1), Gaussian, complete fusion (df = 0,

*p*

_{common}= 1), heavy-tailed, no fusion (df = 0,

*p*

_{common}= 0), heavy-tailed, partial fusion (df = 1), and heavy-tailed, complete fusion (df = 0,

*p*

_{common}= 1). The goodness of fit was normalized such that the psychometric and coin-flipping models represented the upper and lower bounds, respectively. Figure 10 shows the results. The goodness of fit for the heavy-tailed likelihood was always better than for the Gaussian likelihood, consistent with Experiment 1. Among the heavy-tailed models, the goodness of fit for the complete-fusion model was also consistently greater than for no-fusion model and was nearly as great as for the partial-fusion model, which had a free parameter. The fits for the partial-fusion model were quite similar to the fits for the complete-fusion model, suggesting that adding

*p*

_{common}as a free parameter rather than fixing it at 1 is not necessary to account for these data. Indeed, the mean best-fitting

*p*

_{common}for the heavy-tailed, partial-fusion model was

*p*

_{common}= 0.88(±0.08), which is quite close to 1. Computing Bayesian Information Criteria (BIC) (Burnham & Anderson, 2002), which penalizes the partial-fusion models for their one free parameter, revealed decisive evidence for all observers in favor of the heavy-tailed, complete-fusion model. The results therefore suggest that observers assumed one cause, i.e., p(

*C*= 1) ≈ 1.

*a priori*that there is only one cause.

*ρ*= 0.29 (

*p*< 0.005) for disparity-only matches and

*ρ*= 0.19 (

*p*< 0.05) for texture-only matches). As we discussed in the analysis of JNDs in Experiment 1, the increase in JND is much less than one would expect if cue switching.

*γ*is the horizontal version of the eyes; the eye-position signals are measured via extra-retinal signals. Equation 4 is highly non-linear, so even if the noises associated with disparity and eye-position measurements were Gaussian distributed, the resulting distribution of slant estimates would not be. We simulated Equation 4 with Gaussian-distributed disparity and eye-position measurements and indeed observed heavy tails in the direction of greater slants. Thus, it seems reasonable to assume non-Gaussian distributions for the processes of estimating slant from disparity (see also Porrill, Frisby, Adams, & Buckley, 1999).

*ρ*: to correct

*σ*

_{D1}

^{2}, one divides it by 1 −

*ρσ*

_{D1}/

*σ*

_{T1}. The desired relative reliability ratios

*r*

_{T}:

*r*

_{D}in our experiments were 3:1 and 1:3. When corrected for correlation, these ratios become [3 −

*ρ*

*ρ*

*ρ*

*ρ*

*ρ*increases in this condition, disparity becomes even more reliable relative to texture. Likewise, in the 1:3 condition, texture should have been more reliable than disparity. As

*ρ*increases in this condition, texture becomes even more reliable relative to disparity. Thus, if the two cues were correlated, the actual reliabilities would not be as close to each other as we had estimated. This causes the predictions of the Gaussian likelihood model to be closer to the prediction of robustness choosing the more reliable cue. But such behavior is not consistent with much of our data because we often observed robustness in which the observer chose the less reliable cue. We conclude that an undetected correlation between the sensory noises would not substantively change the interpretation of our main findings.

*C*= 1) or two (

*C*= 2)—along with the prior probability of one cause, p(

*C*= 1). The causal-inference model describes the percept as a weighted sum of the complete-fusion and no-fusion resultants. The prior p(

*C*= 1) is a constant for any two sensory estimators. Thus, as conflict size increases, the causal-inference model makes a gradual transition from complete fusion through partial fusion to no fusion. The causal-selection model is similar except that it first selects

*C**, the number of causes that is the most probable, and then subsequent inference is based on

*C** alone. The causal models thus instantiate the simple idea that if there is likely to be a common cause, cues are combined and complete fusion occurs, and if there is not likely to be a common cause, they are segregated and no fusion occurs. The models' behavior is generally consistent with behavioral data in multi-sensory studies (e.g., Wallace et al., 2004). But the models are in an important sense not consistent with the data from our experiments. They link the departure from cue combination in part to the attribution of two causes determined by the conflict between the cues. We found instead that two causes were never perceived for our stimuli; observers fused completely no matter what the conflict size was. The causal models also link cue combination, manifest as weighted averaging, to the inference of one cause, and link cue segregation, which is similar to robustness, to the inference of two causes. Our data do not support a linkage between the transition from weighted averaging to robustness and the number of inferred causes. Perhaps the models are most readily applicable to cue combination between senses where the amount of cue fusion depends strongly on conflict size (Hillis et al., 2002; Wallace et al., 2004). Of course, the causal models could be modified to be consistent with our data by assuming p(

*C*= 1) = 1 and by assuming that the underlying likelihoods are heavy tailed (Natarajan et al., 2009).

*C*= 1) for the measurements

*x*

_{D}and

*x*

_{T}, then their joint likelihood is:

*x*with mean and standard deviation , and

_{P}is the standard deviation of a slant prior which is assumed to be centered at zero. Within each of the four integrals is the product of three Gaussians, which is equal to a Gaussian times a scale factor. The integral is thus one (the integral of a Gaussian) times the scale factor:

*C*= 2) for

*x*

_{D}and

*x*

_{T}, their joint likelihood, is:

*p*(C = 1) is the free parameter

*p*

_{common}. This in turn can be used to estimate the probability of one cause, given the measurements and :

*s*, not the internal slant measurement

*x*, so we are ultimately interested in the posterior . Next we will compute the variants of this posterior in terms of

*s*

_{D}and

*s*

_{T}, and one or two sources. If there are two sources (

*C*= 2) we have:

*C*= 1), the posteriors for

*s*

_{D}and

*s*

_{T}are the same:

*p*

_{common}, was found by maximizing the likelihood of all the Experiment 2 data. This was done separately for each observer.

*Journal of the Optical Society of America A, Optics and Image Science, 5,*1749–1758. [PubMed] [CrossRef] [PubMed]

*Vision Research, 35,*703–722. [PubMed] [CrossRef] [PubMed]