Bayesian inference theories have been extensively used to model how the brain derives three-dimensional (3D) information from ambiguous visual input. In particular, the maximum likelihood estimation (MLE) model combines estimates from multiple depth cues according to their relative reliability to produce the most probable 3D interpretation. Here, we tested an alternative theory of cue integration, termed the intrinsic constraint (IC) theory, which postulates that the visual system derives the most stable, not most probable, interpretation of the visual input amid variations in viewing conditions. The vector sum model provides a normative approach for achieving this goal where individual cue estimates are components of a multidimensional vector whose norm determines the combined estimate. Individual cue estimates are not accurate but related to distal 3D properties through a deterministic mapping. In three experiments, we show that the IC theory can more adeptly account for 3D cue integration than MLE models. In Experiment 1, we show systematic biases in the perception of depth from texture and depth from binocular disparity. Critically, we demonstrate that the vector sum model predicts an increase in perceived depth when these cues are combined. In Experiment 2, we illustrate the IC theory radical reinterpretation of the just noticeable difference (JND) and test the related vector sum model prediction of the classic finding of smaller JNDs for combined-cue versus single-cue stimuli. In Experiment 3, we confirm the vector sum prediction that biases found in cue integration experiments cannot be attributed to flatness cues, as the MLE model predicts.

*and σ*

_{t}*are the standard deviations of the noise of these estimates, then the combined estimate \({\hat z_c}\) is a weighted average with weights proportional to the reliabilities, \(r_{i}=\frac{1}{\sigma^{2}_{i}}\), of the estimates:*

_{d}*k*, of this linear function, which we term the perceptual function, depends on the quality of the visual information within cue

_{i}*i*. For example, from the images in Figures 1a to 1c, a texture module extracts the systematic change in shape and spatial frequency of texture elements resulting in an estimate\(\;{\hat z_t} = {k_t}z\). Critically, in direct contrast to the MLE model, there is no assumption that the perceptual function is veridical, nor is there any explicit representation of the associated sensory noise. Therefore, removing the veridicality assumption makes the IC theory far more parsimonious than the MLE model.

*k*, of the perceptual functions vary independently from one another due to influences from variables unrelated to 3D shape. Although for simplicity we refer to the module outputs as depth estimates, formally they are modeled as producing orthogonal signals in a multidimensional space where the vector magnitude of these signals indicates the magnitude of perceived depth. To illustrate the independence of these functions and their relationship in the multidimensional space, consider the surface depicted in Figure 1a. Along with the image signals emerging from the projection of texture elements, there is a shading gradient that also induces an independent image signal related to depth. However, the surface may be viewed in a different lighting condition. For example, overcast weather would wash out the shading gradient while leaving the texture pattern unmodified, resulting in the image shown in Figure 1b. On the other hand, the same surface can be covered with irregular texture rather than the highly regular polka dots while the lighting condition remains identical to Figure 1a. The shading pattern still produces a smooth luminance gradient but there would be a less clear gradient of texture element deformation, as depicted in Figure 1c. In these examples, independent nuisance variables are associated with the surface properties and the sources of illumination. In a similar fashion, different nuisance variables will affect other image signals, such as the speed of the observer in motion parallax (Fantoni, Caudek, & Domini, 2012) or the fixation distance between the observer and the object in binocular disparities (Johnston, 1991). In general, nuisance variables describe factors that influence the quality of depth cues and therefore cause changes in perceived depth while the distal 3D surface remains constant. In other words, despite being independent of the surface shape, nuisance variables influence the slopes of the perceptual functions.

*k*, indicates the response to distal depth from individual cues,

^{1}we refer to these parameters as

*cue strengths*. The independence of cue strengths allows us to represent the totality of the cue estimates derived from a given stimulus as a multidimensional vector with each orthogonal axes representing the information from each cue. This is illustrated in Figures 1d to 1f, which show the specific examples of 2D vectors representing texture and shading information contained in the stimuli of Figures 1a to 1c. Figure 1d represents the depth information carried by the stimulus in Figure 1a, for which texture and shading are assumed (for illustration purposes) to have the same strength (

*k*=

_{t}*k*). In Figure 1e, the strength of texture is much greater than the strength of shading, as the shading has been removed (

_{s}*k*>

_{t}*k*). In Figure 1f, the strength of shading is greater than the strength of texture, as the texture is highly irregular (

_{s}*k*<

_{t}*k*). Because both \({\hat z_t}\) and \({\hat z_s}\;\)are proportional to the 3D property

_{s}*z*, the length of the combined vector \(( {{{\hat z}_t},\;{{\hat z}_s}} )\) is also proportional to

*z*. However, because the combined vector length depends on the individual cue strengths, it will fluctuate, with both nuisance variables affecting texture and shading (surface properties and environmental illumination). Critically, the central claim of our theory is that the goal of the visual system is to maximize sensitivity to underlying 3D information while minimizing sensitivity to nuisance variables.

*k*, are not free parameters; they can be empirically measured from slopes of perceptual functions relating physical depth to perceived depth. Although the visual system must learn mappings from cue signals to perceived depth, it does not explicitly represent cue strengths. “Cue strength” refers to the empirically derived slope of the perceptual function. Third, because the model is additive, it may be misunderstood as producing systematic overestimations as more cues are added to a stimulus. In fact, it is quite the contrary. We speculate that removing cues brings the model outside of its optimal operating conditions, which results in underestimation of depth from reduced—or single-cue stimuli. We refer to Equation 3, which predicts an increase in perceived depth with the addition of cues as the vector sum model. Finally, although we have not yet proposed a learning model for how these perceptual functions are learned for single cues, it should be noted that this model is more parsimonious than the MLE model (which also has no specified learning mechanism) in two respects. First, this model does not assume that the visual system learns to extract unbiased estimates from individual cues. Second, it does not assume that the visual system requires a learning of the reliability for each individual cue across viewing conditions.

_{i}*k*= 0). In the texture condition, a compelling texture gradient specified the depth profile of the surface while binocular disparities were set to zero (

_{t}*z*= 0). In the combined-cue condition both texture and disparity information were present in the stimulus. The choice of having the texture stimulus be viewed binocularly was made for the practical reason of keeping the vergence signal constant in all viewing conditions, as monocular viewing may create the undesirable effect of perturbing egocentric distance information. This choice would be considered problematic from the perspective of the MLE approach because a null disparity field is a powerful cue to flatness. However, according to the vector sum model the absence of binocular disparities (

_{d}*k*= 0) or a null disparity field (

_{d}*z*= 0) should have the same effect on the perceived depth magnitude of a stimulus. This prediction, which is incompatible with the prediction of the MLE model, has been confirmed in a previous study that showed switching from monocular to binocular viewing of a 3D cylinder carrying texture information did not produce any change in perceived depth magnitude of the stimulus (Vishwanath and Hibbard, 2013). In Experiment 3 we further tested this hypothesis with the stimuli used in this experiment and confirmed this finding and the predictions of the vector sum model.

_{d}*x*-axis above the surface. Although we added a shading component, we refer to this cue condition as the texture condition for simplicity and to avoid confusion with the experimentally defined combined-cue condition. Because the two cues are consistent with each other, both the MLE model and vector sum model consider the texture-shading stimulus as a single-cue stimulus. Furthermore, we chose to use a polka dot texture because it has been shown to elicit the highest levels of precision in discrimination tasks when compared with other texture patterns (Rosas, Wichmann, & Wagemans, 2004). This means that it will have the largest potential to influence the depth percept due to being the most reliable (MLE model) or having the largest cue strength (vector sum model) relative to other textures. Additionally, polka dots have been used extensively by other researchers in the study of 3D from texture (Chen & Saunders, 2020; Knill, 1998a; Knill, 1998b; Todd & Thaler, 2010). To keep a steady fixation at the center of the display as in the disparity-only and combined-cue conditions, texture-only stimuli were also seen binocularly.

*F*(1, 10) = 272.67,

*p =*1.4e-8, generalized η

^{2}= 0.89, and cue type,

*F*(2, 20) = 48.40,

*p*= 2.2e-8, generalized η

^{2}= 0.43. For both fixation distances, the perceived depth of combined-cue stimuli (purple diamonds) was consistently greater than the perceived depth of single-cue stimuli (red squares, blue circles). A Bonferroni-corrected post hoc analysis confirmed that perceived depth in the combined-cue condition was larger than the perceived depth in both the disparity condition,

*t*(10) = 4.42,

*p*= 0.0039, and the texture condition,

*t*(10) = 10.59,

*p*= 2.8e-6. Additionally, texture stimuli were in general perceived as shallower than disparity stimuli, demonstrating cue-specific biases,

*t*(10) = –5.02,

*p*= 0.0016.

*F*(2, 20) = 3.76,

*p*= 0.041, generalized η

^{2}= 0.037, between simulated depth and fixation distance,

*F*(1, 10) = 7.19,

*p*= 0.023, generalized η

^{2}= 0.07, and between all three factors,

*F*(2, 20) = 5.11,

*p*= 0.016, generalized η

^{2}= 0.031, reflect the dependence of cue strength on how the fixation distance influences the quality of the cue. This was expected particularly for the disparity cue where a lack of depth constancy across distances is a well-documented phenomenon (Johnston, 1991). The interaction between simulated depth and cue type,

*F*(2, 20) = 45.42,

*p*= 3.7e-8, generalized η

^{2}= 0.20, further supports the existence of cue-specific biases due to differing cue strengths between cue types.

*k*, from the slopes of the single-cue estimates (\({k_c} = \sqrt {{k_t}^2 + {k_d}^2} \)). Figure 4 shows the predicted slopes plotted against the measured slopes for each participant. Without fitting any free parameters, the linear fit of measured versus predicted slopes with the intercept set to 0 closely matches the unity line (slope = 0.96 with

_{c}*SE =*0.034). Although the correlation coefficient of this linear fit is not very high (

*r*= 0.79), the predictive power of the vector sum model is superior to that of the MLE model, which would require the additional assumption of flatness cues to explain the data. Without assuming the role of unmodeled flatness cues, the results clearly contradict the MLE model prediction that the perceived depth of the combined-cue stimulus falls in between the perceived depth of the single-cue stimuli. Because the reliability of the flatness cues in these particular displays is unknown, a fit of the MLE model would therefore require free parameters modeling the variance of the noise of the flatness cues. In this case, the MLE model would be less parsimonious than the vector sum model. Furthermore, although the MLE model predictions may be amended by introducing flatness cues, we will provide evidence in Experiment 3 rejecting the flatness cues explanation. Additionally, single-cue and combined-cue depths were consistently overestimated in five of six stimulus conditions, contradicting the veridicality assumption of the MLE model.

*SD*of the probe adjustments. Given the different predictions of the two models, we tested whether there was a difference in the

*SD*between the cues. Figure 5 shows the

*SD*of the probe adjustment task as function of simulated depth in all experimental conditions. In this figure the prediction of the MLE model for the

*SD*of the combined-cue adjustments is shown in gray.

*F*(1, 10) = 54.75,

*p*= 2.3e-5, generalized η

^{2}= 0.57. This follows the classic effect of Weber's law where the response variance is proportional to the magnitude of the stimulus, in this case the surface depth. There was also an interaction between the cue type and simulated depth,

*F*(2, 20) = 7.31,

*p*= 0.0041, generalized η

^{2}= 0.090. However, there was no main effect of cue type,

*F*(2, 20) = 0.45,

*p*= 0.65, generalized η

^{2}= 0.0053. This can be observed in Figure 5 where the combined-cue standard deviation (purple) is not smaller than the single-cue standard deviations, as predicted by the MLE model (gray). Instead, these results support the prediction of the vector sum model that noise observed in perceptual judgments is stimulus independent. Because the vector sum predicts a null effect of cue type, we conducted a Bayes factor analysis using the BayesFactor package in R (R Foundation for Statistical Computing, Vienna, Austria) (Morey & Rouder, 2021). A Bayes factor of 0.055 indicated strong evidence for a model including fixed effects of simulated depth and fixation distance compared with a model including the same fixed effects with the inclusion of cue type. Both models included a random effect for participants.

*. As shown in the hypothetical experiment of Figure 6b, the JND is larger at the far viewing distance because the cue strength becomes weaker (consistent with the fact that binocular disparities and their gradients decrease with viewing distance). We see that the JND is inversely proportional to the cue strength (\(J_{i}\,=\,\frac{\sigma_{N}}{k_{i}}\)). Recall that the vector sum model posits that adding cues to a stimulus increases the combined-cue strength according to the magnitude of the vector of cue signals. Because the JND is inversely proportional to cue strength, the vector sum model therefore predicts that the JND shrinks with additional cues, similar to the MLE model. Specifically, the texture-only, disparity-only, and combined-cue JNDs are given by \({J_t} = \frac{{{\sigma _N}}}{{{k_t}}}\) , \({J_d} = \frac{{{\sigma _N}}}{{{k_d}}}\), and \({J_c} = \frac{{{\sigma _N}}}{{{k_c}}} = \frac{{{\sigma _N}}}{{\sqrt {{k_t}^2 + {k_d}^2} }}\), respectively. Appendix 3 shows how, from these equations, we can predict the combined-cue JND directly from the single-cue JNDs as follows: \({J_c} = \frac{1}{{\sqrt {\frac{1}{{J_t^2}} + \frac{1}{{J_d^2}}} }}\). Notice that this equation is formally identical to Equation 2 of the MLE model, where JNDs are assumed to measure the estimation noise (i.e.,*

_{N}*J*= σ

_{i}*). However, the vector sum model predicts that this relationship will hold at the same*

_{i}*perceived depth,*where task-related task noise is expected to be equivalent as the decision process operates on perceived depth, whereas the MLE model predicts it will hold at the same

*simulated depth*, where estimation noise is expected to be equivalent. Thus, the prediction of the two models for a given dataset may slightly differ, as we will show.

*F*(2, 14) = 25.42,

*p*= 2.2e-5, generalized

*η*

^{2}= 0.41. A critical prediction of both the MLE and vector sum model is that the combined-cue condition elicits a smaller JND than the single-cue condition. Bonferroni-corrected

*t*-tests confirmed that the JND for the combined-cue stimuli (purple) was smaller than the JND for the disparity (red),

*t*(7) = –4.60,

*p*= 0.005, and texture stimuli (blue),

*t*(7) = –7.93,

*p*= 1.9e-4, conditions. Additionally, we found a significant main effect of perceived depth,

*F*(1, 7) = 55.54,

*p*= 1.4e-4, generalized η

^{2}= 0.38, with JNDs increasing for larger perceived depths. We suspect that this may be due to a form of Weber's law where the noise from the encoding and decoding of perceived depth to and from memory depends on the magnitude of perceived depth. We explore the implications of Weber's law further in the next section.

*F*(1, 7) = 8.17,

*p*= 0.024, generalized η

^{2}= 0.052, between cue type and perceived depth,

*F*(2, 14) = 11.31,

*p*= 0.0012, generalized η

^{2}= 0.20, and across all three factors of cue type, perceived depth, and viewing distance,

*F*(2, 14) = 5.54,

*p*= 0.017, generalized η

^{2}= 0.074. These interactions, similar to Experiment 1, suggest a dependence of the cue strength on the cues and their viewing conditions. However, the key result is that the combined-cue JND is smaller than the single-cue JND in all conditions. Although this is often taken as evidence for the MLE model, here we show that it can also be predicted by the vector sum model.

*perceived depth*as the combined-cue stimulus. Equating the perceived depths of the standards was expected to approximately match the task noise across the three cue conditions. In contrast, the MLE model predictions are based on the single-cue JNDs for single-cue stimuli with the same

*simulated depths*as the combined-cue stimulus. Although we did not measure the single-cue JNDs at fixed simulated depths, Figure 7b demonstrates how, for each participant, we inferred the appropriate values for the MLE model (squares) from linear fits of the measured JNDs (circles). Regardless, in Figure 8 we see that the predictions for the two models are very similar, as should be expected, with no significant difference in accuracy,

*t*(7) = –0.39,

*p*= 0.71.

*(Figure 6b). Furthermore, we expect that the JND is susceptible to a form of Weber's law on perceived depth, where increases in perceived depth will cause an increase in the standard deviation of the task noise. If we therefore assume that σ*

_{N}*increases with the perceived depth, \({\hat z_s}\), of the standard stimulus through a Weber fraction,*

_{N}*W*, then \({\sigma _N} = W{\hat z_s} + c\), where

*c*is a constant reflecting a baseline noise. Because \(J_{ij}\; = \frac{{{\sigma _N}}}{{{k_{ij}}}}\), where

*k*is the cue strength of cue

_{ij}*i*(disparity-only, texture-only, and the combined-cue) for viewing condition

*j*(40 cm and 80 cm fixation distance), we can obtain \(J_{ij} = \frac{{W{{\hat z}_s} + c}}{{{k_{ij}}}}\). Because the perceived depth of the standard is \({\hat z_s} = {k_{ij}}{z_s}\), where

*z*is the distal depth of the standard stimulus, the JND can be modeled relative to the distal depth by Equation 6:

_{S}*k*, of the comparison. We set, for each participant, cue strength

_{ij}*k*to the individual slopes from linear fits mapping the simulated depths observed in Experiment 1 to the perceived depths. To infer the Weber fraction and the noise coefficient, we fit Equation 6 to the estimated JNDs of each participant. We found both the Weber fraction (

_{ij}*M*= 0.13 mm,

*SE*= 0.031 mm) and the noise coefficient (

*M*= 1.66 mm,

*SE*= 0.36 mm) were significantly greater than 0,

*t*(7) = 4.11,

*p*= 0.0045 and

*t*(7) = 4.57,

*p*= 0.0026, respectively. Critical here is that the JND measured in Experiment 2 depends on the cue strength seen in Experiment 1 (Figure 9a). Using Equation 6 for each participant, we can discount from the observed JND the contribution of the Weber law and the noise constant to produce a noise corrected JND (\(\frac{{J_{ij} - W{z_s}}}{c}\)). Figure 9b plots the relationship between the cue strength and corrected JND averaged across participants. Horizontal error bars indicate the variability of the cue strength across participants and vertical error bars the variability of the corrected JND across participants. When the Weber fraction and the noise constant are factored out, the JND is shown to be dependent on the cue strength (1/

*k*) and independent of the cue type as predicted by the MLE model. For example, the JND of the disparity stimulus at the close viewing distance (Figure 9b, red circles) is smaller than the JND at the larger viewing distance (Figure 9b, red triangles) because the strength of disparity at the smaller viewing distances is larger than the strength of disparity at the larger viewing distance.

_{ij}*p*= 0.019). We expect the poor JND is due to limitations set by the small wavelength at the near fixation distance. Deeper surfaces will cause the surface between the peak and trough to asymptotically approach a planar slant, which has been shown to provide poor texture gradients at small visual angles (Knill, 1998b; Todd & Thaler, 2010).

*k*= 0 or

_{d}*z*= 0, respectively). Under the MLE model perceived depth should be greatly reduced under binocular viewing compared with monocular viewing, as disparities are posited to be highly reliable at near viewing distances, such that the disparity weight may exceed the texture weight. In Experiment 3B, we presented stimuli with the opposite relationship: binocular disparities provided non-zero depth information, but they were paired either with a textural flatness cue from a well-defined pattern specifying a frontoparallel surface, or with an uninformative random-dot pattern often used to eliminate pictorial information from disparity stimuli (Figure 10). Here, the predictions are similar. The vector sum model predicts no difference in perceived depth, while the MLE model predicts a measurable difference.

_{d}^{o}. In the flat-texture condition, we created a binocular stimulus that projected circular, 0.55° polka dots on the image screen by back-projecting the frontoparallel texture onto the corrugated surface (see Figures 10a and 10b for example RDS and flat texture with 15 mm of depth). Unlike Experiment 1, we changed the RDS display to have black dots on a red surface so that the only differences between conditions were the size and distribution of the texture elements. According to the MLE model, the reliability of flatness cues is markedly different for these two stimuli. For the RDS condition (Figure 10a), the reliability of the texture information was approximately negligible, as argued by Hillis et al. (2004) when they tested the integration of texture and disparity cues. In contrast, large circular disks randomly positioned on the image (Figure 10b) specify a flat surface in the frontoparallel plane. If flatness cues have any bearing on depth perception as predicted by the MLE model, then we should expect that the flat-texture condition would induce a sizeable flattening of the perceived amplitude of the sinusoidal corrugation. However, the vector sum model predicts no difference between the two conditions.

*F*(1, 6) = 49.94,

*p*= 4.0e-4, generalized η

^{2}= 0.86. There were no other significant main effects or interactions. It may be noted that the average perceived depth in this experiment was larger than in Experiment 1. This is most likely due to the context of this experiment where trials showing only monocular cues were not interleaved with trials containing multiple cues. It could be the case that in Experiment 1 observers adjusted their criteria so that they reported relatively smaller depth magnitudes for monocular cues. Additionally, individual differences were large (compare the representative subject in Figure 7a to the average in Figure 3), so the increase may be due to sampling differences. What is critical is that binocular viewing did not reduce perceived depth.

*F*(1, 6) = 216.78,

*p*= 6.2e-6, generalized η

^{2}= 0.92, and a significant interaction between simulated depth and fixation distance,

*F*(1, 6) = 8.28,

*p*= 0.028, generalized η

^{2}= 0.15. To evaluate the support for the vector sum model prediction of no difference between the flat-texture and RDS, we again conducted a Bayes factor analysis. A Bayes factor of 0.42 indicated anecdotal evidence for a model including fixed effects of simulated depth and viewing distance and a random effect for participants, compared with a model including all three effects with an additional fixed effect of viewing condition. Together, the results of these experiments support the vector sum model prediction that there is no difference between setting the depth of a cue to zero or eliminating the cue altogether.

- •
*Veridicality*—Independent visual modules compute the veridical metric structure of 3D objects from retinal projections. - •
*Probabilistic inference*—The output of each module is a likelihood distribution of all possible 3D structures that may have generated a given retinal image. The width of these distributions is a measure of the perceptual estimation noise from each individual cue. In other words, each module has explicit access to information about the reliability of a given visual input. - •
*Statistically optimal combination*—3D cue estimates are optimally combined by computing the joint probability distribution from the independent probability distributions of each individual cue. The perceptual estimate corresponds to the 3D structure that maximizes this joint probability distribution. Moreover, because the joint probability distribution has a smaller variance than that of each individual cue the combined estimate is also more reliable. In the case of the linear MLE model, a simple heuristic can achieve a statistically optimal combination, where single-cue estimates are combined through a weighted average where the weights are inversely proportional to the variance of the noise of single-cue estimates.

*linearly*related to distal properties but are in general inaccurate. The slope of these linear functions, which we term

*cue strength*, depends on the quality of the visual input. For example, a regular pattern of texture elements on a distal surface such as polka dots will produce a larger texture signal than sparse texture elements. Therefore, a texture module will in the first case exhibit a steeper input–output perceptual function than in the second case. Similarly, a disparity module will respond with a steeper perceptual function to the depth of objects at closer distances than at further distances. The results of Experiment 1 show indeed that depth judgments are not veridical and depend on the viewing conditions. It can be observed that the perceptual slope in the disparity condition is shallower at a viewing distance of 80 cm than at 40 cm. At the smaller distance, depth from disparity is overestimated, and it is larger in magnitude than depth from texture. However, at the larger distance these estimates are almost the same.

*deterministic*and does not carry any information about the reliability of the input. Consider again a texture gradient projected by sparse surface texture elements. For the MLE account this is an unreliable image signal that produces a noisy output. In other words, each time similar (i.e., equally unreliable) stimuli are viewed, the texture module will provide a different depth estimate. However, according to the veridicality assumption, the average estimate arising from multiple measurements will be unbiased. In contrast, the IC theory will derive similar depth estimates albeit much smaller than the distal depth magnitude. What the MLE approach considers unreliable cues are weak cues for the IC theory because a change in distal depth elicits a small change in the module output.

*simulated*depth difference required to yield this perceived depth difference depends on the cue strength. Therefore, the JND, defined as the simulated depth difference necessary for a reliable discrimination, is inversely proportional to the cue strength.

*Nature Neuroscience,*7(10), 1057–1058, https://doi.org/10.1038/nn1312. [CrossRef] [PubMed]

*Journal of Vision,*4(10):7, 921–929, https://doi.org/10.1167/4.10.7. [CrossRef] [PubMed]

*Journal of Neurophysiology,*114(4), 2242–2248, https://doi.org/10.1152/jn.00350.2015. [CrossRef] [PubMed]

*Experimental Brain Research,*234(1), 255–265, https://doi.org/10.1007/s00221-015-4456-9. [CrossRef] [PubMed]

*Journal of Vision,*17(9):21, 1–26, https://doi.org/10.1167/17.9.21. [CrossRef]

*Journal of Experimental Psychology: Human Perception and Performance,*45(5), 659–680, https://doi.org/10.1037/xhp0000636. [PubMed]

*Vision Research,*190, 107961, https://doi.org/10.1016/j.visres.2021.107961. [CrossRef] [PubMed]

*Journal of Experimental Psychology: Human Perception and Performance,*24(2), 609–621, https://doi.org/10.1037/0096-1523.24.2.609. [PubMed]

*Journal of Vision,*19(10), 7, https://doi.org/10.1167/19.4.7. [PubMed]

*Journal of Vision,*20(7):14, 1–23, https://doi.org/10.1167/jov.20.7.14. [PubMed]

*Data fusion for sensory information processing systems*. Dordrecht, Netherlands: Kluwer Academic.

*Vision Research,*50(16), 1519–1531, https://doi.org/10.1016/j.visres.2010.05.006. [CrossRef] [PubMed]

*Journal of Experimental Psychology: Human Perception and Performance,*24(4), 1273–1295, https://doi.org/10.1037/0096-1523.24.4.1273.

*Journal of Experimental Psychology: Human Perception and Performance,*25(2), 426–444, https://doi.org/10.1037/0096-1523.25.2.426. [PubMed]

*Trends in Cognitive Sciences,*7(10), 444–449, https://doi.org/10.1016/j.tics.2003.08.007. [CrossRef] [PubMed]

*Journal of Vision,*9(2):25, 1–15, https://doi.org/10.1167/9.2.25. [CrossRef] [PubMed]

*Acta Psychologica,*133(1), 81–89, https://doi.org/10.1016/j.actpsy.2009.10.003. [PubMed]

*Sensory cue integration*(pp. 120–143). Oxford, UK: Oxford University Press, https://doi.org/10.1093/acprof:oso/9780195387247.003.0007.

*Shape perception in human and computer vision*(pp. 285–298). London: Springer, https://doi.org/10.1007/978-1-4471-5195-1_20.

*Perception & Psychophysics,*60(7), 1164–1174, https://doi.org/10.3758/BF03206166. [PubMed]

*Vision Research,*46(11), 1707–1723, https://doi.org/10.1016/j.visres.2005.11.018. [PubMed]

*Acta Psychologica,*138(3), 359–366, https://doi.org/10.1016/j.actpsy.2011.07.007. [PubMed]

*Journal of Vision,*15(2):24, 1–11, https://doi.org/10.1167/15.2.24.

*Nature,*415(6870), 429–433, https://doi.org/10.1038/415429a. [PubMed]

*Trends in Cognitive Sciences,*8(4), 162–169, https://doi.org/10.1016/j.tics.2004.02.002. [PubMed]

*Journal of Vision,*10(5):12, 1–20, https://doi.org/10.1167/10.5.12.

*PLoS One,*7(3), e33911, https://doi.org/10.1371/journal.pone.0033911. [PubMed]

*American Scientist,*68(4), 370–380. [PubMed]

*Current Biology,*22(5), 426–431, https://doi.org/10.1016/j.cub.2012.01.033.

*Science,*298(5598), 1627–1630, https://doi.org/10.1126/science.1075396. [PubMed]

*Journal of Vision,*4(12), 967–992, https://doi.org/10.1167/4.12.1. [PubMed]

*Vision Research,*39(21), 3621–3629, https://doi.org/10.1016/S0042-6989(99)00088-7. [PubMed]

*Trends in Cognitive Sciences,*6(8), 345–350, https://doi.org/10.1016/S1364-6613(02)01948-4. [PubMed]

*Vision Research,*31(7–8), 1351–1360, https://doi.org/10.1016/0042-6989(91)90056-B. [PubMed]

*Vision Research,*33(5–6), 813–826, https://doi.org/10.1016/0042-6989(93)90200-G. [PubMed]

*Vision Research,*38(11), 1683–1711, https://doi.org/10.1016/S0042-6989(97)00325-8. [PubMed]

*Vision Research,*38(11), 1655–1682, https://doi.org/10.1016/S0042-6989(97)00324-6. [PubMed]

*Vision Research,*43(7), 831–854, https://doi.org/10.1016/S0042-6989(03)00003-8. [PubMed]

*Journal of Vision,*7(8):13, 1–20, https://doi.org/10.1167/7.8.13. [PubMed]

*Vision Research,*43(24), 2539–2558, https://doi.org/10.1016/S0042-6989(03)00458-9. [PubMed]

*Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences,*356(1740), 1071–1086, https://doi.org/10.1098/rsta.1998.0211.

*Perception,*30(4), 431–448, https://doi.org/10.1068/p3030. [PubMed]

*Annual Review of Vision Science,*4(1), 451–474, https://doi.org/10.1146/annurev-vision-091517-034250. [PubMed]

*i-Perception,*6(5), 204166951560771, https://doi.org/10.1177/2041669515607710.

*Psychological Research,*83(1), 147–158, https://doi.org/10.1007/s00426-018-1101-9. [PubMed]

*Sensory cue integration*(pp. 5–29). Oxford, UK: Oxford University Press, https://doi.org/10.1093/acprof:oso/9780195387247.003.0001.

*Vision Research,*35(3), 389–412, https://doi.org/10.1016/0042-6989(94)00176-M. [PubMed]

*Psychological Review,*107(1), 6–38, https://doi.org/10.1037/0033-295X.107.1.6. [PubMed]

*Vision Research,*44(18), 2135–2145, https://doi.org/10.1016/j.visres.2004.03.024. [PubMed]

*Current Directions in Psychological Science,*5(3), 72–77, https://doi.org/10.1111/1467-8721.ep10772783.

*Journal of Experimental Psychology. Human Perception and Performance,*28(5), 1202–1212. [PubMed]

*Journal of Vision,*12(1):1, 1–18, https://doi.org/10.1167/12.1.1. [PubMed]

*Vision Research,*38(18), 2817–2832, https://doi.org/10.1016/S0042-6989(97)00438-0. [PubMed]

*Psychological Science,*15(8), 565–570, https://doi.org/10.1111/j.0956-7976.2004.00720.x. [PubMed]

*Journal of Experimental Psychology: Human Perception and Performance,*22(1), 173–186, https://doi.org/10.1037/0096-1523.22.1.173. [PubMed]

*Perception & Psychophysics,*57(5), 629–636, https://doi.org/10.3758/BF03213268. [PubMed]

*Perception & Psychophysics,*60(3), 377–388, https://doi.org/10.3758/BF03206861. [PubMed]

*Journal of Experimental Psychology: Human Perception and Performance,*22(4), 930–944, https://doi.org/10.1037/0096-1523.22.4.930. [PubMed]

*Journal of Experimental Psychology: Human Perception and Performance,*44(6), 925–940, https://doi.org/10.1037/xhp0000491. [PubMed]

*Vision Research,*44(13), 1511–1535, https://doi.org/10.1016/j.visres.2004.01.013. [PubMed]

*Journal of Vision,*15(2):14, 1–24, https://doi.org/10.1167/15.2.14.

*Vision Research,*41(24), 3163–3183, https://doi.org/10.1016/S0042-6989(01)00187-0. [PubMed]

*International Journal of Computer Vision,*40(1), 71–89, https://doi.org/10.1023/A:1026557704054.

*Journal of Experimental Psychology: Human Perception and Performance,*21(3), 663–678, https://doi.org/10.1037/0096-1523.21.3.663. [PubMed]

*Trends in Cognitive Sciences,*8(3), 115–121, https://doi.org/10.1016/j.tics.2004.01.006. [PubMed]

*Perception & Psychophysics,*48(5), 419–430, https://doi.org/10.3758/BF03211585. [PubMed]

*Perception,*27(3), 273–282, https://doi.org/10.1068/p270273. [PubMed]

*Journal of Vision,*10(2):20, 1–18, https://doi.org/10.1167/10.2.20.

*i-Perception,*5(6), 497–514, https://doi.org/10.1068/i0645. [PubMed]

*Perception & Psychophysics,*65(1), 31–47, https://doi.org/10.3758/BF03194781. [PubMed]

*Journal of Vision,*10(5):17, 1–13, https://doi.org/10.1167/10.5.17.

*Vision Research,*45(12), 1501–1517, https://doi.org/10.1016/j.visres.2005.01.003. [PubMed]

*Journal of Vision,*7(12):9, 1–16, https://doi.org/10.1167/7.12.9.

*Perception,*24(1), 75–86, https://doi.org/10.1068/p240075. [PubMed]

*Perception beyond inference: The information content of visual processes*(pp. 201–240). Cambridge, MA: MIT Press.

*Handbook of experimental phenomenology*(pp. 181–204). New York: John Wiley & Sons, https://doi.org/10.1002/9781118329016.ch7.

*Psychological Review,*121(2), 151–178, https://doi.org/10.1037/a0035233. [PubMed]

*Psychological Review,*127(1), 146–152, https://doi.org/10.1037/rev0000168. [PubMed]

*Psychological Science,*24(9), 1673–1685, https://doi.org/10.1177/0956797613477867. [PubMed]

*The Journal of Neuroscience,*33(43), 17081–17088, https://doi.org/10.1523/JNEUROSCI.2936-13.2013.

*Journal of Vision,*5(10):7, 834–862, https://doi.org/10.1167/5.10.7. [PubMed]

*Proceedings of the National Academy of Sciences, USA,*105(33), 12087–12092, https://doi.org/10.1073/pnas.0804378105.

*Perception & Psychophysics,*63(8), 1293–1313, https://doi.org/10.3758/BF03194544. [PubMed]

*Vision Research,*33(18), 2685–2696, https://doi.org/10.1016/0042-6989(93)90228-O. [PubMed]

*s*

_{1}= λ

_{1}

*z*and

*s*

_{2}= λ

_{2}

*z*, where λ

_{i}are unknown multipliers depending on nuisance variables and

*z*is the magnitude of the 3D property. These signals are the visual systems encoding of the 3D information from independent cues (e.g., texture, disparity). We seek an estimate \({\hat z_C} = f( {{s_1},{s_2}} )\) that is (1) proportional to

*z*and (2) most sensitive to 3D information and least sensitive to random fluctuations ε

_{i}of λ

_{i}. If λ

_{i0}is the unperturbed value of λ

_{i}, then λ

_{i}= λ

_{i0}+ ε

_{i}and

*s*

_{i0}= λ

_{i0}

*z*. We assume that small random perturbations, ε

_{i}, resulting from changes in viewing conditions can be modeled as Gaussian distributions with zero mean and standard deviations σ

_{i}. Taking the derivative of \(\frac{{df( {{s_1},\,{s_2}} )\;}}{{dz}} = \frac{{df\;}}{{d{s_1}}}( {{\lambda _{10}} + \;{\varepsilon _1}} ) + \frac{{df\;}}{{d{s_2}}}( {{\lambda _{20}} + \;{\varepsilon _2}} ),\) where \(\frac{{df\;}}{{d{s_i}}}\) are calculated at

*s*

_{i0}, we observe a signal term

*S*=

*f*

_{1}λ

_{10}+

*f*

_{2}λ

_{20}(where \({f_i} = \frac{{df\;}}{{d{s_i}}}\)) and a noise term E =

*f*

_{1}ε

_{1}+

*f*

_{2}ε

_{2}having standard deviation \({\sigma _{\rm E}} = \sqrt {{f_1}^2\sigma _1^2 + {f_2}^2\sigma _2^2} \). If we minimize the noise-to-signal ratio \(NSR = \frac{{{\sigma _{\rm E}}\;}}{S}\) with respect to

*f*(by solving for

_{i}*f*the equation \(\frac{{dSNR\;}}{{d{f_i}}} = 0\), where

_{i}*SNR*is the signal-to-noise ratio) we find that the first derivatives of the function are \(\frac{{df\;}}{{d{s_i}}} \propto \frac{{{\lambda _{i0}}\;}}{{\sigma _i^2}}\). It can be shown that the derivatives \(\frac{{d{{\hat z}_C}\;}}{{d{s_i}}}\;\)of the equation \({\hat z_C} = \beta \sqrt {{{( {\frac{{{s_1}\;}}{{{\sigma _1}}}} )}^2} + {{( {\frac{{{s_2}\;}}{{{\sigma _2}}}} )}^2}} \) (calculated at

*s*

_{i0}) meet this requirement. By substituting \({k_i} = \beta \frac{{{\lambda _i}}}{{{\sigma _i}}}\) we obtain the vector sum equation\(\;{\hat z_C} = \sqrt {{{( {{k_1}z} )}^2} + {{( {{k_2}z} )}^2}} \) (easily generalizable to

*n*signals).

*S*=

_{NC}*S*+ δ, is matched to that of a conflict stimulus,

_{B}*S*, where

_{C}*S*is an arbitrarily defined base slant and δ is the change in slant needed for a perceptual match \(E( {{{\hat S}_C}} ) = E( {{{\hat S}_{NC}}} )\). For the conflict stimulus, the disparity slant,

_{B}*S*, differs from a texture-specified slant,

_{D}*S*, by Δ:

_{t}*S*=

_{t}*S*and

_{B}*S*=

_{d}*S*+ Δ. Optimal cue combination predicts that\(\;E( {{{\hat S}_C}} ) = {w_d}( {{S_B} + \Delta } ) + ( {1 - {w_D}} ){S_B} = \;{S_B} + \;{w_d}\Delta \), where\(\;{w_d} = \frac{{\frac{1}{{\sigma _d^2}}}}{{\frac{1}{{\sigma _d^2}} + \frac{1}{{\sigma _t^2}}}}\). A matching \(E( {{{\hat S}_C}} ) = E( {{{\hat S}_{NC}}} )\) is obtained when \({w_d} = \frac{\delta }{\Delta }\), as \(E( {{{\hat S}_{NC}}} ) = {S_B} + \delta \). By using JNDs as proxies for standard deviations the weight can be accurately predicted: \({w_d} = \frac{{\frac{1}{{J_d^2}}}}{{\frac{1}{{J_d^2}} + \frac{1}{{J_t^2}}}}\). The IC theory makes identical predictions. For a small conflict Δ, we can approximate the vector sum equation through the Taylor expansion at the base slant

_{B}*S*: \({\hat S_C} = \sqrt {{k_t}^2{S_B}^2 + {k_d}^2{{( {{S_B} + \Delta } )}^2}} \approx {S_B}\sqrt {{k_t}^2 + {k_d}^2} + \frac{{{k_d}^2}}{{\sqrt {{k_t}^2 + {k_d}^2} }}\Delta \). Because \({\hat S_{NC}}\) = \(( {{S_B} + \delta } )\sqrt {{k_t}^2 + {k_d}^2} \)=\({S_B}\sqrt {{k_t}^2 + {k_d}^2} + \delta \sqrt {{k_t}^2 + {k_d}^2} \), a match, \({\hat S_{NC}} = {\hat S_C}\), is obtained when \(\frac{{{k_d}^2}}{{\sqrt {{k_t}^2 + {k_d}^2} }}\Delta = \delta \sqrt {{k_t}^2 + {k_d}^2} \), from which \(\frac{{{k_d}^2}}{{{k_t}^2 + {k_d}^2}} = \frac{\delta }{\Delta }\). Note that, because for the IC theory \(J_{i} = \frac{{{\sigma _N}}}{{{k_i}}}\) (see Introduction to Experiment 2), then \(\frac{{{k_d}^2}}{{{k_t}^2 + {k_d}^2}} = {w_d}\), which matches the predictions of Hillis et al. (2004).

_{B}_{N}and gain

*k*: \({J_1} = \frac{{{\sigma _N}}}{{{k_1}}}\) and \({J_2} = \frac{{{\sigma _N}}}{{{k_2}}}\). Because from the vector sum equation the gain of the combined stimulus is \({k_c} = \sqrt {{k_1}^2 + {k_2}^2} \), the JND of the combined stimulus is \({J_c} = \frac{{{\sigma _N}}}{{{k_c}}} = \frac{{{\sigma _N}}}{{\sqrt {{k_1}^2 + {k_2}^2} }}\). By substituting \({k_i} = \frac{{{\sigma _N}}}{{{J_i}}}\) in this equation we obtain \({J_c} = \frac{1}{{\sqrt {\frac{1}{{{\rm{J}}_1^2}} + \frac{1}{{{\rm{J}}_2^2}}} }}\), which is identical to the MLE prediction.

_{i}