We elucidate two properties of the intrinsic constraint (IC) model of depth cue combination (F. Domini, C. Caudek, & H. Tassinari, 2006). First, we show that IC combines depth cues in a weighted sum that maximizes the signal-to-noise ratio of the combined estimate. Second, we show that IC predicts that any two depth-matched pairs of stimuli are separated by equal numbers of just noticeable differences (JNDs) in depth. That is, IC posits a strong link between perceived depth and depth discrimination, much like some Fechnerian theories of sensory scaling. We test this prediction, and we find that it does not hold. We also find that depth discrimination performance approximately follows Weber's law, whereas IC assumes that depth discrimination thresholds are independent of baseline stimulus depth.

*B*(

*z*) for depth from binocular disparity and

*M*(

*z*) for depth from structure-from-motion, where

*z*is the true depth of the point of interest. MWF posits that the visual system combines these cues in a weighted sum:

*C*(

*z*) =

*w*

_{ B }

*B*(

*z*) +

*w*

_{ M }

*M*(

*z*). The weights

*w*

_{ B }and

*w*

_{ M }are non-negative and sum to one, but otherwise they are arbitrary constants. However, if the weights are set to

*w*

_{ B }=

*σ*

_{ B }

^{−2}/(

*σ*

_{ B }

^{−2}+

*σ*

_{ M }

^{−2}) and

*w*

_{ M }=

*σ*

_{ M }

^{−2}/(

*σ*

_{ B }

^{−2}+

*σ*

_{ M }

^{−2}), where

*σ*

_{ B }=

*SD*[

*B*(

*z*)] and

*σ*

_{ M }=

*SD*[

*M*(

*z*)], then the sum is optimal in the sense that it maximizes the signal-to-noise ratio (SNR) of the combined estimate

*C*(

*z*), defined as SNR[

*C*(

*z*)] =

*E*[

*C*(

*z*)]/

*SD*[

*C*(

*z*)]. Here,

*E*denotes expected value and

*SD*denotes standard deviation (for an extensive review, see Landy et al., 1995).

*promotion*stage, in which direct retinal measurements of depth cues like disparity and motion are scaled to give metric depth estimates (possibly biased or unbiased, but always in a physically meaningful, metric unit of depth); extensions to accommodate correlated noise across cues (Oruç, Maloney, & Landy, 2003); and a

*robustness*mechanism in which a depth cue that is discrepant with other depth cues can be weighted less heavily at the combination stage or even discarded.

*proportional*to true depth. To support this assumption, Domini et al. (2006) show that in a small-angle approximation, both absolute binocular disparity and retinal velocity of a point on a rotating object are proportional to true depth (see Figure 1). IC represents these depth cues as random variables: disparity

*D*(

*z*) =

*μz*+

*ɛ*

_{ D }and retinal velocity

*V*(

*z*) =

*ωz*+

*ɛ*

_{ V }, where

*μ*is the observer's vergence angle,

*ω*is the angle through which the object rotates in a small time interval Δ

*t*,

*z*is the point of interest's true depth, and

*ɛ*

_{ D }and

*ɛ*

_{ V }are independent zero-mean, Gaussian noise sources with fixed standard deviations

*σ*

_{ D }and

*σ*

_{ V }, respectively. The depth variable

*z*refers to

*scaled depth*, defined as the depth of the point of interest relative to the fixation point, divided by the distance from the observer to the fixation point (see Figure 1).

*z*

_{1},…,

*z*

_{ n }, then the disparity measurements are independent samples from the random variables

*D*(

*z*

_{1}),…,

*D*(

*z*

_{ n }), and we will denote these samples by

*d*

_{1},…,

*d*

_{ n }. Similarly, the retinal velocity measurements are samples from

*V*(

*z*

_{1}),…,

*V*(

*z*

_{2}), which we will denote by

*v*

_{1},…,

*v*

_{ n }. The goal of IC is to use these measurements to arrive at a single depth estimate for each object location. This occurs in several steps (see Figure 2).

- Each depth cue measurement is divided by the standard deviation of the random variable it is drawn from to produce a normalized depth cue measurement:$ d \xaf i $=
*d*_{ i }/*σ*_{ D },$ v \xaf i $=*v*_{ i }/*σ*_{ V }. - The normalized depth cue measurements are grouped into ordered pairs ($ d \xaf i $,$ v \xaf i $), where each pair consists of the normalized disparity and velocity measurements at object location
*i*. This gives a two-dimensional cloud of points. - The first principal component$ e \u2192 1 $of this cloud of points is computed.
- The dot product is taken between each ordered pair and the first principal component:
*ρ*_{ i }= ($ d \xaf i $,$ v \xaf i $) •$ e \u2192 1 $. IC postulates that depth discrimination is based on the decision variable*ρ*_{ i }, and that perceived depth is some monotonically increasing function of*ρ*_{ i }.

*principal component projection*(PCP) algorithm. A later stage of IC, which we will not need to consider, determines the monotonic relationship between

*ρ*

_{ i }and perceived depth.

*D*(

*z*) =

*μz*+

*ɛ*

_{ D }and

*V*(

*z*) =

*ωz*+

*ɛ*

_{ V }are proportional

^{1}and so are the expected values of

*D*(

*z*)/

*σ*

_{ D }and

*V*(

*z*)/

*σ*

_{ V }from which the normalized depth cue measurements

*d*

_{ i }and

*v*

_{ i }are drawn:

*ω*/

*σ*

_{ v })(

*σ*

_{ D }/

*μ*), and its first principal component would be the vector (1,(

*ω*/

*σ*

_{ V })(

*σ*

_{ D }/

*μ*)) normalized to unit length:

*i*is a monotonic function of

*ρ*

_{ i }:

*ρ*

_{ i }is simply the optimal weighted sum of the disparity and the velocity cues, scaled to unit variance.

*d*

_{ i }and

*v*

_{ i }are normalized by

*σ*

_{ D }and

*σ*

_{ V }. From this simulation, it is unclear whether PCP is optimal in any broader sense, so we believe our derivation helps to clarify exactly what PCP accomplishes. Furthermore, we hope that highlighting IC's similarity to MWF will make it easier to relate IC to the existing cue combination literature.

^{2}MWF assumes that all cues are valid depth estimates and so combines disparity and motion cues with weights determined by the cues' variances (unless a robustness mechanism rejects the disparity cue as being discrepant with other depth cues). Thus, both models compute optimal weighted sums, but they may weight cues differently because different notions of optimality follow from different assumptions about individual depth cues. Similarly, as mentioned above, Domini et al. (2006) also describe tasks where MWF and IC make different predictions.

*ρ*

_{ i }, which as we have shown is the optimal weighted sum of individual depth cues, scaled to unit variance. It follows immediately that IC has similarities to Fechnerian theories of sensory scaling, in that it predicts that perceived depth can be meaningfully measured in terms of just noticeable differences (JNDs).

*d*

_{ A }and a motion-defined stimulus

*v*

_{ A }, both with a perceived depth of 10 cm, and also a disparity-defined stimulus

*d*

_{ B }and a motion-defined stimulus

*v*

_{ B }with a perceived depth of 11 cm.

*d*

_{ A }and

*v*

_{ A }have the same perceived depth, so according to IC they have the same value of

*ρ*

_{ i }, which we can call

*ρ*

_{ A }. Similarly,

*d*

_{ B }and

*v*

_{ B }both have

*ρ*

_{ i }=

*ρ*

_{ B }. Thus, the difference in the value of

*ρ*

_{ i }between

*d*

_{ A }and

*d*

_{ B }is

*ρ*

_{ B }−

*ρ*

_{ A }, and the difference in

*ρ*

_{ i }between

*v*

_{ A }and

*v*

_{ B }is also

*ρ*

_{ B }−

*ρ*

_{ A }. The variance of

*ρ*

_{ i }is always one, so if depth JNDs are determined by the signal and the noise properties of

*ρ*

_{ i }(as assumed by Domini et al., 2006), then the number of JNDs that separate

*d*

_{ A }and

*d*

_{ B }is the same as the number that separate

*v*

_{ A }and

*v*

_{ B }. For instance, if we define one JND as a separation of

*k*standard deviations in the decision variable

*ρ*

_{ i }, then

*d*

_{ A }and

*d*

_{ B }are separated by (

*ρ*

_{ B }−

*ρ*

_{ A }) /

*k*JNDs, and so are

*v*

_{ A }and

*v*

_{ B }. Thus, IC predicts that any two depth-matched pairs of stimuli are separated by the same number of depth JNDs. (Note that even without our demonstration that PCP calculates an optimal weighted sum, Domini et al.'s Equation 7, which shows that

*ρ*

_{ i }is proportional to true depth and has unit variance, implies this same conclusion.)

*ρ*

_{ i }, but perceived depth is an unknown monotonic function of

*ρ*

_{ i }, so JNDs need not correspond to equal increments in perceived depth. Thus, IC is more akin to revisions of Fechner's theory that retain the JND as a unit of measurement but that allow the subjective perceptual increments corresponding to JNDs to vary as a function of baseline perceptual magnitude, e.g., an auditory JND may increase loudness more for loud sounds than for faint sounds (for a discussion of these and related issues, see Krueger, 1989).

*matched*stimuli. We measured the six JNDs in separate 270-trial blocks, i.e., only a single matched stimulus was shown in a given block. On each 2IFC trial, observers viewed a matched stimulus and a test stimulus, with the simulated depth of the test stimulus chosen by the method of constant stimuli. Disparity-defined matched stimuli were shown with disparity-defined test stimuli, and motion-defined matched stimuli were shown with motion-defined test stimuli. The stimuli were shown for 1000 ms, in random order, separated by a blank 750-ms interstimulus interval, and the observer pressed a key to indicate which interval contained the stimulus with the greater perceived depth. No feedback was given.

*ρ*, which is proportional to true depth and has a variance that is independent of

*z*. Thus, IC assumes that JNDs are the same at all depths, but this is clearly not the case.

^{3}

_{ D }(

*z*) =

*k*

_{ D }

*z*, then the number of JNDs separating disparity-defined stimuli at depths

*z*

_{1}and

*z*

_{2}is

*z*

_{1}and

*z*

_{2}is

*k*

_{ M }is the constant of proportionality in Weber's law for motion-defined stimuli, JND

_{ M }(

*z*) =

*k*

_{ M }

*z*.

*k*

_{ D }and

*k*

_{ M }individually for each observer by making a maximum-likelihood linear fit to JND size versus simulated depth (as in Figure 7, but using each observer's JNDs instead of the group means). Even this revised calculation, which takes into account Weber's law and thus gives a more accurate JND count, indicates that depth-matched motion and disparity stimuli were not separated by the same number of JNDs. In every comparison, the JND count was less for motion-defined stimuli than for the corresponding disparity-defined stimuli. Not all of the JND counts shown in Figure 7 are independent, as the three motion JND counts are calculated from all three possible pairings of the three motion stimuli and similarly for the disparity JND counts. Nevertheless, even if we just consider the 1.25-cm vs. the 2.5-cm pairs and the 2.5-cm vs. the 5.0-cm pairs, this means that in eight cases the motion JND count was less than the disparity JND count, which is a statistically significant difference under a sign test (

*p*< 0.01). Furthermore, several of the individual JND count comparisons were statistically significant as well (

*p*< 0.05). Thus, a key psychophysical prediction of IC is incorrect, even when we use a JND-counting formula that takes account of Weber's law.

*S*∼

*N*(

*μ*

_{ S },

*σ*

_{ S }

^{2}) and

*T*∼

*N*(

*μ*

_{ T },

*σ*

_{ T }

^{2}), what weights maximize the SNR of the weighted sum

*C*=

*μS*+

*vT*?

*uμ*

_{ s }+

*vμ*

_{ T }, and the variance is

*u*

^{2}

*σ*

_{ S }

^{2}+

*v*

^{2}

*σ*

_{ T }

^{2}, so the SNR is (

*uμ*

_{ S }+

*vμ*

_{ T })/(

*u*

^{2}

*σ*

_{ S }

^{2}+

*v*

^{2}

*σ*

_{ T }

^{2})

^{1/2}. Any two pairs of weights (

*u*,

*v*) and (

*k*

_{ u },

*k*

_{ v }) that differ only by a scale factor give the same SNR, so we will add the constraint that the variance of the weighted sum is one,

*u*

^{2}

*σ*

_{ S }

^{2}+

*v*

^{2}

*σ*

_{ T }

^{2}= 1. (The MWF model assumes

*u*+

*v*= 1, but the unit-variance constraint is more useful in our discussion of IC.)

*f*(

*u*,

*v*) = (

*uμ*

_{ S }+

*vμ*

_{ T })/(

*u*

^{2}

*σ*

_{ S }

^{2}+

*v*

^{2}

*σ*

_{ T }

^{2})

^{1/2}, subject to the unit-variance constraint

*g*(

*u*,

*v*) =

*u*

^{2}

*σ*

_{ S }

^{2}+

*v*

^{2}

*σ*

_{ T }

^{2}− 1 = 0. The Lagrangian is

*u*and

*v*, we find

^{1}This proportionality could be broken by creating cue conflict stimuli where disparity and motion specify different affine structures, and the analysis that follows does not apply in such unusual cases. Most cue conflict stimuli used to date, however, have specified the same affine depth structure in all cues, and have just assigned different depth scale factors to different cues.

^{2}In this case, the principal component in Figure 2 will be vertical, and so the dot product

*ρ*

_{ i }= (

^{3}In order to accommodate Weber's law, IC would have to be revised to change the signal and noise properties of the decision variable. Domini and Caudek (2007) have started investigations along these lines. However, incorporating Weber's law simply by making each cue's standard deviation proportional to the value of the cue. will not work because then all normalized disparity measurements

*ρ*is independent of true depth.