The Intrinsic Constraint (IC) model of depth-cue integration (F. Domini, C. Caudek, & H. Tassinari, 2006) posits a strong link between perceived depth and depth discrimination, much like some Fechnerian theories of sensory scaling. K. J. MacKenzie, R. F. Murray, and L. M. Wilcox (2008) tested the IC model by examining whether two depth-matched pairs of stimuli are separated by equal numbers of Just Noticeable Differences (JNDs) in depth. They concluded that “IC is inconsistent with the psychophysics of depth perception.” Here, by using a different methodological approach, we provide empirical findings that are consistent with the predictions of the IC model. We also discuss the relative merits of the IC and Modified Weak Fusion (MWF) models (M. S. Landy, L. T. Maloney, E. B. Johnston, & M. Young, 1995) of depth-cue combination.

^{(j)}” will index the

*j*th amount of

*perceived depth,*with

*j*= 1,…. The subscript “

_{v}” or “

_{d}” will denote, respectively, the velocity or disparity signals provided by single-cue stimulus displays. By writing

*z*

_{v}

^{(1)}and

*z*

_{d}

^{(1)}, we will indicate the

*simulated*depth magnitudes of a pair of velocity-only and disparity-only stimuli, which give rise to the same amount of perceived depth, here indexed by

*j*= 1.

Notation | Meaning |
---|---|

^ | Denotes a perceptual estimate |

Δ d | Disparity difference |

Δ z ^ | Increment in perceived depth |

ɛ _{1} | Disturbance on ρ distributed as N(0, 1) |

ɛ _{ d i}, ɛ _{ v i} | Measurement errors of the disparity and velocity signals, respectively |

ɛ z ^ i | Disturbance of perceived depth z ^ _{ i} |

μ | Vergence angle |

ρ _{ i} | Scores of the ith surface point on the first Principal
Component computed from the scaled image signals |

ρ _{ P} | Intensity of ρ at point P |

σ _{ v}, σ _{ d} | SD of measurement noise for velocity and disparity
signals, respectively |

σ z ^ | SD of the perceived depth noise |

σ _{ D} | SD of judgment noise introduced in the depth
interpretation stage |

ω | Angle of rotation in 3D space |

P | A generic surface point |

d _{ i}, v _{ i} | Relative disparities and velocities, respectively |

d _{ P}, v _{ P} | Relative disparities and velocities at point P |

f _{ ρ}( ρ) | Positive monotonic function of ρ |

f′ _{ ρ}( ρ _{ P}) | First-order derivative of f _{ ρ}( ρ) at ρ _{ P} |

z _{ x} ^{( j)} | z amount of simulated depth corresponding to the jth
amount of perceived depth; the index identifies the
depth cue provided by the stimulus display |

z _{ i} | Depth map where i is the index of the feature points |

*depth-matching*part of the experiment, observers compared half-cylinders defined by stereo information (

*test*) with three half-cylinders defined by motion information (

*standard*). The

*standard*half-cylinders were 12.5, 25, and 50 mm deep. In a Two-Interval Forced-Choice (2-IFC) task, observers indicated which of the two successively presented half-cylinders appeared to be deeper. By varying the simulated depth of the

*test*half-cylinders according to a staircase procedure, the authors estimated the Point of Subjective Equality (PSE) for perceiving the

*test*(stereo-only) and

*standard*(motion-only) half-cylinders with the same depth. In this part of the experiment, MacKenzie et al. (2008) found the three stereo-depths

*z*

_{d}

^{(1)},

*z*

_{d}

^{(2)}, and

*z*

_{d}

^{(3)}, which perceptually matched the three motion depths

*z*

_{v}

^{(1)},

*z*

_{v}

^{(2)}, and

*z*

_{v}

^{(3)}of the

*standard*stimuli.

*depth-discrimination*part of the experiment, observers were asked to perform a depth-discrimination task at each of the three simulated depth magnitudes. This task was performed separately for the motion and stereo stimuli. Through this procedure, the authors estimated three JNDs for the stereo stimuli (JND

_{ d}

^{(1)}, JND

_{ d}

^{(2)}, and JND

_{ d}

^{(3)}) and three JNDs for the motion stimuli (JND

_{ v}

^{(1)}, JND

_{ v}

^{(2)}, and JND

_{ v}

^{(3)}). The authors then asked the critical question: How many JNDs separate the

*z*

_{ v}

^{(1)}and

*z*

_{ v}

^{(3)}motion-only stimuli, on the one hand, and the perceptually matched

*z*

_{ d}

^{(1)}and

*z*

_{ d}

^{(3)}stereo-only stimuli, on the other? If Fechner's theory holds, then the motion-only stimuli should be separated by the same number of JNDs as the perceptually matched stereo-only stimuli. MacKenzie et al. ( 2008) found that the “depth-matched pairs of stimuli were not (…) separated by equal numbers of JNDs, contradicting IC's prediction.”

*z*

^{(1)}is separated by five JNDs from the depth value denoted by

*z*

^{(6)}. We can envision two scenarios. In the first scenario, the JNDs for stereo and motion stimuli are constant within the depth range examined in the experiment. In order to estimate the number of JNDs separating the smallest depth

*z*

^{(1)}from the largest depth

*z*

^{(6)}, therefore, it is sufficient to divide the depth difference

*z*

^{(6)}−

*z*

^{(1)}(i.e., the height of the scale) by the size of one JND (i.e., the height of a step); see Figure 1 (left panel). In the second scenario, the JNDs are not constant ( Figure 1, right panel). To estimate the JND count, therefore, it would not be appropriate to divide the total depth difference by either the first or the last JND comprising the psychological scale.

*k,*then the number of JNDs separating two disparity-defined or motion-defined stimuli at depths

*z*

_{1}and

*z*

_{2}(with

*z*

_{2}>

*z*

_{1}) will be equal to

*n*= (1/

*k*)(log∣

*z*

_{2}∣ − log∣

*z*

_{1}∣). The JND counts estimated in this manner are reported in their Figure 10. On the basis of all these results, MacKenzie et al. ( 2008) concluded that depth-matched motion and disparity displays are not separated by the same number of JNDs.

*ρ,*which we used in our previous works, is equal to the signal-to-noise ratio in the case of single-cue displays. If depth-discrimination performance followed Weber's law,

*ρ*would remain constant with simulated depth and our model would not fit the psychophysical data. All our previous findings concerning the IC model, however, contradict such hypothesis (Domini et al., 2006; Tassinari, Domini, & Caudek 2008). Other evidence contrary to Weber's law comes from Farell, Li, and McKee (2004a). They found that disparity thresholds for random-dot stereograms increased as a function of pedestal disparity by following an exponential law with an intercept different from zero. Such result cannot be accounted for by Weber's law. On the basis of such evidence, we decided to test the hypothesis of MacKenzie et al. ( 2008) by using a different methodology.

*z*

_{ i}is a depth map, with

*i*being the index of the feature points, the 3D

*affine structure*is a family of depth maps, where

*k*can take on any value. Affine transformations preserve affine properties, such as depth-order relationships, parallelism, and so on (e.g., Koenderink & van Doorn, 1991; Todd, Oomes, Koenderink, & Kappers, 2001).

*directly specify*the local affine structure within the visual scene. For small visual angles, in fact, it can be shown that

*μ*is the vergence angle and

*ω*is the angle of rotation in 3D space; the disturbance terms

*ɛ*

_{ d i}and

*ɛ*

_{ v i}, due to measurement errors, are modeled as Gaussian noise with zero mean and standard deviations

*σ*

_{ d i}and

*σ*

_{ v i}, respectively.

*d*

_{ i}) =

*μz*

_{ i}and E(

*v*

_{ i}) =

*ωz*

_{ i}. In other words, velocity and disparity signals specify the local affine structure.

*σ*

_{ d i}and

*σ*

_{ v i}. Any linear scaling of the image signals, in fact, affects the magnitudes of

*σ*

_{ d i}and

*σ*

_{ v i}but leaves the precision of estimation unchanged. To clarify this point, consider the disparity signals as estimators of the depth-order relations. Intuitively, the precision of estimation provides a quantitative answer to the following question: “In what way do the relations among the disparity signals reflect the depth-order relations?” A mismatch between the relations among the disparity signals and the depth-order relations is indicated in Figure 2. In each of the four cases

*S,*with

*S*∈{1,2,3,4}, consider the depth-order relation

*P*

_{1}<

*P*

_{2}, where

*P*

_{2}=

*P*

_{1}+ Δ

*P*. The point

*P*

_{2}is always in front of

*P*

_{1}. Because of noise, however, sometimes the disparity signal

*d*

_{ P 2}is larger than

*d*

_{ P 1}, but sometimes smaller. A violation of the order relations in the recovered affine structure occurs if

*d*

_{ P 1}>

*d*

_{ P 2}.

*d*) = E(

*d*

_{ P 2}) − E(

*d*

_{ P 1}) and (b) an increase of the standard deviation

*σ*

_{ d}of the disparity noise. Since we are dealing with a local analysis, without loss of generality, we assume that the standard deviations of the measurement noise are the same for both

*P*

_{1}and

*P*

_{2}. The precision of the estimation of the local affine structure, therefore, can be quantified as

*w*

_{di}and

*w*

_{vi}are given by

*w*

_{d}∝

*w*

_{v}∝

*r*

_{i}will thus be equal to

*r*

_{i}=

*ρ*

_{i}+

*ɛ*

_{r}with

*ɛ*

_{r}∼

*r*

_{i}) =

*ρ*

_{ i}.

*d*

_{i}) and velocity (

*v*

_{i}) signals scaled by the standard deviation of their measurement noise. The scores (

*r*

_{i}) on the first principal component correspond to the optimal combination of Equation 4, with the weights indicated by Equations 5 and 6. This combination is optimal in the sense that it maximizes the SNR of the combined estimate (see also Tassinari et al., 2008).

*r*

_{i}score, which provides the best estimate of local affine structure. In a second stage, a metric depth interpretation is assigned to

*r*

_{i}:

_{i}=

*f*

_{ρ}(

*r*

_{i}), where

*f*

_{ρ}is a monotonically increasing function. Bear in mind that, in general, this metric interpretation is not veridical.

*σ*

_{ d i}and

*σ*

_{ v i}for the disparity and the velocity signals, respectively. A second source of noise comes from the errors that originate from the second stage of processing (Euclidean depth interpretation). The most parsimonious assumption is that these errors

*ɛ*

_{ D}can be modeled as additive Gaussian noise as well,

*ɛ*

_{ D}∼

*σ*

_{ D}). We will now demonstrate that, because of these two sources of noise, the just noticeable disparity and velocity increments are biased estimators of the standard deviations

*σ*

_{ d i}and

*σ*

_{ v i}of the disparity and velocity noise, respectively.

*σ*

_{ d i}=

*σ*

_{ d}and

*σ*

_{ v i}=

*σ*

_{ v}). These assumptions are essentially equivalent to restricting the scope of the IC model to a local analysis of a smooth surface. Consider the measurement of the disparity value

*d*

_{ P}produced by a point

*P*on the surface of an object. Since this measurement is subject to noise, we can write

*d*

_{ P}= E(

*d*

_{ P}) +

*ɛ*

_{ d P}. By dividing the previous equation by

*σ*

_{ d}, we obtain

*ρ*

_{ P}= E(

*r*

_{ P}) and

*ɛ*

_{1}∼

*f*

_{ ρ}(

*r*

_{ P}), which imposes a metric interpretation on

*r*

_{ P}. We thus assume that the perceived depth

_{ P}is corrupted by a further source of Gaussian noise,

*ɛ*

_{ D}∼

*σ*

_{ D}):

*f*

_{ ρ}(

*r*) around

*ρ*

_{ P}, we obtain

*ɛ*

_{1}=

*r*

_{ P}−

*ρ*

_{ P}(see Equation 8) represents the perturbation around

*ρ*

_{ P}, Equation 11 becomes

_{ P}, whereas the other two terms are independent random variables having zero mean and standard deviations

*f*′

_{ ρ}(

*ρ*

_{ P}) (since

*ɛ*

_{1}has unit variance) and

*σ*

_{ D}, respectively. Therefore

*σ*

_{ D}

^{2}of the depth-interpretation noise and on the slope of the function

*f*

_{ ρ}(

*ρ*) at

*ρ*

_{ P}.

*ρ*

_{ P}by some amount Δ

*ρ*:

*f*

_{ ρ}(

*ρ*

_{ P}) around

*ρ*

_{ P}, the previous equation can be approximated by

*ρ*corresponding to one JND of perceived depth:

*ρ*=

*biased estimator*of the standard deviation of the disparity noise. In fact, the disparity increment E(Δ

*d*) corresponding to one JND estimates the standard deviation of the measurement noise

*σ*

_{ d}only up to a multiplicative bias factor equal to

*f*

_{ ρ}(

*ρ*) could be any non-linear function, it is important to note that the JND can vary with the intensity of the disparity signals

*d*

_{ i},

*even if the measurement noise of the disparity signals*(

*ɛ*

_{ d i})

*and the noise due to the depth interpretation*(

*ɛ*

_{ D})

*are kept constant*. If

*f*

_{ ρ}(

*ρ*) were a decelerating function, for example, the JND would be an increasing function of

*d*

_{ i}.

*σ*

_{ d}must be taken with a grain of salt.

*z*

_{ v}

^{( j)}is perceptually matched in depth to the disparity-only stimulus

*z*

_{ d}

^{( j)}, then, according to the IC model, the SNRs of the two stimuli (see Equation 3) must be equal.

*v*

^{( j)}and

*d*

^{( j)}are provided by the stimulus displays. In The JND is a biased estimator of disparity noise section, however, we have shown that the JND is a biased estimator of the standard deviation of the measurement noise. Nevertheless, this is not a problem for the present purposes, if we assume the same

*f*

_{ ρ}(

*ρ*) function in the bias factor

*d*

^{( j)}and

*v*

^{( j)}, respectively.

*v*

^{( j)}=

*ωz*

_{ v}

^{( j)}and

*d*

^{( j)}=

*μz*

_{ d}

^{( j)}. Second, notice that

*d*and

*v*; by

*d*

^{(1)}be the front-to-back disparity of the two cylinders and let

*v*

^{(1)}be the front-to-back relative velocity. According to the IC model, it must be true that

*ρ*

_{ d}

^{(1)}=

*ρ*

_{ v}

^{(1)}, or equivalently that

*σ*

_{ d}may remain constant, or it may vary with signal intensity. In the following, we will consider the consequences that would follow each of these two possibilities:

- If the noise of the disparity measurement is constant, then$ \sigma d ( 1 )= \sigma d ( 2 )$,
*ρ*_{ d}^{(2)}=*ρ*_{ d}^{(1)}+ 1, and*ρ*_{ v}^{(2)}=*ρ*_{ v}^{(1)}+ 1. The increase of one JND produces a unit increase of*ρ*for both stereo-only and motion-only displays. After increasing the motion and stereo signals by one JND, therefore, the two stimuli should still be perceived as having the same depth elongation. If the stimulus pair {*z*_{ d}^{( m)},*z*_{ v}^{( m)}} is perceptually matched in depth, and so is the stimulus pair {*z*_{ d}^{( n)},*z*_{ v}^{( n)}}, then the stimuli*z*_{ d}^{( m)}and*z*_{ d}^{( n)}should be separated by the same number of JNDs in depth as the stimuli*z*_{ v}^{( m)}and*z*_{ v}^{( n)}. We want to stress that this prediction, which MacKenzie et al. (2008) attribute to the IC model, holds only if the measurement noise for the disparity and velocity signals does not vary with signal intensity. In their own data, MacKenzie et al. ( 2008) found that the JNDs for the velocity and disparity stimuli do indeed vary as a function of signal intensity. Any conclusion based on the assumption of a constant measurement noise, therefore, is questionable. - Now, consider the case in which the measurement noise varies with signal intensity. Would the IC model be falsified in this case? Not at all,
*unless the standard deviation of the measurement noise varied at the same rate as the signal intensity*. In this case, the ratio between the signal intensity and the standard deviation of the measurement noise would remain constant, and according to the IC model, all stimuli would be perceived as having the same depth extent. The above consideration has an important implication: according to the IC model, the Weber law does not apply within the range used in the psychophysical experiments on 3D depth perception (see Stairway to depth perception section).

*flanking*lines were positioned at fixation distance. A third line, which projected midway between the two, was located in depth in front of the flankers. Participants were asked to judge the depth separation (which we will call

*stimulus depth*) between the flankers and the central line. The 3D information was provided either by binocular disparities (

*stereo stimulus*) or by image velocities (

*motion stimulus*).

*z*

_{ v}

^{(1)}of the motion stimulus was set at 12.5 mm. For each participant, through a staircase procedure, we found the simulated depth of a stereo-only stimulus,

*z*

_{ d}

^{(1)}, which was perceptually matched in depth to the motion-only stimulus.

*z*

_{ v}

^{(1)}and

*z*

_{ d}

^{(1)}, we built two psychophysical scales, one for the motion-only stimuli and one for the stereo-only stimuli. The discrimination thresholds measured at

*z*

_{ v}

^{(1)}and

*z*

_{ d}

^{(1)}, denoted with

*z*

_{ v}

^{(2)}=

*z*

_{ v}

^{(1)}+

*z*

_{ d}

^{(2)}=

*z*

_{ d}

^{(1)}+

*z*

_{ v}

^{( j)}and

*z*

_{ d}

^{( j)}and

*j*= 1,…,6. With the exception of the first step, the two sequences of

*z*

^{( j)}and

*z*

_{ d}

^{( j)}values comprising the disparity-only scale.

*comparison stimulus*), whereas the depth separation of the other stimulus was varied according to a staircase procedure (

*test stimulus*). We used staircases to control the value of simulated depth and four reversal rules—3 down/1 up, 1 down/3 up, 2 down/1 up, and 1 down/2 up—to sample points along the entire psychometric function. Four staircases were used for each psychometric function, which corresponds to approximately 200 trials per function (each staircase was terminated after 6 reversals). For each observer, the JND and PSE were estimated from the fitted psychometric function. The mean and standard deviation of a cumulative normal were used to estimate PSEs and JNDs, respectively. Psychometric functions were fitted using psignifit version 2.5.6 (see http://bootstrap-software.org/psignifit/), a software package that implements the maximum-likelihood method described by Wichmann and Hill ( 2001).

*depth matching*), the

*comparison*was a motion stimulus simulating a depth of 12.5 mm and the

*test*was a stereo stimulus that was varied according to a staircase procedure. The purpose of this part of the experiment was to determine the PSE of the stereo stimulus perceptually matched to the motion stimulus simulating a depth of 12.5 mm. The simulated depth of 12.5 mm for the motion stimulus and the simulated depth at the PSE for the stereo stimulus were then used as the starting points for building the motion-based and stereo-based psychophysical scales.

*depth discrimination*), we built the motion-based and stereo-based psychophysical scales. The motion-based scale was generated by adding successive JNDs to the starting point (i.e., the simulated depth of 12.5 mm). The JNDs were estimated by a depth-discrimination task. In a 2-IFC task, observers were asked to determine which of the two successively presented stimuli appeared to be deeper. To estimate the first JND, we used a motion stimulus simulating a depth of 12.5 mm as the

*comparison*and a motion stimulus that was varied according to a staircase procedure as the

*test*. From the resulting psychophysical function, we estimated the first JND. In the successive step, the motion

*comparison stimulus*simulated a depth of 12.5 mm plus the JND estimated in the previous step; the motion

*test stimulus*was again varied according to a staircase procedure. In this way, a second JND was estimated, and this procedure was repeated five times.

*depth matching*). In this case, both the

*comparison*and

*test*stimuli were defined by disparity information. By using the same procedure as for the motion stimuli, we determined the sequence of the five JNDs, which comprise the stereo-based psychophysical scale.

*depth matching*), observers were asked to compare motion and stereo stimuli. A 2-IFC task was performed in five blocks of trials. In each block, the

*comparison*was a motion stimulus defined by one of the five simulated depth magnitudes that comprise the motion-based psychophysical scale. The

*test stimulus*was a static stereo display, which was varied according to a staircase procedure. The third part of the experiment allowed us to determine the stereo-depths that perceptually matched each step of the motion-based psychophysical scale.

*the same amount of perceived depth*at each step of the psychophysical scales. The answer to this question is provided by Figure 6, where we have plotted the results of the third part of the experiment. Remember that, in part three of the experiment, observers compared static stereo stimuli with motion stimuli. In a 2-IFC, the motion stimulus was kept fixed, whereas the stereo stimulus was varied according to a staircase procedure.

_{2}−

_{3}= 1.0743 mm (95% C.I.: −0.4242, 1.3409;

*t*

_{23}= 1.0743,

*p*> 0.05). We can thus conclude that JND increments applied independently to stereo or motion stimuli correspond to equivalent increments in perceived depth. These results are therefore compatible with the Fechnerian theory relating perceived magnitudes to JND sums.

*z*

_{ v}

^{( j)}of the motion stimuli of part two of the experiment, which were weighted by the ratio

*j*= 1,…,6. If the IC model is correct, then we expect a linear relationship with zero intercept and a slope of one. The IC model can be contrasted with a model assuming an unbiased derivation of 3D Euclidean shape from retinal cues. According to such a model,

*z*

_{ d}

^{( j)}and for the predictors of the two models. The centered data were then analyzed by linear regression.

*not equivalent*in three respects.

*does not,*in general, maximize the SNR of the combined estimate; this happens only if the estimates derived from the single depth cues are unbiased (see 1).

*ρ*. Then, a maximum likelihood depth interpretation is provided for

*ρ*.

_{ d i}) = E(

_{ v i}) =

*z*

_{ i}. Since the weights of the MWF combination rule sum to 1, the expected value of the combined estimate (

_{ c i}) will also be unbiased: E(

_{ c i}) = E(

_{ d i}) = E(

_{ v i}) =

*z*

_{ i}. The combination rule of the MWF model minimizes the variance

_{ c i}and, therefore, also maximizes the SNR of the final estimate,

_{ c i}) of the combined estimate does not depend on the choice of the weights.

*z*

_{ i}from stereo information are possible only if the vergence angle

*μ*can be accurately estimated (see Equation 1). Correspondingly, a veridical interpretation of motion information requires an unbiased estimate of the 3D angular rotation

*ω*(see Equation 2). The psychophysical literature, however, has revealed the existence of large and systematic biases in the perceptual interpretation of 3D shape from binocular disparities and retinal velocities. In a series of papers, we showed that the visual system is unable to recover unbiased estimates of 3D rotation from the velocity field (Caudek & Domini, 1998; Caudek & Rubin, 2001; Domini, Caudek, & Richman, 1998; Domini, Caudek, Turner, & Favretto, 1998). Regarding disparity information, similar results have also been found for the vergence angle (Johnston, 1991; Johnston, Cumming, & Landy, 1994).

*μ*and

*ω*may be obtained by

*promotion,*that is, by relying on the mutual constraints deriving from the simultaneous presence of disparity and velocity information (see Richards & Lieberman, 1985). If a process akin to promotion were to take place, observers' accuracy would improve when more depth cues are added to the stimulus displays. In our own research, however, we have shown that the exact opposite can happen Tassinari et al. ( 2008). In sum, neither the estimation of the viewing parameters nor the process of promotion guarantees unbiased perceptual estimates of 3D shape.

*ξ*

_{ d i}=

*z*

_{ i}+ bias

_{ d}and

*ξ*

_{ v i}=

*z*

_{ i}+ bias

_{ v}. MacKenzie et al. showed that, in order to maximize the SNR, the weights of the combination rule must be proportional to the means and inversely proportional to the variances of the individual estimates:

*Studio dei meccanismi di integrazione dell'informazione visiva e multisensoriale*) awarded to Corrado Caudek.