Suppose that an image contains multiple cues to the depth of a point. Modified weak fusion assumes that the observer has a normal random variable for each cue, say
M for motion and
D for disparity. On every trial, the observer has a sample
m i from
M and
d i from
D, and these samples are metric depth estimates. The observer knows the standard deviations of
M and
D, but not the means. Bayes' theorem gives the posterior probability distribution on possible values of depth
z, conditioned on the samples
m i and
d i :
Here
P(
z) is the prior probability of a depth value
z, and
P(
m i ,
d i ) is the marginal probability of samples
m i and
d i , i.e.,
P(
m i ,
d i ) =
∫ P(
m i ,
d i ∣
z)
P(
z)
dz. If
M and
D are conditionally independent,
Equation 1 becomes
If the motion and disparity cues are normal random variables with means equal to the true depth value (i.e., the cues are unbiased), then we can use the normal probability density function
g(
x;
μ, σ) and the standard deviations
σ M =
std[
M] and
σ D =
std[
D] to write
Equation 2 as
Using
Equation A1 (see
1) for the pointwise product of two normal density functions, this becomes
where
and
The maximum-likelihood depth estimate is
s i in
Equation 6, as this is the value of
z that maximizes the likelihood term
g(
z;
s i ,
σ S ) in
Equation 5. Thus the maximum-likelihood decision variable is a random variable
S that is a weighted average of cues
M and
D, with weights determined by the variance of the cues. Modified weak fusion asserts that the observer uses this maximum-likelihood estimate as a decision variable when making depth judgements. In a weaker form, the theory says that the observer's decision variable is a weighted average of
M and
D, but not necessarily the optimal weighted average as in
Equation 6. When deriving the normative cue combination rule in
Equation 6, we assumed that the cues
M and
D are unbiased, but the theory leaves open the possibility that in human vision the cues are in fact biased. Furthermore, if the prior
P(
z) is not uniform over the range of interest, the theory allows that observer may use the maximum a posteriori estimate, which is the value of
z that maximizes
g(
z;
s i ,
σ S )
P(
z) in
Equation 5 and can differ from the maximum-likelihood estimate. Finally, when depth estimates from different cues disagree strongly, one or more of the estimates may be corrupt, so the theory includes a robustness mechanism that reduces the weight assigned to cues that have very different values than more reliable cues.