Bayesian cue combination models have been used to examine how human observers combine information from several cues to form estimates of linear quantities like depth. Here we develop an analogous theory for circular quantities like planar direction. The circular theory is broadly similar to the linear theory but differs in significant ways. First, in the circular theory the combined estimate is a nonlinear function of the individual cue estimates. Second, in the circular theory the mean of the combined estimate is affected not only by the means of individual cues and the weights assigned to individual cues but also by the variability of individual cues. Third, in the circular theory the combined estimate can be less certain than the individual estimates, if the individual estimates disagree with one another. Fourth, the circular theory does not have some of the closed-form expressions available in the linear theory, so data analysis requires numerical methods. We describe a vector sum model that gives a heuristic approximation to the circular theory's behavior. We also show how the theory can be extended to deal with spherical quantities like direction in three-dimensional space.

*M*for motion and

*D*for disparity. On every trial, the observer has a sample

*m*

_{ i }from

*M*and

*d*

_{ i }from

*D,*and these samples are metric depth estimates. The observer knows the standard deviations of

*M*and

*D,*but not the means. Bayes' theorem gives the posterior probability distribution on possible values of depth

*z,*conditioned on the samples

*m*

_{ i }and

*d*

_{ i }:

*P*(

*z*) is the prior probability of a depth value

*z,*and

*P*(

*m*

_{ i },

*d*

_{ i }) is the marginal probability of samples

*m*

_{ i }and

*d*

_{ i }, i.e.,

*P*(

*m*

_{ i },

*d*

_{ i }) =

*∫*

*P*(

*m*

_{ i },

*d*

_{ i }∣

*z*)

*P*(

*z*)

*dz*. If

*M*and

*D*are conditionally independent, Equation 1 becomes

*g*(

*x*;

*μ, σ*) and the standard deviations

*σ*

_{ M }=

*std*[

*M*] and

*σ*

_{ D }=

*std*[

*D*] to write Equation 2 as

*s*

_{ i }in Equation 6, as this is the value of

*z*that maximizes the likelihood term

*g*(

*z*;

*s*

_{ i },

*σ*

_{ S }) in Equation 5. Thus the maximum-likelihood decision variable is a random variable

*S*that is a weighted average of cues

*M*and

*D,*with weights determined by the variance of the cues. Modified weak fusion asserts that the observer uses this maximum-likelihood estimate as a decision variable when making depth judgements. In a weaker form, the theory says that the observer's decision variable is a weighted average of

*M*and

*D,*but not necessarily the optimal weighted average as in Equation 6. When deriving the normative cue combination rule in Equation 6, we assumed that the cues

*M*and

*D*are unbiased, but the theory leaves open the possibility that in human vision the cues are in fact biased. Furthermore, if the prior

*P*(

*z*) is not uniform over the range of interest, the theory allows that observer may use the maximum a posteriori estimate, which is the value of

*z*that maximizes

*g*(

*z*;

*s*

_{ i },

*σ*

_{ S })

*P*(

*z*) in Equation 5 and can differ from the maximum-likelihood estimate. Finally, when depth estimates from different cues disagree strongly, one or more of the estimates may be corrupt, so the theory includes a robustness mechanism that reduces the weight assigned to cues that have very different values than more reliable cues.

*H*for shading and

*C*for cast shadows. The von Mises distribution (Figure 1) is a circular analogue of the normal distribution, with a location parameter

*μ*indicating the mean and the angle of maximum probability, and a concentration parameter

*κ*indicating the narrowness of the peak at

*μ*. The concentration parameter

*κ*is the circular analogue of the “reliability”

*σ*

^{−2}of a normal distribution, and when a von Mises distribution is narrow (

*κ*> 2) it is well approximated by a normal distribution with variance

*σ*

^{2}= 1/

*κ*. The von Mises probability density function is

*I*

_{0}is the modified Bessel function of the first kind and order zero. Following the linear theory, we assume that the observer knows the concentration parameters

*κ*

_{ H }and

*κ*

_{ C }of cues

*H*and

*C,*but not the means

*μ*

_{ H }and

*μ*

_{ C }.

*h*

_{ i }from

*H*and

*c*

_{ i }from

*C*. Bayes' theorem gives the posterior distribution on lighting directions

*θ*:

*y, x*) is the four-quadrant inverse tangent. The location parameter

*l*

_{ i }is the maximum-likelihood estimate of lighting direction, and the concentration

*κ*

_{ L }represents the certainty of this estimate. Thus the maximum-likelihood decision variable is a random variable

*L*that is a nonlinear function of cues

*H*and

*C,*as in Equation 14. Following the linear model, we can extend the combination rule by allowing the concentration parameters

*κ*

_{ H }and

*κ*

_{ C }in Equation 14 (i.e., the analogues of 1/

*σ*

_{ M }

^{2}and 1/

*σ*

_{ D }

^{2}in Equation 6) to be replaced by arbitrary positive weights, thus allowing a broader and generally nonoptimal range of cue combination strategies, and by allowing for an effect of the prior

*P*(

*θ*). In most of what follows, we will allow the decision variable

*L*to be calculated with concentrations replaced by arbitrary positive weights

*w*

_{ H }and

*w*

_{ C }:

*L*does not belong to the same family of probability distributions (i.e., von Mises) as the individual cues

*H*and

*C*. In fact, it does not belong to any standard family we know of, and we have no closed-form expression for its distribution. Numerical methods are necessary to find the distribution of the combined decision variable

*L*from the means, concentrations, and weight parameters of individual cues.

*L*can be approximated as a von Mises random variable

*L** with parameters

*w*

_{ H }=

*κ*

_{ H },

*w*

_{ C }=

*κ*

_{ C }), this approximation simplifies to

*L*(calculated using Monte Carlo methods) and the approximation

*L**, for several means, concentrations, and weights for the individual cues

*H*and

*C*. Figure 3 shows the circular means and circular standard deviations (Fisher, 1993) of

*L*and

*L** for a wider range of parameters. The approximation is reasonably good, at least as an aid to understanding the model's qualitative behavior. It is least accurate when approximately equally weighted cues indicate opposite directions; in this case,

*L** has a uniform distribution, whereas

*L*is bimodal with peaks halfway between the two opposed directions.

*L*has a value partway between the cues

*H*and

*C*. Unlike in the linear model, the combined estimate is not simply a weighted average of the cues. The left-hand column of Figure 3 shows that the mean of the combined estimate is approximately a weighted average of the cue means when the cue means are similar, but when the discrepancy between cues is large, the combined estimate is closer to the more heavily weighted cue. For example, when the shading cue is weighted more strongly than the cast shadow cue (

*w*

_{ H }>

*w*

_{ C }), the cast shadow cue pulls the combined estimate away from the shading cue over a limited range, but when the discrepancy is large the combined estimate smoothly reverts to the shading cue. The robustness mechanism in the linear model causes similar behavior. Robust behavior without a dedicated robustness mechanism for rejecting discrepant cues is not unique to circular models: non-Gaussian models of cue combination on the line, particularly those based on probability distributions with heavy tails, can also behave robustly without dedicated robustness mechanisms (Girshick & Banks, 2009; Knill, 2007).

*μ*

_{ C }−

*μ*

_{ H }∣. Equation 20 shows that in the von Mises approximation with optimal weights, when the cues agree (

*μ*

_{ H }=

*μ*

_{ C }) the combined concentration

*κ*

_{ L }equals the sum of the cue concentrations,

*κ*

_{ H }+

*κ*

_{ C }(just as reliabilities sum in the linear model, i.e.,

*σ*

_{ S }

^{−2}=

*σ*

_{ M }

^{−2}+

*σ*

_{ D }

^{−2}; see Equation A10), and when cues indicate opposite directions

*κ*

_{ L }equals the difference between the cue concentrations, ∣

*κ*

_{ H }−

*κ*

_{ C }∣.

*w*

_{ M }= 2

*w*

_{ D }, then motion perturbations will have twice the effect on perceived depth as disparity perturbations, regardless of the standard deviations of the motion and disparity cues (Young, Landy, & Maloney, 1993). Equation 17 shows that in the von Mises approximation to the circular model, the decision variable mean is independent of the cue concentrations, as in the linear model. In the exact circular model, though, the cue concentrations do affect the mean of the decision variable. Figure 4 shows the mean of the decision variable

*L*for several cue means, concentrations, and weights, along with the von Mises approximation. First, we see that relative to the von Mises approximation, the decision variable mean is biased toward the mean of the cue with the higher concentration (Figure 4a). Second, again relative to the von Mises approximation, the decision variable mean is biased toward the mean of the cue with the greater weight (Figure 4b). (These two biases may be in opposite directions.) Third, these two biases are strongest when the difference between the cue means is large (Figures 4c and 4d) and when the concentrations are low (Figure 4, all panels).

*κ*> 2). Cue concentrations can be estimated from the slopes of psychometric functions, e.g., the slope of a psychometric function for discriminating between lighting directions defined by shading gives an upper limit on the noisiness of the shading cue, since discrimination performance is limited by noise in the shading cue plus any other internal decision noise. Independent estimates like this can show whether cues in a given experiment are concentrated enough to make the biases illustrated in Figure 4 negligible.

*μ*is the vector's direction and

*κ*is its length. That is, the sum of two polar-coordinate vectors (

*κ*

_{ H },

*μ*

_{ H }) and (

*κ*

_{ C },

*μ*

_{ C }) is the polar-coordinate vector (

*κ*

_{ L*},

*μ*

_{ L*}). This means we can visualize the von Mises approximation to optimally weighted circular cue combination as a vector sum, with a vector for each cue pointing in the direction

*μ*of the cue, and the length of the vector equal to the cue's concentration parameter

*κ*. The vector representing the von Mises approximation to the combined distribution is the sum of the vectors representing the individual cues. This heuristic is only as good as the von Mises approximation, so it fails when we combine equally strong cues indicating opposite directions. In this case, the cue vectors sum to the zero vector, i.e.,

*κ*

_{ L*}= 0, representing the uniform distribution, whereas the combined decision variable

*L*is bimodal, as mentioned earlier. The vector sum heuristic also fails when observers use nonoptimal cue weights, as the von Mises approximation for nonoptimal weights does not describe vector addition in polar coordinates; specifically, Equation 18 does not give the correct length of the summed vector.

*angle*of a combined direction estimate is a nonlinear function of the

*angles*of the individual cues. However, optimal circular cue combination can be represented as a simple weighted sum, if we represent individual cues as

*vectors*instead of angles. As in Equation 14, on each trial we have a shading cue angle

*h*

_{ i }and its associated concentration

*κ*

_{ H }, and a cast shadow cue angle

*c*

_{ i }and its concentration

*κ*

_{ C }. We can find the maximum-likelihood combined angle estimate using Equation 14. Alternatively and equivalently, we can represent the two cues as polar-coordinate vectors (

*κ*

_{ H },

*h*

_{ i }) and (

*κ*

_{ C },

*c*

_{ i }), and then the combined estimate is represented by the polar vector (

*κ*

_{ L },

*l*

_{ i }) that is the vector sum of the cue vectors. Of course, we must still use Equations 14 and 15, which are equations for vector summation in polar coordinates, to find the summed vector (

*κ*

_{ L },

*l*

_{ i }). Nevertheless, this alternate view highlights a similarity to optimal linear cue combination, where the maximum-likelihood estimate is also a weighted sum of individual cues.

*n*-dimensional sphere (Fisher, Lewis, & Embleton, 1987), which in three dimensions has probability density function

**x**is a unit vector,

**is a unit-vector location parameter indicating the peak direction,**

*μ**κ*is a scalar concentration parameter, and • is the vector dot product. Suppose the shading cue

**H**and the cast shadow cue

**C**are now unit vectors ranging over all three-dimensional directions (and so we write them in bold). Repeating the derivation in Equations 9 to 15 with a von Mises–Fisher distribution shows that the maximum-likelihood combined lighting direction estimate is

*g*(

*x*;

*μ*

_{ M },

*σ*

_{ M }) and

*g*(

*x*;

*μ*

_{ D },

*σ*

_{ D }). Equation A1 shows that the pointwise product of these densities is proportional to

*g*(

*x*;

*μ*

_{ T },

*σ*

_{ T }), where

*S*from Equation 6:

*S*(Equations A9 and A10), up to a normalization constant.

*S*is not von Mises. The von Mises approximation to the optimally weighted decision variable (Equations 19 and 20) is the pointwise product of the von Mises densities of the individual cues, as can be seen by comparing the approximation to Equations 12 to 15. As we have just shown, this approximation is exact in the linear model, so it should work well in the circular model under conditions where the circular model approximates the linear model: large cue concentrations and small differences between cue means. Figure 3 supports this approximation and shows that even at low concentrations and large differences between the means, the approximation still captures the models' behavior qualitatively, e.g., the width of the combined decision variable grows with the difference between the cue means.

*M*and

*D,*the observer optimally combined two random variables

*X*and

*Y*with means

*μ*

_{ X }=

*μ*

_{ M }and

*μ*

_{ Y }=

*μ*

_{ D }, and variances

*M*and

*D*using weights

*w*

_{ M }and

*w*

_{ D }, the decision variable's density is the pointwise product of

*g*(

*x*;

*μ*

_{ X },

*σ*

_{ x }) and

*g*(

*x*;

*μ*

_{ Y },

*σ*

_{ Y }), up to a normalization factor.

*H*and

*C*suboptimally as in Equation 16, we approximate the decision variable by taking the pointwise product of von Mises distributions

*U*and

*V*that have means

*μ*

_{ U }=

*μ*

_{ H }and

*μ*

_{ V }=

*μ*

_{ C }, and concentrations chosen by analogy with Equations A14 and A15 and using the approximation

*κ*=

*σ*

^{−2}, which is valid at large concentrations (Fisher, 1993):