The local spatiotemporal pattern of light on the retina is often consistent with a single translational velocity which may also be interpreted as a superposition of spatial patterns translating with different velocities. Human perception reflects such interpretations, as can be demonstrated using stimuli constructed from a superposition of two drifting gratings. Depending on a variety of parameters, these stimuli may be perceived as a coherently moving plaid pattern or as two transparent gratings moving in different directions. Here, we propose a quantitative model that explains how and why such interpretations are selected. An observer's percept corresponds to the most probable interpretation of noisy measurements of local image motion, based on separate prior beliefs about the speed and singularity of visual motion. This model accounts for human perceptual interpretations across a broad range of angles and speeds. With optimized parameters, its components are consistent with previous results in motion perception.

*transparently*past each other (Adelson & Movshon, 1982; Wallach, 1935). Clearly, both interpretations are physically plausible scenarios. How does the visual system choose? The answer depends on a variety of different attributes of the stimulus (Adelson & Movshon, 1982; Cropper, Mullen, & Badcock, 1996; Hupé & Rubin, 2003; Kim & Wilson, 1993; Kooi, Valois, Switkes, & Grosof, 1992; Krauskopf & Farell, 1990; Krauskopf, Wu, & Farell, 1996; Movshon, Adelson, Gizzi, & Newsome, 1986; Smith, 1992; Stoner & Albright, 1992; Stoner, Albright, & Ramachandran, 1990; Victor & Conte, 1992; Welch & Bowne, 1990). In general, the transparent interpretation becomes more likely with faster component speeds, broader angles, and longer presentation times, as well as with greater differences between the components' attributes, including in speed, spatial frequency, contrast, depth, or hue. The effects of these parameters have generally been examined one at a time, and most proposed models are tied to the specifics of these plaid stimuli and are thus difficult to generalize to natural vision.

_{1},

_{2}}. Physiologically, we assume that these correspond to the responses of two distinct subsets of noisy orientation- and speed-selective neurons, such as those found in primary visual cortex. Given these noisy measurements, the decoding portion of the model then uses the rules of statistical decision theory to select one of the two hypotheses, {

*H*

_{coh},

*H*

_{tran}}, corresponding to coherent/transparent percepts, respectively. Specifically, the model selects the most probable of the two percepts by comparing

*p*(

*H*

_{coh}∣

_{1},

_{2}) and

*p*(

*H*

_{tran}∣

_{1},

_{2}).

*p*(

*p*(

_{1},

_{2}), when comparing these two percept probabilities, since it is the same for both of them.

*likelihood function*,

*p*(

*not*the same function as the measurement noise distribution, which is a function of the normal velocity

*c*

_{4}determines the speed at which the prior transitions from a constant regime to a power-law regime, and

*c*

_{5}is an exponent that controls the rate of decay. This parametric description is consistent with previous theoretical proposals (Dong & Atick, 1995), with simulations of graphical environments (Roth & Black, 2007) and with perceptual prior models reverse-engineered from human speed discrimination data (Stocker & Simoncelli, 2006).

*s*

_{g}, in directions deviating from vertical by equal (but opposite) amounts. Thus, each stimulus is determined by the grating speed and the angle between the grating normal velocities and the vertical axis,

*θ*

_{g}(an angle of zero corresponds to an upward moving horizontal grating). For presentation purposes, we reparameterize the stimuli in terms of the grating speed and the

*pattern speed*corresponding to the unique translational velocity that is consistent with the motion of the two gratings:

*s*

_{p}=

*s*

_{g}/ cos(

*θ*

_{g}). Figure 3a shows the collection of stimuli that we used, plotted in terms of these two speeds.

*s*

_{g},

*s*

_{p}} could be expressed as

*w*(·) is a Weibull function with four parameters controlling the angle and slope of the transition from coherent to transparent and the saturating (min/max) response levels on either side of the transition. For each subject, we fit these four parameters by maximizing the likelihood (Figure 5, third row). For three of the four subjects (all but s3), this model does not provide a good description of data. The subjects perceive large-angle plaids with slow components as coherent, and in addition, the boundary between the transparent and coherent regions for larger component speeds does not appear to lie along a straight line emanating from the origin.

*w*

_{g}and

*w*

_{p}are both four-parameter Weibull functions, constrained to have the same transition speed and the same saturating level for large values (i.e., in the transparent region). We fit the six parameters of this model to each subject by maximizing the likelihood of their data (Figure 5, fourth row). For all four subjects, this model does not provide as good of a description as the Bayesian model.

*n*ln(

*σ*

_{e}

^{2}) +

*k*ln(

*n*), where

*n*is the number of stimulus conditions,

*σ*

_{e}

^{2}is the variance of the model error (across stimulus conditions), and

*k*is the number of free parameters in the model. The (negative) BIC values for the models are plotted in Figure 6b. For all subjects, the Bayesian model is seen to do a better job of fitting the data, despite the penalty for having as many, or more, free parameters.

*asymmetric*plaids, in which the two gratings differ in speed by a fixed scale factor. Figure 8b shows that the percepts for plaids with grating speeds in a ratio of 1:3 are nearly the same as those for symmetric plaids. Finally, we computed predictions for asymmetric plaids in which the normal velocities of the gratings lie on the same side of the pattern velocity (these are known as “type II” plaids in the literature (Kim & Wilson, 1993)). Figures 8c–8e show predictions for three different grating speed ratios and show a range of behaviors.

*p*(

*H*

_{coh}∣

_{1},

_{2})) would increase variability in coherency/transparency judgments across all stimuli, which seems inconsistent with regions of the stimulus space in which subjects show no variation in interpretation (e.g., where all trials are judged coherent). In conclusion, the means by which probabilistic computations are accomplished with neurons is a topic of many recent theoretical studies. However, the full sequence of computations that underlies perceptual inference, including the representation and learning of prior information, remains a fundamental and unresolved topic for future investigation.

*faster*than the corresponding component speeds, again in 0.5 deg/s increments. For example, plaids with the slowest component speed of 0.5 deg/s were presented with pattern speeds ranging from 1 to 5.5 deg/s. The angle between the normal directions of the components was twice the arccosine of the ratio of component to pattern speed. For example, for a component speed of 0.5 deg/s, the half-angles between the directions of the component motions ranged from 60 to 84.8 deg. The sequence of stimuli presented during the experiment was randomized, with each stimulus presented at least six times.

*θ,*moving rigidly with 2D velocity

_{ θ }, where

_{ θ }= [cos(

*θ*), sin(

*θ*)], a unit vector in the

*θ*direction. An observer makes a measurement,

*σ*

_{ s }(

_{ θ }) in speed and

*σ*

_{ d }(

_{ θ }) in direction:

*R*is a 2 × 2 matrix that performs a rotation by

*π*/2. This measurement distribution is illustrated in Figure B1a. We assume that standard deviations for speed and direction,

*σ*

_{ s }(

_{ θ }) and

*σ*

_{ d }(

_{ θ }), are a function of speed (Stocker & Simoncelli, 2006) and parameterized them as:

*c*

_{1}determines the speed at which this transition occurs, and parameter

*c*

_{2}determines the proportionality factor at high speeds. The standard deviation of the distribution in terms of direction is proportional to that in terms of speed, with parameter

*c*

_{3}controlling this proportionality. A previous review has suggested that

*c*

_{3}is approximately 0.33 (Nakayama, 1985), and we have used 0.35 as an initial value from which to start the optimization.

*p*(

*θ*), is uniform. The resulting measurement distribution is illustrated in Figure B1b, corresponding to a probabilistic version of the circle defining the set of normal velocities consistent with a given pattern velocity (see Adelson and Movshon, 1982, Figure 3). Finally, the likelihood function is obtained by evaluating this measurement distribution as a function of

*p*(

*H*

_{coh}). Let

*s,*the model operates by generating two random measurements {

_{1},

_{2}} and then deciding on a percept by comparing the two posterior probabilities

*p*(

*H*

_{coh}∣

_{1},

_{2}) and

*p*(

*H*

_{tran}∣

_{1},

_{2}). Although this decision process is deterministic, the measurements are stochastic (drawn from the measurement density of Equation B1), and thus, repeated presentations of the same stimulus produces “coherent” responses with a probability denoted as

*p*

_{ s }(

*n*

_{ s }denotes the number of “coherent” responses given by the subject over a total of

*N*

_{ s }trials in which stimulus

*s*was presented.

*p*

_{ s }(

*k*of these simulated trials produced a “coherent” response. A maximum likelihood estimate of the model probability would be

_{ s }(

_{ p }[

*p*

^{ k }(1 −

*p*)

^{(50−k)}] =

*k*/ 50. However, this estimate is problematic when optimizing the log likelihood of Equation C1: If the simulated trials produce a value of

*k*= 0 (or

*k*= 50), the estimated model probability will be 0 (or 1), which can lead to an infinite log likelihood. To avoid this, we used the mean estimate,

_{ s }(

_{0}

^{1}

*p*

^{ k }(1 −

*p*)

^{(50−k)}

*dp*= (

*k*+ 1) / 52, whenever the simulated trials produced

*k*= 0 or

*k*= 50.

*p*(

*H*

_{coh}), was easily optimized because it appears only as a multiplicative scale factor in the last step of computation of the probabilities in Equations 1 and 2.

*c*

_{1}= 1,

*c*

_{2}= 0.2,

*c*

_{3}= 0.35,

*c*

_{4}= 1,

*c*

_{5}= 2.4,

*p*(

*H*

_{coh}) = 0.5. Integrals were computed numerically, over a rectangular region of the velocity plane covering

*v*

_{ x }∈ [−16, 16] deg/s and

*v*

_{ y }∈ [−12, 24] deg/s, sampled at increments of 0.5 deg/s. We verified that this plane was large enough, and of sufficiently fine spacing, to accurately fit the parameters. For the integration required to compute the likelihoods (Equation B3), we used 120 directional samples over an angle of

*π*radians. We also confirmed that this sampling was sufficiently dense, so as not to significantly impact the behavior of the model.