**People make surprising but reliable perceptual errors. Here, we provide a unified explanation for systematic errors in the perception of three-dimensional (3-D) motion. To do so, we characterized the binocular retinal motion signals produced by objects moving through arbitrary locations in 3-D. Next, we developed a Bayesian model, treating 3-D motion perception as optimal inference given sensory noise in the measurement of retinal motion. The model predicts a set of systematic perceptual errors, which depend on stimulus distance, contrast, and eccentricity. We then used a virtual-reality headset as well as a standard 3-D desktop stereoscopic display to test these predictions in a series of perceptual experiments. As predicted, we found evidence that errors in 3-D motion perception depend on the contrast, viewing distance, and eccentricity of a stimulus. These errors include a lateral bias in perceived motion direction and a surprising tendency to misreport approaching motion as receding and vice versa. In sum, we present a Bayesian model that provides a parsimonious account for a range of systematic misperceptions of motion in naturalistic environments.**

*lateral bias*: They systematically overestimate angle of approach in 3-D, such that objects moving toward the head are perceived as moving along a path that is more lateral than the true trajectory (Harris & Dean, 2003; Harris & Drga, 2005; Lages, 2006; Rushton & Duke, 2007; Welchman et al., 2004; Welchman et al., 2008). Bayesian models of 3-D motion perception, assuming a slow motion prior, can account for this bias (Lages, 2006; Lages, Heron, & Wang, 2013; Wang, Heron, Moreland, & Lages, 2012; Welchman et al., 2008). However, existing models are restricted to specific viewing situations (stimuli in the midsagittal plane) and have been tested using tasks and stimuli that limit the kind of perceptual errors that can be observed. In addition, these models have not addressed a recently identified perceptual phenomenon in which the direction of motion in depth (but not lateral motion) is frequently misreported: Approaching motion is reported to be receding and vice versa (Fulvio et al., 2015).

*P*(

*s*|

*r*) specifies the conditional probability of the physical stimulus

*s*given a sensory measurement or response

*r*. The posterior is determined, according to Bayes's rule, by the product of two probabilistic quantities known as the likelihood and the prior. The likelihood

*P*(

*r*|

*s*) is the conditional probability of the observed sensory response

*r*given a physical stimulus

*s*. It characterizes the information that neural responses carry about the sensory stimulus. Increased sensory uncertainty, due to ambiguity or noise in the external world or internal noise in the sensory system, manifests as an increase in the width of the likelihood. The prior

*P*(

*s*) represents the observer's assumed probability distribution of the stimulus in the world. The prior may be based on evolutionary or experience-based learning mechanisms. The relationship between posterior, likelihood, and prior is given by Bayes's rule, which states:

*relationship*of the retinal-velocity signals between the two eyes.

*p*is the position of the object as a function of time

*t*in a coordinate system defined over

*x*-,

*y*-, and

*z*-axes. Here, we use a head-centered coordinate system and place the origin at the midpoint between the two eyes of the observer (see icon in upper left corner of Figure 2). In this left-handed coordinate system, the

*x*-axis is parallel to the interocular axis (positive rightward), the

*y*-axis is orthogonal to the

*x*-axis in the plane of the forehead (positive upward), and the

*z*-axis extends in front of and behind the observer (positive in front).

*xz*-plane (

*y*= 0 for all points; Figure 2). Note, however, that this does not mean that this model is valid only for stimuli in the plane of the interocular axis. As long as retinal angles are represented in an azimuth-longitude coordinate system, the horizontal retinal velocities can be computed from the

*x*and

*z*components of 3-D motion vectors alone. This geometry is independent of the observer's point of fixation but assumes that fixation does not change over the course of stimulus presentation. In this coordinate system, the (

*x*,

*z*) coordinates of the left and right eye are defined as (

*x*

_{L}, 0) and (

*x*

_{R}, 0), respectively. The distance between the eyes along the interocular axis, denoted by

*a*, is

*x*

_{R}−

*x*

_{L}.

*x*(

*t*),

*z*(

*t*)) will project to a different horizontal angle in each eye. If we define these angles relative to the

*x*-axis in the

*xz*-plane, they are given by

*β*

_{L}(

*t*) and

*β*

_{R}(

*t*) indicate the angle in the left and the right eye, respectively. The object will generally have a different distance from each eye. These distances are given by

*h*

_{L}(

*t*) and

*h*

_{R}(

*t*) indicate the distance from the left and the right eye, respectively.

*df*(

*x*)/

*dt*=

*f*′(

*x*). This yields

*x*′(

*t*) and

*z*′(

*t*) that generated them are unknown. We therefore solve for

*x*′(

*t*) and

*z*′(

*t*) as a function of

*h*

_{L,R},

*z*

_{0},

*z*′,

*x*

_{0}, and

*x*′ refer to

*h*

_{L,R}(

*t*),

*z*(

*t*),

*z*′(

*t*),

*x*(

*t*), and

*x*′(

*t*), each evaluated at time

*t*=

*t*

_{0}. To determine the velocity

*x*′ in terms of retinal velocities, we rearrange Equation 6 for the left eye to solve for

*z*′, substitute the result back into Equation 6 for the right eye, and solve for

*x*′, yielding

*a*refers to the interocular separation. To determine the equation for

*z*′ in terms of retinal velocities, we rearrange Equation 6 for the left eye to solve for

*x*′ and substitute this back into Equation 6 for the right eye, yielding the following equation for

*z*′, also in terms of retinal velocities:

*where*the observer fixates. This independence occurs because the position of fixation does not affect the angular velocity cast by a moving object at a given head-centric location—assuming, as in previous models, that fixation remains stable during stimulus presentation (Lages, 2006; Wang et al., 2012; Welchman et al., 2008). However, this does not explicitly account for any differences in retinal-velocity estimation across the visual field. For example, while retinal-motion signals may be less reliable at eccentricity compared to at fixation (Johnston & Wright, 1986; Levi, Klein, & Aitsebaomo, 1984), we do not explicitly incorporate such differences here.

*z*

_{0}and its location relative to each eye (

*x*

_{0}−

*x*

_{L}) and (

*x*

_{0}−

*x*

_{R})—is known, we can use Equations 7 and 8 (which specify

*x*′ and

*z*′ as linear combinations of

*x*and

*z*(

*A*is given by

*M*=

*x*and

*z*velocity components of a motion trajectory, we plot the sensory uncertainty for each velocity component (the square root of the diagonal elements of Equation 11, denoted

*x*

_{0}and distance in depth

*z*

_{0}in Figure 3A and 3B. Each panel contains an isocontour plot showing the log of the sensory uncertainty at each true spatial location. Several features are notable. The uncertainty in

*x*′ is at its minimum for points that fall on or near the midsagittal plane and increases for points to the left and right. The uncertainty in

*z*′ is at its minimum for points closest to the eyes and increases radially away from the midpoint between the eyes. Note that uncertainty in

*x*′ also increases with distance, but not as steeply as in

*z*′. In the central visual field, the uncertainty in

*z*′ is generally much greater than the uncertainty in

*x*′.

*x*′ and

*z*′, we plot the log of the ratio of the two values for a subset of points close to the observer in Figure 3C (within 25 cm left/right and 100 cm in depth). Ratios greater than 0 (red) indicate that uncertainty in

*z*′ is greater than

*x*′, and ratios less than 0 (blue) indicate the reverse. In the central visual field, this ratio is greater than 1. This is consistent with previous work (Welchman et al., 2008). However, the ratio varies considerably as a function of both viewing distance and viewing angle. At steep viewing angles (>45°), the relationship reverses and

*x*′ uncertainty is actually greater than

*z*′ uncertainty. We should note that our model includes uncertainty only in object speed, not in object location. Uncertainty in object location would likely increase for objects projecting to larger retinal eccentricities.

*x*′ and

*z*′ are not independent. To visualize this relationship, in Figure 3D we show the covariance ellipses for a set of locations within 100 cm in depth (the inset shows a zoomed view of nearby points). For most locations, the ellipses are highly elongated, indicating that for each location, uncertainty is anisotropic across directions. As expected from the geometric analysis (Figure 1), the axis of minimal uncertainty is orthogonal to a line connecting each location back to the interocular axis, independent of the direction of gaze. This creates a radial pattern, in which uncertainty is highest for motion extending radially from the observer's location. Along the midsagittal plane (

*x*

_{0}= 0), the covariance is zero and the axes of minimal and maximal uncertainty align with the

*x*- and

*z*-axes, respectively.

*z*component of velocity than for the

*x*component. However, if

*z*

_{0}is equal to

*a*/2, half the interocular distance, the variances in

*x*′ and

*z*′ will be equal. Thus, while uncertainty for motion in depth in the midsagittal plane clearly tends to be substantially higher than for lateral motion, the relative uncertainty is reduced for near viewing distances.

*x*

_{0}= 0), such that uncertainty in

*x*′ and

*z*′ from the likelihood come out as independent (Equations 12 and 13).

*x*′,

*z*′)—is given by a 2-D Gaussian probability density function. For motion originating along the midsagittal plane, this is given by the product of two 1-D Gaussians,

*μ*and

*σ*denote the mean and standard deviation of a 1-D Gaussian, respectively. These likelihoods are given by

*h*

_{L}and

*h*

_{R}in these equations can be determined if one knows the interocular separation and has a reliable estimate of the object location from any combination of monocular and binocular cues (that is, the distance from each eye need not be derived from monocular distance information only).

*x*′ and

*z*′, as has been done previously (Lages, 2006, 2013; Wang et al., 2012; Welchman et al., 2008). We can then express the prior as

*x*′ and

*z*′ are given by

*α*will have its variance scaled by

*α*

^{2}). This shows that the ideal observer exhibits a reduction in variance even as it exhibits an increase in bias (in this case, bias toward slower speeds).

*x*- and

*z*-axes). We already have the noise covariance

*M*of the sensory measurements from Equation 11. The covariance of the posterior of the Bayesian ideal observer (denoted by Λ, the covariance of

*C*, a diagonal matrix with variance in

*x*′ and

*z*′ of

*x*′ and

*z*′—that is,

*x*′ and

*z*′—is then

*S*= Λ

*M*

^{−1}provides a joint shrinkage factor on the maximum-likelihood estimate analogous to the role played by

*α*in the previous section.

*xz*-plane, and indicated the perceived direction of motion. Raw data from all three experiments are provided in Supplementary Figures S1–S3.

*n*= 19) or wearing glasses inside the VR head-mounted display system (

*n*= 4). The experiment was carried out in accordance with the guidelines of the University of Wisconsin–Madison Institutional Review Board. Course credits were given in exchange for participation.

*f*noise pattern that was identical in both eyes to aid vergence. In addition, nonius lines were embedded within a small 1/

*f*noise patch near the center of the aperture. All stimulus elements were antialiased to achieve subpixel resolution. The background seen through the aperture was midgray (Figure 4B).

*n*= 15 participants) or 45 cm (

*n*= 32 participants). Participants were instructed to fixate the center of the aperture. However, they were free to make head movements, and when they did so, the display updated according to the viewpoint specified by the yaw, pitch, and roll of their head. Translations of the head did not affect the display, such that stimulus viewing distance remained constant.

*x*(lateral) and

*z*(motion-in-depth) directions, with no change in

*y*(vertical direction), before disappearing. The motion trajectory always lasted for 1 s. Velocities in

*x*and

*z*were independently chosen from a 2-D Gaussian distribution (

*M*= 0 cm/s,

*SD*= 2 cm/s) with imposed cutoffs at 6.1 and −6.1 cm/s. This method resulted in motion trajectories whose directions spanned the full 360° space (Figure 4C, left side). Thus, the target came toward the participant (approaching) and moved back behind fixation away from the participant (receding) on approximately 50% of trials each. It is important to note that since

*x*and

*z*motion were chosen randomly and independently, the amount of perceived lateral movement on each trial did not carry information about the amount of motion in depth, and vice versa. The target was rendered under perspective projection, so that both monocular (looming) and binocular cues to motion in depth were present.

*f*noise pattern, appeared at the edge of the aperture. The paddle dimensions were 0.25 cm × 0.5 cm × 0.25 cm. Participants were asked to extrapolate the target's trajectory and adjust the paddle's position such that the paddle would have intercepted the target if the target had continued along its trajectory. The paddle's position could be adjusted along a circular path that orbited the fixation point in the

*xz*-plane using the left and right arrow keys of the keyboard (Figure 4C, right side). As the participant moved the paddle through the visual scene, the paddle was rendered according to the rules of perspective projection. Thus, the stimuli were presented and the responses were made in the same 3-D space. By asking participants to extrapolate the trajectory, we prevented them from setting the paddle to a screen location that simply covered the last seen target location. We did not ask participants to retain fixation during the paddle-adjustment phase of the trial. When participants were satisfied with the paddle setting, they resumed fixation and pressed the space bar to initiate a new trial. Supplementary Movie S1 demonstrates the general procedure.

*xz*-plane. We analyzed the data to determine whether this angular error tended to be toward the fronto-parallel plane (lateral bias) or the midsagittal plane (medial bias; see Figure 4C). We assigned positive values to medial errors and negative values to lateral errors such that the average would indicate the overall directional bias.

*t*tests for multiple comparisons.

^{2}.

*f*noise texture at the depth of the fixation plane (90 cm) to help maintain vergence. No virtual room was present. Additionally, a small square fixation point was placed at the center of the display. The fixation point was surrounded by horizontal and vertical nonius lines, and was placed on a circular 0.1° radius 1/

*f*noise pattern.

*xz*-plane was presented on each trial. The positions of the dots were constrained to a single plane fronto-parallel to the display (i.e., perpendicular to the observer's viewing direction). The initial disparity of the plane was consistent with a distance of 93 cm (3 cm behind the fixation plane). The plane then moved for 500 ms with

*x*and

*z*velocities independently and uniformly sampled from an interval of −4 to 4 cm/s, corresponding to a maximum possible binocular disparity of 0.21° (uncrossed) relative to the fixation plane.

^{2}. Both dot size and dot density changed with distance to the observer according to the laws of perspective projection. Dots had variable contrast, presented at one of three Weber values (7.5%, 15%, or 60%). Weber contrast was computed as the luminance difference between the dots and the background expressed as a percentage of the background luminance level. Half of the dots were darker, and the other half brighter, than the midgray background.

^{2}) on a black (0.14 cd/m

^{2}) background. All other aspects of the target's motion were identical to those in Experiment 1.

*i*:

*σ*

_{p}, in cm/s and assumed to be isotropic). Using this method, we fit the model to the motion-direction responses of each individual participant in each experiment. Note that the parameter for the prior was fitted for each participant based on the assumption that individuals do not have identical priors. For Experiments 1 and 2, the measurement noise for each stimulus contrast was fit independently.

*g*of this model was assessed by comparing the log likelihood of the best-fit parameters to the log likelihood achieved if the prior was assumed to be uniform (not Gaussian). This comparison was calculated in terms of the bits per trial gained by fitting with a zero-mean Gaussian prior:

*I*denotes the total number of trials for a given participant.

*g*is also provided for each experiment in Table 1. Across all three experiments, between 0.29 and 0.87 bits/trial was gained on average through the inclusion of a Gaussian slow-motion prior. For Experiment 3, six (out of 21) participants were best described as having a flat prior (that is, essentially 0 bits/trial were gained with the Gaussian prior, and the best-fit

*σ*

_{p}> 1,000 cm/s). The noise-parameter mean and standard deviations in the table exclude the fits to these participants (but all participants are included in the goodness of fit and all subsequent analyses). Even excluding these six participants, the estimated sensory noise for the participants in Experiment 3 was greater than for the high-contrast condition in the other experiments. Recall that Experiment 3 had three potential stimulus eccentricities: Although only the central position was used to fit the model, it seems reasonable that the demand to attend to all three locations may have increased the sensory uncertainty in this experiment.

*F*(2, 10494) = 5.8

*, p*< 0.01. Multiple comparisons revealed a significant increase in perceptual bias at the greater viewing distance compared to the smaller viewing distance for the mid and high target contrast levels (

*p*< 0.01). The difference in perceptual bias at the two viewing distances for the low target contrast was not significant (

*p*> 0.05).

*x*′ plotted on the horizontal axis and

*z*′ plotted on the vertical axis. These plots demonstrate that a large percentage of the sampling distribution for a stimulus moving toward an observer can occur for trajectories that recede in depth. In other words, the variance of the MAP sampling distribution in the

*z*direction can be large enough that it extends into the opposite direction of motion. For rightward motion, however, very little of the distribution occurs for leftward trajectories. To further examine the percentage of trials in which observers are predicted to misreport motion direction, we converted the trajectories in the sampling distribution of the MAP to direction angles and replotted the normalized frequency in polar coordinates as a function of motion direction (Figure 7A and 7B, right panels). Nonzero values in the opposite direction of motion (away or leftward) indicate that the model predicts that a certain percentage of trials will include direction confusions.

*F*(1, 135) = 7.8,

*p*< 0.01, as well as a main effect of target contrast,

*F*(1, 135) = 26.79,

*p*< 0.01, with a reduction in direction confusions for object motion nearer to the head and for higher target contrasts. There was also a significant interaction between viewing distance and target contrast,

*F*(2, 135) = 4.4,

*p*= 0.014. Multiple comparisons revealed that direction confusions significantly increased for all target contrast levels (

*p*< 0.01 low and high;

*p*= 0.013 mid) as the viewing distance doubled from 45 cm to 90 cm.

*F*(2, 4) = 160.99,

*p <*0.01.

*F*(1, 135)

*=*17.35,

*p*< 0.01. The interaction between viewing distance and contrast was also statistically significant,

*F*(2, 135) = 9.52,

*p*< 0.01. Follow-up comparisons revealed that direction confusions significantly increased with viewing distance for the lowest contrast stimulus (

*p*< 0.01).

*F*(2, 6) = 3.4,

*p*= 0.05.

*z*′ is typically much larger than uncertainty in

*x*′ for the same location in the midsagittal plane, the relative uncertainty decreases away from that plane (Figure 3). In fact, at an angle of 45° away from that plane the relative uncertainty becomes the same, predicting unbiased estimates of motion trajectory. Beyond 45° the relationship reverses, such that the model will predict a medial rather than a lateral bias. Another way to think about this is that the axis of maximal uncertainty shifts from being aligned with the

*z*-axis in the midsagittal plane to become aligned with the

*x*-axis for motion originating directly to the left or right of the observer (see Figure 1). Because of this, estimated motion trajectories predicted by the model will differ between midsagittal and peripheral motion.

*x*) but stay largely the same for motion in depth (

*z*; Figure 8C). These model predictions are qualitatively similar to the experimental data (Figure 8B and 8D). Note that for this experiment, the model parameters were fitted to the central data for each participant, and then peripheral predictions were generated based on these parameters.

*t*test on the experimental data revealed significantly less lateral bias in response to peripheral compared to central targets,

*t*(20) = −2.5,

*p*= 0.02, with a difference of 7.9°. There was a small decrease in motion-in-depth direction confusion at the peripheral locations of ∼1.38% on average, but this difference was not significant,

*t*(20) = 0.78,

*p*> 0.05. By contrast, there was a substantial and significant increase in lateral motion-direction confusion (20.9% on average) at the peripheral locations,

*t*(20) = −10.82,

*p*< 0.01, as predicted by the model.

*x*′ and

*z*′. That is, although the sampling distribution of the MAP extends into reversed directions (Figure 7A), the average of this distribution is always in the same direction as the stimulus. Prior studies directly comparing a Bayesian ideal observer to 3-D motion perception either did not present both approaching and receding motion or disregarded direction confusions (Lages, 2006; Welchman et al., 2008), and thus this additional bias was not observed. However, there are several ways in which existing Bayesian models, including the one presented here, may be elaborated to account for these effects. For example, extensions to our model might incorporate a prior that is not centered on zero motion for some stimuli, a cost function that reflects the different behavioral consequences of misperceiving approaching and receding motion, or the impact of attentional effects. Of particular interest would be the exploration of a statistical relationship between stimulus contrast and motion direction in natural scenes.

*x*′ and

*z*′, where motions toward/away and left/right are continuous with each other (i.e., positive and negative arms of the same axis; see heat maps in Figure 7A and 7B). This type of coordinate system is necessary in order for the model to predict the prevalence of direction confusions in depth, because the resulting posterior distribution often straddles

*z*′ = 0 but not

*x*′ = 0. From a purely computational perspective, it would be reasonable to consider that the probabilities of motion trajectories might be represented in terms of polar direction and speed. But in such a coordinate system, it is unclear if the same pattern of direction confusions would result. The clear match between the direction-confusion predictions of our model and the experimental data provide strong support that the current model captures essential features that describe the inferences that underlie motion perception.

*, 34 (38), 12701–12715.*

*The Journal of Neuroscience**, 36 (24), 6563–6582.*

*The Journal of Neuroscience**, 10 (4), 433–436.*

*Spatial Vision**, 16 (10): 5, 1–11, https://doi.org/10.1167/16.10.5. [PubMed] [Article]*

*Journal of Vision**, 34 (47), 15522–15533.*

*The Journal of Neuroscience**, 26, 973–990.*

*Vision Research**, 12 (3): 9, 1–13, https://doi.org/10.1167/12.3.9. [PubMed] [Article]*

*Journal of Vision**, 6, 287–293.*

*Perception**, 20, 757–762.*

*Current Biology**, 77 (5), 1685–1696.*

*Attention, Perception & Psychophysics**, 14 (7), 926–932.*

*Nature Neuroscience**, 46 (10), 1676–1694, https://doi.org/10.1016/j.visres.2005.07.036.*

*Vision Research**, 29 (5), 869–881.*

*Journal of Experimental Psychology: Human Perception and Performance**, 8 (2), 229–233.*

*Nature Neuroscience**, 42 (19), 2253–2257.*

*Vision Research**. Singapore: World Scientific.*

*Topics in circular statistics**, 26 (7), 1099–1109.*

*Vision Research**, 13 (2), 150–158.*

*Current Opinion in Neurobiology**, 36 (14), 1–16.*

*Perception**, 7 (7): 5, 1–24, https://doi.org/10.1167/7.7.5. [PubMed] [Article]*

*Journal of Vision**, 27 (12), 712–719.*

*Trends in Neurosciences**. Cambridge, UK: Cambridge University Press.*

*Perception as Bayesian inference**, 33 (41), 16275–16284.*

*The Journal of Neuroscience**, 95 (1), 255–270.*

*Journal of Neurophysiology**, 6 (4): 14, 508–522, https://doi.org/10.1167/6.4.14. [PubMed] [Article]*

*Journal of Vision**, 7 (79), 1–3.*

*Frontiers in Behavioral Neuroscience**, 105 (51), E117.*

*Proceedings of the National Academy of Sciences, USA**(pp. 90–120). New York: IGI Global.*

*Developing and applying biologically-inspired vision systems: Interdisciplinary concepts**, 24 (8), 789–800.*

*Vision Research**, 19 (7), 686–692.*

*Psychological Science**, 19 (13), 1118–1122.*

*Current Biology**, 97 (1), 849–857.*

*Journal of Neurophysiology**, 7 (13): 11, 1–8, https://doi.org/10.1167/7.13.11. [PubMed] [Article]*

*Journal of Vision**, 7 (4), e35705.*

*PLoS One**, 26 (4), 609–619.*

*Vision Research**, 34, 1595–1604.*

*Vision Research**, 93 (3), 1809–1815.*

*Journal of Neurophysiology**, 10 (4), 437–442.*

*Spatial Vision**, 20 (3), 591–612.*

*Journal of Experimental Psychology: Human Perception and Performance**, 171 (1), 35–46.*

*Experimental Brain Research**, 1, e00031.*

*eLife**, 12 (8), 1050–1055.*

*Nature Neuroscience**, 47 (7), 899–912.*

*Vision Research**, 34 (47), 15508–15521.*

*The Journal of Neuroscience**, 136 (3520), 982–983.*

*Science**, 21, 456–458.*

*Bulletin of the Psychonomic Society**, 5 (8): 139, https://doi.org/10.1167/5.8.139. [Abstract]*

*Journal of Vision**, 392 (6675), 450.*

*Nature**, 9 (4), 578–585.*

*Nature Neuroscience**, 32 (8), 1535–1549.*

*Vision Research**, 22 (3), 377–380.*

*Vision Research**, 34 (7), 2592–2604.*

*The Journal of Neuroscience**, 1–8.*

*2012 International Conference on 3D Imaging**, 18 (10), 1509–1517.*

*Nature Neuroscience**, 5 (6), 598–604.*

*Nature Neuroscience**, 105 (33), 12087–12092.*

*Proceedings of the National Academy of Sciences, USA**, 44 (17), 2027–2042.*

*Vision Research**, 333 (6168), 71–74.*

*Nature*