The response of motion-selective neurons in primary visual cortex is ambiguous with respect to the two-dimensional (2D) velocity of spatially extensive objects. To investigate how local neural activity is integrated in the computation of global motion, we asked observers to judge the direction of a rigidly translating natural scene viewed through 16 apertures. We report a novel relative oblique effect: local contour orientations parallel or orthogonal to the direction of motion yield more precise and less biased estimates of direction than other orientations. This effect varies inversely with the local orientation variance of the natural scenes. Analysis of contour orientations across aperture pairings extends previous research on plaids and indicates that observers are biased toward the faster moving contour for Type I pairings. Finally, we show that observers' bias and precision as a function of the orientation statistics of natural scenes can be accounted for by an interaction between naturally arising anisotropies in natural scenes and a template model of MT that is optimally tuned for isotropic stimuli.

*speed and direction*of an object, but also (and less intuitively) with the

*orientation distribution of the moving object*(Movshon, Adelson, Gizzi, & Newsome, 1985).

*ϕ*

_{1D}is the speed of 1D motion, and

*θ*

_{2D}and

*ϕ*

_{2D}are the direction and speed of 2D motion, respectively. An example is illustrated in Figure 1. The phase and amplitude of the waveform reflect the speed and direction of 2D motion. The 1D velocities stemming from three edge orientations have been highlighted. Each velocity is represented twice on the waveform. As the two points denote identical 1D velocities (180° apart with speeds of identical magnitude but of opposite sign), it is convenient to ignore the negative side of the waveform. To do so, we calculate the angular separation between each orientation and the 2D direction, across the half-circle (Equation 2). Then, by replacing the absolute orientation term in Equation 1 with the relative orientation term (Equation 3), we can constrain our description of 1D velocities to have positive speeds:

^{2}and the root-mean-square contrast of the image prior to occlusion was fixed at 0.20. The native resolution of the van Hateren images is 1536 × 1024 pixels; images were presented at this resolution. Due to the use of apertures, only a subset of the full image was ever presented—a region contained within a radius of 256 pixels (4°) from the center of the original image.

^{2}, which matched the mean luminance of the stimulus.

*θ*

_{2D}and the perceived direction

*θ*

_{per}was calculated using Equation 4. Negative and positive angular separations denote errors in the perceived direction that are, respectively, clockwise and anticlockwise of the true direction of motion:

_{err}was calculated using the four-quadrant arctangent of the sum of the weighted sines and cosines (Equation 5), where

*θ*represents the error of each bin and

*W*

_{ θ }represents the weighting given to each error bin:

*V*

_{err}in each error histogram was then calculated using the following equations:

*V*

_{err}(between 0 and 1) was then converted into a more conventional circular standard deviation

*σ*term (Mardia & Jupp, 1972):

*σ*= 6°) was applied across the dimension of direction before the mean and standard deviation of the histograms were computed.

*R*= −0.952,

*p*< 0.0001). The near one-to-one relationship between the pairings demonstrates that it is the

*reported*direction, not the

*physical*direction, that determines where observers' responses are most variable, mirroring earlier findings with plaids and center–surround gratings (Heeley & Buchanan-Smith, 1992; Meng & Qian, 2005).

*relative to the 2D direction of motion,*on observers' performance. In a theoretical sense, only two differently oriented surfaces are required to compute 2D motion and it should not matter what the orientations of the surfaces are. However, psychophysical data clearly demonstrate that observers are unable to correctly compute 2D motion under a variety of conditions and that this inability is linked to the orientation content of the stimuli (Amano et al., 2009; Bowns, 1996; Burke & Wenderoth, 1993; Loffler & Orbach, 2001; Mingolla et al., 1992; Yo & Wilson, 1992). Accordingly, we examine the impact of the orientation content of naturally occurring contours on observers' ability to compute 2D motion to establish the capacity of the motion stream to overcome the “aperture problem” given the heterogeneous orientation structure of natural scenes.

*sum, mean,*and

*variance*of the filter responses (1; Equations A5, A6, and A7). The filter response statistics for image 44 of the van Hateren image set are shown in Figures 6d–6f.

*σ*= 6°) was applied across the relative orientation dimension. Finally, by computing the mean and standard deviation along the error dimension, we estimated observers' bias and precision as a function of the two stimulus dimensions.

*relative oblique*effect, is modulated by orientation variance and is absent for near-isotropic regions (blue lines). These results effectively quantify the extent to which observers suffer from the “aperture problem” when judging the direction of natural scenes. Specifically, local orientations oblique to the global (2D) direction of motion induce biases of between 2° and 5° and increase variability by 20%–25% relative to observers' performance when local orientations are orthogonal or parallel to the 2D motion vector.

^{2}histograms were complied (180 across each relative orientation dimension, again corresponding to relative orientations falling between −90° and +89° at 1-degree intervals). Finally, a two-dimensional Gaussian function (

*σ*

_{ x,y }= 6°) was used to smooth across the two relative orientation dimensions, before the mean and standard deviation of each error population were calculated.

*f*amplitude spectrum, so many spatiotemporal frequencies are typically represented at a given point in space and time. Second, natural images contain structures (edges) that contain information that is phase aligned across spatial frequency bands (Attneave, 1954; Barlow, 1961). Consequently, gross mismatches between the spatial frequency of the sensor and the stimulus are unlikely and the pseudo-speed tuning (as defined by Equation 9) is fairly robust.

*x, y,*and

*t*(32 by 32 pixels, by 32 frames). Motion energy sensors were constructed from Equation A11 (1) and had a peak sensitivity to spatial structure at 4 c/deg. The spatiotemporal envelope was kept constant across all DS sensors (

*x, y*= 0.2 arcmin,

*t*= 0.1 s; 7 by 7 pixels by 7 frames). This had the advantage of keeping the directional bandwidth constant at ≈45° (half-wave at half-height, as measured from the response to spatial frequency matched sine-wave gratings) so that the maximal sensor response was identical across all speeds and directions.

*R*= 0.72,

*p*< 0.00001 and

*R*= 0.72,

*p*< 0.00001, respectively).

*R*= 0.025) but significant at

*p*< 0.005 (

*N*= 34,000). Thus, while we are able to model the statistical properties of observers'

*bias*and

*precision,*we are unable to capture observers' trial-by-trial variability.

*σ*) was around 10–13° and given that observed bias as a function of relative orientation (Figure 7, row two) are generally smaller (around 3–6°), it may be that observers' stochastic response variability simply swamps the predictable variability caused by motion energy imbalances, producing only a weak correlation. A second contributory factor could be that the model operates in a homogeneous manner as a function of direction, whereas the psychophysical data exhibit a number of anisotropies such as the oblique effect (Dakin, Mareschal, & Bex, 2005a; Gros et al., 1998), cardinal attraction (Loffler & Orbach, 2001), or reference repulsion (Rauber & Treue, 1998), which are not implemented in the model. A third reason why the model is only weakly correlated with observer trial-by-trial variation may be the lack of any gain control in the model, which may serve to normalize relative energy across the natural scenes (Mante et al., 2005). This is particularly pertinent because both the psychophysical data and the model reported in this paper demonstrate how imbalances in the energy across the orientation structure of the scene can lead to systematic errors in the estimation of 2D motion.

**fft2**function. The product of the log Gabor and the natural scene was calculated in the frequency domain and the results transformed back to the spatial domain using Matlab's

**ifft2**function. This procedure is equivalent to performing convolution of the filter and the natural scene in the spatial domain.

*G*was constructed in the Fourier domain and was defined by

*R*(

*f*

_{ xy }) specifies the spatial frequency profile of the sensor and

*O*(

*θ*

_{ xy }) specifies the orientation tuning of the sensor, with

*f*

_{ xy }denoting the spatial frequency of each point in the Fourier domain and

*θ*

_{ xy }denoting the orientation of each point in the Fourier domain.

*p*(

*f*

_{ xy }) is defined as

*f*

_{peak}is the filter's central frequency and

*σ*is the ratio between the filter's central frequency and the standard deviation of the log Gaussian that is set to 0.65.

*O*(

*θ*

_{ xy }) is defined in Equation A3 and is an angular Gaussian function, where

*ϕ*(defined in Equation A4) is the angular separation between the orientation tuning of the sensor

*θ*

_{peak}and the orientation of each pixel in the Fourier domain:

*sum*of the orientation energy, the

*mean absolute orientation*

*orientation variance*was calculated. This was done on a pixel-by-pixel basis for the calculations reported in Results: Relative orientation and orientation variance, and on an aperture-by-aperture basis for the calculations reported in Results: Second-order orientation statistics. The mean orientation

*θ*is the orientation of each filter and

*E*

_{ θ }is the filter output:

*x*

_{a},

*y*

_{a}) is the center of each aperture and

*t*

_{m}is the middle frame:

*x*and

*y*with a wavelength

*λ*

_{spatial}and an orientation

*θ*. The phase was shifted on each frame by Δ

*λ*

_{temporal}:

*λ*

_{phase}= 0 for even phase and

*λ*

_{phase}=

*λ*was calculated from the desired pseudo-speed tuning

*ϕ*

_{1D}of each local motion sensor, given the spatial frequency of the sensor using

*square root*of the sum of the square of the

*odd*and

*even*phased neurons to generate a phase-invariant output (Adelson & Bergen, 1985):

*Experimental Brain Research, Supplementum*, vol. 11, pp. 117–151, 1986).