Free
Article  |   March 2011
Quantifying “the aperture problem” for judgments of motion direction in natural scenes
Author Affiliations
Journal of Vision March 2011, Vol.11, 25. doi:https://doi.org/10.1167/11.3.25
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      David Kane, Peter Bex, Steven Dakin; Quantifying “the aperture problem” for judgments of motion direction in natural scenes. Journal of Vision 2011;11(3):25. https://doi.org/10.1167/11.3.25.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

The response of motion-selective neurons in primary visual cortex is ambiguous with respect to the two-dimensional (2D) velocity of spatially extensive objects. To investigate how local neural activity is integrated in the computation of global motion, we asked observers to judge the direction of a rigidly translating natural scene viewed through 16 apertures. We report a novel relative oblique effect: local contour orientations parallel or orthogonal to the direction of motion yield more precise and less biased estimates of direction than other orientations. This effect varies inversely with the local orientation variance of the natural scenes. Analysis of contour orientations across aperture pairings extends previous research on plaids and indicates that observers are biased toward the faster moving contour for Type I pairings. Finally, we show that observers' bias and precision as a function of the orientation statistics of natural scenes can be accounted for by an interaction between naturally arising anisotropies in natural scenes and a template model of MT that is optimally tuned for isotropic stimuli.

Introduction
The firing rate of direction-selective (DS) neurons in primary visual cortex is primarily determined by the image structure falling within the neurons' receptive field; such cells exhibit peak sensitivity for specific orientations and spatiotemporal frequencies (De Valois, De Valois, & Yund, 1979; Hubel & Wiesel, 1968). Recovering the two-dimensional motion of an object from the output of a population of DS cells is non-trivial because DS cells are selective for a number of stimulus dimensions (e.g., orientation and spatial frequency) not directly related to the two-dimensional motion of an object (Basole, White, & Fitzpatrick, 2003; Mante & Carandini, 2005). In particular, DS cells are selective for the one-dimensional component of motion normal to the local contour orientation. Although, for isotropic stimuli such as dot patterns, the activation of DS/V1 neurons will be strongest in the direction of object motion (Snowden, Treue, & Andersen, 1992), for stimuli containing a more limited range of orientations (e.g., lines or plaids) neural activity in area V1 will vary with both the 2D speed and direction of an object, but also (and less intuitively) with the orientation distribution of the moving object (Movshon, Adelson, Gizzi, & Newsome, 1985). 
The problem of estimating 2D motion from anisotropic stimuli is commonly referred to as the “aperture problem”; if a rigidly moving object is viewed through an aperture (such as the small receptive field of a V1 neuron), and only one edge orientation is visible, then the 2D motion of the object is ambiguous. Under such conditions, motion is typically perceived to be in the direction orthogonal to the edge's orientation (Wallach, 1935). Despite the ambiguity associated with the motion of a single edge, the 2D velocity of an object can be recovered by computing the 1D velocity (normal to an edge orientation) from two or more differently oriented elements. The following paragraphs explain how this can be achieved. 
The speed of 1D motion varies sinusoidally with the angular separation between the edge orientation and the underlying 2D direction. As a result, the range of 1D velocities that a 2D velocity can elicit is constrained to lie upon a sine wave defined by 
ϕ 1 D = sin ( θ 2 D ϑ ) ϕ 2 D ,
(1)
where ϑ denotes the absolute orientation of an edge, ϕ 1D is the speed of 1D motion, and θ 2D and ϕ 2D are the direction and speed of 2D motion, respectively. An example is illustrated in Figure 1. The phase and amplitude of the waveform reflect the speed and direction of 2D motion. The 1D velocities stemming from three edge orientations have been highlighted. Each velocity is represented twice on the waveform. As the two points denote identical 1D velocities (180° apart with speeds of identical magnitude but of opposite sign), it is convenient to ignore the negative side of the waveform. To do so, we calculate the angular separation between each orientation and the 2D direction, across the half-circle (Equation 2). Then, by replacing the absolute orientation term in Equation 1 with the relative orientation term (Equation 3), we can constrain our description of 1D velocities to have positive speeds: 
θ r e l a t i v e = tan 1 ( sin ( θ 2 D ϑ ) cos ( θ 2 D ϑ ) ) ,
(2)
 
θ 1 D = θ 2 D + θ r e l a t i v e ϕ 1 D = sin ( θ r e l a t i v e ) ϕ 2 D .
(3)
Only two points are required to uniquely specify the phase and amplitude of the half-cosine. Consequently, a cosine-fitting procedure is able to correctly estimate the 2D velocity by sampling the 1D velocity stemming from two or more differently oriented elements (assuming no noise). 
Figure 1
 
(Top right) A circle rigidly translating leftward generates a distribution of 1D velocities (measured normal to each contour fragment's orientation) that lies upon a cosine. The 1D velocities stemming from three differently oriented fragments are highlighted; green indicates an orientation orthogonal to the 2D direction, blue indicates an orientation oblique to the 2D direction, and pink indicates an orientation parallel to the 2D direction. Note how the speed of 1D motion varies sinusoidally with the angular separation between the orientation of an edge and the 2D direction.
Figure 1
 
(Top right) A circle rigidly translating leftward generates a distribution of 1D velocities (measured normal to each contour fragment's orientation) that lies upon a cosine. The 1D velocities stemming from three differently oriented fragments are highlighted; green indicates an orientation orthogonal to the 2D direction, blue indicates an orientation oblique to the 2D direction, and pink indicates an orientation parallel to the 2D direction. Note how the speed of 1D motion varies sinusoidally with the angular separation between the orientation of an edge and the 2D direction.
An alternative approach to the “aperture problem” is the use of an “Intersection of Constraints” (IOC) rule (Adelson & Movshon, 1982). This is an algebraic solution that takes advantage of the fact that the range of possible 2D velocities that are consistent with a given 1D velocity lies upon a line in velocity space. By calculating the point of intersection between two or more lines, the correct 2D velocity can be obtained. 
Psychophysical studies probing the “aperture problem” have typically used plaid stimuli composed of two drifting gratings (the minimum number of oriented components needed to uniquely specify a 2D velocity). The results show that the motion stream is able to correctly estimate 2D velocity under some conditions (Adelson & Movshon, 1982; Amano, Edwards, Badcock, & Nishida, 2009; Lorenceau, 1998) but not others (Amano et al., 2009; Bowns, 1996; Burke & Wenderoth, 1993; Mingolla, Todd, & Norman, 1992; Rubin & Hochstein, 1993; Yo & Wilson, 1992). Specifically, when a distribution of 1D directions is skewed to one side of the 2D direction (known as a Type II configuration; Ferrera & Wilson, 1990), the perception of motion is often biased toward the mean direction of the 1D motion signals (Bowns, 1996; Burke & Wenderoth, 1993; Yo & Wilson, 1992). This pattern of results has been reported when the 1D velocities must be integrated locally (Bowns, 1996; Burke & Wenderoth, 1993; Ferrera & Wilson, 1990; Wilson & Kim, 1994; Yo & Wilson, 1992) and when the 1D velocities must be integrated across space (Amano et al., 2009; Mingolla et al., 1992; Rubin & Hochstein, 1993). These findings are not consistent with either the cosine-fitting or IOC model, which produces veridical estimates of 2D direction. This has led some authors to propose that perceived 2D direction is simply the average of the 1D vectors—a solution known as the Vector Average (VA; Mingolla et al., 1992; Rubin & Hochstein, 1993; Wilson, Ferrera, & Yo, 1992). However, the Vector Average solution incorrectly predicts misperceptions of perceived speed in Type I stimuli when 1D motions fall on either side of the 2D direction (Amano et al., 2009; Lorenceau, 1998). In summary, neither the cosine-fitting nor IOC model is able to explain observers' systematic misperceptions of direction for Type II stimuli, and the VA model cannot predict observers' unbiased estimates of speed for Type I stimuli. 
Type II stimuli can also be used to reveal temporal aspects of the computation of global motion. Studies employing perceptual (Lorenceau, 1998; Yo & Wilson, 1992), oculomotor (Masson, Rybarczyk, Castet, & Mestre, 2000), and neurophysiological (Pack & Born, 2001) paradigms indicate that the response of the motion stream is initially biased toward the 1D components of motion but later switches (partially or fully) to the 2D direction. The delay preceding such a switch depends on the nature of the stimulus; for instance, supra-threshold stimuli with distinct but locally overlapping orientations appear to refine relatively quickly (∼160 ms; Yo & Wilson, 1992), while stimuli composed of translating lines that are oriented obliquely to the direction of motion (Lorenceau, 1998; Lorenceau, Shiffrar, Wells, & Castet, 1993; Masson et al., 2000) are resolved more slowly (∼400 ms). These results indicate that the unambiguous 2D motion signals from the line endings take time to propagate to the ambiguous 1D regions of the stimulus. Furthermore, the influence of line endings appears to require higher contrast than line elements in order to exert their influence on the motion stream (Lorenceau & Shiffrar, 1992; Lorenceau et al., 1993). Thus, although the visual system is capable of responding to the unambiguous direction signals arising from local elements with broad orientation structure, the detection of motion signals from ambiguous line elements appears to be more immediate. 
Rationale
The majority of studies probing the “aperture problem” have used very simple stimuli, often containing just two oriented sine gratings. In contrast, natural scenes contain a variety of different textures, end points, and contours. The purpose of the present study is to reveal which components of natural images drive observers' judgments of motion direction. To this end, we introduce a novel variant of the image classification paradigm (Eckstein & Ahumada, 2002; Gosselin & Schyns, 2001), the aim of which is to identify which aspects of a stimulus drive performance on a particular task. Such techniques work by degrading the information that is important for the task at hand through the application of additive (reverse correlation; Eckstein & Ahumada, 2002) or multiplicative (Bubbles; Gosselin & Schyns, 2001) noise. By summing the noise fields weighted by the observer's responses—an operation that is formally equivalent to performing a reverse correlation procedure (Chauvin, Worsley, Schyns, Arguin, & Gosselin, 2005)—one can generate “perceptive fields” that map the relationship between each part of the stimulus and the observer's response. 
The image classification paradigm used here required observers to view a natural image that rigidly translated in a random direction on each trial (Figure 2a). The image was viewed through an opaque mask, punctured by 16 randomly positioned apertures (Figure 2b). The observer's task was then to indicate the direction of perceived motion using a method of adjustment (Figure 2c). On each trial, a continuous error signal was generated, corresponding to the angular separation between the reported direction of motion and the real direction of motion. Over many trials, histograms of errors could be compiled and the mean and standard deviation of the distribution used as estimates of observers' bias and precision, respectively. To relate observers' performance to the stimulus, separate histograms were compiled, as a function of particular stimulus attributes (e.g., the orientation variance of the natural scene viewed through each aperture). By weighting the input to each histogram by the presence or absence of a particular stimulus attribute, the histograms can be compiled in a heterogeneous manner that reflects the properties of the stimulus on each trial. For instance, if the histograms were tuned along the dimension of orientation variance, then on trials that predominately expose orientation-rich textures, the error signal on that trial should contribute more strongly to those histograms tuned to high orientation variance than those tuned to low. The calculations used to generate error histograms as a function of the stimulus on each trial are described at the beginning of each Results section and the model used to estimate orientation statistics of the natural scenes can be found in the Scene statistics section. 
Figure 2
 
We measured observers' ability to estimate the direction of motion of rigidly translating natural stimuli viewed through 16 apertures. (a) A linear grayscale natural image from the van Hateren (van Hateren & van der Schaaf, 1998) image set. (b) A sample frame from the movie stimulus presented to observers. (c) The test phase. Observers manipulated the orientation of a line composed of 4 Gaussian patches that radiated from the center of the display to the edge of the potential viewing area until it matched the perceived direction of the translating natural scene. A phase-randomized version of the stimulus was presented during the test phase (and between trials) to mask the transient structure at the onset/offset of the stimulus.
Figure 2
 
We measured observers' ability to estimate the direction of motion of rigidly translating natural stimuli viewed through 16 apertures. (a) A linear grayscale natural image from the van Hateren (van Hateren & van der Schaaf, 1998) image set. (b) A sample frame from the movie stimulus presented to observers. (c) The test phase. Observers manipulated the orientation of a line composed of 4 Gaussian patches that radiated from the center of the display to the edge of the potential viewing area until it matched the perceived direction of the translating natural scene. A phase-randomized version of the stimulus was presented during the test phase (and between trials) to mask the transient structure at the onset/offset of the stimulus.
Methods
Psychophysics
Observers
Three psychophysically experienced observers (DK, SD, JG) each with normal or corrected-to-normal vision took part in all experiments. All procedures complied with the tenets of the Declaration of Helsinki and were approved by the Institutional Ethics Review Board. 
Apparatus
Stimuli were generated on an Apple iMac computer running MATLAB (MathWorks) with functions from the Psychtoolbox (Brainard, 1997; Pelli, 1997). Stimuli were displayed on a Dell, Trinitron CRT with spatial and temporal resolutions of 1024 × 768 pixels and 85 Hz, respectively. The display was viewed at a distance of 97 cm such that 64 pixels subtended 1 degree of visual angle. The video signal from the computer's graphics card was first passed through an attenuator (Pelli & Zhang, 1991) and was amplified and copied (using a line splitter) to the three guns of the monitor to give a pseudo 12-bit monochrome image. Monitor linearization was achieved by recording the relationship between the signal from the graphics card and the monitor luminance (measured using a Minolta LS 110 photometer), to create a linearization lookup table. 
Stimuli
Stimuli were natural images selected from the linear van Hateren “.iml” image set (van Hateren & van der Schaaf, 1998). The mean luminance of the stimuli was 40 cd/m2 and the root-mean-square contrast of the image prior to occlusion was fixed at 0.20. The native resolution of the van Hateren images is 1536 × 1024 pixels; images were presented at this resolution. Due to the use of apertures, only a subset of the full image was ever presented—a region contained within a radius of 256 pixels (4°) from the center of the original image. 
Motion was generated using operations built in to the computer's graphics card (NVIDIA GeForce accessed via OpenGL) that allowed for subpixel resolution via linear interpolation. On each trial, a full-size image was passed to the graphics card buffer. By shifting the source coordinates of the image on each frame of the movie, rigid image translation was generated. The speed of translation was 1 pixel/frame and lasted 32 frames, corresponding to a speed of 1.33°/s, a total distance of 0.5°, and a duration of 0.3765 s. During each trial/movie, the center of the image was constrained to pass through the point of fixation on the middle frame of each movie. Between trials, a static, phase-scrambled version of the natural scene was placed within the viewing area to mask the presence of afterimages and to maintain a fixed display contrast. The observer's response initiated the next trial. 
The translating natural scene was viewed through 16 apertures, each with a radius of 0.25°. The aperture edges were smoothed with a raised cosine over 0.05 arcmin. The apertures were presented at different random locations on each trial (avoiding overlaps). All apertures were placed within a 4° radius from the point of central fixation (Figure 2b). Thus, during each frame, 16% of the full area was visible to the subject. The mask/apertures had a mean luminance of 40 cd/m2, which matched the mean luminance of the stimulus. 
Procedure
On each trial, the underlying natural image was translated in a random direction (0°–360°). After presentation of the stimulus, a mask image appeared. The observers' task was to indicate the perceived 2D direction of the motion, by manipulating the orientation of a probe: four evenly spaced 2D Gaussian elements radiating from the fixation point to the circumference of the global aperture (Figure 2c). Observers took as long as required to manipulate the probe (using the computer's mouse) until it was aligned with the perceived direction of motion. Observers were asked to maintain fixation at all times upon a dot presented in the middle of the stimulus. 
Conditions
Two images were used in the study (Nos. 44 and 206 of the van Hateren set). The images are shown in Figures 3a and 3b
Figure 3
 
Distribution of static orientation structure in the test stimuli. (a) Image 44 and (b) Image 206 from the van Hateren image set. (c) Orientation energy as a function of the absolute orientation. (d) The percent of pixels with a specified circular variance. Ten circular variance bins were used between 0 and 1. If the distribution of orientation variance was white, then the expected percentage in each bin would be 10%.
Figure 3
 
Distribution of static orientation structure in the test stimuli. (a) Image 44 and (b) Image 206 from the van Hateren image set. (c) Orientation energy as a function of the absolute orientation. (d) The percent of pixels with a specified circular variance. Ten circular variance bins were used between 0 and 1. If the distribution of orientation variance was white, then the expected percentage in each bin would be 10%.
Figure 3c shows the distribution of energy across orientations in the test stimuli; like most natural scenes, there is greater energy on the cardinal (horizontal and vertical) than oblique axes (Switkes, Mayer, & Sloan, 1978). It is possible that these image-based anisotropies or sensitivity-based anisotropies (Campbell, Kulikowski, & Levinson, 1966) could affect performance as a function of the direction of motion. To examine this question, the images were either translated at their original canonical orientation or were randomly rotated between 0° and 360° prior to translation. 
Observers DK and JG completed at least 3000 trials on all conditions, while subject SCD completed at least 3000 trials for both images, but not the random rotation conditions. In total, we ran more than 34,000 trials. 
Observers' error
On each trial, the signed angular separation between the real direction of motion θ 2D and the perceived direction θ per was calculated using Equation 4. Negative and positive angular separations denote errors in the perceived direction that are, respectively, clockwise and anticlockwise of the true direction of motion: 
θ e r r = tan 1 ( sin ( θ 2 D θ p e r ) cos ( θ 2 D θ p e r ) ) .
(4)
In each section of the results, error histograms were compiled. To do so, errors between −90 and +90° were binned at 1-degree intervals (errors greater than ∣90°∣ were excluded, which was less than 0.1% of trials). To relate observers' errors to the stimulus, separate histograms were compiled and the input to each histogram was weighted according to the presence or absence of particular stimulus features (described at the beginning of each Results section). 
The mean and standard deviation of each histogram quantified observers' bias and precision. The mean error
θ
err was calculated using the four-quadrant arctangent of the sum of the weighted sines and cosines (Equation 5), where θ represents the error of each bin and W θ represents the weighting given to each error bin: 
θ e r r = atan 2 ( θ sin ( θ ) W θ , θ cos ( θ ) W θ ) .
(5)
The variance V err in each error histogram was then calculated using the following equations: 
R 2 = θ ( sin ( θ ) W θ ) 2 + θ ( cos ( θ ) W θ ) 2 θ W θ 2 ,
(6)
 
V e r r = 1 R .
(7)
The variance term V err (between 0 and 1) was then converted into a more conventional circular standard deviation σ term (Mardia & Jupp, 1972): 
σ e r r = 2 ln ( 1 V e r r ) .
(8)
 
Bootstrapping
Estimates of observers' bias and precision are plotted with 95% confidence intervals estimated using a bootstrapping operation. We assumed that each trial was independent, and 1024 bootstrapped data sets were compiled by resampling (with replacement) from the total number of trials. For each resampled data set, the error histogram was recompiled and bias and precision were recalculated. The estimates were sorted from low to high and the 26th and 998th estimates were used as the upper and lower 95% confidence intervals, respectively. 
Results: Absolute direction of motion
Data analysis
In this section, we relate observers' direction estimates to the true (2D) direction of motion. To do so, 360 histograms were generated, each recording the frequency of errors made for each direction tested (at 1° intervals). Thus, if observers were presented with 135° motion, and reported 90° motion, the histogram for 135° would record one instance of a −45° error. After compilation of the error histograms, the mean and standard deviation of each error histogram were taken as estimates of observers' bias and precision (Figure 4, columns two and three). 
Figure 4
 
Analysis of the reported direction as a function of the presented direction. Each row shows data from one observer. The first column shows the ratio of the frequency of reported directions to presented directions as a function of the presented direction (a ratio of 1 is shown in green). The second and third columns plot observers' bias and variability as a function of the presented direction. The green region in column two shows unbiased performance. Overall, observers' performance is highly dependent on the direction of motion that was presented. Insets compare performance for canonically oriented (blue) and randomly rotated natural scenes (black). The similarity of data across these conditions indicates that performance anisotropies are not due to anisotropies in the stimuli.
Figure 4
 
Analysis of the reported direction as a function of the presented direction. Each row shows data from one observer. The first column shows the ratio of the frequency of reported directions to presented directions as a function of the presented direction (a ratio of 1 is shown in green). The second and third columns plot observers' bias and variability as a function of the presented direction. The green region in column two shows unbiased performance. Overall, observers' performance is highly dependent on the direction of motion that was presented. Insets compare performance for canonically oriented (blue) and randomly rotated natural scenes (black). The similarity of data across these conditions indicates that performance anisotropies are not due to anisotropies in the stimuli.
Due to the fine sampling across absolute direction, only a few trials were included in each error histogram. To increase the effective number of trials in each histogram, a Gaussian smoothing operation (σ = 6°) was applied across the dimension of direction before the mean and standard deviation of the histograms were computed. 
We also compute the ratio of the reported direction to presented direction via an analogous methodology (Figure 4, column one). 
Results
The ratio of the frequency of reported directions to the frequency of presented directions is plotted in the first column of Figure 4 and demonstrate that all three observers infrequently report that they perceived motion in any of the oblique directions (45°, 135°, 225°, and 315°). There is also a smaller dip in the frequency that cardinal directions are reported. These data are consistent with two effects previously noted in the literature, one pushing responses away from the cardinal directions toward the oblique directions (Rauber & Treue, 1998) and a second larger effect that pushes responses away from the oblique directions toward the cardinals (Loffler & Orbach, 2001). 
The second column of Figure 4 plots observer bias as a function of the direction of motion. The pattern of bias is idiosyncratic, but the narrow error bars (gray areas) indicate that it is stable for each observer. Bias is nearly identical for the canonical and randomly oriented conditions (blue and black lines; inset); this demonstrates that the pattern of bias as a function of direction is not determined by the stimulus but by the observer. It is not clear what factors may cause the biases in perception, but it is worth noting that experimental procedures that seek to measure bias for specific directions may be confounded by such idiosyncratic behavior. It is for this reason that we use random directions in this image classification experiment and collapse across the dimension of absolute direction when computing observers' response statistics in the next two sections of this paper. The reader should be aware that a parallel study (Dakin, Apthorp, & Alais, 2010) is aimed at specifically examining perceptual anisotropies in a motion task. 
The third column of Figure 4 plots observer precision as a function of the presented direction of motion. The oblique effect, where precision is lowest around the oblique directions and greatest around the cardinals (Dakin, Mareschal, & Bex, 2005b; Gros, Blake, & Hiris, 1998), is clearly present in the data. A weaker cardinal effect is also present in the data of DK and SD who exhibit a small decrease in precision around the cardinals. This effect is consistent with observers being unwilling to report cardinal directions—an effect that would normally manifest itself as an increase in the precision of a discrimination task that utilized a cardinal direction as a decision boundary (Jazayeri & Movshon, 2007). 
In the present data, the “oblique effect” (the loss of precision away from cardinal directions) is not always centered upon the oblique directions. To examine whether observers' idiosyncratic biases influence the magnitude of the oblique effect, we first estimated the location of the oblique effect in each quadrant. This was achieved by taking the center of mass of each quadrant of the variability statistics. This estimate was then subtracted from the nearest oblique direction (i.e., 45°, 135°, 225°, or 315°) to estimate the extent that the “oblique effect” was offset from the true oblique directions. The oblique offset was then paired with the bias statistic (Figure 4, column 2) at the estimated location of the oblique effect. This process was repeated for each quadrant, for each condition, and for each subject to generate 40 offset bias pairings. Figure 5 shows a scatter plot of bias versus oblique offset and reveals a strong negative relationship (R = −0.952, p < 0.0001). The near one-to-one relationship between the pairings demonstrates that it is the reported direction, not the physical direction, that determines where observers' responses are most variable, mirroring earlier findings with plaids and center–surround gratings (Heeley & Buchanan-Smith, 1992; Meng & Qian, 2005). 
Figure 5
 
Scatter plot of the center of mass of each quadrant of observers' precision against the bias measured at this angle. Results show a negative correlation (R = −0.952) indicating that the oblique effect depends on the perceived, not physical, direction of motion.
Figure 5
 
Scatter plot of the center of mass of each quadrant of observers' precision against the bias measured at this angle. Results show a negative correlation (R = −0.952) indicating that the oblique effect depends on the perceived, not physical, direction of motion.
Scene statistics
In the next two Results sections, we examine observers' errors as a function of the orientation statistics of the natural scenes. On each trial, only a small region of the natural scene was exposed to the observer. We wanted to examine how the orientation variance and the relative orientation of the exposed natural scenes affected observers' ability to compute 2D motion. To elaborate, the majority of studies probing motion perception use either locally ambiguous stimulus (e.g., translating bars) or locally unambiguous motion stimuli (e.g., translating dots). By examining observers' errors as a function of the orientation variance, we can examine the relative impact of naturally occurring orientation variations in textures and edges upon observers' ability to compute 2D motion. Second, we wished to assess the impact of the orientation of each element, relative to the 2D direction of motion, on observers' performance. In a theoretical sense, only two differently oriented surfaces are required to compute 2D motion and it should not matter what the orientations of the surfaces are. However, psychophysical data clearly demonstrate that observers are unable to correctly compute 2D motion under a variety of conditions and that this inability is linked to the orientation content of the stimuli (Amano et al., 2009; Bowns, 1996; Burke & Wenderoth, 1993; Loffler & Orbach, 2001; Mingolla et al., 1992; Yo & Wilson, 1992). Accordingly, we examine the impact of the orientation content of naturally occurring contours on observers' ability to compute 2D motion to establish the capacity of the motion stream to overcome the “aperture problem” given the heterogeneous orientation structure of natural scenes. 
Unlike the majority of studies probing the “aperture problem,” the exact orientation content of our stimuli was not under direct experimental control and so had to be estimated using a biologically inspired model of orientation processing. To that end, the two van Hateren images used in the present study were convolved with a bank of polar separable, log-Gabor filters (1; Equation A1) tuned to 12 evenly spaced orientations (0°–165°; Figures 6a and 6b). The peak spatial frequency of the filters was 5.333 cycles/degree with a spatial frequency bandwidth of 1.5 octaves (half-width at half-height) and an orientation bandwidth of 22.6° (half-width at half-height). The scene statistics were then computed on a pixel-by-pixel basis by taking the sum, mean, and variance of the filter responses (1; Equations A5, A6, and A7). The filter response statistics for image 44 of the van Hateren image set are shown in Figures 6d6f
Figure 6
 
(a) A linear grayscale natural image was convolved with a set of log-Gabor filters tuned to each of 16 orientations evenly spaced over 180°. (b, c) Sample orientation energy distributions for the corresponding pink and blue regions highlighted in (a). The distribution of orientation energy at each pixel was classified in terms of (d) the sum of the energy across orientations ≈ contrast, (e) the mean orientation, and (f) the orientation variance.
Figure 6
 
(a) A linear grayscale natural image was convolved with a set of log-Gabor filters tuned to each of 16 orientations evenly spaced over 180°. (b, c) Sample orientation energy distributions for the corresponding pink and blue regions highlighted in (a). The distribution of orientation energy at each pixel was classified in terms of (d) the sum of the energy across orientations ≈ contrast, (e) the mean orientation, and (f) the orientation variance.
The aim of the filtering was to estimate the orientation statistics of the natural scene. Given that the orientation content of natural scenes is highly correlated across natural scenes (Barlow, 1961); we felt that the use of one spatial frequency channel was sufficient. 
We were not interested in the absolute orientation of each element but rather its orientation relative to the 2D direction of motion. Accordingly, on each trial, the mean orientation of each pixel was converted into a measure of relative orientation by computing the angular separation between the mean orientation and the 2D direction, across the half-circle (1; Equation A8). The relative orientation fell between −90° and +90°, where 0° denotes an angle parallel to the 2D direction of motion, ±45° angles oblique to the 2D direction, and ±90° angles orthogonal to the 2D motion. This metric is pictorially represented here (e.g. labels of lower row of Figure 7) and in following sections using a standardized 2D direction (red arrow), a black line (representing the local orientation structure relative to the standard direction) and a blue arrow (representing the 1D velocity stemming from the local orientation). 
Results: Relative orientation and orientation variance
Data analysis
In this section, observer errors were related to the orientation statistics within the exposed apertures. To do so, the orientation statistics of the natural scenes were estimated as described in the preceding section. To recap, a bank of oriented log-Gabor filters was convolved with each natural scene, yielding three measurements at each pixel location: mean orientation, orientation variance, and orientation energy. Orientation variance was used to classify each pixel as belonging to a texture (high orientation variance), an edge (low orientation variance), or somewhere in between (medium orientation variance). The mean orientation term was converted into a relative orientation term between ±90° by taking the angular separation between the mean orientation and the stimulus direction on each trial. The relative orientation term is used because it allows us to collapse across the direction dimension and ignore the stimulus and perceptual anisotropies discussed in the preceding section. 
To estimate global orientation statistics on each trial, we computed a two-dimensional histogram of the orientation variance and the relative orientation of the exposed image pixels, weighted by the orientation energy at each pixel. To relate the observers' errors to the image statistics, we simply added the dimension of observer error and compiled across trials. After compiling the 3D histogram, a smoothing operation (σ = 6°) was applied across the relative orientation dimension. Finally, by computing the mean and standard deviation along the error dimension, we estimated observers' bias and precision as a function of the two stimulus dimensions. 
Results
Figure 7 plots response bias (top row) and precision (bottom row) as a function of relative orientation. The red arrows in the lower section of the plot indicate the true 2D motion vector, the black line denotes the relative orientation of an element, and the blue arrows denote the local (1D) direction of motion orthogonal to each contour orientation. Data are plotted separately for regions of high (textures; blue line), medium (green line), and low (edges; red line) orientation variance. 
Figure 7
 
Bias and variability as a function of the orientation structure of the exposed patches of the natural scene relative to the direction of motion, for 3 observers (DK, JG, and SD). Areas of high local orientation variance (blue lines) induce relatively constant performance across the dimension of relative orientation. In contrast, areas of low orientation variance (red lines) exhibit a periodic dependence on the orientation of the image structure presented relative to the 2D direction of motion. Typically, the bias is orthogonal to the direction of motion, but there is some idiosyncrasy in the pattern of bias. The pattern of precision is stable across observers and indicates that precision is low when image structure is oriented oblique to the 2D direction of motion but is high when image structure is oriented either orthogonal or parallel to the 2D direction of motion.
Figure 7
 
Bias and variability as a function of the orientation structure of the exposed patches of the natural scene relative to the direction of motion, for 3 observers (DK, JG, and SD). Areas of high local orientation variance (blue lines) induce relatively constant performance across the dimension of relative orientation. In contrast, areas of low orientation variance (red lines) exhibit a periodic dependence on the orientation of the image structure presented relative to the 2D direction of motion. Typically, the bias is orthogonal to the direction of motion, but there is some idiosyncrasy in the pattern of bias. The pattern of precision is stable across observers and indicates that precision is low when image structure is oriented oblique to the 2D direction of motion but is high when image structure is oriented either orthogonal or parallel to the 2D direction of motion.
Data for edges (red lines) show that observer errors are cyclically modulated by the orientation content of the stimulus; observers are more precise when the edges are orthogonal or parallel to the 2D motion vector. In contrast, when the orientation of contour elements is oblique to the 2D motion vector, observers are less precise and are more biased. This effect, which we term a relative oblique effect, is modulated by orientation variance and is absent for near-isotropic regions (blue lines). These results effectively quantify the extent to which observers suffer from the “aperture problem” when judging the direction of natural scenes. Specifically, local orientations oblique to the global (2D) direction of motion induce biases of between 2° and 5° and increase variability by 20%–25% relative to observers' performance when local orientations are orthogonal or parallel to the 2D motion vector. 
Results: Second-order orientation statistics
Data analysis
In this section, we extend our analysis of observers' errors to consider their dependence not just on the orientation statistics of a single image patch (a first-order analysis) but also as a function of the conjoint relative orientation statistics of aperture pairings (a second-order analysis). To reduce processing time, the orientation statistics were now not processed on a pixel-by-pixel basis, but instead orientation statistics were calculated on the pooled local orientation energy collapsed across all pixels falling within an individual aperture. On each trial, there were 128 unique aperture pairings and observer errors were compiled as a function of the conjoint relative orientation between each aperture pairing. Unlike the preceding section, observer errors were weighted by one minus the orientation variance of each aperture, not the sum of the orientation energy. This procedure allowed us to use all aperture pairings but reduced the impact of high orientation variance patches (for which the mean orientation statistic is less meaningful). In total, 1802 histograms were complied (180 across each relative orientation dimension, again corresponding to relative orientations falling between −90° and +89° at 1-degree intervals). Finally, a two-dimensional Gaussian function (σ x,y = 6°) was used to smooth across the two relative orientation dimensions, before the mean and standard deviation of each error population were calculated. 
Results
In the previous section, it was demonstrated that the relative orientation of anisotropic image patches strongly influences the perceived direction of motion. In this section, we extend the analysis to examine how observers' bias and precision varies as a function of the conjoint relative orientation of elements across space, i.e., we ask what the impact of relative orientation A is in the presence of relative orientation B. 
Patterns of observer bias and variability are plotted in Figure 9, but to help the reader both understand the space used and to relate our results to previous findings, Figure 8 first schematically illustrates the range of aperture pairings. The abscissa and the ordinate of Figure 8b denote the relative orientation of apertures A and B, respectively, where the orientation of an edge (black line) is depicted relative to a standard/constant 2D motion (red arrow). The conjoint relative orientation of each aperture pairing is denoted by the two-dimensional coordinate within this space. A line of symmetry runs through the space from the lower left to the upper right (Figure 8b, purple dashed line) and we note that the results were computed separately for each side of the line of symmetry. At all points along the line of symmetry, the local direction of motion within aperture pairs is identical. Along the magenta dashed line, local directions are the mirror opposite of one another. Figure 8c denotes the location of Type I (green) and Type II (gray) configurations of local motions within the space, and Figure 8d denotes regions in which motion in the abscissa is faster than motions in the ordinate (blue), and vice versa (green). 
Figure 8
 
Representation of the second-order statistics used in Figure 9. (a) A sample stimulus moving vertically upward. The red arrow depicts the 2D direction of motion, while the blue arrows depict local (1D) motions orthogonal to each contour orientation. (b) Schematic representation of the complete set of aperture pairs. Along the green dashed line, aperture pairs have identical orientations; along the purple dashed line, aperture pairs have mirror-reversed orientations. (c) Green areas denote Type I pairings (local motions fall on either side of the global direction of motion) while gray regions denote Type II pairings (local motions fall on the same side of the global (2D) direction of motion). (d) Blue denotes regions in which the local motions are faster in aperture one than aperture two, while the converse is true for green regions.
Figure 8
 
Representation of the second-order statistics used in Figure 9. (a) A sample stimulus moving vertically upward. The red arrow depicts the 2D direction of motion, while the blue arrows depict local (1D) motions orthogonal to each contour orientation. (b) Schematic representation of the complete set of aperture pairs. Along the green dashed line, aperture pairs have identical orientations; along the purple dashed line, aperture pairs have mirror-reversed orientations. (c) Green areas denote Type I pairings (local motions fall on either side of the global direction of motion) while gray regions denote Type II pairings (local motions fall on the same side of the global (2D) direction of motion). (d) Blue denotes regions in which the local motions are faster in aperture one than aperture two, while the converse is true for green regions.
Figures 9a and 9b depict the bias and precision statistics, respectively, as a function of the conjoint relative orientation of aperture pairings. To highlight trends in our data and allow the reader to examine the 95% confidence intervals, Figures 9c9e plot one-dimensional slices (denoted by the inset). In Figure 9c, the blue line shows bias when the local (1D) motions exposed by aperture pairings are symmetrically opposite the global (2D) motion vector and the green line depicts the bias of observers when the local (1D) motions are identical. In line with previous findings, the bias for symmetric pairing is low (Bowns, 1996; Yo & Wilson, 1992), while for Type II pairings the bias has a greater magnitude and is directed toward the direction of local (1D) motion (Bowns, 1996; Burke & Wenderoth, 1993; Mingolla et al., 1992; Rubin & Hochstein, 1993; Yo & Wilson, 1992). The magnitude of the bias for Type II pairs is reduced when the angular separation between the two components is increased, again consistent with earlier research on plaids (Bowns, 1996; Burke & Wenderoth, 1993). It remains unclear whether this is due to the motion stream being better able to individuate motion signals when they are further apart in velocity space, as shown in motion transparency (Braddick, Wishart, & Curran, 2002; Greenwood & Edwards, 2006), or whether it simply reflects the fact that as separation between the orientations in the stimulus increases, then one or both orientations move closer toward the informative and less biased static or orthogonal components of motion. 
Figure 9
 
Observers' pattern of (a) bias and (b) precision as a function of the second-order relationships among aperture pairings pooled across 3 observers (DK, JG, and SD). (a) Light (positive) regions denote anticlockwise bias and dark regions denote clockwise bias. (b) Light regions denote high variability, while dark regions denote low variability. (c–e) One-dimensional slices from left to right through (a) and (b) as denoted by the insets. (c) Observers' bias for apertures with identical relative orientations (green) and for mirror-symmetric relative orientations (blue). (d) For non-symmetric Type I pairings, the bias tends to be in the direction of the fastest component of motion. (e) Same as (c), but for the variability statistics, there is no improvement for symmetric or identical aperture pairings. (f–h) Bias and (i–k) variability statistics plotted individually for each subject.
Figure 9
 
Observers' pattern of (a) bias and (b) precision as a function of the second-order relationships among aperture pairings pooled across 3 observers (DK, JG, and SD). (a) Light (positive) regions denote anticlockwise bias and dark regions denote clockwise bias. (b) Light regions denote high variability, while dark regions denote low variability. (c–e) One-dimensional slices from left to right through (a) and (b) as denoted by the insets. (c) Observers' bias for apertures with identical relative orientations (green) and for mirror-symmetric relative orientations (blue). (d) For non-symmetric Type I pairings, the bias tends to be in the direction of the fastest component of motion. (e) Same as (c), but for the variability statistics, there is no improvement for symmetric or identical aperture pairings. (f–h) Bias and (i–k) variability statistics plotted individually for each subject.
Examining the variability statistics (Figure 9b), there appears to be little impact of opposite pairings. Performance is variable when the orientation of the aperture is oblique relative to the 2D direction of motion, regardless of their relative sign. 
Discussion
The aim of our psychophysical experiment was to investigate how the results of studies investigating the perception of 2D motion with simple stimuli (e.g., plaids or bars) containing one or two orientations generalize to the perception of motion in stimuli composed of naturally occurring textures. In the context of the “aperture problem,” natural scenes differ from plaid or bar stimuli because their orientation bandwidth can be broad and they contain extended contour structure, as well as independent variation in luminance and contrast across the extent of the stimulus (Mante, Frazor, Bonin, Geisler, & Carandini, 2005). 
Our image classification paradigm demonstrates that when low orientation variance image structures (i.e., contours) are included in the stimulus, performance varies as a function of the orientation of the exposed elements relative to the 2D motion vector. Specifically, observers' estimates of direction are relatively imprecise and are biased toward the direction of 1D motion when elements are oriented oblique to the 2D direction. In contrast, observers are relatively unbiased and precise when low variance elements are oriented parallel or orthogonal to motion. In other words, observers are unable to discount the local orientation structure of the natural scene when making global directional judgments. This pattern of responses is consistent with psychophysical paradigms examining the perceived direction of translating bars; when a translating bar is oriented oblique to the 2D motion vector, observers are biased toward the direction orthogonal to the bar's orientation (Loffler & Orbach, 2001), particularly at short time periods (Lorenceau et al., 1993). 
The majority of studies that examine observers' ability to solve the “aperture problem” use stimuli composed of two orientations, as this is the minimum number needed to uniquely specify a 2D motion vector. To parallel this research, we extended our image classification paradigm to examine observers' error distributions as a function of the second-order relationships between the conjoint orientations/directions that are exposed by pairs of apertures. Consistent with previous studies, we report that when the distribution of local (1D) motions is biased to one side of the global (2D) direction (Type II) observers' reports are biased toward the direction of local motion (Amano et al., 2009; Mingolla et al., 1992; Wilson et al., 1992; Yo & Wilson, 1992). We also report that this bias is reduced when there is a greater angular separation between the two orientations (Bowns, 1996; Burke & Wenderoth, 1993). We also make one further novel observation: when local motions fall asymmetrically on either side of the global (2D) motion, observers are biased toward reporting the direction of the faster local motion. 
Global motion model
Our results demonstrate that anisotropies in the orientation structure of a natural scene can affect judgments of 2D direction of motion. Image anisotropies lead to minimally biased directional judgments when the orientation of elements is parallel or orthogonal to motion but can bias direction judgments when elements are oriented oblique to the direction of 2D motion. We next set out to determine if this pattern of results could reflect an assumption of isotropy in the detection of 2D motion. This assumption is common to many models of MT pattern-selective cells that compute 2D velocity by integrating across a plane in spatiotemporal space (e.g., Simoncelli & Heeger, 1998). Such models behave suboptimally with anisotropic stimuli because the model's “template” and the distribution of directions in natural stimuli are not matched. In support of such models, Schrater, Knill, and Simoncelli (2000) demonstrated that the detection of a moving stimulus (masked by spatiotemporal noise) is optimal when the stimulus energy is distributed across a plane in spatiotemporal space rather than confined to specific orientations within the spatiotemporal domain (Schrater et al., 2000). In this section, we take an analogous approach; we first demonstrate that anisotropic stimuli can lead to directional errors in a model that assumes isotropy when computing 2D velocity from a distribution of 1D velocities. We then investigate whether the pattern of directional errors produced by the model is consistent with the pattern of directional errors produced by our observers. 
The exact distribution of 1D velocities was not under our direct experimental control, so it is necessary for us to estimate the distribution of 1D velocities that arise from the translating natural images used in this study. We did so using a standard motion energy filtering approach (Adelson & Bergen, 1985) that uses spatiotemporal frequency-selective directional filters. The aim of the initial motion-sensing stage is to detect the distribution of 1D velocities elicited by a natural stimulus moving at a known speed but unknown direction. To do so, we created a bank of filters tuned to directions between 0° and 360° and to pseudo-speeds between 0 and 150% of a predefined 2D speed. Here pseudo-speed is defined as the ratio of the spatial and temporal frequency tuning of the sensor: 
s p e e d = t f r e q s f r e q .
(9)
We term this “pseudo-speed,” because speed tuning is a secondary attribute of the conjoint spatial and temporal frequency tuning of a motion energy filter. In isolation, an individual motion energy filter will respond maximally to a pseudo-speed only when the spatial frequency profile of the stimulus and the sensor are matched. Although we acknowledge that the motion energy filters used are not speed tuned, we find the pseudo-speed tuning to be reliable. This point is supported by the finding that the cosine relationship between the speed and direction of 1D motion is clearly present in the output of our filter bank when convolved with the rigidly translating natural scenes used, both on individual trials (Figure 12) and across many trials (Figure 10a). Two properties of natural scenes minimize the problem of decoding speed from motion energy filters with separable spatial and temporal frequency tuning. First, natural images have a broadband and approximately 1/f amplitude spectrum, so many spatiotemporal frequencies are typically represented at a given point in space and time. Second, natural images contain structures (edges) that contain information that is phase aligned across spatial frequency bands (Attneave, 1954; Barlow, 1961). Consequently, gross mismatches between the spatial frequency of the sensor and the stimulus are unlikely and the pseudo-speed tuning (as defined by Equation 9) is fairly robust. 
Figure 10
 
Testing the global motion model with anistropic motion energy distributions. The input to the global motion stage was generated by taking (a) the motion energy for an isotropic stimulus in and superimposing (b) additional signals along the orientation structure of the object (denoted by the white-dashed arrow) to produce (c) biased motion energy distributions. Signal additions were in a pair-wise manner to allow us to produce model estimates (d & e) that can be compared to the data in Section III (d) the angular separation between the model's direction estimates and the veridical direction (e) the ratio between the veridical 2D speed and the model's estimates of speed.
Figure 10
 
Testing the global motion model with anistropic motion energy distributions. The input to the global motion stage was generated by taking (a) the motion energy for an isotropic stimulus in and superimposing (b) additional signals along the orientation structure of the object (denoted by the white-dashed arrow) to produce (c) biased motion energy distributions. Signal additions were in a pair-wise manner to allow us to produce model estimates (d & e) that can be compared to the data in Section III (d) the angular separation between the model's direction estimates and the veridical direction (e) the ratio between the veridical 2D speed and the model's estimates of speed.
To generate 2D velocity sensors, we averaged the response of the 1D filter bank over a number of trials. “Prototype” filters were constructed, tuned to speeds that spanned a range from 0% (static) to 150% of the known object speed. This was achieved by measuring the response of the 1D filters to natural scenes rigidly translating in a random direction and at a specified speed. The response of the 1D filters was then phase-shifted to align with a standard 2D direction and averaged to create a template for a particular speed. This procedure collapses across the dimension of direction to remove any stimulus anisotropies. Templates for different directions were then created by phase-shifting the prototype template to the desired direction. In total, 21 templates were produced from zero speed (static) to 2 pixels/frame at 0.1° intervals. Global motion sensor templates tuned to the full range of directions at 0.1° intervals were constructed by phase-shifting the averaged template. As a result, the templates are homogeneous as a function of direction. 
To generate estimates of global 2D motion, motion energy was summed across all apertures. The summed motion energy on each trial was then multiplied with each of the 2D templates to generate a population of 2D filter responses. The final estimate of 2D velocity was selected using a “winner-take-all” algorithm. 
Model details
Motion energy sensors (Adelson & Bergen, 1985) were multiplied with the stimulus in the space and time domain (i.e., not in the Fourier domain). Sensors were centered upon the middle of each aperture and the middle frame—motion direction-selective (DS) sensors and movies had dimensions of 0.5°, 0.5°, and 0.37 s in x, y, and t (32 by 32 pixels, by 32 frames). Motion energy sensors were constructed from Equation A11 (1) and had a peak sensitivity to spatial structure at 4 c/deg. The spatiotemporal envelope was kept constant across all DS sensors (x, y = 0.2 arcmin, t = 0.1 s; 7 by 7 pixels by 7 frames). This had the advantage of keeping the directional bandwidth constant at ≈45° (half-wave at half-height, as measured from the response to spatial frequency matched sine-wave gratings) so that the maximal sensor response was identical across all speeds and directions. 
Global motion integrators were tuned to speeds from 0% to 150% of the actual object speed (1 pixel/frame = 1.33°/s) at +10% intervals and to directions around the clock (0–360°) at 0.1° intervals. Such a fine spacing was needed since it sets the limit on the precision of the winner-take-all algorithm used to select the “winning” global motion sensor at the final stage. 
No noise, normalization, or gain was incorporated into the model because we wished to explore the “noise” generated by convolving the GM sensors with anisotropic motion energy profiles, without the complexity introduced by such mechanisms. 
The model was tested with both artificial stimuli and with the stimuli used in the present psychophysical trials. Testing across both stimulus classes allowed us to assess which features of the model output are due to the underlying mechanisms of the model and which were the results of stimulus anisotropies. 
Results: Artificial stimuli
Since the peak tuning of the global motion templates was distributed equally across directions, the 2D model should perform optimally when presented with isotropic stimuli. Here we examine how the model performed when presented with anisotropic motion energy profiles. To generate artificial stimuli and relate the analysis to the psychophysical data (Figure 9), we took the motion energy profile for a rigidly moving object (Figure 10a) whose component motions are represented by the white dashed line. Imbalances in motion energy were added in a pairwise manner along the cosine to allow us to relate the models' behavior to the psychophysical results. In Figure 10b, two Gaussian energy profiles have been constructed lying along the cosine (at −70° and +40° away from the veridical direction) that defines the global motion. 
Figures 10d and 10e depict the direction and speed estimates of the model. The results reveal that the model's estimates of direction and speed vary systematically with motion energy imbalances. The bias results are in good qualitative agreement with the psychophysical data, with the directional estimates being drawn toward the motion energy imbalance for Type II combinations and toward the faster component of motion for Type I combinations. The model's speed estimates become biased toward increasingly low speeds as the two component motions move toward slower speeds. Two factors lead to this result. The first is that motion energy away from the orthogonal orientations peaks at progressively lower temporal frequencies and the “winning” template is therefore shifted toward lower speeds. The second factor is that the local motions of a faster moving object are spread over a greater range of temporal frequencies/speeds than a slower moving object. Given that the total motion energy is constant as a function of speed in our derived templates (without any normalization; Figure 11e), the motion energy (or feed-forward weighting) must therefore be more concentrated for templates tuned to slower speeds. To elaborate, two global motions traveling in the same direction but with different speeds (Figures 11a and 11b) overlap substantially in low-to-static temporal frequencies but are distinct at high temporal frequencies (i.e., at orientations orthogonal to the 2D direction of motion). Thus, if orientations orthogonal to the 2D direction of motion are not well represented, there is little to disambiguate competing speed estimates, and global velocity estimates become biased towards slow speeds as shown in Figure 10e
Figure 11
 
Motion energy for an SF band-pass dot moving at (a) 2 pixels/frame or (b) 1 pixel/frame. (c) The difference between (a) and (b). (d) The absolute motion energy difference of (c) collapsed across the speed dimension. (e) The figure demonstrates that the sum of the motion energy over both speed and directions is largely constant regardless of the underlying object speed. However, as (a) is spread over a greater range of speeds than (b), the concentration of motion energy across each direction must be greater in (b). In (d), the greatest difference between the two signals is to be found in the direction of 2D motion, but (b) has greater motion energy in the overlapping low speeds—this aspect leads to a global motion stage that is biased toward low speeds for stimuli with a weak orthogonal component of motion. (f–h) Insight into the pattern of motion energy shown in the row above. (f) A series of global motions (green dots) whose component motions (blue lines) pass through the velocity tuning of a DS sensor denoted by the red dot. (g, e) The response of the DS sensor in (f) to (g) a grating or (h) a dot stimulus. Note how the profile of (g) closely follows the red line of (f), but the motion energy in (h) falls with increasing speed. This is because the component motions that pass through the receptive field of the 1D sensor (red dot, f) are more finely spread.
Figure 11
 
Motion energy for an SF band-pass dot moving at (a) 2 pixels/frame or (b) 1 pixel/frame. (c) The difference between (a) and (b). (d) The absolute motion energy difference of (c) collapsed across the speed dimension. (e) The figure demonstrates that the sum of the motion energy over both speed and directions is largely constant regardless of the underlying object speed. However, as (a) is spread over a greater range of speeds than (b), the concentration of motion energy across each direction must be greater in (b). In (d), the greatest difference between the two signals is to be found in the direction of 2D motion, but (b) has greater motion energy in the overlapping low speeds—this aspect leads to a global motion stage that is biased toward low speeds for stimuli with a weak orthogonal component of motion. (f–h) Insight into the pattern of motion energy shown in the row above. (f) A series of global motions (green dots) whose component motions (blue lines) pass through the velocity tuning of a DS sensor denoted by the red dot. (g, e) The response of the DS sensor in (f) to (g) a grating or (h) a dot stimulus. Note how the profile of (g) closely follows the red line of (f), but the motion energy in (h) falls with increasing speed. This is because the component motions that pass through the receptive field of the 1D sensor (red dot, f) are more finely spread.
Results: Natural scenes
The global motion model was next tested with the stimuli from the psychophysical experiment, examples of the model output for typical stimuli are shown in Figure 12. This allowed us to repeat the second-order analysis described in the section “Results: Second-order orientation statistics”, replacing the observers' with the model's direction estimates. The patterns of bias and variability generated by the model are shown in Figure 13 and are in good qualitative agreement with the observers' data. The results also highlight unexpected anisotropies present in observer responses—but not in the model's predictions to the artificial motion energy profiles. 
Figure 12
 
(a) Example trial in which an image rigidly translates in the leftward (180°) direction (b) sum of motion energy across all the apertures. (c) Examples of individual apertured image-regions and their associated motion-energy.
Figure 12
 
(a) Example trial in which an image rigidly translates in the leftward (180°) direction (b) sum of motion energy across all the apertures. (c) Examples of individual apertured image-regions and their associated motion-energy.
Figure 13
 
(a) Model bias (b) and model variability as a function of the second-order relative-orientations of aperture pairings. (c, d) Scatter plots of the model against the observers' bias (c) and variability (d).
Figure 13
 
(a) Model bias (b) and model variability as a function of the second-order relative-orientations of aperture pairings. (c, d) Scatter plots of the model against the observers' bias (c) and variability (d).
To provide a more robust statistical analysis, the model and observer bias and variability were recalculated using larger bins (10°) along the two relative orientation dimensions, but no further smoothing was applied. This generated 441 independent measures of bias and variability and a Pearson's correlation between the model and the observer bias revealed strong and significant correlations (R = 0.72, p < 0.00001 and R = 0.72, p < 0.00001, respectively). 
Despite success in predicting the observers' overall bias and variability, a correlation of the model errors on a trial-by-trial basis was very weak (R = 0.025) but significant at p < 0.005 (N = 34,000). Thus, while we are able to model the statistical properties of observers' bias and precision, we are unable to capture observers' trial-by-trial variability. 
General discussion
In the modeling section of this paper, we developed a two-stage, feed-forward template model of global motion processing that putatively reflects visual processes that occur in areas V1 and MT of the primate brain and is optimally tuned for isotropic stimuli. The model exhibits pattern of errors (biases and precision) that are similar to those observed in our psychophysical data when confronted with anisotropic natural stimuli. As we restricted our analysis to a single spatial frequency channel, the model amounted to fitting a cosine to a motion energy distribution on each trial. However, if the range of motion energy filters were extended to integrate across multiple spatial frequency channels, then the model would be theoretically similar to models that process 2D velocity by integrating across a plane in spatiotemporal space (e.g., Simoncelli & Heeger, 1998). 
Unlike previous studies of the “aperture problem” (e.g., Amano et al., 2009), the 1D velocities presented to the observer were not under direct experimental control but were determined by the orientation structure of the exposed natural scene and the 2D velocity of translation. Consequently, we used the motion energy model (Adelson & Bergen, 1985) to estimate the distribution of 1D velocities. This is a critical feature of the model, as a cosine-fitting model (as described in the Introduction section) would not produce errors in velocity estimation when presented with a noiseless and discrete description of the 1D velocity distribution. However, when uncertainty is introduced into the model, the motion signals stemming from differently oriented elements are not weighted equally. Instead, the strength of a signal varies with the contrast of stimulus elements and with their orientation relative to the 2D direction of motion. As a result, the motion energy on each trial is not distributed evenly across the cosine and errors arise because estimates of 2D velocity are drawn toward the 1D velocities stemming from high-contrast orientations. As such, the relative orientations of the stimulus elements play a central role in determining estimates of 2D velocity; if the orientation is oblique to the 2D direction, this is problematic because estimates of 2D direction will be drawn toward an oblique direction. In contrast, if the orientation is orthogonal to motion, then this is beneficial to 2D motion judgments because the 1D and 2D vectors are in the same direction. At the other end of the temporal frequency spectrum, static/parallel orientations are highly informative because they constrain motion judgments to one of two directions 180° apart but to an infinite number of speeds. 
Existing evidence that the motion stream is able to utilize parallel orientations when making direction judgments comes from studies using randomly updated Glass patterns. Such stimuli exhibit a consistent static/parallel signal but a noisy and isotropic motion signal. Observers perceive the stimuli as alternatively switching between the two directions of motion consistent with the static/parallel orientations (Ross, Badcock, & Hayes, 2000), while the inclusion of a small percentage of coherently moving dots (∼10%) can stabilize the motion percept (Ross, 2004). Recent neurophysiological research using Glass patterns has revealed that MT and MST cells (commonly believed to underlie 2D motion perception) integrate across both moving and static signals (Krekelberg, Dannenberg, Hoffmann, Bremmer, & Ross, 2003). 
Related work has also indicated that motion streaks play a role in motion processing (Apthorp & Alais, 2009; Burr & Ross, 2002; Geisler, 1999): Motion streaks refer to the response induced by a stimulus moving within the integration period of a mechanism tuned for static image structure. Typically, the residual signal is oriented parallel to the direction of motion. To take advantage of motion streaks, Geisler (1999) proposed that the motion stream is integrated across two filters, one tuned to the orientation orthogonal to the direction of motion orientation and a second, static filter, tuned to the parallel orientation designed to detect the signal from the proposed motion streaks. As this pattern of integration is common to the cosine-fitting model presented in this work and existing models that integrate across a plane in spatiotemporal space (both approaches assume signal-isotropy), we predict that motion streaks will enhance direction estimates from such models. In contrast, as the Vector Average model weights the contribution of each orientation as a cosine function of the relative orientation, the model effectively ignores orientations parallel to motion and is unlikely to be able to account for the current data, the data concerning streaks, or the data using limited lifetime Glass patterns. 
Although the present model predicts observers' overall bias and variability as a function of the second-order orientation statistics, the model is only weakly (but significantly) able to capture observers' trial-by-trial variability. This may reflect a number of factors; the first is that the relative orientation of aperture patches may not be the main cause of observers' trial-by-trial variability. Observer variability (σ) was around 10–13° and given that observed bias as a function of relative orientation (Figure 7, row two) are generally smaller (around 3–6°), it may be that observers' stochastic response variability simply swamps the predictable variability caused by motion energy imbalances, producing only a weak correlation. A second contributory factor could be that the model operates in a homogeneous manner as a function of direction, whereas the psychophysical data exhibit a number of anisotropies such as the oblique effect (Dakin, Mareschal, & Bex, 2005a; Gros et al., 1998), cardinal attraction (Loffler & Orbach, 2001), or reference repulsion (Rauber & Treue, 1998), which are not implemented in the model. A third reason why the model is only weakly correlated with observer trial-by-trial variation may be the lack of any gain control in the model, which may serve to normalize relative energy across the natural scenes (Mante et al., 2005). This is particularly pertinent because both the psychophysical data and the model reported in this paper demonstrate how imbalances in the energy across the orientation structure of the scene can lead to systematic errors in the estimation of 2D motion. 
One critique of the current work and related work using the “bubbles” paradigm (Gosselin & Schyns, 2001) is that the application of a mask can distort both the orientation and the spatial frequency characteristics of the stimulus. This critique is fair; in particular, the mean orientation of a sensor bank whose spatial extent overlaps with the aperture boundary may not reflect the orientation of the underlying scene but the combined orientation of the aperture and the natural scene. Indeed, the spatial configuration of the aperture has been shown to affect the perceived 2D direction of a moving carrier (Castet & Zanker, 1999; Shimojo, Silverman, & Nakayama, 1989). To address this critique, ongoing work (Kane, Bex, & Dakin, 2010) has used a global Gabor stimulus (Amano et al., 2009). This stimulus is composed of randomly oriented Gabor elements, whose temporal frequency is consistent with a given 2D velocity. The use of this stimulus allows for direct control of the orientation content of the stimulus and also avoids complications associated with placing an aperture over the scene. 
Appendix A
Scene statistics
In our scene analysis, the orientation statistics of natural scenes were estimated by convolution with a bank of log-Gabor filters tuned to directions between 0 and 165° at 15° intervals. The log-Gabor filters were constructed in the Fourier domain (Field, 1987) and the natural scenes were transformed into the Fourier domain using Matlab's fft2 function. The product of the log Gabor and the natural scene was calculated in the frequency domain and the results transformed back to the spatial domain using Matlab's ifft2 function. This procedure is equivalent to performing convolution of the filter and the natural scene in the spatial domain. 
Each log Gabor G was constructed in the Fourier domain and was defined by 
G = R ( f x y ) O ( θ x y ) ,
(A1)
where R(f xy ) specifies the spatial frequency profile of the sensor and O(θ xy ) specifies the orientation tuning of the sensor, with f xy denoting the spatial frequency of each point in the Fourier domain and θ xy denoting the orientation of each point in the Fourier domain. 
p(f xy ) is defined as 
p ( f x , y ) = exp ( ln ( f x , y / f p e a k ) 2 2 ln ( σ / f x , y ) 2 ) ,
(A2)
where f peak is the filter's central frequency and σ is the ratio between the filter's central frequency and the standard deviation of the log Gaussian that is set to 0.65. 
O(θ xy ) is defined in Equation A3 and is an angular Gaussian function, where ϕ (defined in Equation A4) is the angular separation between the orientation tuning of the sensor θ peak and the orientation of each pixel in the Fourier domain: 
O ( θ x y ) = exp ( ϕ 2 2 σ θ 2 ) ,
(A3)
 
ϕ = | atan 2 ( sin ( θ x y θ p e a k ) , cos ( θ x y θ p e a k ) ) | .
(A4)
Once the energy at each orientation had been calculated, the sum of the orientation energy, the mean absolute orientation
θ
, and the orientation variance was calculated. This was done on a pixel-by-pixel basis for the calculations reported in Results: Relative orientation and orientation variance, and on an aperture-by-aperture basis for the calculations reported in Results: Second-order orientation statistics. The mean orientation
θ
is calculated by Equation A5, where θ is the orientation of each filter and E θ is the filter output: 
θ = 1 2 atan 2 ( θ sin ( 2 θ ) E θ , θ cos ( 2 θ ) E θ , ) .
(A5)
The orientation variance was calculated from the following equations: 
R 2 = θ ( sin ( 2 θ ) E θ ) 2 + θ ( cos ( 2 θ ) E θ ) 2 θ E θ 2 ,
(A6)
 
V = 1 R .
(A7)
On each trial, the mean orientation of a pixel or aperture was converted to a relative orientation term by calculating the angular separation between the 2D direction and the mean orientation of a pixel: 
θ r e l a t i v e = tan 1 ( sin ( θ 2 D ϑ ) cos ( θ 2 D ϑ ) ) .
(A8)
The 1D motion energy (Adelson & Bergen, 1985) filters were constructed in the spatial domain and were the product of a Gaussian envelope and a Carrier signal: 
D G = G ( x , y , t ) S ( x , y , t ) .
(A9)
The Gaussian envelope was centered upon the middle frame and upon the middle of each aperture on each frame, where (x a, y a) is the center of each aperture and t m is the middle frame: 
G ( x , y , t ) = exp ( ( x x a ) 2 2 σ x 2 ) exp ( ( y y a ) 2 2 σ y 2 ) exp ( ( t t m ) 2 2 σ t 2 ) .
(A10)
The Carrier signal was a sinusoidal modulation in x and y with a wavelength λ spatial and an orientation θ. The phase was shifted on each frame by Δλ temporal: 
S ( x , y , t ) = sin ( 2 π λ s p a t i a l ( sin ( θ ) x + cos ( θ ) y ) + Δ λ t e m p o r a l t + λ p h a s e ) ,
(A11)
where λ phase = 0 for even phase and λ phase =
π 2
for odd phase sensors. 
The phase shift per frame Δλ was calculated from the desired pseudo-speed tuning ϕ 1D of each local motion sensor, given the spatial frequency of the sensor using 
t f r e q = ϕ 1 D s f r e q ,
(A12)
 
Δ λ = t f r e q 2 π .
(A13)
Finally, the motion energy at each point was taken to be the square root of the sum of the square of the odd and even phased neurons to generate a phase-invariant output (Adelson & Bergen, 1985): 
E = G e v e n 2 + G o d d 2 .
(A14)
 
Acknowledgments
This work was funded by the Welcome Trust. Supported by R01 EY 019281 and R01 EY 018664. 
Commercial relationships: none. 
Corresponding author: David Kane. 
Email: davidkane@berkeley.edu. 
Address: 11‐43 Bath Street, London EC1V 9EL, UK. 
References
Adelson E. H. Bergen J. R. (1985). Spatiotemporal energy models for the perception of motion. Journal of the Optical Society of America A, 2, 284–299. [CrossRef]
Adelson E. H. Movshon J. A. (1982). Phenomenal coherence of moving visual patterns. Nature, 300, 523–525. [CrossRef] [PubMed]
Amano K. Edwards M. Badcock D. R. Nishida S. y. (2009). Adaptive pooling of visual motion signals by the human visual system revealed with a novel multi-element stimulus. Journal of Vision, 9(3):4, 1–25, http://www.journalofvision.org/content/9/3/4, doi:10.1167/9.3.4. [PubMed] [Article] [CrossRef] [PubMed]
Apthorp D. Alais D. (2009). Tilt aftereffects and tilt illusions induced by fast translational motion: Evidence for motion streaks. Journal of Vision, 9(1):27, 1–11, http://www.journalofvision.org/content/9/1/27, doi:10.1167/9.1.27. [PubMed] [Article] [CrossRef] [PubMed]
Attneave F. (1954). Some informational aspects of visual perception. Psychological Review, 61, 183–193. [CrossRef] [PubMed]
Barlow H. (1961). Possible principles underlying the transformation of sensory messages. In Rosenblith W. A. (Ed.), Sensory communication (pp. 217–234). Cambridge, MA: MIT Press.
Basole A. White L. E. Fitzpatrick D. (2003). Mapping multiple features in the population response of visual cortex. Nature, 423, 986–990. [CrossRef] [PubMed]
Bowns L. (1996). Evidence for a feature tracking explanation of why type II plaids move in the vector sum direction at short durations. Vision Research, 36, 3685–3694. [CrossRef] [PubMed]
Braddick O. J. Wishart K. A. Curran W. (2002). Directional performance in motion transparency. Vision Research, 42, 1237–1248. [CrossRef] [PubMed]
Brainard D. H. (1997). The psychophysics toolbox. Spatial Vision, 10, 433–436. [CrossRef] [PubMed]
Burke D. Wenderoth P. (1993). The effect of interactions between one-dimensional component gratings on two-dimensional motion perception. Vision Research, 33, 343–350. [CrossRef] [PubMed]
Burr D. C. Ross J. (2002). Direct evidence that “Speedlines” influence motion mechanisms. Journal of Neuroscience, 22, 8661–8664. [PubMed]
Campbell F. W. Kulikowski J. J. Levinson J. (1966). The effect of orientation on the visual resolution of gratings. The Journal of Physiology, 187, 427–436. [CrossRef] [PubMed]
Castet E. Zanker J. (1999). Long-range interactions in the spatial integration of motion signals. Spatial Vision, 12, 287–307. [CrossRef] [PubMed]
Chauvin A. Worsley K. J. Schyns P. G. Arguin M. Gosselin F. (2005). Accurate statistical tests for smooth classification images. Journal of Vision, 5(9):1, 659–667, http://www.journalofvision.org/content/5/9/1, doi:10.1167/5.9.1. [PubMed] [Article] [CrossRef] [PubMed]
Dakin S. C. Apthorp D. Alais D. (2010). Anisotropies in judging the direction of moving natural scenes. Journal of Vision, 10(11):5, 1–19, http://www.journalofvision.org/content/10/11/5, doi:10.1167/10.11.5. [PubMed] [Article] [CrossRef] [PubMed]
Dakin S. C. Mareschal I. Bex P. J. (2005a). An oblique effect for local motion: Psychophysics and natural movie statistics. Journal of Vision, 5(10):9, 878–887, http://www.journalofvision.org/content/5/10/9, doi:10.1167/5.10.9. [PubMed] [Article] [CrossRef]
Dakin S. C. Mareschal I. Bex P. J. (2005b). Local and global limitations on direction integration assessed using equivalent noise analysis. Vision Research, 45, 3027–3049. [CrossRef]
De Valois K. K. De Valois R. L. Yund E. W. (1979). Responses of striate cortex cells to grating and checkerboard patterns. The Journal of Physiology, 291, 483–505. [CrossRef] [PubMed]
Eckstein M. P. Ahumada A. J.Jr. (2002). Classification images: A tool to analyze visual strategies. Journal of Vision, 2(1):i, 1x, http://www.journalofvision.org/content/2/1/i, doi:10.1167/2.1.i. [PubMed] [Article] [CrossRef]
Ferrera V. P. Wilson H. R. (1990). Perceived direction of moving two-dimensional patterns. Vision Research, 30, 273–287. [CrossRef] [PubMed]
Field D. J. (1987). Relations between the statistics of natural images and the response properties of cortical cells. Journal of the Optical Society of America A, 4, 2379–2394. [CrossRef]
Geisler W. S. (1999). Motion streaks provide a spatial code for motion direction. Nature, 400, 65–69. [CrossRef] [PubMed]
Gosselin F. Schyns P. G. (2001). Bubbles: A technique to reveal the use of information in recognition tasks. Vision Research, 41, 2261–2271. [CrossRef] [PubMed]
Greenwood J. A. Edwards M. (2006). An extension of the transparent-motion detection limit using speed-tuned global-motion systems. Vision Research, 46, 1440–1449. [CrossRef] [PubMed]
Gros B. L. Blake R. Hiris E. (1998). Anisotropies in visual motion perception: A fresh look. Journal of the Optical Society of America A, Optics, Image Science, and Vision, 15, 2003–2011. [CrossRef] [PubMed]
Heeley D. W. Buchanan-Smith H. M. (1992). Directional acuity for drifting plaids. Vision Research, 32, 97–104. [CrossRef] [PubMed]
Hubel D. H. Wiesel T. N. (1968). Receptive fields and functional architecture of monkey striate cortex. The Journal of Physiology, 195, 215–243. [CrossRef] [PubMed]
Jazayeri M. Movshon J. A. (2007). A new perceptual illusion reveals mechanisms of sensory decoding. Nature, 446, 912–915. [CrossRef] [PubMed]
Kane D. Bex P. Dakin S. (2010). Humans assume isotropic orientation structure when solving the “Aperture problem” for motion [Abstract]. Journal of Vision, 10(7):832, 832a, http://www.journalofvision.org/content/10/7/832, doi:10.1167/10.7.832. [CrossRef]
Krekelberg B. Dannenberg S. Hoffmann K. P. Bremmer F. Ross J. (2003). Neural correlates of implied motion. Nature, 424, 674–677. [CrossRef] [PubMed]
Loffler G. Orbach H. S. (2001). Anisotropy in judging the absolute direction of motion. Vision Research, 41, 3677–3692. [CrossRef] [PubMed]
Lorenceau J. (1998). Veridical perception of global motion from disparate component motions. Vision Research, 38, 1605–1610. [CrossRef] [PubMed]
Lorenceau J. Shiffrar M. (1992). The influence of terminators on motion integration across space. Vision Research, 32, 263–273. [CrossRef] [PubMed]
Lorenceau J. Shiffrar M. Wells N. Castet E. (1993). Different motion sensitive units are involved in recovering the direction of moving lines. Vision Research, 33, 1207–1217. [CrossRef] [PubMed]
Mante V. Carandini M. (2005). Mapping of stimulus energy in primary visual cortex. Journal of Neurophysiology, 94, 788–798. [CrossRef] [PubMed]
Mante V. Frazor R. A. Bonin V. Geisler W. S. Carandini M. (2005). Independence of luminance and contrast in natural scenes and in the early visual system. Nature Neuroscience, 8, 1690–1697. [CrossRef] [PubMed]
Mardia K. Jupp P. (1972). Wiley Series in Probability and Statistics.
Masson G. S. Rybarczyk Y. Castet E. Mestre D. R. (2000). Temporal dynamics of motion integration for the initiation of tracking eye movements at ultra-short latencies. Visual Neuroscience, 17, 753–767. [CrossRef] [PubMed]
Meng X. Qian N. (2005). The oblique effect depends on perceived, rather than physical, orientation and direction. Vision Research, 45, 3402–3413. [CrossRef] [PubMed]
Mingolla E. Todd J. T. Norman J. F. (1992). The perception of globally coherent motion. Vision Research, 32, 1015–1031. [CrossRef] [PubMed]
Movshon J. A. Adelson E. H. Gizzi M. S. Newsome W. T. (1985). The analysis of moving visual patterns. In Chagas C. Gattass R. Gross C. (Eds.), Experimental brain research supplementum II: Pattern recognition mechanisms (vol. 54, pp. 117–151). (Reprinted in Experimental Brain Research, Supplementum, vol. 11, pp. 117–151, 1986).
Pack C. C. Born R. T. (2001). Temporal dynamics of a neural solution to the aperture problem in visual area MT of macaque brain. Nature, 409, 1040–1042. [CrossRef] [PubMed]
Pelli D. G. (1997). The VideoToolbox software for visual psychophysics: Transforming numbers into movies. Spatial Vision, 10, 437–442. [CrossRef] [PubMed]
Pelli D. G. Zhang L. (1991). Accurate control of contrast on microcomputer displays. Vision Research, 31, 1337–1350. [CrossRef] [PubMed]
Rauber H. J. Treue S. (1998). Reference repulsion when judging the direction of visual motion. Perception, 27, 393–402. [CrossRef] [PubMed]
Ross J. (2004). The perceived direction and speed of global motion in Glass pattern sequences. Vision Research, 44, 441–448. [CrossRef] [PubMed]
Ross J. Badcock D. R. Hayes A. (2000). Coherent global motion in the absence of coherent velocity signals. Current Biology, 10, 679–682. [CrossRef] [PubMed]
Rubin N. Hochstein S. (1993). Isolating the effect of one-dimensional motion signals on the perceived direction of moving two-dimensional objects. Vision Research, 33, 1385–1396. [CrossRef] [PubMed]
Schrater P. R. Knill D. C. Simoncelli E. P. (2000). Mechanisms of visual motion detection. Nature Neuroscience, 3, 64–68. [CrossRef] [PubMed]
Shimojo S. Silverman G. H. Nakayama K. (1989). Occlusion and the solution to the aperture problem for motion. Vision Research, 29, 619–626. [CrossRef] [PubMed]
Simoncelli E. P. Heeger D. J. (1998). A model of neuronal responses in visual area MT. Vision Research, 38, 743–761. [CrossRef] [PubMed]
Snowden R. J. Treue S. Andersen R. A. (1992). The response of neurons in areas V1 and MT of the alert rhesus monkey to moving random dot patterns. Experimental Brain Research, 88, 389–400. [CrossRef] [PubMed]
Switkes E. Mayer M. J. Sloan J. A. (1978). Spatial frequency analysis of the visual environment: Anisotropy and the carpentered environment hypothesis. Vision Research, 18, 1393–1399. [CrossRef] [PubMed]
van Hateren J. H. van der Schaaf A. (1998). Independent component filters of natural images compared with simple cells in primary visual cortex. Proceedings of the Royal Society B: Biological Sciences, 265, 359–366. [CrossRef]
Wallach H. (1935). Psychologische Forschung,.
Wilson H. R. Ferrera V. P. Yo C. (1992). A psychophysically motivated model for two-dimensional motion perception. Visual Neuroscience, 9, 79–97. [CrossRef] [PubMed]
Wilson H. R. Kim J. (1994). Perceived motion in the vector sum direction. Vision Research, 34, 1835–1842. [CrossRef] [PubMed]
Yo C. Wilson H. R. (1992). Perceived direction of moving two-dimensional patterns depends on duration, contrast and eccentricity. Vision Research, 32, 135–147. [CrossRef] [PubMed]
Figure 1
 
(Top right) A circle rigidly translating leftward generates a distribution of 1D velocities (measured normal to each contour fragment's orientation) that lies upon a cosine. The 1D velocities stemming from three differently oriented fragments are highlighted; green indicates an orientation orthogonal to the 2D direction, blue indicates an orientation oblique to the 2D direction, and pink indicates an orientation parallel to the 2D direction. Note how the speed of 1D motion varies sinusoidally with the angular separation between the orientation of an edge and the 2D direction.
Figure 1
 
(Top right) A circle rigidly translating leftward generates a distribution of 1D velocities (measured normal to each contour fragment's orientation) that lies upon a cosine. The 1D velocities stemming from three differently oriented fragments are highlighted; green indicates an orientation orthogonal to the 2D direction, blue indicates an orientation oblique to the 2D direction, and pink indicates an orientation parallel to the 2D direction. Note how the speed of 1D motion varies sinusoidally with the angular separation between the orientation of an edge and the 2D direction.
Figure 2
 
We measured observers' ability to estimate the direction of motion of rigidly translating natural stimuli viewed through 16 apertures. (a) A linear grayscale natural image from the van Hateren (van Hateren & van der Schaaf, 1998) image set. (b) A sample frame from the movie stimulus presented to observers. (c) The test phase. Observers manipulated the orientation of a line composed of 4 Gaussian patches that radiated from the center of the display to the edge of the potential viewing area until it matched the perceived direction of the translating natural scene. A phase-randomized version of the stimulus was presented during the test phase (and between trials) to mask the transient structure at the onset/offset of the stimulus.
Figure 2
 
We measured observers' ability to estimate the direction of motion of rigidly translating natural stimuli viewed through 16 apertures. (a) A linear grayscale natural image from the van Hateren (van Hateren & van der Schaaf, 1998) image set. (b) A sample frame from the movie stimulus presented to observers. (c) The test phase. Observers manipulated the orientation of a line composed of 4 Gaussian patches that radiated from the center of the display to the edge of the potential viewing area until it matched the perceived direction of the translating natural scene. A phase-randomized version of the stimulus was presented during the test phase (and between trials) to mask the transient structure at the onset/offset of the stimulus.
Figure 3
 
Distribution of static orientation structure in the test stimuli. (a) Image 44 and (b) Image 206 from the van Hateren image set. (c) Orientation energy as a function of the absolute orientation. (d) The percent of pixels with a specified circular variance. Ten circular variance bins were used between 0 and 1. If the distribution of orientation variance was white, then the expected percentage in each bin would be 10%.
Figure 3
 
Distribution of static orientation structure in the test stimuli. (a) Image 44 and (b) Image 206 from the van Hateren image set. (c) Orientation energy as a function of the absolute orientation. (d) The percent of pixels with a specified circular variance. Ten circular variance bins were used between 0 and 1. If the distribution of orientation variance was white, then the expected percentage in each bin would be 10%.
Figure 4
 
Analysis of the reported direction as a function of the presented direction. Each row shows data from one observer. The first column shows the ratio of the frequency of reported directions to presented directions as a function of the presented direction (a ratio of 1 is shown in green). The second and third columns plot observers' bias and variability as a function of the presented direction. The green region in column two shows unbiased performance. Overall, observers' performance is highly dependent on the direction of motion that was presented. Insets compare performance for canonically oriented (blue) and randomly rotated natural scenes (black). The similarity of data across these conditions indicates that performance anisotropies are not due to anisotropies in the stimuli.
Figure 4
 
Analysis of the reported direction as a function of the presented direction. Each row shows data from one observer. The first column shows the ratio of the frequency of reported directions to presented directions as a function of the presented direction (a ratio of 1 is shown in green). The second and third columns plot observers' bias and variability as a function of the presented direction. The green region in column two shows unbiased performance. Overall, observers' performance is highly dependent on the direction of motion that was presented. Insets compare performance for canonically oriented (blue) and randomly rotated natural scenes (black). The similarity of data across these conditions indicates that performance anisotropies are not due to anisotropies in the stimuli.
Figure 5
 
Scatter plot of the center of mass of each quadrant of observers' precision against the bias measured at this angle. Results show a negative correlation (R = −0.952) indicating that the oblique effect depends on the perceived, not physical, direction of motion.
Figure 5
 
Scatter plot of the center of mass of each quadrant of observers' precision against the bias measured at this angle. Results show a negative correlation (R = −0.952) indicating that the oblique effect depends on the perceived, not physical, direction of motion.
Figure 6
 
(a) A linear grayscale natural image was convolved with a set of log-Gabor filters tuned to each of 16 orientations evenly spaced over 180°. (b, c) Sample orientation energy distributions for the corresponding pink and blue regions highlighted in (a). The distribution of orientation energy at each pixel was classified in terms of (d) the sum of the energy across orientations ≈ contrast, (e) the mean orientation, and (f) the orientation variance.
Figure 6
 
(a) A linear grayscale natural image was convolved with a set of log-Gabor filters tuned to each of 16 orientations evenly spaced over 180°. (b, c) Sample orientation energy distributions for the corresponding pink and blue regions highlighted in (a). The distribution of orientation energy at each pixel was classified in terms of (d) the sum of the energy across orientations ≈ contrast, (e) the mean orientation, and (f) the orientation variance.
Figure 7
 
Bias and variability as a function of the orientation structure of the exposed patches of the natural scene relative to the direction of motion, for 3 observers (DK, JG, and SD). Areas of high local orientation variance (blue lines) induce relatively constant performance across the dimension of relative orientation. In contrast, areas of low orientation variance (red lines) exhibit a periodic dependence on the orientation of the image structure presented relative to the 2D direction of motion. Typically, the bias is orthogonal to the direction of motion, but there is some idiosyncrasy in the pattern of bias. The pattern of precision is stable across observers and indicates that precision is low when image structure is oriented oblique to the 2D direction of motion but is high when image structure is oriented either orthogonal or parallel to the 2D direction of motion.
Figure 7
 
Bias and variability as a function of the orientation structure of the exposed patches of the natural scene relative to the direction of motion, for 3 observers (DK, JG, and SD). Areas of high local orientation variance (blue lines) induce relatively constant performance across the dimension of relative orientation. In contrast, areas of low orientation variance (red lines) exhibit a periodic dependence on the orientation of the image structure presented relative to the 2D direction of motion. Typically, the bias is orthogonal to the direction of motion, but there is some idiosyncrasy in the pattern of bias. The pattern of precision is stable across observers and indicates that precision is low when image structure is oriented oblique to the 2D direction of motion but is high when image structure is oriented either orthogonal or parallel to the 2D direction of motion.
Figure 8
 
Representation of the second-order statistics used in Figure 9. (a) A sample stimulus moving vertically upward. The red arrow depicts the 2D direction of motion, while the blue arrows depict local (1D) motions orthogonal to each contour orientation. (b) Schematic representation of the complete set of aperture pairs. Along the green dashed line, aperture pairs have identical orientations; along the purple dashed line, aperture pairs have mirror-reversed orientations. (c) Green areas denote Type I pairings (local motions fall on either side of the global direction of motion) while gray regions denote Type II pairings (local motions fall on the same side of the global (2D) direction of motion). (d) Blue denotes regions in which the local motions are faster in aperture one than aperture two, while the converse is true for green regions.
Figure 8
 
Representation of the second-order statistics used in Figure 9. (a) A sample stimulus moving vertically upward. The red arrow depicts the 2D direction of motion, while the blue arrows depict local (1D) motions orthogonal to each contour orientation. (b) Schematic representation of the complete set of aperture pairs. Along the green dashed line, aperture pairs have identical orientations; along the purple dashed line, aperture pairs have mirror-reversed orientations. (c) Green areas denote Type I pairings (local motions fall on either side of the global direction of motion) while gray regions denote Type II pairings (local motions fall on the same side of the global (2D) direction of motion). (d) Blue denotes regions in which the local motions are faster in aperture one than aperture two, while the converse is true for green regions.
Figure 9
 
Observers' pattern of (a) bias and (b) precision as a function of the second-order relationships among aperture pairings pooled across 3 observers (DK, JG, and SD). (a) Light (positive) regions denote anticlockwise bias and dark regions denote clockwise bias. (b) Light regions denote high variability, while dark regions denote low variability. (c–e) One-dimensional slices from left to right through (a) and (b) as denoted by the insets. (c) Observers' bias for apertures with identical relative orientations (green) and for mirror-symmetric relative orientations (blue). (d) For non-symmetric Type I pairings, the bias tends to be in the direction of the fastest component of motion. (e) Same as (c), but for the variability statistics, there is no improvement for symmetric or identical aperture pairings. (f–h) Bias and (i–k) variability statistics plotted individually for each subject.
Figure 9
 
Observers' pattern of (a) bias and (b) precision as a function of the second-order relationships among aperture pairings pooled across 3 observers (DK, JG, and SD). (a) Light (positive) regions denote anticlockwise bias and dark regions denote clockwise bias. (b) Light regions denote high variability, while dark regions denote low variability. (c–e) One-dimensional slices from left to right through (a) and (b) as denoted by the insets. (c) Observers' bias for apertures with identical relative orientations (green) and for mirror-symmetric relative orientations (blue). (d) For non-symmetric Type I pairings, the bias tends to be in the direction of the fastest component of motion. (e) Same as (c), but for the variability statistics, there is no improvement for symmetric or identical aperture pairings. (f–h) Bias and (i–k) variability statistics plotted individually for each subject.
Figure 10
 
Testing the global motion model with anistropic motion energy distributions. The input to the global motion stage was generated by taking (a) the motion energy for an isotropic stimulus in and superimposing (b) additional signals along the orientation structure of the object (denoted by the white-dashed arrow) to produce (c) biased motion energy distributions. Signal additions were in a pair-wise manner to allow us to produce model estimates (d & e) that can be compared to the data in Section III (d) the angular separation between the model's direction estimates and the veridical direction (e) the ratio between the veridical 2D speed and the model's estimates of speed.
Figure 10
 
Testing the global motion model with anistropic motion energy distributions. The input to the global motion stage was generated by taking (a) the motion energy for an isotropic stimulus in and superimposing (b) additional signals along the orientation structure of the object (denoted by the white-dashed arrow) to produce (c) biased motion energy distributions. Signal additions were in a pair-wise manner to allow us to produce model estimates (d & e) that can be compared to the data in Section III (d) the angular separation between the model's direction estimates and the veridical direction (e) the ratio between the veridical 2D speed and the model's estimates of speed.
Figure 11
 
Motion energy for an SF band-pass dot moving at (a) 2 pixels/frame or (b) 1 pixel/frame. (c) The difference between (a) and (b). (d) The absolute motion energy difference of (c) collapsed across the speed dimension. (e) The figure demonstrates that the sum of the motion energy over both speed and directions is largely constant regardless of the underlying object speed. However, as (a) is spread over a greater range of speeds than (b), the concentration of motion energy across each direction must be greater in (b). In (d), the greatest difference between the two signals is to be found in the direction of 2D motion, but (b) has greater motion energy in the overlapping low speeds—this aspect leads to a global motion stage that is biased toward low speeds for stimuli with a weak orthogonal component of motion. (f–h) Insight into the pattern of motion energy shown in the row above. (f) A series of global motions (green dots) whose component motions (blue lines) pass through the velocity tuning of a DS sensor denoted by the red dot. (g, e) The response of the DS sensor in (f) to (g) a grating or (h) a dot stimulus. Note how the profile of (g) closely follows the red line of (f), but the motion energy in (h) falls with increasing speed. This is because the component motions that pass through the receptive field of the 1D sensor (red dot, f) are more finely spread.
Figure 11
 
Motion energy for an SF band-pass dot moving at (a) 2 pixels/frame or (b) 1 pixel/frame. (c) The difference between (a) and (b). (d) The absolute motion energy difference of (c) collapsed across the speed dimension. (e) The figure demonstrates that the sum of the motion energy over both speed and directions is largely constant regardless of the underlying object speed. However, as (a) is spread over a greater range of speeds than (b), the concentration of motion energy across each direction must be greater in (b). In (d), the greatest difference between the two signals is to be found in the direction of 2D motion, but (b) has greater motion energy in the overlapping low speeds—this aspect leads to a global motion stage that is biased toward low speeds for stimuli with a weak orthogonal component of motion. (f–h) Insight into the pattern of motion energy shown in the row above. (f) A series of global motions (green dots) whose component motions (blue lines) pass through the velocity tuning of a DS sensor denoted by the red dot. (g, e) The response of the DS sensor in (f) to (g) a grating or (h) a dot stimulus. Note how the profile of (g) closely follows the red line of (f), but the motion energy in (h) falls with increasing speed. This is because the component motions that pass through the receptive field of the 1D sensor (red dot, f) are more finely spread.
Figure 12
 
(a) Example trial in which an image rigidly translates in the leftward (180°) direction (b) sum of motion energy across all the apertures. (c) Examples of individual apertured image-regions and their associated motion-energy.
Figure 12
 
(a) Example trial in which an image rigidly translates in the leftward (180°) direction (b) sum of motion energy across all the apertures. (c) Examples of individual apertured image-regions and their associated motion-energy.
Figure 13
 
(a) Model bias (b) and model variability as a function of the second-order relative-orientations of aperture pairings. (c, d) Scatter plots of the model against the observers' bias (c) and variability (d).
Figure 13
 
(a) Model bias (b) and model variability as a function of the second-order relative-orientations of aperture pairings. (c, d) Scatter plots of the model against the observers' bias (c) and variability (d).
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×