Free
Research Article  |   September 2009
The aperture problem in contoured stimuli
Author Affiliations
Journal of Vision September 2009, Vol.9, 13. doi:https://doi.org/10.1167/9.10.13
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      David Kane, Peter J. Bex, Steven C. Dakin; The aperture problem in contoured stimuli. Journal of Vision 2009;9(10):13. https://doi.org/10.1167/9.10.13.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

A moving object elicits responses from V1 neurons tuned to a broad range of locations, directions, and spatiotemporal frequencies. Global pooling of such signals can overcome their intrinsic ambiguity in relation to the object's direction/speed (the “aperture problem”); here we examine the role of low-spatial frequencies (SF) and second-order statistics in this process. Subjects made a 2AFC fine direction-discrimination judgement of ‘naturally’ contoured stimuli viewed rigidly translating behind a series of small circular apertures. This configuration allowed us to manipulate the scene by randomly switching which portion of the stimulus was presented behind each aperture or by occluding certain spatial frequency bands. We report that global motion integration is (a) largely insensitive to the second-order statistics of such stimuli and (b) is rigidly broadband even in the presence of a disrupted low SF component.

Introduction
Motion perception is an essential component of vision within complex natural environments, and has been a focus of study for more than a century (for review see Bradley & Goyal, 2008). However, the computational issues that motion perception raises, and how these might be addressed in terms of neural modeling, have proven difficult to overcome. Our present understanding rests on the notion that motion is initially encoded by directionally selective neurons in cortical area V1, each operating within a narrow range of spatio-temporal frequencies and over a limited area of space (Anderson & Burr, 1987; Hubel & Wiesel, 1968). Populations of such neurons (collectively tuned to a broad range of spatial and temporal frequencies) operate in parallel across the visual field. While such highly spatially-localized and specific signals accurately reflect local retinal motion (i.e. over a small area), they fail to reflect global motion (i.e. of larger moving objects), which elicit responses from neurons tuned to a broad range of spatial frequencies (SF), directions and spatial locations. Consequently local motion signals must be combined to signal global motion. The present works explores the global pooling of local signals in terms of the integration of signals across SF channels (Experiments 1, 2, and 3) and the role of second-order spatial statistics (Experiment 2). 
In order to effectively encode motion in the natural environment, the integration of V1 signals must reflect the correlations that exist across space and time. We focus on two statistical regularities that we review in turn. The first concerns the second-order correlations within the contours of natural scenes and the second, the spatial alignment of structure across spatial frequencies (SF) channels (Attneakve, 1954; Barlow, 1961). 
Contour structure
Natural scenes contain a preponderance of edges (Attneave, 1954; Barlow, 1961) whose properties tend to vary smoothly across a scene, a characteristic termed ‘good continuity’ by the Gestalt psychologists (Wertheimer, 1958). More formally the relationship has been defined in terms of the probability that one edge point predicts the occurrence of another edge point at a given distance (), orientation difference (ϕ) and contour angle (θ) (Geisler, Perry, Super, & Gallogly, 2001). Broadly speaking the smaller ϕ, θ & , the more likely one is to encounter another edge point. Psychophysicists have examined if and how the visual system exploits such regularities using paradigms in which small oriented elements (typically Gabors) are used to build contours with particular second-order relations (e.g. co-circularity) which are then embedded in a field of randomly-oriented distracter elements (e.g. Field, Hayes, & Hess, 1993). In this paradigm, contour detection must involve global integration since it operates over spatial distances and across spatial phase in a manner that could not be achieved by conventional V1 neurons (Hess & Dakin, 1997). Sensitivity to contours has been shown to increase with lower curvature (smaller ϕ & θ) and contour length (Field et al., 1993), consistent with the statistics of natural scenes. 
While it is clear that the second-order distribution of orientations across the visual field is critical for determining our ability to see static extended contours, the role of such statistics in motion processing is less clear. Second-order orientation statistics can certainly influence motion processing when the underlying elements are locally ambiguous. Lorenceau and Shiffrar (1992) demonstrated that the perceived directions of four moving bars (Figure 1) can be dramatically altered by changing the appearance of occluding elements. Although the bars in Figures 1a and 1b move in an identical fashion (sinusoidally translating in the direction perpendicular to their orientation) the perceived directions of motion are different. In Figure 1a the bars are perceived to move as independent pairs, but when the occluders are present in Figure 1b, the individual components ‘cohere’ and appear to move as a rotating diamond whose vertices are occluded. This dramatic change in percept is thought to arise from a change in the classification of the end points from ‘intrinsic’ (i.e. part of the object) to ‘extrinsic’ (arising from occlusion by another object). This argument is intuitive. When the endpoints are considered part of the object eliciting motion there is only one physically realistic interpretation: independent motion. However, if the endpoints are due to an occluding object, the motion signal generated at the intercept bears no relation to object motion. In isolation, this leaves the speed and direction (velocity) of each bar ambiguous, potentially consistent with an infinite range of speeds (as shown in Figure 1d). However by combining multiple vectors, a unique solution can be found where the constraint lines meet, a direction know as the intersection of constraints (IOC) solution (Movshon, Adelson, Gizzi, & Newsome, 1986). 
Figure 1
 
The influence of form on motion integration (Lorenceau & Shiffrar, 1992). The movement of the bars is identical in (a) and (b) (sinusoidally translating in the direction perpendicular to their orientation). Yet in (a) the bars appear to move independently of each other, but in (b), when the apertures are made explicit, the individual components ‘cohere’ and appear to move in directions consistent with a rotating diamond. (c) The ambiguity associated with a moving bar. The exact speed and direction (velocity) of the bar is unknown, however it is known that the veridical velocity must fall on a ‘constraint’ line that can be inferred from the speed perpendicular to the bars' orientation, as shown in (d)—By solving for two or more such lines, a unique vector can be found, in the case of a rigid object moving in 2D space this vector reflects the veridical velocity.
Figure 1
 
The influence of form on motion integration (Lorenceau & Shiffrar, 1992). The movement of the bars is identical in (a) and (b) (sinusoidally translating in the direction perpendicular to their orientation). Yet in (a) the bars appear to move independently of each other, but in (b), when the apertures are made explicit, the individual components ‘cohere’ and appear to move in directions consistent with a rotating diamond. (c) The ambiguity associated with a moving bar. The exact speed and direction (velocity) of the bar is unknown, however it is known that the veridical velocity must fall on a ‘constraint’ line that can be inferred from the speed perpendicular to the bars' orientation, as shown in (d)—By solving for two or more such lines, a unique vector can be found, in the case of a rigid object moving in 2D space this vector reflects the veridical velocity.
The manner in which occlusion affects motion processing is consistent with recent research demonstrating that one-dimensional (1D) and two-dimensional (2D) signals are treated quite differently by the motion-processing stream. By measuring the perceived direction of multiple Gabor stimuli Amano et al. (Amano, Edwards, Badcock, & Nishida, 2009) have shown that integration of 1D plaids occurs in a manner consistent with the IOC rule, whilst integration of 2D plaids produces answers in line with predictions from a Vector average (VA) rule. Furthermore Bowns and Alais (2006) have shown that adaptation to stimuli yielding a VA solution generates a large shift in the perceived direction towards the IOC interpretation and vice-versa. Such adaptation suggests that the two solutions operate independently and compete to determine the overall percept of motion. Although differential processing of 1D and 2D stimuli has been demonstrated in both the psychophysical (Amano et al., 2009) and neurophysiological (Adelson & Movshon, 1982; Albright, 1984) literature, it remains unclear whether the aperture problem is solved locally (e.g. IOC; Adelson & Movshon, 1982; Simoncelli & Heeger, 1998), by the non-Fourier component of motion (Wilson, Ferrera, & Yo, 1992) or the change in the position of features over time (Bowns, 1996). 
This uncertainty also abounds in the interpretation of ‘component’ and ‘pattern’ cells in motion sensitive areas of primate and feline brains. While the majority of V1 neurons are selective for the motion of the individual sine-wave components of moving plaids, the majority of cells in area MT respond to the direction of the plaid (Albright, 1984; Movshon et al., 1986). Again, it is unclear what mechanism mediates the response of the ‘pattern’ selective cells and a number of models have been proposed (Adelson & Bergen, 1985; Adelson & Movshon, 1982; Perrone, 2004; Rust, Mante, Simoncelli, & Movshon, 2006; Simoncelli & Heeger, 1998; Wilson et al., 1992). Recent research by Majaj, Carandinin, and Movshon (2007) has shown the ‘pattern’ selectivity of MT cells is contingent upon local interactions because ‘pattern’ selectivity is only observed when the two components are overlapping within the receptive field of MT neurons. If two components are spatially separated, the response profiles of MT ‘pattern’ cells are similar to those of ‘component’ cells. This suggests the response profiles of ‘pattern’ cells are mediated by local 2D cues not the integration of spatially disparate 1D cues (e.g. Figure 1b). 
Co-incidence of structure across spatial frequencies
As well as the second-order spatial regularities discussed so far, natural scenes have the tendency for content across spatial frequencies to be spatially aligned (Attneave, 1954; Barlow, 1961). The early decomposition of retinal signals cannot fully encapsulate this property, as their spatial frequency tuning is too narrow (Anderson & Burr, 1987; Blakemore & Campbell, 1969; Hubel & Wiesel, 1968), accordingly signals must be recombined to achieve the broad SF tuning observed in the integration of both static (Dakin & Hess, 1998) and moving contours (Bex & Dakin, 2003; Ledgeway & Hess, 2002, 2006; Ledgeway, Hess, & Geisler, 2005). Such broadband integration is not without danger—an inflexible integration mechanism increases the risk that inappropriate or noisy signals may be integrated. Variation in the extent of integration across scale has been shown in static contour tasks, with contour integration being spatially broadband along straight elements but narrowband at areas of high curvature (Dakin & Hess, 1998). Functionally, this arrangement should reduce the impact of noise by integrating where the signals are likely to be the same across scale (straight edges) but selectively integrating when the signal will vary across frequency (curved edges). 
In the motion domain global integration has been shown to be broadband in detection tasks (Bex & Dakin, 2002) and in motion after effects (MAE) when isotropic flickering test stimuli are employed (Ashida & Osaka, 1994; von Grunau & Dube, 1992). For instance, while participants are unable to detect the motion of locally band-pass dots whose spatial frequency content do not overlap (Bex & Dakin, 2003; Ledgeway, 1996) the perception of global motion (rotation, translation & expansion) can be masked by noise elements at spatial frequencies that are remote from signal elements (Bex & Dakin, 2002). This suggests that global motions detectors are not only SF broadband but are unable to selectively tune their input with respect to the stimulus type at hand. Such ‘rigid’ integration has also been observed in the orientation domain where Schrater, Knill, and Simoncelli (2000) found that thresholds for a signal embedded white noise are near optimal when the energy is uniformly spread around one speed plane, but sub-optimal when the energy in confined to isolated sub-sets of the space. 
The study of apparent motion has established that d max (the greatest distance that motion may be detected over two successive frames), scales inversely with SF (under most conditions) (Baker, Baydala, & Zeitouni, 1989; Cleary & Braddick, 1990; Eagle & Rogers, 1996; Morgan, 1992) but much less work has studied the influence of SF in global motion tasks. There is some evidence that low SFs play a special role. In Bex and Dakin (2002), masking was strongest for low SF noise elements, even when matched for visibility, suggesting that coarse information is preferentially integrated. Further evidence that low SFs are used to ‘bind’ high SF comes from the phenomenon of ‘motion capture’, where high SF structure is perceived as moving in the direction of the low SFs, even when the directional signals are centered on opposing directions (Ramachandran & Cavanagh, 1987). 
The present work uses a novel stimulus to explore the influence of second-order statistics and SF in a 2AFC direction discrimination task. Stimuli are generated by band-pass filtering white noise and then performing a thresholding operation. The results are binary blob images ( Figure 2) containing smooth and relatively sparse contours, which we term “naturalistic” simply because this form of contour structure is more commonly observed in natural scenes than in e.g. two-dimensional noise. The SF profile may be described as low-cut; with the maximum amplitude defined by the filter used (0.75 c/deg) while the fall off in amplitude with increasing SF is shallow due to the thresholding operation. 
Figure 2
 
Examples of the stimuli employed. (a) Broadband. (b) Low-pass, Gaussian filtered version of (a). (c) “Leaky” high-pass—generated by subtracting a Gaussian blurred version of (a) from (a). (d) (Strictly) High-pass stimulus—generated by further subtracting a Gaussian blurred versions of (c) (see Methods). (e) Amplitude spectra of (a, b, c and d), note how the low frequency component of (c) is leaky but the no-illusion stimuli reaches an amplitude of zero at a low SF.
Figure 2
 
Examples of the stimuli employed. (a) Broadband. (b) Low-pass, Gaussian filtered version of (a). (c) “Leaky” high-pass—generated by subtracting a Gaussian blurred version of (a) from (a). (d) (Strictly) High-pass stimulus—generated by further subtracting a Gaussian blurred versions of (c) (see Methods). (e) Amplitude spectra of (a, b, c and d), note how the low frequency component of (c) is leaky but the no-illusion stimuli reaches an amplitude of zero at a low SF.
The stimuli used are a significant deviation from the type of stimuli used in most motion studies. For instance, while dot-stimuli may be resolved locally, the use of drifting-Gabors or straight edges forces the visual system to combine signals over space to disambiguate local motions (Amano et al., 2009; Lorenceau & Alais, 2001). In contrast the ability of the visual system to locally resolve motion signals stemming from curved elements is unclear. In this regard there is an interesting inter-play between the ability of the motion stream to accurately identify component motion (presumably easier for straight edges) against the ability to resolve signals locally (presumably easier for areas of high-curvature). Furthermore areas of low-curvature may aid the binding of spatially disparate elements (as shown in contour detection paradigms e.g. Geisler et al., 2001). For the purpose of the current study it is worth noting that increasing the area of the carrier signal exposed to the observer leads to large improvements in discrimination thresholds. Thus if any disambiguation is occurring on a local level, the precision of such estimates are poor. 
Like many studies designed to probe the aperture problem we restrict our analysis to motion within two-dimensions, we concede that this excludes many of the spatiotemporal relationships present in natural environments, but note that 2D motion is consistent with the sub-set of naturally occurring motions that occur within the fronto-parallel plane. 
Given the non-fractal nature of our stimuli, Experiment 1 probes the role of the low and high SF's components of our stimulus. This experiment provides an essential control for Experiments 2 and 3, which explores the effect of disrupting the second-order statistics in a direction discrimination paradigm. 
Methods
Subjects
Three psychophysically experienced observers (DK, SD, JG) with normal or corrected-to-normal vision, took part in all experiments except for in Experiment 2, when subject JG was replaced with JC. 
Apparatus
Stimuli were generated on an Apple iMac, running MATLAB (MathWorks) using elements of the Psychtoolbox (Brainard, 1997; Pelli, 1997). Stimuli were displayed on a Dell, Trinitron CRT with spatial and temporal resolution set to 1080 * 768 pixels and 85 Hz respectively. The screen was viewed from a distance of 1.5m so that one pixel subtended 0.35 arcmin. of visual angle. The monitor signal was passed through an attenuator (Pelli & Zhang, 1991), following which the signal was amplified and copied (using a line-splitter) to the three guns of the monitor resulting in a pseudo 12 bit monochrome image. Monitor linearization was achieved by recording the relationship between the signal and the monitor intensity (Minolta LS 110 photometer), to create a linearization look up table that was passed to the Psychtoolbox internal color look up table. 
Stimuli
The mean luminance of the stimuli was 30.5 cd/m 2 with a root-mean-square contrast of 0.20. Stimuli were viewed through a large 2D raised cosine aperture (tapered annulus radius; 1.38 arc min) presented in the center of the display. The radius of the aperture was ether 2.95° or 1.17° (two viewing areas were employed to control against ceiling affects; see below). The smaller aperture size was equal to the total signal area in the locally apertured condition in Experiment 2. Due to the tapered annulus used, the visible area was taken to be the area above contrast detection threshold in keeping with the detectable area of Gabor stimuli (Fredericksen, Bex, & Verstraten, 1997). 
Stimuli were generated by spatially band-pass filtering random noise using a 2D Laplacian-of-Gaussian filter— σ = 22.8 arc min—and then thresholding the result at mean luminance to generate binary “blob” images. An example stimulus is illustrated in Figure 2a. This procedure allowed us to rapidly generate complex shapes with a broad SF profile. 200 such images were generated. On each trial a random image was selected, with replacement. Low-pass images were generated by convolving the broadband images with a Gaussian filter ( σ = 5.4 arc min., Figure 2b). One set of high-pass images ( Figure 2c) was generated by subtracting a Gaussian ( σ = 2.1 arc min.) filtered version of the broadband images from the source image. This process is “leaky”—allowing some low-frequency information to pass and leading to the Craik-Cornsweet-O'Brien (CCOB) illusion (Cornsweet, 1970; Craik, 1966; O'Brien, 1958) to be present in our stimuli (observe how the areas within the contours appear to be light or dark even though the luminance of each patch is equal). To control the potential influence of this illusory coarse-scale structure the low-pass image was subtracted (Figure 2d). This procedure has previously (Dakin & Bex, 2003) been shown to completely abolish the CCOB effect by attenuating the low-frequencies and so reversing the polarity of centres of the “blobs”. 
The carrier component of stimuli translated at a speed of 3.93 deg/s for 0.3 seconds (refresh rate 85 Hz = 26 frames) in near-upwards directions. Motion was generated using operations built in to the computer's graphics card, accessed using the OpenGL programming language. During each trial the stimuli was passed to the graphics card buffer. Stimuli (11.5 × 11.5 deg) were greater in size than the viewing aperture (radius 2.95 deg/1.17 deg), but during each frame only a segment of the original image was displayed. By smoothly varying the region of the original image presented a percept of rigid translation of the image through the aperture was generated. To avoid the potential effects of an orientation bias the underlying image was randomly flipped from left to right during on each trial. Between trials a phase-scrambled version of the original broadband stimuli was placed within the viewing area and the following trial was initiated immediately following the observers response. 
Procedure
A method of constant stimuli (MCS) was used to assess fine direction-discrimination with such patterns. A small offset clockwise (CW) or anticlockwise (ACW) was added relative to vertical upwards motion. The observer's 2AFC task was to fixate on a continuously present cross at the center of the monitor and to indicate the direction of motion (CW or ACW of vertical upwards motion), guessing if necessary. Audio feedback was provided for incorrect answers. The offset was between ±7° (large radius) and ±10° (small radius) at 17 equally spaced intervals. Each point was measured 17 times per run and all participants completed at least 2 runs (i.e. 578 trials per condition), extra trials were added if the psychometric function was under or over constrained. All conditions were randomly interleaved. 
The procedure for deriving thresholds was identical to Dakin, Mareschal, and Bex (2005); the psychometric function was fit with a cumulative wrapped Gaussian and the standard-deviation parameter of the best fitting function was taken as the estimated threshold. A bootstrapping technique was employed to estimate 95% confidence intervals on these estimates; data were re-sampled with replacement across each point (assuming binomial error) in the psychometric function a total of 1024 times and the function refit. In all plots, error bars indicate 95% confidence intervals on the threshold estimates. 
Experiment 1: Dependence of direction discrimination in naturally contoured stimuli on spatial frequency structure
In the first experiment we sought to determine the relative influence of information across SF channels in our ‘naturally’ contoured stimuli. Figure 3 plots direction discrimination thresholds for DK, JG & SD, measured with four underlying carrier signals (broadband, “leaky” high-pass, strictly high-pass and low-pass). Error bars indicate 95% confidence intervals. Note how performance is worse with the smaller aperture (dark gray), indicating that performance in the smaller aperture conditions is not at ceiling. In Figure 3d, thresholds for DK, JG & SD were first mean adjusted to zero to correct for biases, then pooled across participants. Thresholds were broadly similar for the high-pass and broadband conditions but thresholds were significantly higher for low-pass stimuli in the smaller aperture condition. Thus direction sensitivity increases either by increasing spatial frequency or increasing aperture size (at least for the conditions tested). This indicates that the signal is less reliable at the low SF's, despite there being an identical number of cycles in the contour structure of each SF channel. Finally, these results reveal no special role for low SFs, unlike that observed motion capture (Ramachandran & Cavanagh, 1987). Given that we tested only four spatial frequency and two aperture-size conditions we cannot make more general assertions about the relationship between these parameters and direction discrimination performance. There may be for example, subtle interactions between parameters, effects that saturate with increasing SF, etc. 
Figure 3
 
(a, b and c) Direction discrimination thresholds for three observers (DK, JG & SD), measured with four underlying carrier signals (broadband, leaky high-pass, strictly high-pass and low-pass—see text for description). Error bars indicate 95% confidence intervals. Note that performance was worse over the smaller aperture (dark gray) condition indicating that performance was not at ceiling. (d) Mean thresholds for the three observers after normalization to zero to correct biases, then pooled across participants. Thresholds were lower for high-pass than broadband conditions, but not significantly so. Thresholds were significantly higher for low-pass stimuli in the smaller aperture condition.
Figure 3
 
(a, b and c) Direction discrimination thresholds for three observers (DK, JG & SD), measured with four underlying carrier signals (broadband, leaky high-pass, strictly high-pass and low-pass—see text for description). Error bars indicate 95% confidence intervals. Note that performance was worse over the smaller aperture (dark gray) condition indicating that performance was not at ceiling. (d) Mean thresholds for the three observers after normalization to zero to correct biases, then pooled across participants. Thresholds were lower for high-pass than broadband conditions, but not significantly so. Thresholds were significantly higher for low-pass stimuli in the smaller aperture condition.
Experiment 2: The role of second-order statistics
Given that removing the low SF information from broadband images did not substantially impair direction discrimination thresholds, we next asked if the “naturalistic” contour structure within our stimuli was promoting the integration of high SF motion signals. It is certainly the case that motion signals can inform observers about the form of objects as shown in slit-motion studies (e.g. Nishida, 2004) and studies of spatiotemporal boundary formation (e.g. Shipley & Kellman, 1993) but much less work has demonstrated the influence of form on motion processing (but see, Lorenceau & Alais, 2001; McDermott, Weiss, & Adelson, 2001). To test this hypothesis, we assessed the impact of disrupting the second-order motion/orientation statistics of our stimuli by placing apertures over the stimuli (Figure 4a). Global structure could then be disrupted by randomly switching the signals passing under each aperture with the signal that would have passed behind another randomly chosen aperture (Figures 4b and 4d). Scrambling in this manner across all apertures preserved local signals but disrupted global structure. Note that breaking global structure in this way disrupts both the second-order statistics and the low SF components of the signal. Therefore the effect of scrambling can only be identified by comparing performance across both the high-pass and broadband stimuli. Thus if motion processing exploits the statistical regularities of second-order structure in naturalistic images, then performance should deteriorate in both the high-pass and broadband conditions as this structure is abolished. Alternatively, if a detriment to performance is observed only in broadband stimuli, then disruption to the low SFs is driving any observed reduction in performance. 
Figure 4
 
(a–d) Middle frames of the four conditions used in Experiment 2 (contrast has been maximized to improve visibility). (a) Underlying stimuli were similar to Experiment 1, but were viewed through a series of small stationary apertures that were centered on the contours in the middle frame of the sequence. (b) Global structure was disrupted by randomly swapping the signals viewed behind each aperture. (c, d) Shows a high-pass filtered version of the same image. (e–g) depict the first, middle and last frames of an example broad-band unscrambled trial. For illustration purposes the underlying image is superimposed upon the occluding surface of the apertures. Note that apertures were densely place over the whole contour structure of the image and that the contour passes through the middle of each aperture during the middle frame (f).
Figure 4
 
(a–d) Middle frames of the four conditions used in Experiment 2 (contrast has been maximized to improve visibility). (a) Underlying stimuli were similar to Experiment 1, but were viewed through a series of small stationary apertures that were centered on the contours in the middle frame of the sequence. (b) Global structure was disrupted by randomly swapping the signals viewed behind each aperture. (c, d) Shows a high-pass filtered version of the same image. (e–g) depict the first, middle and last frames of an example broad-band unscrambled trial. For illustration purposes the underlying image is superimposed upon the occluding surface of the apertures. Note that apertures were densely place over the whole contour structure of the image and that the contour passes through the middle of each aperture during the middle frame (f).
Stimuli
Stimuli were identical to the broadband ( Figure 2a) and high-pass ( Figure 2c) stimuli of Experiment 1 but were viewed through a mask consisting of a series of circular raised-cosine apertures (radius 16.2 arc min.; tapered region radius 1.38 arc min.). All apertures were positioned a circular region (radius of 2.95 deg) centered upon the fixation point. The underlying noise carrier translated upwards and each contour passed through the middle of each aperture during the middle frame of the trial ( Figure 4f). This arrangement of the apertures and contours rendered the global structure of the stimuli readily apparent to the observer. Furthermore centering the apertures over the contours reduced between-trial variability that would have resulted from a random placement of the apertures. Due to the random nature of the stimuli the number of apertures varied, with a mean of 86.4 and a standard deviation of 6.8. Scrambling was achieved by swapping the signal under one aperture with that of another randomly chosen aperture. Scrambling in this manner preserved local signals but disrupted global structure. Example stimuli can be viewed in Movie 1 (un-scrambled) and Movie 2 (scrambled). 
 
Movie 1
 
Example trial: Un-scrambled condition.
 
Movie 2
 
Example trial: Scrambled condition.
Results
Results from Experiment 2 are plotted in Figure 5 and show two main effects. First, thresholds for the broadband stimuli (gray triangles) are higher than thresholds obtained without the obscuring apertures (dashed lines). Second, scrambling increased thresholds two-fold across the broadband condition but only had a weak effect on the high-pass stimuli. Interestingly participants reported a percept of rigid translation under all conditions except the broadband scrambled condition where a small amount of spatial incoherence was observed. This pattern of results suggests that the second-order statistics do not significantly influence motion processing in our experiment because scrambling would have predicted an equivalent effect in both the high-pass and broadband signals. Instead, our results are consistent with a global motion mechanism that pools directional information across space and SFs but is insensitive to the relative motion information in nearby locations. In this model, scrambling increases the directional bandwidth at low SF's ( Figures 5d and 5e) leading to a loss of sensitivity. The weaker effect observed in the high-pass conditions reflects the weak signal in the low SF's (see Figure 2e). Later sections attempt to justify this position further by isolating the low SF component of the signal ( Experiment 3) and assessing the variability in the signal though a model of V1 neurons (see 
Figure 5
 
Direction discrimination thresholds measured with locally apertured stimuli for three observers (DK, JC & SD). Dashed lines indicate the mean direction discrimination threshold for each subject for the broadband stimuli from Experiment 1. Thresholds for the broadband stimuli (gray triangles) are always higher than the high-pass (black circles) stimuli. The effect of scrambling is highly significant in the broadband stimuli whilst only a small effect is observed in the high-pass stimuli. This suggests that ‘coherent’ global structure is not necessary to achieve low discrimination thresholds but that disrupting global structure is detrimental to performance when the low frequencies are present. (d, e) depict the motion energy at 3.6 c/deg and 0.75/cdeg respectively across a channel of V1 neurons tuned to the object speed. Note how the distribution of motion energy is identical in (d) but not in (e) highlighting how scrambling dramatically increases the direction bandwidth of the signal at low SF's (see 1 for model details).
Figure 5
 
Direction discrimination thresholds measured with locally apertured stimuli for three observers (DK, JC & SD). Dashed lines indicate the mean direction discrimination threshold for each subject for the broadband stimuli from Experiment 1. Thresholds for the broadband stimuli (gray triangles) are always higher than the high-pass (black circles) stimuli. The effect of scrambling is highly significant in the broadband stimuli whilst only a small effect is observed in the high-pass stimuli. This suggests that ‘coherent’ global structure is not necessary to achieve low discrimination thresholds but that disrupting global structure is detrimental to performance when the low frequencies are present. (d, e) depict the motion energy at 3.6 c/deg and 0.75/cdeg respectively across a channel of V1 neurons tuned to the object speed. Note how the distribution of motion energy is identical in (d) but not in (e) highlighting how scrambling dramatically increases the direction bandwidth of the signal at low SF's (see 1 for model details).
Experiment 3: Low SFs and the effect of scrambling carrier location
Experiment 3 was designed to probe the role of low SFs in the scrambling effect observed in Experiment 2. This was achieved by progressively attenuating the high SF component of the broadband signal to isolate the low SF component by convolution of the carrier signal with Gaussian kernels of progressively larger spatial extent. Examples of the scrambled and unscrambled stimuli are shown in Movies 3, 4, 5, and 6
 
Movie 3
 
Example trial: Un-scrambled, blur σ = 5.4 arc min.
 
Movie 4
 
Example trial: Scrambled, blur σ = 5.4 arc min.
 
Movie 5
 
Example trial: Un-scrambled, blur σ = 22.0 arc min.
 
Movie 6
 
Example trial: Scrambled, blur σ = 22.0 arc min.
Methods
Subjects, procedure and apparatus were identical to Experiment 2. Stimuli were low-pass versions of the broadband stimuli in Experiment 1 from which five low-pass conditions were created by convolving the broadband images with a Gaussian filters set to σ = 5.4 7.8 11.4 16.2 22.2 arc min—after convolution the contrast for all conditions was set to a root-mean-square contrast of 0.20 (6.0 cd/m2). The five new stimuli were then tested across both the scrambled and unscrambled conditions of Experiment 2 to generate 10 new conditions. 
Results
Figure 6 shows the results of Experiment 3, which are in good agreement with the results of Experiment 2. Scrambling induced a twofold increase in thresholds at low levels of stimulus blur ( σ = 0.09). To examine the effects of increasing blur, a straight line was fit to the log of thresholds across the scrambled and unscrambled conditions. The exponent of the fit was recorded and error bars were generated using a bootstrapping procedure with 1024 iterations. The results of the fitting procedure ( Figures 6d6f) show that the exponent is higher in the scrambled condition (significantly so for DK and SD). This means that motion discrimination thresholds increase more quickly with blurring, for scrambled than unscrambled conditions. Since increasing the level of blur in the images does not alter the second-order statistics we conclude that it is the disruption of low SF components of the signal that is driving the effect of scrambling. An alternative interpretation of the data is that lateral interactions occur over increasing distance with decreasing SF (e.g. Polat & Sagi, 1993)—given the fixed radius of the display this may lead to an increased impact of lateral interactions with increasing blur. 
Figure 6
 
Results for Experiment 3 for three observers (DK, SD & JG). Direction discrimination thresholds for scrambled (black circles) and unscrambled (gray triangles) apertured stimuli are shown as a function of the standard deviation of Gaussian blur applied to the underlying contour image. The curves show the line of best fit generated by fitting a straight line to the log of the data, the slope of which is shown in (d–f) for unscrambled (gray bars) and scrambled (black bars) conditions, for observers DK, SD and JG respectively. Error bars show 95% confidence intervals on all graphs. The exponent is always greater in the scrambled condition, (significantly for DK and SD). This suggests that increasing reliance upon the low frequency component is of greater detriment to the scrambled stimuli, further indicating that it is the low-frequency component of the signal rather than the second-order statistics that is driving the effect of scrambling.
Figure 6
 
Results for Experiment 3 for three observers (DK, SD & JG). Direction discrimination thresholds for scrambled (black circles) and unscrambled (gray triangles) apertured stimuli are shown as a function of the standard deviation of Gaussian blur applied to the underlying contour image. The curves show the line of best fit generated by fitting a straight line to the log of the data, the slope of which is shown in (d–f) for unscrambled (gray bars) and scrambled (black bars) conditions, for observers DK, SD and JG respectively. Error bars show 95% confidence intervals on all graphs. The exponent is always greater in the scrambled condition, (significantly for DK and SD). This suggests that increasing reliance upon the low frequency component is of greater detriment to the scrambled stimuli, further indicating that it is the low-frequency component of the signal rather than the second-order statistics that is driving the effect of scrambling.
Discussion
The accurate estimation of motion-direction is trivial for objects containing isotropic orientation structure. Under such conditions the distribution of motion energy is predictable and veridical estimates of the direction of motion can be obtained by simply calculating the center of motion energy. However in natural, unconstrained environments this is rarely, if ever the case and biases in motion energy render such a strategy unreliable. The paradigm we have described is able to probe the influence of imbalances in motion energy simply because the stimuli used exhibited anisotropies in the orientation structure that varied randomly from trial-to-trial. In Experiments 2 and 3 scrambling will induce ‘spurious’ correlations in the low SF component of the signal (see model), increasing anisotropies in the motion energy and in turn raising psychophysical thresholds. 
The lack of an effect of disrupting the second-order statistics is surprising considering the importance of second-order statistics in the detection of static (e.g. Field et al., 1993) and moving contours (Bex, Simmers, & Dakin, 2001; Ledgeway & Hess, 2002, 2006; Ledgeway et al., 2005). More directly our work appears to contradict the findings of Lorenceau and Alais (2001) who show performance on a motion discrimination task is better for ‘closed’ forms than ‘open’ forms. Although both studies used similar paradigms, the stimuli employed differed in terms of their perceptual ambiguity: The class of stimuli employed by Lorenceau and Alais (2001) has been well studied and the percept of global motion is ambiguous and bi-stable (McDermott et al., 2001) reflecting the potential of such displays to be consistent with more than one physical interpretation (see Figure 1). In contrast, the signal presented in the current paradigm was consistent with only one interpretation. This suggests that global second-order statistics may only influence performance in a motion discrimination task when there are very high levels of uncertainty in the binding of spatially disparate elements. The finding also implies that studies of global motion with random second-order orientation statistics (e.g. Amano et al., 2009) are designed to an appropriate level of abstraction. 
Although our results suggest no role for the second-order statistics (within one SF channel) that determine performance in contour detections paradigms (e.g. Field et al., 1993), the effect of scrambling highlights the importance of the low SF component of motion and how manipulations of spatially disparate elements can dramatically influence the directional signal at such frequencies. This observation has implications for a number of other studies using apertured but broadband stimuli (e.g. Lorenceau & Alais, 2001; Mingolla et al., 1992) where the directional signal of the low-pass component may play an important role. 
The rigid integration of the disrupted low SF component observed in our study indicates that the motion stream is unable to filter out or ‘ignore’ SF channels on the basis of a high directional bandwidth in the distribution of motion energy. Although in the experiments reported ‘ignoring’ the low frequency component of motion would likely improve psychometric thresholds, the relationship between signal bandwidth and reliability is not straightforward. For instance, a broad directional bandwidth is often the hallmark of an unambiguous directional signal (e.g. small dot stimuli)—an observation that has been incorporated into the model of Weiss and Adelson (1998) where signals with a broad directional bandwidth are able to constrain estimates of global judgements to a greater extent than signals with narrow directional bandwidths. 
The present work does not distinguish between the predictions of IOC or VA theories, as the stimuli used are essentially Type I. Using the aperture positions of Experiment 2 to restrict the range of orientations presented to the observer may provide a promising route through which this issue could be investigated. 
Controls
The above analysis has implicitly assumed that the psychophysical data is the result of local signals being combined across space to yield a global estimate of direction. However the stimulus used is theoretically resolvable at the local level, since local patches window a two-dimensional stimulus and are therefore not strictly one-dimensional. To ascertain what level of disambiguation is being achieved at the local level, we perform a control in which we varied the number of apertures. The control experiment was identical in all regards to the broadband unscrambled condition of Experiment 2, except we vary the area of the image presented to the observer by varying the number of apertures presented from 1, 4, 16 or 32. In all conditions the spatial positioning of the apertures was random but constrained to fall within a radius of 2.95° from fixation. Results are shown in Figure 7. Discrimination thresholds improve with increasing aperture number, strongly suggesting that the degree of precision achieved in Experiment 2 could not have resulted from a local analysis alone and that information must have been combined across space. Note, performance in the single aperture condition is better than if the information were truly ambiguous (i.e. straight edges) in which case a simple model which detects the direction orthogonal to an elements orientation will produce discrimination thresholds of around 65 degrees. Thus some level of local disambiguation is being achieved. 
Figure 7
 
Control Experiment 1 examined the ability of the observers to locally resolve the information presented in each aperture of Experiment 2. Results demonstrate that performance improves rapidly with increasing aperture number and strongly suggests a global analysis is needed to achieve the level of precision observers achieved in Experiment 2.
Figure 7
 
Control Experiment 1 examined the ability of the observers to locally resolve the information presented in each aperture of Experiment 2. Results demonstrate that performance improves rapidly with increasing aperture number and strongly suggests a global analysis is needed to achieve the level of precision observers achieved in Experiment 2.
A second potential criticism is that the second-order statistics of Experiment 2 are only presented during the middle frames of our stimuli as the apertures largely obscure the contour structure during the beginning and end frames. The criticism is valid because the strength of the second-order relations falls with increasing distance between elements (Geisler et al., 2001). Since the full contour structure of the stimulus is only exposed during the middle frames of the trial, the mean distance between elements will be larger during the first and last frames thus reducing the strength of the second-order statistics. To answer this criticism we repeated Experiment 2 in full, but slowed down the translation of the underlying carrier from 3.93 to 1.00 deg/s so that the contour structure was exposed for the full duration of the trial. The results are shown in Figure 8 for subjects DK, JG and SD. Results are consistent with Experiment 2 and reveal no significant difference between the high-pass un-scrambled and scrambled conditions but again reveal that the scrambling significantly lowers the precision of observers in the broadband conditions. It should be noted that performance is worse at the slower carrier speeds of the control experiment, a finding that is expected because a slower carrier speed and identical trial duration will reveal much less of the carrier to the observer. 
Figure 8
 
Control Experiment 2 repeats Experiment 2 using a slower carrier speed so that the full contour structure is presented to the observer on each frame. Results follow the same pattern as Experiment 2 with scrambling always causing a significant increase in observers' threshold in the broadband (gray triangles) but not the high-pass condition (black circles).
Figure 8
 
Control Experiment 2 repeats Experiment 2 using a slower carrier speed so that the full contour structure is presented to the observer on each frame. Results follow the same pattern as Experiment 2 with scrambling always causing a significant increase in observers' threshold in the broadband (gray triangles) but not the high-pass condition (black circles).
Model
We have applied a model of V1 directionally selective (DS) neurons to the stimuli employed in Experiment 2 with the aim of examining the effect of scrambling and high-pass filtering on our stimulus. It is worth noting that initial attempts to model the data using DS filters tuned only to the object speed where unable to match psychophysical thresholds, even in the absence of noise. This suggests that for ‘natural’ stimuli (exhibiting irregular orientation structure) motion signals must be integrated across a broad range of DS filters to capture the full expression of component motion (i.e. stemming from all orientations and speeds in the stimulus). Accordingly, the model included DS filters tuned to speeds and SF's above and below that of the carrier signal. 
The aim of the model was to illustrate the interaction between the motion energy model of V1 neurons (Adelson & Bergen, 1985) and the stimulus employed in Experiment 2. In this regard the approach taken was similar to that of Mante and Carandini (2005) who demonstrated that the motion energy model was effective in predicting how the signal in an optical imaging study varies as a function of a translating bars orientation (relative to motion), length and speed (Basole, White, & Fitzpatrick, 2003). 
Model details
In this section we explore the interaction between the motion energy model of V1 directionally selective (DS) neurons (Adelson & Bergen, 1985) and the stimuli used in Experiment 2. To explore how the motion energy varied across all the orientations, speed and SF's present in our stimulus it was helpful to define a V1 neuron's speed tuning at the ratio between the (peak) temporal and spatial tuning of the neuron (Equation A2, 1). The full battery of DS filters could then be defined by their direction, speed and spatial-frequency tuning as follows:
  1.  
    Thirty-two directions evenly spaced around the clock.
  2.  
    Thirteen evenly spaced speeds from 0% (static) to 150% of the carrier signal speed (3.95 deg/s).
  3.  
    Eight SFs from 50% to 700% of the peak SF of the broadband carrier signal (0.75 c/deg) in eight half-octave steps.
This resulted in the temporal frequency tuning of the DS filters varying across each SF channel (see Figure 9). The spatial frequency and directional bandwidth of all the model neurons was held constant at 1.5 octaves and 45° (half width and full height) respectively in keeping with the observed bandwidths of primate area V1 (De Valois, Yund, & Hepler, 1982; Snowden, Treue, & Andersen, 1992).
Figure 9
 
(a) Hypothetical motion energy of a rigidly translating isotropic stimulus plotted in the speed-direction space used in Figure 10. The x-axis depicts the angular separation between the veridical object direction and the direction tuning of the DS filters whilst the y-axis plots the speed tuning of the DS filters as a percentage of the object speed. (b) Plot of the changing temporal frequencies used as a function of the spatial frequency tuning of the DS filters. (c) The temporal frequency tuning of the DS filters minus the peak temporal frequency of the stimulus. Note that the pattern of motion energy shown in Figure 10 closely follows the peak temporal frequency tuning of the stimulus.
Figure 9
 
(a) Hypothetical motion energy of a rigidly translating isotropic stimulus plotted in the speed-direction space used in Figure 10. The x-axis depicts the angular separation between the veridical object direction and the direction tuning of the DS filters whilst the y-axis plots the speed tuning of the DS filters as a percentage of the object speed. (b) Plot of the changing temporal frequencies used as a function of the spatial frequency tuning of the DS filters. (c) The temporal frequency tuning of the DS filters minus the peak temporal frequency of the stimulus. Note that the pattern of motion energy shown in Figure 10 closely follows the peak temporal frequency tuning of the stimulus.
The stimuli were accurate reconstructions of trials used in Experiment 2 in terms of the aperture positions and the spatial (256 * 256) and temporal resolution (26 frames). However, to avoid the artifacts introduced by the horizontal/vertical pixel raster, the direction of motion on each trial was randomized. 
Convolution of the signal and sensor took place in the Fourier domain and was inverse-transformed back into the spatial domain. The square root of the sum of the square of the real and imaginary components was taken to represent the motion energy at each point in space for each DS filter, a computation that is formally equivalent to the full rectified square of odd and even phase neurons to generate a phase invariant output (Adelson & Bergen, 1985). A global motion analysis was achieved by collapsing the spatial domain and summing across all DS filters tuned to the same spatiotemporal frequency and direction. Each spatial frequency channel could then be represented as a 2D speed ‘vs.’ direction image (as illustrated in Figure 9a), in which the intensity of each region represents the global sum of motion energy across DS filters whose velocity tuning is denoted by the regions position in the image. The only filter normalization employed was to divide the output of each neuron by the sum of the absolute of the receptive field across space and time; this had the effect of evening out the expected 1/f spatiotemporal frequency spectrum. No gain control, normalization or inhibition occurred between neurons. 
Noise and the sampling rate of neurons were not considered essential to the model output because discrimination thresholds were not derived from the output of the neurons. Additional factor such as the addition of Poisson noise (e.g. Dakin et al., 2005) would have been necessary if direction discrimination thresholds were to be predicted. Further additional complexity could have been added by varying the bandwidths of the V1 neurons as a function of spatial or temporal frequency as both the physiology (e.g. Bair & Movshon, 2004) or psychophysics (e.g. Burr, 1981) would deem necessary, but this would make the resulting motion energy more complex to analyze. For instance it would be more difficult to ascertain whether the directional bandwidth of the signal was the result of the stimulus or the sensor. By keeping the bandwidth of the sensor constant (in octaves) in the SF domain and constant across the speed tuning of the sensor, the changes in signal bandwidth across these dimensions could be attributed to the stimulus, not the sensor. 
Model results
Figure 10 reveals the interaction between the stimulus used in Experiment 2 and the motion energy model of V1 directionally sensitive neurons (Adelson & Bergen, 1985). Each row illustrates the motion energy averaged across 256 example trials for one of the four conditions of Experiment 2 (depicted in the leftmost row). The illustrations in each column show the motion energy as a function of the speed and direction of the DS filter within each SF channel (see Figure 9a). For image clarity the motion energy across each condition (each row of Figure 10) was normalized between 0–1 and the conditions and sensors are depicted at the same spatial scale in the leftmost column and bottom row respectively. Note the spatial frequency of the broadband carrier signal and DS filter are matched at 0.75 c/deg. 
Figure 10
 
‘Raw’ motion energy plots for the 4 conditions used in Experiment 2. Motion energy is plotted as a function of the speed and direction tuning of the model DS filters as illustrated in Figure 10a; each SF is plotted separately in each column from ‘fine’ to ‘coarse’ scale. (a) Motion energy for the band-pass unscrambled condition of Experiment 2, note how the peak of motion energy follows the temporal frequency tuning of the DS filters, not the speed tuning and that the motion energy is centered on the veridical direction only when the SF of the carrier signal and DS filters are matched (0.75 c/deg). (b) Motion energy for the band-pass scrambled condition, note how the directional bandwidth is higher the low SF's relative to the unscrambled condition. (c and d) Motion energy for the high-pass conditions; the motion energy is concentrated in the high-SF channel, and the directional bandwidth is least in the high-pass conditions, reflecting decreased superposition of signals from the lower SF channels.
Figure 10
 
‘Raw’ motion energy plots for the 4 conditions used in Experiment 2. Motion energy is plotted as a function of the speed and direction tuning of the model DS filters as illustrated in Figure 10a; each SF is plotted separately in each column from ‘fine’ to ‘coarse’ scale. (a) Motion energy for the band-pass unscrambled condition of Experiment 2, note how the peak of motion energy follows the temporal frequency tuning of the DS filters, not the speed tuning and that the motion energy is centered on the veridical direction only when the SF of the carrier signal and DS filters are matched (0.75 c/deg). (b) Motion energy for the band-pass scrambled condition, note how the directional bandwidth is higher the low SF's relative to the unscrambled condition. (c and d) Motion energy for the high-pass conditions; the motion energy is concentrated in the high-SF channel, and the directional bandwidth is least in the high-pass conditions, reflecting decreased superposition of signals from the lower SF channels.
Initial inspection reveals the motion energy of the high-pass condition to be (unsurprisingly) concentrated in the high SF channels. However the pattern of motion energy in the broadband condition is more complex. To understand the distribution of motion energy in the broadband conditions it is important to note that spatial and temporal frequencies are independently coded in many V1 neurons (Foster, Gaska, Nagler, & Pollen, 1985; Priebe, Lisberger, & Movshon, 2006; Tolhurst & Movshon, 1975)—when there is a mismatch between the SF of a stimulus and the sensor, the speed tuning of the neuron is lost and the motion energy (in this SF channel) will be greatest when the temporal frequency of the DS filter and the stimulus is matched. For a rigidly translating band-pass (or low-cut) stimulus such as ours this results in component motion (occurring at slower speeds) only being captured in the high-SF channels in accordance with Equation A2, 1. To highlight this point Figure 10c plots the difference between the temporal frequency tuning of the DS neurons and the peak temporal frequency tuning of the stimulus. Note how the peak of motion energy in Figure 10 closely follows the zero temporal frequency difference. 
Comparison of Figures 10a and 10b shows that the effect of scrambling is to dramatically increase the bandwidth of the signal in the lower SFs indicating that scrambling leads the motion sensors to detect spurious correlations at low SF's. Finally, the directional bandwidths are sharper in the high-pass conditions, reflecting a lower superposition of signals across SF channels. This suggests the low frequency component of the broadband stimuli leads to ‘masking’ of the high frequencies and provides a plausible explanation for the higher psychophysical thresholds observed in the broadband conditions. 
Implications for models of global motion processing
The changing nature of the signal across SF channels highlights the independence the spatial and temporal tuning of V1 neurons (Foster et al., 1985; Priebe et al., 2006; Tolhurst & Movshon, 1975) while conversely showing that stimulus variables such as orientation and speed are not independently coded in area V1 (see Mante & Carandini, 2005). It should be noted that the distinct pattern of motion signals across SF channels is determined by the low-cut SF profile of our stimuli. In contrast, if the stimulus was fractal and isotropic the full expression of component motion would be found within each SF channel. However in naturally occurring stimuli the SF profile is likely to vary between a broadband and a band pass profile and is unlikely to be isotropic. Accordingly the broadband integration of signals across spatial frequencies observed in global motion studies (Bex & Dakin, 2002; Schrater et al., 2000) appears necessary to capture the full expression of component motion (occurring across a range of speeds and orientations) despite the increased vulnerability to noise that such broadband integration brings (Bex & Dakin, 2002). 
Appendix A
 
g θ = 1 σ x 2 σ y 2 σ t 2 exp ( x 2 σ x 2 y 2 σ y 2 t 2 σ t 2 + i ( ω x x θ + ω t t ) )
(A1)
 
s = t f s f
(A2)
 
Acknowledgments
We thank John Greenwood for his support. 
This research was supported by the Wellcome Trust. 
Commercial relationships: none. 
Corresponding author: D. Kane. 
Email: d.kane@ucl.ac.uk. 
Address: UCL Institute of Ophthalmology, University College London, 11-43 Bath Street, London EC1V 9EL, UK. 
References
Adelson, E. H. Bergen, J. R. (1985). Spatiotemporal energy models for the perception of motion. Journal of the Optical Society of America A, Optics and Image Science, 2, 284–299. [PubMed] [CrossRef] [PubMed]
Adelson, E. H. Movshon, J. A. (1982). Phenomenal coherence of moving visual patterns. Nature, 300, 523–525. [PubMed] [CrossRef] [PubMed]
Albright, T. D. (1984). Direction and orientation selectivity of neurons in visual area MT of the macaque. Journal of Neurophysiology, 52, 1106–1130. [PubMed] [PubMed]
Amano, K. Edwards, M. Badcock, D. R. Nishida, S. Y. (2009). Adaptive pooling of visual motion signals by the human visual system revealed with a novel multi-element stimulus [Abstract]. Journal of Vision, 9, (3):4, 1–25, http://journalofvision.org/9/3/4/, doi:10.1167/9.3.4. [CrossRef] [PubMed]
Anderson, S. J. Burr, D. C. (1987). Receptive field size of human motion detection units. Vision Research, 27, 621–635. [PubMed] [CrossRef] [PubMed]
Ashida, H. Osaka, N. (1994). Difference of spatial frequency selectivity between static and flicker motion aftereffects. Perception, 23, 1313–1320. [PubMed] [CrossRef] [PubMed]
Attneave, F. (1954). Some informational aspects of visual perception. Psychology Review, 61, 183–193. [PubMed] [CrossRef]
Bair, W. Movshon, J. A. (2004). Adaptive temporal integration of motion in direction-selective neurons in macaque visual cortex. Journal of Neuroscience, 24, 7305–7323. [PubMed] [CrossRef] [PubMed]
Baker, Jr., C. L. Baydala, A. Zeitouni, N. (1989). Optimal displacement in apparent motion. Vision Research, 29, 849–859. [PubMed] [CrossRef] [PubMed]
Barlow, H. (1961). Sensory communication. (pp. 217–234). Cambridge, MA: MIT Press.
Basole, A. White, L. E. Fitzpatrick, D. (2003). Mapping multiple features in the population response of visual cortex. Nature, 423, 986–990. [PubMed] [CrossRef] [PubMed]
Bex, P. J. Dakin, S. C. (2002). Comparison of the spatial-frequency selectivity of local and global motion detectors. Journal of the Optical Society of America A, Optics, Image Science, and Vision, 19, 670–677. [PubMed] [CrossRef] [PubMed]
Bex, P. J. Dakin, S. C. (2003). Motion detection and the coincidence of structure at high and low spatial frequencies. Vision Research, 43, 371–383. [PubMed] [CrossRef] [PubMed]
Bex, P. J. Simmers, A. J. Dakin, S. C. (2001). Snakes and ladders: The role of temporal modulation in visual contour integration. Vision Research, 41, 3775–3782. [PubMed] [CrossRef] [PubMed]
Blakemore, C. Campbell, F. W. (1969). On the existence of neurones in the human visual system selectively sensitive to the orientation and size of retinal images. The Journal of Physiology, 203, 237–260. [PubMed] [Article] [CrossRef] [PubMed]
Bowns, L. (1996). Evidence for a feature tracking explanation of why type II plaids move in the vector sum direction at short durations. Vision Research, 36, 3685–3694. [PubMed] [CrossRef] [PubMed]
Bowns, L. Alais, D. (2006). Large shifts in perceived motion direction reveal multiple global motion solutions. Vision Research, 46, 1170–1177. [PubMed] [CrossRef] [PubMed]
Bradley, D. C. Goyal, M. S. (2008). Velocity computation in the primate visual system. Nature Reviews Neuroscience, 9, 686–695. [PubMed] [CrossRef] [PubMed]
Brainard, D. H. (1997). The Psychophysics Toolbox. Spatial Vision, 10, 433–436. [PubMed] [CrossRef] [PubMed]
Burr, D. C. (1981). Temporal summation of moving images by the human visual system. Proceedings of the Royal Society of London B: Biological Science, 211, 321–339. [PubMed] [CrossRef]
Cleary, R. Braddick, O. J. (1990). Direction discrimination for band-pass filtered random dot kinematograms. Vision Research, 30, 303–316. [PubMed] [CrossRef] [PubMed]
Cornsweet, T. (1970). Visual perception. New York: Academic.
Craik, K. (1966). The nature of psychology: A selection of papers, essays and other writings by the late K. J. W Craik. London: Cambridge University Press.
Dakin, S. C. Bex, P. J. (2003). Natural image statistics mediate brightness ‘filling in’. Proceedings of the Royal Society B: Biological Sciences, 270, 2341–2348. [PubMed] [Article] [CrossRef]
Dakin, S. C. Hess, R. F. (1998). Spatial-frequency tuning of visual contour integration. Journal of the Optical Society of America A, Optics, Image Science, and Vision, 15, 1486–1499. [PubMed] [CrossRef] [PubMed]
Dakin, S. C. Mareschal, I. Bex, P. J. (2005). Local and global limitations on direction integration assessed using equivalent noise analysis. Vision Research, 45, 3027–3049. [PubMed] [CrossRef] [PubMed]
De Valois, R. L. Yund, E. W. Hepler, N. (1982). The orientation and direction selectivity of cells in macaque visual cortex. Vision Research, 22, 531–544. [PubMed] [CrossRef] [PubMed]
Eagle, R. A. Rogers, B. J. (1996). Motion detection is limited by element density not spatial frequency. Vision Research, 36, 545–558. [PubMed] [CrossRef] [PubMed]
Field, D. J. Hayes, A. Hess, R. F. (1993). Contour integration by the human visual system: Evidence for a local “association field”. Vision Research, 33, 173–193. [PubMed] [CrossRef] [PubMed]
Foster, K. Gaska, J. Nagler, M. Pollen, D. (1985). Spatial and temporal frequency selectivity of neurones in visual cortical areas V1 and V2 of the macaque monkey. The Journal of Physiology, 365, 331–363. [PubMed] [Article] [CrossRef] [PubMed]
Fredericksen, R. E. Bex, P. J. Verstraten, F. A. (1997). How big is a Gabor patch, and why should we care? Journal of the Optical Society of America A, Optics, Image Science, and Vision, 14, 1–12. [PubMed] [CrossRef] [PubMed]
Geisler, W. S. Perry, J. S. Super, B. J. Gallogly, D. P. (2001). Edge co-occurrence in natural images predicts contour grouping performance. Vision Research, 41, 711–724. [PubMed] [CrossRef] [PubMed]
Hess, R. F. Dakin, S. C. (1997). Absence of contour linking in peripheral vision. Nature, 390, 602–604. [PubMed] [CrossRef] [PubMed]
Hubel, D. H. Wiesel, T. N. (1968). Receptive fields and functional architecture of monkey striate cortex. The Journal of Physiology, 195, 215–243. [PubMed] [Article] [CrossRef] [PubMed]
Ledgeway, T. (1996). How similar must the Fourier spectra of the frames of a random-dot kinematogram be to support motion perception? Vision Research, 36, 2489–2495. [PubMed] [CrossRef] [PubMed]
Ledgeway, T. Hess, R. F. (2002). Rules for combining the outputs of local motion detectors to define simple contours. Vision Research, 42, 653–659. [PubMed] [CrossRef] [PubMed]
Ledgeway, T. Hess, R. F. (2006). The spatial frequency and orientation selectivity of the mechanisms that extract motion-defined contours. Vision Research, 46, 568–578. [PubMed] [CrossRef] [PubMed]
Ledgeway, T. Hess, R. F. Geisler, W. S. (2005). Grouping local orientation and direction signals to extract spatial contours: Empirical tests of “association field” models of contour integration. Vision Research, 45, 2511–2522. [PubMed] [CrossRef] [PubMed]
Lorenceau, J. Alais, D. (2001). Form constraints in motion binding. Nature Neuroscience, 4, 745–751. [PubMed] [CrossRef] [PubMed]
Lorenceau, J. Shiffrar, M. (1992). The influence of terminators on motion integration across space. Vision Research, 32, 263–273. [PubMed] [CrossRef] [PubMed]
Majaj, N. J. Carandini, M. Movshon, J. A. (2007). Motion integration by neurons in macaque MT is local, not global. Journal of Neuroscience, 27, 366–370. [PubMed] [Article] [CrossRef] [PubMed]
Mante, V. Carandini, M. (2005). Mapping of stimulus energy in primary visual cortex. Journal of Neurophysiology, 94, 788–798. [PubMed] [Article] [CrossRef] [PubMed]
McDermott, J. Weiss, Y. Adelson, E. H. (2001). Beyond junctions: Nonlocal form constraints on motion interpretation. Perception, 30, 905–923. [PubMed] [CrossRef] [PubMed]
Mingolla, E. Todd, J. T. Norman, J. F. (1992). The perception of globally coherent motion. Vision Research, 32, 1015–1031. [PubMed] [CrossRef] [PubMed]
Morgan, M. J. (1992). Spatial filtering precedes motion detection. Nature, 355, 344–346. [PubMed] [CrossRef] [PubMed]
Movshon, J. A. Adelson, E. H. Gizzi, M. S. Newsome, W. T. (1986). The analysis of moving visual patterns..
Nishida, S. (2004). Motion-based analysis of spatial patterns by the human visual system. Current Biology, 14, 830–839. [CrossRef] [PubMed]
(1958). Contour perception, illusion and reality. Journal of the Optical Society of America A, Optics, Image Science, and Vision, 48, 112–119.
Pelli, D. G. (1997). The VideoToolbox software for visual psychophysics: Transforming numbers into movies. Spatial Vision, 10, 437–442. [PubMed] [CrossRef] [PubMed]
Pelli, D. G. Zhang, L. (1991). Accurate control of contrast on microcomputer displays. Vision Research, 31, 1337–1350. [PubMed] [CrossRef] [PubMed]
Perrone, J. A. (2004). A visual motion sensor based on the properties of V1 and MT neurons. Vision Research, 44, 1733–1755. [PubMed] [CrossRef] [PubMed]
Polat, U. Sagi, D. (1993). Lateral interactions between spatial channels: Suppression and facilitation revealed by lateral masking experiments. Vision Research, 33, 993–999. [PubMed] [CrossRef] [PubMed]
Priebe, N. J. Lisberger, S. G. Movshon, J. A. (2006). Tuning for spatiotemporal frequency and speed in directionally selective neurons of macaque striate cortex. Journal of Neuroscience, 26, 2941–2950. [PubMed] [Article] [CrossRef] [PubMed]
Ramachandran, V. S. Cavanagh, P. (1987). Motion capture anisotropy. Vision Research, 27, 97–106. [PubMed] [CrossRef] [PubMed]
Rust, N. C. Mante, V. Simoncelli, E. P. Movshon, J. A. (2006). How MT cells analyze the motion of visual patterns. Nature Neuroscience, 9, 1421–1431. [PubMed] [CrossRef] [PubMed]
Schrater, P. R. Knill, D. C. Simoncelli, E. P. (2000). Mechanisms of visual motion detection. Nature Neuroscience, 3, 64–68. [PubMed] [CrossRef] [PubMed]
Shipley, T. F. Kellman, P. J. (1993). Optical tearing in spatiotemporal boundary formation: When do local element motions produce boundaries, form, and global motion? Spatial Vision, 7, 323–339. [PubMed] [CrossRef] [PubMed]
Simoncelli, E. P. Heeger, D. J. (1998). A model of neuronal responses in visual area MT. Vision Research, 38, 743–761. [PubMed] [CrossRef] [PubMed]
Snowden, R. J. Treue, S. Andersen, R. A. (1992). The response of neurons in areas V1 and MT of the alert rhesus monkey to moving random dot patterns. Experimental Brain Research, 88, 389–400. [PubMed] [CrossRef] [PubMed]
Tolhurst, D. J. Movshon, J. A. (1975). Spatial and temporal contrast sensitivity of striate cortical neurones. Nature, 257, 674–675. [PubMed] [CrossRef] [PubMed]
von Grunau, M. Dube, S. (1992). Comparing local and remote motion aftereffects. Spatial Vision, 6, 303–314. [PubMed] [CrossRef] [PubMed]
Weiss, Y. Adelson, E. H. (1998). Slow and smooth: A Bayesian theory for the combination of local motion signals in human vision. Technical Report 1624, MIT AI lab, 1998,
Wertheimer, M. (1958). Principles of perceptual organization..
Wilson, H. R. Ferrera, V. P. Yo, C. (1992). A psychophysically motivated model for two-dimensional motion perception. Visual Neuroscience, 9, 79–97. [PubMed] [CrossRef] [PubMed]
Figure 1
 
The influence of form on motion integration (Lorenceau & Shiffrar, 1992). The movement of the bars is identical in (a) and (b) (sinusoidally translating in the direction perpendicular to their orientation). Yet in (a) the bars appear to move independently of each other, but in (b), when the apertures are made explicit, the individual components ‘cohere’ and appear to move in directions consistent with a rotating diamond. (c) The ambiguity associated with a moving bar. The exact speed and direction (velocity) of the bar is unknown, however it is known that the veridical velocity must fall on a ‘constraint’ line that can be inferred from the speed perpendicular to the bars' orientation, as shown in (d)—By solving for two or more such lines, a unique vector can be found, in the case of a rigid object moving in 2D space this vector reflects the veridical velocity.
Figure 1
 
The influence of form on motion integration (Lorenceau & Shiffrar, 1992). The movement of the bars is identical in (a) and (b) (sinusoidally translating in the direction perpendicular to their orientation). Yet in (a) the bars appear to move independently of each other, but in (b), when the apertures are made explicit, the individual components ‘cohere’ and appear to move in directions consistent with a rotating diamond. (c) The ambiguity associated with a moving bar. The exact speed and direction (velocity) of the bar is unknown, however it is known that the veridical velocity must fall on a ‘constraint’ line that can be inferred from the speed perpendicular to the bars' orientation, as shown in (d)—By solving for two or more such lines, a unique vector can be found, in the case of a rigid object moving in 2D space this vector reflects the veridical velocity.
Figure 2
 
Examples of the stimuli employed. (a) Broadband. (b) Low-pass, Gaussian filtered version of (a). (c) “Leaky” high-pass—generated by subtracting a Gaussian blurred version of (a) from (a). (d) (Strictly) High-pass stimulus—generated by further subtracting a Gaussian blurred versions of (c) (see Methods). (e) Amplitude spectra of (a, b, c and d), note how the low frequency component of (c) is leaky but the no-illusion stimuli reaches an amplitude of zero at a low SF.
Figure 2
 
Examples of the stimuli employed. (a) Broadband. (b) Low-pass, Gaussian filtered version of (a). (c) “Leaky” high-pass—generated by subtracting a Gaussian blurred version of (a) from (a). (d) (Strictly) High-pass stimulus—generated by further subtracting a Gaussian blurred versions of (c) (see Methods). (e) Amplitude spectra of (a, b, c and d), note how the low frequency component of (c) is leaky but the no-illusion stimuli reaches an amplitude of zero at a low SF.
Figure 3
 
(a, b and c) Direction discrimination thresholds for three observers (DK, JG & SD), measured with four underlying carrier signals (broadband, leaky high-pass, strictly high-pass and low-pass—see text for description). Error bars indicate 95% confidence intervals. Note that performance was worse over the smaller aperture (dark gray) condition indicating that performance was not at ceiling. (d) Mean thresholds for the three observers after normalization to zero to correct biases, then pooled across participants. Thresholds were lower for high-pass than broadband conditions, but not significantly so. Thresholds were significantly higher for low-pass stimuli in the smaller aperture condition.
Figure 3
 
(a, b and c) Direction discrimination thresholds for three observers (DK, JG & SD), measured with four underlying carrier signals (broadband, leaky high-pass, strictly high-pass and low-pass—see text for description). Error bars indicate 95% confidence intervals. Note that performance was worse over the smaller aperture (dark gray) condition indicating that performance was not at ceiling. (d) Mean thresholds for the three observers after normalization to zero to correct biases, then pooled across participants. Thresholds were lower for high-pass than broadband conditions, but not significantly so. Thresholds were significantly higher for low-pass stimuli in the smaller aperture condition.
Figure 4
 
(a–d) Middle frames of the four conditions used in Experiment 2 (contrast has been maximized to improve visibility). (a) Underlying stimuli were similar to Experiment 1, but were viewed through a series of small stationary apertures that were centered on the contours in the middle frame of the sequence. (b) Global structure was disrupted by randomly swapping the signals viewed behind each aperture. (c, d) Shows a high-pass filtered version of the same image. (e–g) depict the first, middle and last frames of an example broad-band unscrambled trial. For illustration purposes the underlying image is superimposed upon the occluding surface of the apertures. Note that apertures were densely place over the whole contour structure of the image and that the contour passes through the middle of each aperture during the middle frame (f).
Figure 4
 
(a–d) Middle frames of the four conditions used in Experiment 2 (contrast has been maximized to improve visibility). (a) Underlying stimuli were similar to Experiment 1, but were viewed through a series of small stationary apertures that were centered on the contours in the middle frame of the sequence. (b) Global structure was disrupted by randomly swapping the signals viewed behind each aperture. (c, d) Shows a high-pass filtered version of the same image. (e–g) depict the first, middle and last frames of an example broad-band unscrambled trial. For illustration purposes the underlying image is superimposed upon the occluding surface of the apertures. Note that apertures were densely place over the whole contour structure of the image and that the contour passes through the middle of each aperture during the middle frame (f).
Figure 5
 
Direction discrimination thresholds measured with locally apertured stimuli for three observers (DK, JC & SD). Dashed lines indicate the mean direction discrimination threshold for each subject for the broadband stimuli from Experiment 1. Thresholds for the broadband stimuli (gray triangles) are always higher than the high-pass (black circles) stimuli. The effect of scrambling is highly significant in the broadband stimuli whilst only a small effect is observed in the high-pass stimuli. This suggests that ‘coherent’ global structure is not necessary to achieve low discrimination thresholds but that disrupting global structure is detrimental to performance when the low frequencies are present. (d, e) depict the motion energy at 3.6 c/deg and 0.75/cdeg respectively across a channel of V1 neurons tuned to the object speed. Note how the distribution of motion energy is identical in (d) but not in (e) highlighting how scrambling dramatically increases the direction bandwidth of the signal at low SF's (see 1 for model details).
Figure 5
 
Direction discrimination thresholds measured with locally apertured stimuli for three observers (DK, JC & SD). Dashed lines indicate the mean direction discrimination threshold for each subject for the broadband stimuli from Experiment 1. Thresholds for the broadband stimuli (gray triangles) are always higher than the high-pass (black circles) stimuli. The effect of scrambling is highly significant in the broadband stimuli whilst only a small effect is observed in the high-pass stimuli. This suggests that ‘coherent’ global structure is not necessary to achieve low discrimination thresholds but that disrupting global structure is detrimental to performance when the low frequencies are present. (d, e) depict the motion energy at 3.6 c/deg and 0.75/cdeg respectively across a channel of V1 neurons tuned to the object speed. Note how the distribution of motion energy is identical in (d) but not in (e) highlighting how scrambling dramatically increases the direction bandwidth of the signal at low SF's (see 1 for model details).
Figure 6
 
Results for Experiment 3 for three observers (DK, SD & JG). Direction discrimination thresholds for scrambled (black circles) and unscrambled (gray triangles) apertured stimuli are shown as a function of the standard deviation of Gaussian blur applied to the underlying contour image. The curves show the line of best fit generated by fitting a straight line to the log of the data, the slope of which is shown in (d–f) for unscrambled (gray bars) and scrambled (black bars) conditions, for observers DK, SD and JG respectively. Error bars show 95% confidence intervals on all graphs. The exponent is always greater in the scrambled condition, (significantly for DK and SD). This suggests that increasing reliance upon the low frequency component is of greater detriment to the scrambled stimuli, further indicating that it is the low-frequency component of the signal rather than the second-order statistics that is driving the effect of scrambling.
Figure 6
 
Results for Experiment 3 for three observers (DK, SD & JG). Direction discrimination thresholds for scrambled (black circles) and unscrambled (gray triangles) apertured stimuli are shown as a function of the standard deviation of Gaussian blur applied to the underlying contour image. The curves show the line of best fit generated by fitting a straight line to the log of the data, the slope of which is shown in (d–f) for unscrambled (gray bars) and scrambled (black bars) conditions, for observers DK, SD and JG respectively. Error bars show 95% confidence intervals on all graphs. The exponent is always greater in the scrambled condition, (significantly for DK and SD). This suggests that increasing reliance upon the low frequency component is of greater detriment to the scrambled stimuli, further indicating that it is the low-frequency component of the signal rather than the second-order statistics that is driving the effect of scrambling.
Figure 7
 
Control Experiment 1 examined the ability of the observers to locally resolve the information presented in each aperture of Experiment 2. Results demonstrate that performance improves rapidly with increasing aperture number and strongly suggests a global analysis is needed to achieve the level of precision observers achieved in Experiment 2.
Figure 7
 
Control Experiment 1 examined the ability of the observers to locally resolve the information presented in each aperture of Experiment 2. Results demonstrate that performance improves rapidly with increasing aperture number and strongly suggests a global analysis is needed to achieve the level of precision observers achieved in Experiment 2.
Figure 8
 
Control Experiment 2 repeats Experiment 2 using a slower carrier speed so that the full contour structure is presented to the observer on each frame. Results follow the same pattern as Experiment 2 with scrambling always causing a significant increase in observers' threshold in the broadband (gray triangles) but not the high-pass condition (black circles).
Figure 8
 
Control Experiment 2 repeats Experiment 2 using a slower carrier speed so that the full contour structure is presented to the observer on each frame. Results follow the same pattern as Experiment 2 with scrambling always causing a significant increase in observers' threshold in the broadband (gray triangles) but not the high-pass condition (black circles).
Figure 9
 
(a) Hypothetical motion energy of a rigidly translating isotropic stimulus plotted in the speed-direction space used in Figure 10. The x-axis depicts the angular separation between the veridical object direction and the direction tuning of the DS filters whilst the y-axis plots the speed tuning of the DS filters as a percentage of the object speed. (b) Plot of the changing temporal frequencies used as a function of the spatial frequency tuning of the DS filters. (c) The temporal frequency tuning of the DS filters minus the peak temporal frequency of the stimulus. Note that the pattern of motion energy shown in Figure 10 closely follows the peak temporal frequency tuning of the stimulus.
Figure 9
 
(a) Hypothetical motion energy of a rigidly translating isotropic stimulus plotted in the speed-direction space used in Figure 10. The x-axis depicts the angular separation between the veridical object direction and the direction tuning of the DS filters whilst the y-axis plots the speed tuning of the DS filters as a percentage of the object speed. (b) Plot of the changing temporal frequencies used as a function of the spatial frequency tuning of the DS filters. (c) The temporal frequency tuning of the DS filters minus the peak temporal frequency of the stimulus. Note that the pattern of motion energy shown in Figure 10 closely follows the peak temporal frequency tuning of the stimulus.
Figure 10
 
‘Raw’ motion energy plots for the 4 conditions used in Experiment 2. Motion energy is plotted as a function of the speed and direction tuning of the model DS filters as illustrated in Figure 10a; each SF is plotted separately in each column from ‘fine’ to ‘coarse’ scale. (a) Motion energy for the band-pass unscrambled condition of Experiment 2, note how the peak of motion energy follows the temporal frequency tuning of the DS filters, not the speed tuning and that the motion energy is centered on the veridical direction only when the SF of the carrier signal and DS filters are matched (0.75 c/deg). (b) Motion energy for the band-pass scrambled condition, note how the directional bandwidth is higher the low SF's relative to the unscrambled condition. (c and d) Motion energy for the high-pass conditions; the motion energy is concentrated in the high-SF channel, and the directional bandwidth is least in the high-pass conditions, reflecting decreased superposition of signals from the lower SF channels.
Figure 10
 
‘Raw’ motion energy plots for the 4 conditions used in Experiment 2. Motion energy is plotted as a function of the speed and direction tuning of the model DS filters as illustrated in Figure 10a; each SF is plotted separately in each column from ‘fine’ to ‘coarse’ scale. (a) Motion energy for the band-pass unscrambled condition of Experiment 2, note how the peak of motion energy follows the temporal frequency tuning of the DS filters, not the speed tuning and that the motion energy is centered on the veridical direction only when the SF of the carrier signal and DS filters are matched (0.75 c/deg). (b) Motion energy for the band-pass scrambled condition, note how the directional bandwidth is higher the low SF's relative to the unscrambled condition. (c and d) Motion energy for the high-pass conditions; the motion energy is concentrated in the high-SF channel, and the directional bandwidth is least in the high-pass conditions, reflecting decreased superposition of signals from the lower SF channels.
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×