The perception and neural processing of frontoparallel two-dimensional (2D) motion have been studied extensively. In the classical primate motion-processing circuit, directionally selective neurons in striate cortex (V1) respond to the one-dimensional (1D) velocity component orthogonal to their preferred orientation. The disambiguated 2D direction of complex objects and patterns is then encoded explicitly by later stages of motion processing, most notably in area MT (Adelson & Movshon,
1982; Rodman & Albright,
1989; Stoner & Albright,
1992). V1 neurons are also known to exhibit high degrees of selectivity for other stimulus features, including spatial and temporal frequency, and have a gradual contrast response function. MT neurons, on the other hand, are notably insensitive to many visual features (Sclar, Maunsell, & Lennie,
1990), and their responses saturate more quickly with respect to contrast.
It is widely accepted that the initial extraction of 1D component motion occurs within the relatively local spatial scale of V1 receptive fields. Although MT receptive fields are considerably larger (Felleman & Kaas,
1984), the later encoding of pattern motion does not as cleanly map on to the spatially large and generally coarse tuning properties of MT neurons. Indeed, both psychophysical and physiological studies have suggested that 2D pattern-motion encoding in MT probably reflects computations on the afferent signals from earlier visual areas. For example, the contrast and spatial frequency of individual component gratings composing pattern motion affects coherent pattern-motion perception (Adelson & Movshon,
1982). The pattern-direction selectivity of MT neurons also becomes weaker when component gratings are placed in different spatial locations within the receptive field of an MT neuron (Majaj, Carandini, & Movshon,
2007) or when component gratings are presented to each eye (Tailby, Majaj, & Movshon,
2010). Furthermore, a cascade model based on the information flow from V1 to MT captures the pattern-direction selectivity of MT neurons using simple pooling and input–output mappings (Rust, Mante, Simoncelli, & Movshon,
2006).
Compared with this rather thorough understanding of 2D motion processing for a single input stream, our understanding of the perception and processing of three-dimensional (3D) motion coming through a binocular input stream is quite nascent (Cormack, Czuba, Knöll, & Huk,
2017). Conventionally, 3D motion has often been considered as cyclopean motion, in which motion integration occurs at the point of binocular combination and disparity extraction (Julesz,
1960,
1971; Carney & Shadlen,
1993; Shadlen & Carney,
1986). These models suggest that 3D motion is extracted at the level of the binocular disparity computation, early in the visual-processing pathway (prior to area MT), and that this 3D motion information is fed into standard stages of later motion processing (Patterson,
1999). A key piece of evidence for this is that 3D motion can be seen in dynamic random-element stereograms, which contain no coherent monocular motion signals (Norcia & Tyler,
1984; Tyler & Julesz,
1978). Others, however, have argued that the motion seen in such stimuli is actually the result of high-level feature tracking and thus outside the scope of the canonical motion pathway (Lu & Sperling,
1995). Regardless, the spatial scale of binocular integration is still “local” under this view, especially for binocular combination, which requires binocular correspondence (i.e., stimulation of corresponding local retinal regions within the upper disparity limit).
There is, however, a growing body of work inconsistent with this local-disparity-based explanation of how 3D motion processing should work, and which instead indicates that at least some eye-specific motion signals are exploited at a relatively late stage. Recent findings have shown the existence of 3D direction selectivity involving either disparity-based or velocity-based cues in the extrastriate area MT (Joo et al.,
2016; Sanada & DeAngelis,
2014). Disparity-based and velocity-based 3D motion computation seem to be processed separately in area MT (Joo et al.,
2016). Critically, eye-specific velocity signals—even without conventional binocular correspondence—may be available after the canonical point of binocular combination in primary visual cortex for a velocity-based computation of 3D direction in extrastriate cortex (Rokers, Czuba, Cormack, & Huk,
2011).
In the present study, we sought to more directly characterize how eye-specific velocity signals are integrated between the eyes and across space in supporting perceptual sensitivity to 3D direction, and to test whether 3D motion processing depends on mechanisms that are either different from those for processing frontoparallel 2D motion, or perhaps integrate local 2D motion signals in a way that is unique to eye-specific processing for 3D motion computations. First, based on previous findings showing strong 3D motion selectivity in area MT with relatively large receptive-field size compared to V1 (Sanada & DeAngelis,
2014; Rokers, Cormack, & Huk,
2009), we expected that 3D motion processing would be spatially global—showing robust direction-selective adaptation effects regardless of the precise spatial match between the adaptor and test stimulus. Such global processing is consistent with the larger receptive fields in extrastriate visual areas and contrasts with more spatially local 2D motion processing which correspond to the smaller receptive fields in earlier visual areas (Hedges et al.,
2011; Kohn & Movshon,
2003). We used a psychophysical motion-adaptation paradigm using visual arrays consisting of small oriented Gabor patches (Amano, Edwards, Badcock, & Nishida,
2009; Hisakata, Hayashi, & Murakami,
2016; Lee & Lu,
2010; Rokers et al.,
2011; Scarfe & Johnston,
2011), in which we measured motion aftereffects (MAEs). We manipulated the spatial congruency between adapting and test stimuli and tested whether there were strong 3D MAEs when test stimuli were presented at unadapted locations (Snowden & Milne,
1997).
Furthermore, to test whether eye-specific velocity signals are combined for 3D direction selectivity at later stages of motion processing, we then used “pseudoplaid” stimuli, comprising small Gabor patches having random orientation with velocities specified by the intersection of constraints (Adelson & Movshon,
1982) to yield consistent global pattern motion. If eye-specific velocity signals are combined at the site of binocular combination (i.e., V1), 3D direction information would be lost because 3D motion based on the comparison of spatially local velocities between two eyes would not yield a consistent or coherent 3D direction. In contrast, if eye-specific velocity signals are present in stages at or after the estimation of pattern motion, estimates of 3D direction could still be recovered by comparing more spatially global estimates of eye-specific pattern motion. Together, the results from these applications of a psychophysical motion-adaptation paradigm point to the existence of global eye-specific motion signals that are indeed exploited in the binocular computation of a 3D motion signal at a relatively late stage—quite likely past V1, the canonical site of binocular combination for static stereopsis.