Free
Article  |   November 2013
Temporal channels and disparity representations in stereoscopic depth perception
Author Affiliations
Journal of Vision November 2013, Vol.13, 26. doi:10.1167/13.13.26
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to Subscribers Only
      Sign In or Create an Account ×
    • Get Citation

      Takahiro Doi, Maki Takano, Ichiro Fujita; Temporal channels and disparity representations in stereoscopic depth perception. Journal of Vision 2013;13(13):26. doi: 10.1167/13.13.26.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract
Abstract
Abstract:

Abstract  Stereoscopic depth perception is supported by a combination of correlation-based and match-based representations of binocular disparity. It also relies on both transient and sustained temporal channels of the visual system. Previous studies suggest that the relative contribution of the correlation-based representation (over the match-based representation) and the transient channel (over the sustained channel) to depth perception increases with the disparity magnitude. The mechanisms of the correlation-based and match-based representations may receive preferential inputs from the transient and sustained channels, respectively. We examined near/far discrimination by observers using random-dot stereograms refreshed at various rates. The relative contribution of the two representations was inferred by changing the fraction of dots that were contrast reversed between the two eyes. Both representations contributed to depth discrimination over the tested range of refresh rates. As the rate increased, the correlation-based representation increased its contribution to near/far discrimination. Another experiment revealed that the match-based representation was constructed by exploiting the variability in correlation-based disparity signals. Thus, the relative weight of the transient over sustained channel differs between the two representations. The correlation-based representation dominates depth perception with dynamic inputs. The match-based representation, which may be a nonlinear refinement of the correlation-based representation, exerts more influence on depth perception with slower inputs.

Introduction
Two types of neural representations of binocular disparity—correlation-based and match-based representations—underlie stereoscopic depth perception (Cumming & Parker, 1997; Doi, Tanabe, & Fujita, 2011; Janssen, Vogels, Liu, & Orban, 2003; Krug, Cumming, & Parker, 2004; Kumano, Tanabe, & Fujita, 2008; Parker, 2007; Tanabe, Umeda, & Fujita, 2004). These representations are characterized by disparity tuning functions of the neurons underlying perceptual decisions. In correlation-based representation, the amplitude and sign of the disparity tuning function follow the cross-correlation between images projected to the left and right eyes. In match-based representation, the matched features between the images from the two eyes determine the amplitude of the tuning function. Neurons involved in the two representations have contrasting tuning functions in response to anticorrelated random-dot stereograms (RDSs). In anticorrelated RDSs, the luminance contrast of the dots is reversed between the two eyes: White dots are replaced with black dots, and black dots are replaced with white dots against a gray background. The anticorrelation inverts the sign of cross-correlation and eliminates the matched features between the two eyes. Accordingly, the correlation-based representation has inverted tuning functions for anticorrelated RDSs relative to those for correlated RDSs (Cumming & Parker, 1997; Krug et al., 2004; Takemura, Inoue, Kawano, Quaia, & Miles, 2001), while the match-based representation has flat (zero or decreased amplitude) functions (Janssen et al., 2003; Kumano et al., 2008; Tanabe et al., 2004; Theys, Srivastava, van Loon, Goffin, & Janssen, 2012). The correlation-based and match-based representations refer to those distinctive sets of tuning curves, but they do not refer to underlying neuronal mechanisms. Specifically, our correlation-based representation does not necessarily employ a disparity energy model (Ohzawa, De-Angelis, & Freeman, 1990), and our match-based representation does not mean that the underlying mechanism is the feature matching such as the matching of second-derivative zero crossings of filtered left-eye and right-eye images (Marr & Poggio, 1979). 
The relative role of the two representations differs between fine and coarse depth perception (Doi, Tanabe, & Fujita, 2011). The match-based representation dominates fine depth perception, whereas both match-based and correlation-based representations contribute to coarse depth perception. Thus, a classic dissociation between fine and coarse stereopsis (Bishop & Henry, 1971; Tyler, 1990; Wilcox & Allison, 2009) is partly ascribed to different neural representations of disparity. Fine and coarse stereopsis also differ in the temporal properties of their neural substrate (Edwards & Schor, 1999; Kontsevich & Tyler, 2000). A sustained mechanism subserves fine disparity processing, whereas a transient mechanism increases its contribution for larger disparities. From these two sets of observations, we hypothesize that the match-based and correlation-based representations have different temporal properties: The match-based representation may be related to the sustained channel of the visual system, whereas the correlation-based representation may be related to the transient channel. 
By using the graded anticorrelation technique (Doi et al., 2011; see below), we examined the contribution of the two disparity representations to depth perception at various refresh rates of random-dot patterns. As the refresh rate was increased, we found that the relative contribution to depth perception of the correlation-based representation over the match-based representation increased (Experiment 1). Control experiments confirmed the effects even when the number of refreshes during a trial was fixed (Experiment 2). In Experiment 3, we explored the mechanisms of the match-based representation and demonstrated the role of the variability of local binocular correlation calculated through small receptive fields (RFs). 
Experiment 1: Psychometric functions shift systematically with the refresh rate of random dot patterns
Introduction
We examined the disparity representation underlying the performance of a near/far discrimination task by manipulating the fraction of anticorrelated dots (Doi et al., 2011). Graded anticorrelation achieves a dissociation of the match-based and correlation-based representations, providing stimuli with which only one of the two representations could contribute to task performance. An example stimulus of the graded anticorrelation is half-matched RDSs, which isolate the contribution of the match-based representation to stereopsis. In half-matched RDSs, half of the dots are contrast reversed (Figure 1A; Figure 1B, middle). Thus, 50% of the features are matched, providing the signal available for the match-based representation. In contrast, the correlation-based representation cannot signal disparity with this stimulus. Cross-correlation of correlated dots and that of anticorrelated dots cancel out each other, so the stimulus as a whole yields 0% correlation. Half-matched RDSs complement fully anticorrelated RDSs, which isolate the contribution of the correlation-based representation to stereopsis. Anticorrelated RDSs have 0% matching signal and nonzero correlation signal. Overall, the neural system for the correlation-based representation loses disparity selectivity for half-matched RDSs and reverses the tuning curve for fully anticorrelated RDSs (Figure 1C; blue, dashed curves). The neural system for the match-based representation retains disparity selectivity for half-matched RDSs, but loses selectivity for fully anticorrelated RDSs (Figure 1C; red, solid curves). An odd-symmetric tuning curve from a single unit is shown as an example in Figure 1C, but our definitions of the two representations are generalizable to any kinds of tuning shapes, spatial scales, and other properties, as long as the signed amplitude of the tuning curve follows the match signal or correlation signal as described in Figure 1C
Figure 1
 
Experimental rationale. (A) Screenshot of a half-matched random-dot stereogram (RDS). The RDS consists of a center disk and a surrounding annulus. Red crosses are fixation markers. During the experiments, we used red phosphors for stimuli and background while using white fixation crosses. A scale bar indicates 1°. (B) Graded contrast-reversal (anticorrelation) of random-dot stereograms (RDSs). Three representative RDSs are shown schematically. Pairs of circles denote the luminance contrast of stimulus dots projected to the left and right eyes. (Right) All pairs have matched contrast between the two eyes. Thus, the stereo half images are correlated perfectly. (Left) All pairs have reversed contrast; the stereo half images are anticorrelated perfectly. (Center) Half of the pairs have matched contrast. The half images are uncorrelated because correlated and anticorrelated dots cancel each other. (C) Ideal disparity tuning functions of the match-based (red, solid curve) and correlation-based (blue, dashed curve) representations for the three RDSs shown in (B). Only one of the two representations shows disparity-dependent modulation at the left (0% match) and center (50% match). (D) Hypothetical psychometric functions in a two-alternative near/far discrimination task. The probability of a correct choice is derived from percent match values for the match-based representation (red, solid curve) and from percent correlation values for the correlation-based representation (blue, dashed curve).
Figure 1
 
Experimental rationale. (A) Screenshot of a half-matched random-dot stereogram (RDS). The RDS consists of a center disk and a surrounding annulus. Red crosses are fixation markers. During the experiments, we used red phosphors for stimuli and background while using white fixation crosses. A scale bar indicates 1°. (B) Graded contrast-reversal (anticorrelation) of random-dot stereograms (RDSs). Three representative RDSs are shown schematically. Pairs of circles denote the luminance contrast of stimulus dots projected to the left and right eyes. (Right) All pairs have matched contrast between the two eyes. Thus, the stereo half images are correlated perfectly. (Left) All pairs have reversed contrast; the stereo half images are anticorrelated perfectly. (Center) Half of the pairs have matched contrast. The half images are uncorrelated because correlated and anticorrelated dots cancel each other. (C) Ideal disparity tuning functions of the match-based (red, solid curve) and correlation-based (blue, dashed curve) representations for the three RDSs shown in (B). Only one of the two representations shows disparity-dependent modulation at the left (0% match) and center (50% match). (D) Hypothetical psychometric functions in a two-alternative near/far discrimination task. The probability of a correct choice is derived from percent match values for the match-based representation (red, solid curve) and from percent correlation values for the correlation-based representation (blue, dashed curve).
Figure 2
 
The effects of pattern refresh rates on the relative activation of transient and sustained channels. (A) Simulated time courses of the luminance contrast at a given pixel for one eye in the RDSs. Time courses with low refresh rate (5.3 Hz, eight repetitions of the same image) and high refresh rate (42.5 Hz, only one repetition of the same image) are shown in the left and right, respectively. The contrast of three corresponds to a bright dot, one to the background, and −1 to either a dark dot or the closure of the shutter glass that occurs in every other frame. (B) The outputs of the sustained temporal channel (shown in the inset) for the example stimulus time courses shown in (A). A tick interval for the inset is 0.1 s for the x axis and 0.2 for the y axis. (C) The outputs of the transient temporal channel for the same example time courses shown in (A). The convention is the same as in (B). (D) The energy of the channel outputs as a function of refresh rate. We generated a random dot pattern 1,000 times and calculated the trial average of the signal energy (squared channel output followed by time integral) for each channel.
Figure 2
 
The effects of pattern refresh rates on the relative activation of transient and sustained channels. (A) Simulated time courses of the luminance contrast at a given pixel for one eye in the RDSs. Time courses with low refresh rate (5.3 Hz, eight repetitions of the same image) and high refresh rate (42.5 Hz, only one repetition of the same image) are shown in the left and right, respectively. The contrast of three corresponds to a bright dot, one to the background, and −1 to either a dark dot or the closure of the shutter glass that occurs in every other frame. (B) The outputs of the sustained temporal channel (shown in the inset) for the example stimulus time courses shown in (A). A tick interval for the inset is 0.1 s for the x axis and 0.2 for the y axis. (C) The outputs of the transient temporal channel for the same example time courses shown in (A). The convention is the same as in (B). (D) The energy of the channel outputs as a function of refresh rate. We generated a random dot pattern 1,000 times and calculated the trial average of the signal energy (squared channel output followed by time integral) for each channel.
Psychometric functions for a near/far discrimination task can be derived by gradually varying the percentage of matched features (percent match) or equivalently the percentage of binocular correlation (percent correlation). In a perceptual decision model, the decisions of near/far discrimination are made by comparing the activities of near-preferring neurons with those of far-preferring neurons (DeAngelis, Cumming, & Newsome, 1998; Poggio & Fischer, 1977; Poggio, Gonzalez, & Krause, 1988; Poggio & Talbot, 1981; Prince & Eagle, 2000; Shadlen, Britten, Newsome, & Movshon, 1996; Shiozaki, Tanabe, Doi, & Fujita, 2012; Uka & DeAngelis, 2004). The decision variable is a linear combination of the sensory activities of the two populations of neurons (Shadlen et al., 1996). This model leads to the prediction that the reversed tuning curves produce reversed near/far judgment, which is detected as a performance below chance level (50%) in a near/far discrimination task. The sensory activity has Poisson variability (Dean, 1981). Thus, the quality (i.e., the ratio of mean to standard deviation) of the decision variable is a monotonic function of the amplitude of sensory tuning curves (see appendix and figure 8A in Doi et al., 2011). 
Given this model, the match-based representation predicts that the percent correct of a near/far discrimination monotonically decreases from the perfect to chance level as the percent binocular match is reduced from 100% to 0% (Figure 1D). In contrast, the correlation-based representation predicts that the percent correct decreases to chance level as the percent match is reduced to 50% (0% correlation), and further decreases below chance as the percent match is reduced from 50% to 0% (0% to −100% correlation). Near/far judgment should be reversed relative to the disparity when the percent correlation is negative. The psychometric function should be odd symmetric with respect to the center of the plot. The match-based and correlation-based representations thus predict contrasting psychometric curves as a function of graded anticorrelation. These predictions are not specific to units with odd-symmetric tuning but can be generalized to a variety of units with diverse tuning shapes and spatial scales. In order to examine the temporal property of the two representations, we analyzed how the shape of psychometric functions changes with RDS pattern refresh rates. 
Methods
Task and testing procedure
We administered a single-interval, forced-choice near/far discrimination task: Subjects viewed a circular, bipartite dynamic RDS for 1.5 s and judged whether the center disk was nearer or farther than the surrounding annulus. The disparity sign and the binocular match level (i.e., the percentage of the dots with matched contrasts between the two eyes) of the center disk were varied across trials, and the magnitude of disparity was 0.24°. We chose the value 0.24° in an attempt to maximize our chance to detect effects of refresh rates on the balance between the match-based and correlation-based representations. Our previous study showed that decreasing and increasing disparity magnitudes around 0.24° shift the balance toward the match-based and correlation-based representations, respectively (Doi et al., 2011). The match level ranged from 0% to 100% with 12.5% increments. The surrounding annulus always consisted of 100% contrast-matched dots with zero disparity. Stimuli with different disparity signs and match levels were presented in pseudo-randomized order. We tested four refresh rates for the random dot pattern (42.5, 21.3, 10.6, and 5.3 Hz) in separate blocks of trials; these values correspond to one refresh in the random dot pattern every two, four, eight, and 16 monitor refreshes (85 Hz), respectively. A pattern refresh per two monitor refreshes was the highest possible rate because we used a shutter system for the dichoptic image presentation (Doi et al., 2011). Each subject was tested with different refresh rates in random order. The psychometric function was based on data from two blocks of 270 trials (2 Disparity Signs × 9 Match Levels × 15 Repetitions), which were carried out within a single day. 
Subjects
Six subjects (TD, MT, NK, TO, SK, and SA) participated in Experiment 1. Subjects TD and MT are two of the authors. The others were naïve to the purpose of this experiment. All subjects had normal or corrected-normal vision, and submitted informed written consent prior to the experiments. All experiments were performed in accordance with the Declaration of Helsinki. 
Apparatus
Visual stimuli were presented on a full-flat cathode-ray-tube (CRT) display (Multiscan E230, Sony, Minato), which subtended 32.1° × 23.9° of the visual field with a viewing distance of 57 cm. The luminance values of bright dots, background, and dark dots were 1.9, 0.95, and 0.01 cd/m2, respectively. The luminance was measured under the use of active shutter glasses (RE7-CANE, Elsa, Aachen). Visual stimuli were generated by using the OpenGL Utility Toolkit (GLUT) and presented with a spatial resolution of 1152 × 864 pixels at a frame rate of 85 Hz. The apparatus included a graphics board supporting quad buffer stereo display (Wildcat VP 990, 3Dlabs, Milpitas, CA) and a workstation (PWS360, Dell, Round Rock, TX). Images were anti-aliased with GLUT functions so that visual stimuli had subpixel resolution. We used only red phosphors to minimize interocular crosstalk (<3% of the background). 
Visual stimuli
The center disk of the RDSs had a radius of 2.5°, and the surrounding annulus had a width of 1°. The annulus served as the zero-disparity reference for near/far discrimination. It also masked the shift of the center disk associated with disparity change; any monocularly detectable feature did not change with disparity. The size of the center disk was held constant across crossed and uncrossed disparities. The RDS center was located 3° below a fixation cross so that the cross was presented in the middle of the annulus. RDSs contained an equal number of bright and dark dots, each sized 0.14° × 0.14°, and the dot density was 24% (13 dots per square degrees). One dot occluded another dot where dots overlapped; however, this occlusion did not bias the mean luminance or binocular match level. 
Analysis
We fitted a psychometric function (P) to the behavioral data describing the proportion of correct choices by the following equation:  where x is the match level, γ is the y intercept, i.e., P(0), α is the value of x when the proportion correct is 1 – (1 – γ) × 0.368, and β is proportional to the slope at P(α). We searched for a set of parameters that maximized the likelihood of observing the data by assuming a binomial distribution. 
We then measured the similarity between the fitted psychometric functions and the prediction of the correlation-based representation (blue curve in Figure 1D). The similarity was quantified as the fractional area (A) defined as  where xc is the value of x satisfying P(x) = 0.5 (Doi et al., 2011). By substituting in Equation 1, the final equation is  When P(x) did not cross 0.5, xc was set to zero. The numerator in Equation 2, i.e., the odd-symmetric component, measures twice the area of the downward deviation from chance level (Figure 4A), whereas the denominator measures the total area of the deviation. The value of A thus measures the relative contribution of the odd symmetric component, centered at 50% match and 50% correct. The fractional area is zero for the prediction by the match-based representation and unity for the prediction by the correlation-based representation (Figure 1D). 
Figure 3
 
Psychometric functions have different shapes for low and high refresh rates of dot patterns (Experiment 1). The average functions from all subjects tested (top left) and individual functions are shown. Continuous and dashed curves are best-fitted functions of the data for low (open circles) and high (solid circles) refresh rates, respectively. The correct percentages were calculated from 360 choices for the average data and 60 choices for the example data. Error bars indicate standard errors of the means across six subjects for the average data and across two blocks of trials for the example data.
Figure 3
 
Psychometric functions have different shapes for low and high refresh rates of dot patterns (Experiment 1). The average functions from all subjects tested (top left) and individual functions are shown. Continuous and dashed curves are best-fitted functions of the data for low (open circles) and high (solid circles) refresh rates, respectively. The correct percentages were calculated from 360 choices for the average data and 60 choices for the example data. Error bars indicate standard errors of the means across six subjects for the average data and across two blocks of trials for the example data.
Figure 4
 
Odd symmetry of the psychometric function increases with the pattern refresh rate (Experiment 1). (A) The odd symmetry was quantified as an area-based index. The area of the odd-symmetric component (blue area, odd symmetric with respect to the central point at 50% match and 50% correct) is divided by the total area that represents the total performance deviation from chance level (blue area plus gray area) to give an index, denoted the fractional area. The match-based and correlation-based representations predict zero and one fractional area, respectively. (B) The fractional area is plotted against the log refresh rate for the six subjects.
Figure 4
 
Odd symmetry of the psychometric function increases with the pattern refresh rate (Experiment 1). (A) The odd symmetry was quantified as an area-based index. The area of the odd-symmetric component (blue area, odd symmetric with respect to the central point at 50% match and 50% correct) is divided by the total area that represents the total performance deviation from chance level (blue area plus gray area) to give an index, denoted the fractional area. The match-based and correlation-based representations predict zero and one fractional area, respectively. (B) The fractional area is plotted against the log refresh rate for the six subjects.
Nested regression models and F tests were used to test the effects of the refresh rate on the shape of psychometric function. A dependent variable (e.g., fractional area) was fitted with two nested linear functions of the common log of the refresh rate: one with the free slope parameter, which was shared across subjects, and the other with zero slope. F statistics were calculated by comparing the residual sum of squares from the two fitted models, with appropriate adjustment of degrees of freedom. If the fit with the free-slope model was significantly better than the other model, we concluded that the refresh rate had an effect on the shape of the psychometric function. Both models had offset parameters, each for a different subject, to account for the individual variability of the offset. The fitted slopes obtained in Experiment 1 were used to compare the results between Experiment 2 and Experiment 1
Simulations
We performed simulations to examine how the sustained and transient channels of the visual system respond to our RDSs of different refresh rates. To this end, we modeled the time course of luminance contrast at a given pixel in our RDSs and calculated how the responses of the sustained and transient channels are influenced by the pattern refresh rate. The stereo shutter glasses were modeled as the contrast value of −1 that occurred every other frame. Bright and dark dots were modeled as three and −1, respectively, and the background as one. These contrast values might seem peculiar, but the average with the shutter closure (−1) gives the contrast values of one, zero, and −1 for a bright dot, background, and a dark dot, respectively. For a given trial, we generated a random-dot sequence over 128 frames (corresponding to 1.5 s at 85 Hz). The probability to have a dot at a given frame is 0.24 (0.12 for a dark dot and 0.12 for a bright dot). The pixel value was (or was not) succeeded over the next few frames depending on pattern refresh rate, with the closed shutter frames inserted every other frame. We added 22 frames consisting of alternating background and shutter closure frames to both the beginning and end of the stimulus sequence to mimic the pre- and post-stimulus duration in the actual experiments. 
The sustained and transient channels were modeled based on the low-pass and band-pass temporal filters of monocular receptive fields reported in the primary visual cortex (V1) of primates, respectively. The channel shape was adjusted based on the characteristics available in time-domain representation (De Valois, Cottaris, Mahon, Elfar, & Wilson, 2000) and in frequency-domain representation (Hawken, Shapley, & Grosof, 1996). Our sustained (low-pass) channel had a time-to-peak of 94 ms and a high cut frequency of 9.3 Hz (the frequency at which the amplitude decreases to 50% of the peak value as frequency is increased, Hawken et al., 1996). The transient (band-pass) channel had a time-to-peak of 77 ms, a time to trough of 99 ms, a negative-to-positive peak ratio of 0.63, a peak frequency of 14.2 Hz, and a high-cut frequency of 23.9 Hz. The channels were defined as the difference in Gamma functions with the following form (Cai, DeAngelis, & Freeman, 1997):   where the parameters (k1, c1, t1, n1, k2, c2, t2, n2) = (0.28, 1.03, 19.69, 59.82, 0.17, 0.50, 30.42, 32.97) for the transient channel and (k1, c1, t1, n1, k2, c2, t2, n2) = (0.21, 0.29, 17.93, 21.91, 0.02, 0.14, 1.53, 14.17) for the sustained channel. The amplitudes (k parameters) of the two channels were adjusted for each filter to have unit energy. Both channels are defined over 20 frames. The response of a channel was quantified as the signal energy that passes through the channel. A random dot sequence was convolved with a channel, and the 22 frames were removed from the beginning and end of the resulting time-series signals. The signal was squared and integrated over time to obtain the energy. We reported the average energy across 1,000 trials (Figure 2D). 
Results
Simulations
We first simulated the responses of the transient and sustained channels to our RDSs to estimate how the manipulation of pattern refresh rates affects the relative activation of the two channels. We modeled the sustained and transient channels after the two types of neurons in the primate V1 (insets in Figures 2B and 2C, respectively) (De Valois et al., 2000; Hawken et al., 1996). We simulated the temporal profile of luminance contrast at a given pixel with low (5.3 Hz) and high (42.5 Hz) refresh rates (Figure 2A, left and right, respectively). In the example time course of the low refresh rate, a bright dot was presented at the beginning of the trial and a dark dot in the middle of the trial. The shutter glass was closed in every other frame, producing a jagged profile. In the example of the high refresh rate, the bright and dark dots were more scattered over the course of stimulus presentation. The sustained and transient channels responded differently; this is evident in the responses to the low refresh example. The sustained channel integrated the incoming signal over a period of time (Figure 2B, left). The transient channel responded more to the dot onset and offset (Figure 2C, left). When the refresh rate was increased, the output of the sustained channel decreased (Figure 2B, right) but that of the transient channel increased (Figure 2C, right), so that the relative activation was more balanced between the two channels for the high refresh than for the low refresh rate. We quantified the channel activation by calculating the energy (i.e., squared channel output followed by time integral) across four refresh rates. As the pattern refresh rate was increased, the energy for the sustained channel decreased while the energy for the transient channel increased. Thus, the relative contributions of the sustained and transient channels in visual processing can be varied by changing the refresh rate of the RDSs. We capitalize on this manipulation to probe the perceptually relevant representation of disparity under different temporal conditions. 
Experiments
We compared the psychometric functions for the highest (42.5 Hz) and lowest (5.3 Hz) refresh rates (Figure 3). Both functions fell between the predictions of the match-based and correlation-based representations, suggesting a mixed contribution from the two representations. However, the relative contributions were different between the two refresh rates. The average psychometric function from all six subjects showed a rightward and downward shift as the refresh rate was increased. The function obtained with 42.5 Hz was more odd symmetric (toward the center of the plot). The reversed near/far judgments at match levels below 50% (negative percent correlation) were stronger for 42.5 Hz than for 5.3 Hz. Moreover, the performance around 50% match was lower for the higher refresh rate. These observations were largely consistent for the four individual data (NK, TD, MT, and TO). The observations suggest that the psychometric functions tested with the more rapid pattern agree with the prediction by the correlation-based representation. 
We characterized these observations by quantifying the degree of odd symmetry of the psychometric functions (Equation 2; Figure 4A). A fractional area of zero indicates that the function is consistent with the match-based representation, and a fractional area of one indicates the pure correlation-based representation. The fractional area increased with the pattern refresh rate (Figure 4B; regression slope was 0.36 [1/(log10 Hz)] for the data aggregated across all subjects, p = 1.3 × 10−4, H0: the slope is zero), indicating that the functions became more odd symmetric with higher refresh rates. This tendency was clear in five out of six subjects, although a nonmonotonic increase was seen in three of them. The minimum and maximum values of the fractional area were 0.14 and 0.76, respectively, indicating that the observed psychometric function always had an intermediate shape between the two contrasting predictions from the correlation-based and match-based representations. Overall, the psychometric function shifted systematically with pattern refresh rate such that the correlation-based representation increased its relative contribution to depth perception for a more dynamic visual input. 
We next examined which feature(s) of the psychometric functions were affected by the pattern refresh rate and led to the increase in odd symmetry. We focused on three characteristic points for which the transition from the match-based representation to the correlation-based representation provides specific predictions (Figure 1D). First, the match level that gives the chance-level performance should increase towards 50%. Indeed, the match level with the chance performance increased from 33.8 ± 5.4% (mean ± standard error) to 45.0 ± 0.8% as the refresh rate was increased from 5.3 Hz to 42.5 Hz (Figure 5A; regression slope was 13.32 [% Match/(log10 Hz)] for the aggregate data across subjects, p = 2.5 × 10−4). The second and third points of interest were the performances at 0% match and 50% match. Shifting from the match-based to correlation-based representations, the performance at 0% match level should decrease to 0% (complete reversal of near/far judgment), while the performance at 50% match level should decrease to 50% (chance level). We found that the performance at both 0% match (Figure 5C; slope was −0.086 [%/(log10 Hz)], p = 3.7 × 10−2) and 50% match (Figure 5B; slope was −0.10 [%/(log10 Hz)], p = 2.9 × 10−2) decreased with the refresh rate, although the offset was different across subjects. The values at the highest refresh rate were larger than the prediction of the correlation-based representations (mean ± standard error, 23.4 ± 5.8% at 0% match and 70 ± 5.7% at 50% match). Thus, all three features of the psychometric functions contributed to the increase in odd symmetry with refresh rate, indicating that the shift of the psychometric functions closely followed the prediction of transition from the match-based representation toward the correlation-based representation. We suggest that the contribution of the correlation-based representation over the match-based representation increases as the relative responses of the transient channel over the sustained channel are increased. 
Figure 5
 
Detailed characterization of the shape of the psychometric function (Experiment 1). Three quantities are plotted against the log refresh rate for individual subjects and the average across them. The values predicted by the correlation-based representation are shown with horizontal dotted lines. (A) The match level with chance performance increased with the refresh rate. (B) The percentage of correct choices with 50% match decreased with the refresh rate on average, with an offset depending on subjects. (C) The percentage of correct choices with 0% match decreased with the refresh rate on average, with an offset depending on subjects. The percent correct values were based on the fitted values but not on the raw data.
Figure 5
 
Detailed characterization of the shape of the psychometric function (Experiment 1). Three quantities are plotted against the log refresh rate for individual subjects and the average across them. The values predicted by the correlation-based representation are shown with horizontal dotted lines. (A) The match level with chance performance increased with the refresh rate. (B) The percentage of correct choices with 50% match decreased with the refresh rate on average, with an offset depending on subjects. (C) The percentage of correct choices with 0% match decreased with the refresh rate on average, with an offset depending on subjects. The percent correct values were based on the fitted values but not on the raw data.
Experiment 2: The overall pattern of results is preserved when the number of dot patterns is fixed
Introduction
An increase in the refresh rate was accompanied with an increase in the number of independent dot-patterns presented during a trial. This was inevitable because we fixed the stimulus duration to exclude contributions from duration-dependent factors. The number of independent patterns, however, could also affect the task performance without changing the relative weights of the correlation-based and match-based representations. If more patterns are available, the “noise” (the neural responses to task-unrelated stimulus variables such as spatial frequency) can be averaged out. According to this explanation, the task performance should deviate more from chance level with higher refresh rates across match levels. This leads to an amplification of the psychometric function without changing the match level that gives chance performance (see Figure 8A in Doi et al., 2011). Contrary to this prediction, as the refresh rate was increased, the performance at 50% match level decreased toward chance level (Figure 5B), and the match level with chance performance increased (Figure 5A). The only consistent results were that the reversed near/far judgment at 0% match became stronger for higher refresh rates (Figure 5C). Therefore, the number of independent dot patterns should not be a major cause of the observed shifts of the psychometric function; it cannot explain two of the three observations made in Experiment 1. To test this reasoning experimentally, we measured the psychometric functions with a high and low refresh rate with the number of dot patterns fixed by varying the stimulus duration. 
Methods
We used refresh rates of 10.6 Hz and 21.3 Hz, which were the middle refresh rates used in Experiment 1. We chose the neighboring two rates to minimize the difference in the stimulus duration. The stimuli with the rate of 10.6 Hz were presented for 1.5 s, as in Experiment 1, and those with 21.3 Hz were presented for 0.75 s. The order of the two rates was randomly chosen for each subject. Other experimental procedures were the same as in Experiment 1. One of the authors (TD) and seven naïve subjects participated in Experiment 2. However, we excluded the data from one subject because of his unstable performance; this subject, unlike the others, demonstrated a markedly different performance between two blocks of trials with the same refresh rate. Data from another subject (HS) could not be used in the analysis of the match level that gives chance performance, because the performance was always higher than chance level with the low refresh rate. The apparatus used in Experiment 2 was different from that used in Experiment 1. Visual stimuli were presented on a full-flat CRT display (FlexScan T965, EIZO, Hakusan) that subtended 37.6° × 28.6° of the visual field with a viewing distance of 57 cm. The pixel number, monitor refresh rate, and luminance values were adjusted to be the same as those used in Experiment 1. The graphics board was a Quadro FX3700 (NVIDIA, Santa Clara, CA), and the workstation was a Precision T3400 (Dell, Round Rock, TX). The interocular crosstalk was less than 2% of the background luminance. 
Results
We compared the psychometric functions tested with high and low refresh rates while the number of dot patterns were fixed (Figure 6). We excluded one subject out of eight from all analyses due to abnormally unstable performance. The functions shifted with the refresh rate similar to that observed in Experiment 1 (Figure 3). The odd symmetry of the functions (toward the center of the plot) increased with the refresh rate. The amount of shift appears smaller in Experiment 2 (Figure 6) than in Experiment 1 (Figure 3), but we note that the high and low rates had closer values in Experiment 2. A notable difference between the two experiments was observed around the range of low match levels (0% to 25%). When the number of dot patterns was fixed, the two psychometric functions almost overlapped at low match levels. Nonetheless, the overall odd symmetry of the functions was larger with the higher refresh rate, which was consistent with the original observation. 
Figure 6
 
Psychometric functions have different shapes for low and high refresh rates even when the number of independent dot patterns is fixed (Experiment 2). The notation is the same as in Figure 3; however, the refresh rates are different from those in Experiment 1 (Figure 3). The correct percentages were calculated from 420 choices for the average data (leftmost) and 60 choices for the data from example subjects.
Figure 6
 
Psychometric functions have different shapes for low and high refresh rates even when the number of independent dot patterns is fixed (Experiment 2). The notation is the same as in Figure 3; however, the refresh rates are different from those in Experiment 1 (Figure 3). The correct percentages were calculated from 420 choices for the average data (leftmost) and 60 choices for the data from example subjects.
We performed the same quantitative characterization as in Experiment 1. The fractional area was larger for the higher refresh rate (Figure 7A; mean ± standard error, 0.085 ± 0.031; p = 0.032, t test). As the refresh rate was increased, the match level with chance performance increased (Figure 7B; mean ± standard error, 5.4 ± 1.5; p = 0.015, t test), and the performance at 50% match decreased (Figure 7C; mean ± standard error, 0.17 ± 0.037; p = 0.0037, t test). Subject HS's data could not be included in the analysis of chance performance (Figure 7B) because his performance was always above chance level with the low refresh rate. These effects observed in Experiment 2 matched or surpassed those obtained in Experiment 1. The average change across subjects (Figure 7, black diamond) was compared to the change derived from the best-fitted line obtained in Experiment 1 (Figure 7, long gray line). The observed magnitudes in Experiment 2 relative to those in Experiment 1 were 80%, 140%, and 550% for the fractional area, the match level with chance performance, and the performance at 50% match, respectively. 
Figure 7
 
The results in Experiment 1 were mostly replicated even when the number of dot patterns was fixed (Experiment 2). The odd symmetry and the three point-based characterizations of the psychometric functions are plotted as a function of refresh rate. (A) Fractional area, an index for odd symmetry, was higher for higher refresh rate. (B) Percent match with chance performance was higher for higher refresh rate. (C) Percent correct choice at 50% match was lower for higher refresh rate. (D) Percent correct at 0% match had no significant difference. The gray long line in each plot is the best fitted line to the data obtained in Experiment 1. The fitted model had a common slope parameter across subjects and variable offset parameters, which were averaged in this figure. The horizontal dotted lines represent the predicted level from the pure correlation-based representation.
Figure 7
 
The results in Experiment 1 were mostly replicated even when the number of dot patterns was fixed (Experiment 2). The odd symmetry and the three point-based characterizations of the psychometric functions are plotted as a function of refresh rate. (A) Fractional area, an index for odd symmetry, was higher for higher refresh rate. (B) Percent match with chance performance was higher for higher refresh rate. (C) Percent correct choice at 50% match was lower for higher refresh rate. (D) Percent correct at 0% match had no significant difference. The gray long line in each plot is the best fitted line to the data obtained in Experiment 1. The fitted model had a common slope parameter across subjects and variable offset parameters, which were averaged in this figure. The horizontal dotted lines represent the predicted level from the pure correlation-based representation.
Figure 8
 
Dot pairing of half-matched RDSs to manipulate the statistics of the local binocular correlation (Experiment 3). (A) Two example RDSs with positive pairing (right) and negative pairing (left). The RDSs consist of a center disk with a certain disparity and a surrounding annulus with zero disparity. Red crosses are the fixation marker, and its vertical bar was a nonius line. Scale bar: 1° (shown for only illustrative purpose, not present in the experiments). (B) Schematic illustration of the dot-paired and unpaired RDSs. (Right) 100% paired RDSs in which all the contrast-matched dots (blue outline) and all the contrast-reversed dots (orange outline) are vertically paired. The combination of monocular contrasts (white or black) is random. (Center) 0% paired RDSs in which no pairs are made. (Left) −100% paired RDSs in which all the contrast-matched dots are paired with contrast-reversed dots vertically. (C) All possible combinations of monocular contrasts and binocular correlation. In a single combination, the left and right columns represent the dots for the left eye and right eye, respectively. The positive and negative pairing cannot be distinguished from the combination of monocular contrasts, because every combination is equally likely to occur irrespective of the sign of the pairing.
Figure 8
 
Dot pairing of half-matched RDSs to manipulate the statistics of the local binocular correlation (Experiment 3). (A) Two example RDSs with positive pairing (right) and negative pairing (left). The RDSs consist of a center disk with a certain disparity and a surrounding annulus with zero disparity. Red crosses are the fixation marker, and its vertical bar was a nonius line. Scale bar: 1° (shown for only illustrative purpose, not present in the experiments). (B) Schematic illustration of the dot-paired and unpaired RDSs. (Right) 100% paired RDSs in which all the contrast-matched dots (blue outline) and all the contrast-reversed dots (orange outline) are vertically paired. The combination of monocular contrasts (white or black) is random. (Center) 0% paired RDSs in which no pairs are made. (Left) −100% paired RDSs in which all the contrast-matched dots are paired with contrast-reversed dots vertically. (C) All possible combinations of monocular contrasts and binocular correlation. In a single combination, the left and right columns represent the dots for the left eye and right eye, respectively. The positive and negative pairing cannot be distinguished from the combination of monocular contrasts, because every combination is equally likely to occur irrespective of the sign of the pairing.
The stronger depth reversal at 0% match for the higher refresh rate in Experiment 1 (Figure 5C) was not replicated in Experiment 2. We found no significant difference in the performance between the high and low refresh rates (Figure 7D; mean ± standard error, −0.011 ± 0.036; p = 0.75, t test; 42% of the magnitude derived from the fit in Experiment 1). The stronger depth reversal for higher refresh rates observed in Experiment 1 might have been mediated by improved signal quality with a larger number of dot patterns. Nevertheless, the overall pattern in the shift of psychometric functions was preserved even when the number of dot patterns was fixed. 
Experiment 3: Match-based representation is constructed through the variability of local binocular correlation
Introduction
We have shown that the match-based representation is more influential on depth perception with more sustained stimuli. Visual mechanisms with sustained temporal properties tend to process fine spatial details. For example, neurons preferring lower temporal frequency tend to prefer higher spatial frequency in the visual cortex (DeAngelis, Ohzawa, & Freeman, 1993). From these observations, we speculated that fine-scale processing might be a key to solve a problem to construct the match-based representation. The problem is how to realize this by starting from energy models. Energy models describe the initial encoding of binocular disparity in the cortex (Cumming & Parker, 1997; Ohzawa et al., 1990). Thus, the match-based representation should use energy-model-like mechanisms as the initial stage of disparity encoding. However, the disparity tuning of the energy mechanisms has only a flat function for half-matched RDSs. It is unclear how a tuned disparity function of the match-based representation (Figure 1C; middle; red, solid curve) can be constructed from the flat tuning function of the energy-mechanism output (Figure 1C; middle; blue, dashed line). 
Here, we propose that fine-scale processes could solve this problem by signaling depth with the variability of visual responses. A tuning function in neurophysiology and modeling is typically based on the first-order statistic (mean) of sensory responses, but information useful for sensory processing may be available in higher order statistics (e.g., variance; Rieke, Warland, de Ruyter van Steveninck, & Bialek, 1999). The response variability considered here originates in stimulus variability (i.e., random dot patterns). The variability is substantial for fine-scale processes such as disparity detectors with only small RFs. Even when the true (asymptotic) tuning curve is flat, disparity detectors with small RFs, e.g., those in the primate V1, have variable, instantaneous responses. This response variability would have the potential to signal disparity information of half-matched RDSs. Suppose that a detector has an RF small enough to be stimulated at a time either by a group of perfectly correlated dots or perfectly anticorrelated dots within a half-matched RDS. When the stimulus disparity is at the detector's preferred disparity, the response variability over time must be large because a correlated pattern evokes a large response, but an anticorrelated pattern evokes a null response. When the stimulus disparity is outside of the detector's dynamic range, the response variability must be small because both correlated and anticorrelated patterns evoke similar baseline responses. This reasoning can be generalized to the variability of responses to a given stimulus across a population of neurons with different RF positions. 
A stereoscopic system may construct the match-based representation by exploiting the response variability of the energy mechanisms with small RFs. This hypothesis leads to a prediction that the variability of local binocular correlation influences the depth perception to half-matched RDSs. To test this prediction, we examined how near/far discrimination in half-matched RDSs depends on the binocular correlation of dots in a local visual field. First, we checked whether the response variability of energy-model units has potential to signal binocular disparity. Second, we showed how a new stimulus manipulation, i.e., making vertically adjacent pairs of dots, affects the variability of local binocular correlation. Finally, we showed that near/far discrimination by human subjects for half-matched RDSs critically depends on the correlation variability. 
Methods
Paired RDSs to manipulate local binocular correlation
We created a variant of half-matched RDSs by pairing random dots vertically as in the Glass pattern stimuli (Figure 8A; Glass, 1969; Glass & Perez, 1973) to manipulate locally defined binocular correlation. The local correlation varies over space and time in half-matched RDSs. The variability of local correlation can be manipulated by two parameters: the percentage of dots that form the vertical pairs in the RDSs (ρ1) and the percentage that the two dots in the pairs have the same binocular polarities (ρ2). When ρ2 = 100%, a contrast-matched dot is always paired with another contrast-matched dot, and a contrast-reversed dot is always paired with another contrast-reversed dot. When ρ2 = 0%, a contrast-matched dot is always paired with a contrast-reversed dot. We increased the variability of local correlation by setting ρ2 to 100% and decreased the variability by setting ρ2 to 0%. We referred to the former stimulus as the positively paired RDS (Figure 8A, right; Figure 8B, rightmost) and the latter stimulus as the negatively paired RDS (Figure 8A, left; Figure 8B, leftmost). For either 100% or 0% ρ2, we gradually controlled the strength of correlation variability by varying ρ1 from 100% to 0% (Figure 8B, middle; unpaired RDSs). Overall, we represented a given paired RDS using a signed percentage, percent paired dots, which is ρ1 for positively paired RDSs (ρ2 = 100) and –ρ1 for negatively paired RDSs (ρ2 = 0). The percentage ranges from −100% to 100%, and a larger percentage corresponds to a larger variability in local binocular correlation. 
Importantly, monocular vision cannot discriminate positive and negative signs of the pairing, because each combination of monocular contrasts (i.e., bright-bright, bright-dark, dark-bright, dark-dark) occurs with the same probability (0.25). Each contrast combination for the upper and lower dots of a pair is equally likely in both positive and negative pairings (Figure 8C). Thus, we can attribute a difference, if any, in the performance of near/far discrimination between positive and negative pairings to binocular processing. 
The distance between the upper and lower dots in a pair was 0.15°, the minimum distance that prevented the dots from overlapping in our display. The surrounding annulus had 50% match level and the same pairing level as the center disk to make the center-surround edge monocularly invisible. We used a refresh rate of 10.6 Hz and a disparity magnitude of 0.03°. The contribution of the match-based representation dominates with this fine disparity magnitude (Doi, Tanabe, & Fujita, 2011). The center position of the RDSs was located 3.25° below a fixation cross, which was 0.25° more peripheral than in Experiments 1 and 2. This position was chosen to make the task more difficult. Without this, the task is so easy that an additional stimulus manipulation may not affect the task performance (Doi, Tanabe, & Fujita, 2011). Indeed, the performance of a subject (MT) was saturated even with an eccentricity of 3.25°, so we tested this subject further with a larger eccentricity (5.00°). 
Subjects and testing procedure
Three subjects (TD, TO, and MT) participated in Experiment 3. Subjects TD and MT are two of the authors. Because subject MT was not involved in the planning and preparation of Experiment 3, subjects MT and TO were naïve to the purpose of the experiment. As in the previous experiments, the task was to judge whether the center region of RDSs was nearer or farther than the surrounding annulus. We changed the percentage of vertically paired dots pseudorandomly from −100% to 100% with steps of 25% across trials. Subjects carried out two blocks of 270 trials (9 Percentages of Paired Dots × 2 Disparity Signs × 15 Repetitions) with a short break in between. At the beginning of each block (i.e., two trials per subject), we showed 100%-matched and 0%-paired RDSs with crossed and uncrossed disparities each for 2 s to inform the position of the boundary between the center and surrounding parts of the RDSs. This procedure was necessary to reduce the spatial uncertainty regarding the center patch and reference annulus, since the annulus had only 50% match in Experiment 3. Other testing procedures and the apparatus were the same as those used in Experiment 1
Simulations with disparity energy models
We simulated the responses of V1 neurons using the disparity energy model (Ohzawa et al., 1990) to gain insight into whether the response variability has the potential to signal binocular disparity. We generated RDSs with the algorithms used in Experiments 1 and 2. Stereo shutter glasses were not considered for simplicity without loss of generality. The model response (E) to a given dot pattern was calculated as follows:  where Display FormulaImage not available and Display FormulaImage not available indicate the responses of left and right monocular RFs, respectively, for one of the two simple-cell subunits, and Display FormulaImage not available and Display FormulaImage not available for the other subunit. The subscripts L and R denote the eye, and the superscripts S1 and S2 denote the subunits throughout this section. Display FormulaImage not available was given by the dot product of the RF and a pattern of bright dots (+1) and dark dots (−1) as follows:  where Display FormulaImage not available indicates the left-eye spatial RF as a function of horizontal and vertical positions; nb and nd were the total numbers of bright and dark dots, respectively, in a given dot pattern; xi,L and yi,L indicate horizontal and vertical positions of the ith bright dot, respectively, in the left eye; and j,L and j,L indicate the positions of the jth dark dot in the left eye. The responses of the other RFs, Display FormulaImage not available , Display FormulaImage not available , and Display FormulaImage not available , were calculated by replacing Display FormulaImage not available with Display FormulaImage not available , Display FormulaImage not available , and Display FormulaImage not available , respectively. The left-eye RF of Subunit 1 was defined as a two-dimensional Gabor with the following form:  where h and v indicate horizontal and vertical positions, respectively; Display FormulaImage not available and y horizontal and vertical positions of the Gaussian envelope, respectively; σ and ζ horizontal and vertical widths of the envelope, respectively; f sinusoidal frequency, and Display FormulaImage not available sinusoidal phase. The other RFs, Display FormulaImage not available , Display FormulaImage not available , and Display FormulaImage not available , were obtained by replacing the subscript and superscript of the horizontal envelope position Display FormulaImage not available and the sinusoidal carrier phase Display FormulaImage not available
We simulated the responses of a position-shift unit (even-symmetric disparity tuning) and a phase-shift unit (odd-symmetric tuning), both of which prefer crossed (near) disparities. For the position shift unit, Display FormulaImage not available and Display FormulaImage not available were −0.015° and Display FormulaImage not available and Display FormulaImage not available were 0.015°. The phases of Subunit 1 ( Display FormulaImage not available , Display FormulaImage not available ) were zero, and those of Subunit 2 ( Display FormulaImage not available , Display FormulaImage not available ) were 0.5π. The vertical envelope position y was −3°. The envelope widths σ and ζ were 0.05°, and the frequency f was three. We chose the width and frequency values from the range of macaque V1 data (Prince, Cumming, & Parker, 2002), with a bias toward smaller width and higher frequency to emphasize the response variability. The tested disparities were −0.25, −0.2, −0.15, −0.105, −0.08, −0.05, −0.03, −0.005, 0.02, 0.045, 0.09, 0.14, and 0.19°. For the phase-shift unit, the horizontal envelopes of all the RFs (u parameters) were positioned at zero. The phases had a following combination ( Display FormulaImage not available , Display FormulaImage not available , Display FormulaImage not available , Display FormulaImage not available ) = (0, 0.5π, 0.5π, π). The other parameter values were the same as those used for the position-shift unit. The tested disparities were ±0.2, ±0.145, ±0.1, ±0.105, ±0.055, ±0.03, ±0.015, and 0°. Disparity tuning functions were calculated for the response mean and standard deviation (SD) with 0%, 50%, and 100% match levels. The mean and SD were based on the responses to 10,000 dot patterns at each disparity and match level. 
In a separate set of simulations, we tested the effects of RF size on the tuning curve of response SD. We tested only position-shift units, whose RF disparity was zero. The tested envelope widths (σ and ζ) were 0.05°, 0.11°, 0.21°, 0.42°, and 0.85°. The frequency f was adjusted to keep the tuning shape constant (3, 1.5, 0.75, 0.38, and 0.19, corresponding to the envelope widths above). The set of stimulus disparities were scaled according to each envelope width. The match level was 50%. The SD tuning curve was normalized by the response mean averaged across tested disparities for each envelope width. Mean responses were fairly constant across disparities, since the tested match level was 50%. The expected number of dots falling within a monocular RF was calculated using the density of dot (13 dots/°2) and the circle with a radius of two RF SD
Effects of dot pairing on local binocular correlation
We then checked how our pairing manipulation affects locally defined binocular correlation. We calculated local binocular correlation instead of the energy-model responses to pinpoint the quantity we intended to manipulate: the variability of local correlation. We generated random-dot patterns with the same algorithm as used for psychophysical experiments, and calculated the binocular correlation locally using a small spatiotemporal window. The stimulus and window had zero disparities. The window had a circular, uniform weight with a radius of 0.42° over a single frame. This value was roughly equal to the width (two SD) of RFs of macaque V1 neurons (Nienborg, Bridge, Parker, & Cumming, 2004) in the vertical dimension, along which we paired dots. Other window sizes (0.11°, 0.21°, 0.85°, and 1.70°) were also tested. The local correlation in units of percent (percent local correlation) was calculated as 200 × (nm/na – 0.5), where nm indicates the number of binocularly matched-contrast dots within the window, and na indicates the number of all the dots within the window. If no dots fell within the window (na = 0), we assigned a 0% correlation. We constructed the distribution of the local correlation based on 10,000 random-dot patterns at each pairing level. 
Results
Simulations with disparity energy models
By simulating responses of disparity energy units at 0%, 50%, and 100% match levels, we tested an assumption underlying the psychophysical test in Experiment 3: The response variability of the disparity energy mechanism has the potential to signal binocular disparity. A critical issue is whether the variability of the responses to half-matched RDSs (50% match) varies as a function of disparity, because the response mean does not signal disparity at this match level. The mean of simulated responses closely followed the defined tuning curves for the correlation-based representation irrespective of tuning symmetry, i.e., even symmetric or odd symmetric (Figures 9A, C; compare the tuning curves to blue dashed curves in Figure 1C). The tuning curves were flat for half-matched RDSs and inverted for 0% match RDSs (anticorrelated RDSs). The response variability quantified as the standard deviation (SD) was also sensitive to binocular disparity (Figures 9B, D). Most importantly, the disparity tuning of the SD was not flat for half-matched RDSs (gray curves in Figures 9B, D). For even-symmetric units, the disparity tuning of the SD for half-matched RDSs peaked at the same disparity as the mean response for 100% match, providing consistent disparity signals. This is not the case for odd-symmetric units; the SD disparity tuning with half-matched RDSs had a different shape from the mean tuning with perfectly correlated RDSs (100% match). We next examined the effects of RF size on the SD disparity tuning to confirm that the disparity encoding through response variability is an advantage of fine-scale processing. In addition to the original RF size (i.e., two SD of RF envelope, 0.11°), we tested three larger sizes of RF (0.21°, 0.42°, and 0.85°). The expected numbers of dots falling within each size of RF were 0.46, 1.84, 7.36, and 29.4 dots/°2, respectively. As expected, both the baseline and amplitude of the SD disparity tuning curve decreased as the RF size was increased (Figure 9E). Overall, the response variability of the energy mechanism can encode disparity. The disparity encoding through the response variability is consistent with the encoding through the response mean when the RFs have position disparity. Smaller RFs produce more profound disparity signals. 
Figure 9
 
Predicted disparity tuning of the response mean and the standard deviation with the energy model. (A) Even-type disparity tuning of the response mean. Three functions were disparity-tuning curves of the same model at 100% (black, open circle), 50% (gray, solid circle), and 0% (black, solid circle) match levels. The tuning function was flat at 50% match. (B) The disparity tuning of the standard deviation (SD) of the responses with the same model used in (A). The function at 50% match (gray, solid circle) has a similar tuning to the mean tuning function at 100% match. The mean and SD were calculated based on 10,000 simulated trials. (C) (D) Odd-type disparity tunings plotted with the same convention used in (A) and (B). (E) The SD tuning curves of even-type units with a range of RF sizes. The match level was 50%. The response SD was normalized by the response mean at each RF size. The disparity was scaled by the SD of RF envelope (the RF size inserted indicates two SD). Each data point was based on 10,000 trials.
Figure 9
 
Predicted disparity tuning of the response mean and the standard deviation with the energy model. (A) Even-type disparity tuning of the response mean. Three functions were disparity-tuning curves of the same model at 100% (black, open circle), 50% (gray, solid circle), and 0% (black, solid circle) match levels. The tuning function was flat at 50% match. (B) The disparity tuning of the standard deviation (SD) of the responses with the same model used in (A). The function at 50% match (gray, solid circle) has a similar tuning to the mean tuning function at 100% match. The mean and SD were calculated based on 10,000 simulated trials. (C) (D) Odd-type disparity tunings plotted with the same convention used in (A) and (B). (E) The SD tuning curves of even-type units with a range of RF sizes. The match level was 50%. The response SD was normalized by the response mean at each RF size. The disparity was scaled by the SD of RF envelope (the RF size inserted indicates two SD). Each data point was based on 10,000 trials.
Effects of dot pairing on local binocular correlation
We then attempted to manipulate this response variability by making vertically adjacent pairs of dots (Figure 8A). Instead of the energy-model response, we hereafter turn our attention to the stimulus variability that causes the response variability that is selective for disparity. The key stimulus variable is the binocular correlation calculated for a local visual field, since the disparity selectivity of the energy model arises from the binocular cross-correlation of the images weighted by local receptive fields (Anzai, Ohzawa, & Freeman, 1999; Tanabe, Doi, Umeda, & Fujita, 2005). As we intended, the dot pairing influenced the distribution of the local binocular correlation. The local correlation was distributed around 0% when correlated dots were paired with anticorrelated dots (Figure 10A, top; −100% paired dots), and the distribution became progressively wider as the percentage of the pairing was increased from 0% (Figure 10A, middle) to 100% (Figure 10A, bottom). With 100% paired dots, the local correlation on many frames was as high as 100% (only correlated dots) or −100% (only anticorrelated dots). The gradual increase in the percentage of vertically paired dots increased the SD of the distribution steadily, but did not change the mean (Figure 10B). 
Figure 10
 
The effects of dot pairing on the distribution of local binocular correlation. (A) The simulated distribution of local correlation is shown for three levels (−100%, 0%, 100%) of percent paired dots. The number of trials was 10,000 for each histogram. (B) The mean (solid circle) and standard deviation (open circle) of the distributions are plotted as a function of percent paired dots. The standard deviation increased with the percent paired dots. (C) The standard deviation was calculated and plotted for different window sizes from which the local correlation was calculated. The data with 0.42° (white circles) were the same as those shown in (B).
Figure 10
 
The effects of dot pairing on the distribution of local binocular correlation. (A) The simulated distribution of local correlation is shown for three levels (−100%, 0%, 100%) of percent paired dots. The number of trials was 10,000 for each histogram. (B) The mean (solid circle) and standard deviation (open circle) of the distributions are plotted as a function of percent paired dots. The standard deviation increased with the percent paired dots. (C) The standard deviation was calculated and plotted for different window sizes from which the local correlation was calculated. The data with 0.42° (white circles) were the same as those shown in (B).
These results were obtained by using the local circular window with a radius of 0.42°. This value was chosen to match the vertical RF size of macaque V1 neurons (Nienborg et al., 2004). We next examined how the size of the circular window influenced the correlation variability as a function of percent paired dots. We found that the offset of the SD function was higher for a smaller window (Figure 10C). The slope of the function also increased as the window radius was decreased from 1.70° to 0.42°. However, the slope decreased to zero as the radius was decreased from 0.42° to 0.11°, likely because the vertical distance of paired dots was 0.15°. Overall, this dot pairing is an effective way to manipulate the variability of local correlation, especially at the spatial scale of RFs in the primate V1. 
Effects of dot pairing on psychophysical performance
We examined how our pairing manipulation affects the performance of near/far discrimination for half-matched RDSs. The performance increased as the percentage of paired dots was increased from −100% to 100% (Figure 11A), suggesting that the variability in local correlation underlies the performance. For the average data, the percent correct was at near chance level with −100% paired dots and increased steadily up to near perfect level with 100% paired dots (Figure 11A, upper left). The fitted line had a slope of 0.14, which was significantly different from zero (likelihood ratio test; H0: zero slope; p < 1 × 10−15). This was also true when the line was fitted separately for the range from −100% to 0% paired dots (slope, 0.24; p = 1.6 × 10−13) or from 0% to 100% paired dots (slope, 0.09; p = 9.2 × 10−5). These features were evident in individual data with subjects TD and TO (Figure 11, upper right and lower left, respectively). 
Figure 11
 
(A) The performance increases with the percent paired dots. The percent correct on the ordinate was calculated from 240 choices for the average data (upper left panel), 60 choices for subjects TD and TO (upper right and lower left panels, respectively), and 120 choices for subject MT (large circles in lower right panel). Error bars indicate ± standard error of the means (SEMs) across three subjects for the average data and ± SEMs across blocks of trials for individual data (two blocks for subjects TD and TO, four blocks for subject MT). For subject MT, the performance was tested with easy (upper small circles) and difficult (lower small circles) conditions and averaged (middle large circles). (B) The average performance was replotted against the standard deviation (SD) of local correlation at a corresponding percent paired dots (Figure 10B).
Figure 11
 
(A) The performance increases with the percent paired dots. The percent correct on the ordinate was calculated from 240 choices for the average data (upper left panel), 60 choices for subjects TD and TO (upper right and lower left panels, respectively), and 120 choices for subject MT (large circles in lower right panel). Error bars indicate ± standard error of the means (SEMs) across three subjects for the average data and ± SEMs across blocks of trials for individual data (two blocks for subjects TD and TO, four blocks for subject MT). For subject MT, the performance was tested with easy (upper small circles) and difficult (lower small circles) conditions and averaged (middle large circles). (B) The average performance was replotted against the standard deviation (SD) of local correlation at a corresponding percent paired dots (Figure 10B).
The performance of subject MT stayed high and did not change over the tested range of percent paired dots (Figure 11A, lower right, the upper dotted line). To decrease overall performance, we repeated the same experiment with subject MT by placing the stimulus at a larger eccentricity (5.00°). The performance with this stimulus position was at chance level for 0% paired dots and increased with percent paired dots (Figure 11A, lower right, the lower dotted line). The average performance now increased in a similar way to the performances from the other subjects (Figure 11A, lower right, large blue circles). Thus, the depth perception for half-matched RDSs was a monotonically increasing function of the variability of local binocular correlation (Figure 11B). These results are in close agreement with the idea that the match-based representation is constructed by exploiting the response variability of disparity energy mechanisms. 
Discussion
We examined each subject's performance of near/far discrimination by varying the refresh rates and binocular correlation of RDSs. We plotted the percent correct choice against the percent contrast-matched dots separately for different refresh rates. As the refresh rate was increased, the psychometric function shifted downward and rightward, becoming more odd-symmetric relative to the center of the plot (Figures 3, 4, 5). The shift was largely preserved when the number of dot patterns presented during a trial was fixed (Figures 6, 7; note that out of eight subjects one subject was excluded from all the analyses due to unstable performance, and another subject could not be included in the analysis reported in Figure 7B). These observations indicate that the relative contribution of the correlation-based representation over the match-based representation increases for more rapidly changing stimuli. We suggest that the ratio of inputs from the transient and sustained temporal channels in the early visual system is different between the two disparity representations that reside in the extrastriate cortices. The sustained channel may provide major inputs to the neural system for match-based representation, whereas the transient channel may be preferentially linked to the system for correlation-based representation. We further examined the mechanisms of the match-based representation by focusing on the response variability of disparity-energy units with small RFs (Figure 9). The performance for half-matched RDSs monotonically increased with an increase of the variability in local binocular correlation (Figures 10, 11). In summary, we suggest that match-based stereoscopic depth depends on temporally sustained and spatially fine-scale processes. This system may exploit disparity information represented in the response variability of the disparity-energy mechanisms with small RFs. Correlation-based stereo representation depends more on transient processes and does not need further transformation of disparity energy signals. 
Possible artifacts from the stereo shutter glasses
The effective correlation of our RDSs can be lower than the notional correlation value, because the stereo shutter glasses introduce a temporal offset between the images to the left and right eyes. We checked the effective correlation from the point of view of the primary visual cortical neurons, because the refresh-rate manipulation was intended to control these “neuronal” effective correlations: A higher rate would increase the effective correlation calculated by neurons with the transient channel but decrease the effective correlation with the sustained channel. We confirmed these ideas by simulating the disparity tuning of the energy models with the RDSs viewed through the shutter glasses and the mirror stereoscope (data not shown). We extended the single-pixel simulation shown in Figure 2 to include a spatial extent and RDSs modeled after those used in the Experiments 1 and 2. The temporal filters were the same as those used in the simulations presented in Figure 2. The spatial RFs were the same as those used in the simulations presented in Figure 9. We found that the disparity tuning was almost indistinguishable between the shutter glasses and mirror stereoscope for the units with the sustained channel. This is suggested by Figure 2B, where the luminance oscillation from the shutter glasses is averaged out in the outputs of the sustained channel. The tuning was more prominent for a lower refresh rate. For the transient channel, the basic trend was preserved between the shutter glasses and the mirror stereoscope: The tuning was more prominent for a higher refresh rate. The only difference found between the two stereo presentations was that the refresh-rate effects on the disparity tuning were larger for the mirror stereoscope than for the shutter glasses. Therefore, the use of shutter glasses may have reduced the observed effects of the refresh rate on the psychometric function but would not have artificially enhanced the effects. These exercises confirmed that the energy-model units receiving inputs from the sustained channel have more prominent disparity tuning for a lower refresh rate and those receiving inputs from the transient channel have more prominent tuning for a higher refresh rate for the range of refresh rates used in this study. This finding remains consistent regardless of whether the stereo presentation device is the shutter glasses or the mirror stereoscope. 
The crosstalk between the left-eye and right-eye images is another source of possible artifact. A small amount of the crosstalk is inevitable even with the use of the fast red phosphor, which affects the effective contrast of a dot. When the stimulus disparity is zero or smaller than the width of a dot, the effective contrast of bright, correlated dots is increased from the bright dots in the other eye, creating an artificial increase in the binocular correlation. In contrast, anticorrelated dots would have an artificial decrease in the correlation, since the effective contrast of the dark, anticorrelated dots is decreased from the bright dots in the other eye. In Experiments 1 and 2, the disparity magnitude was 0.24° and larger than the dot size 0.14°, so that the artifact did not occur. In Experiment 3, the disparity magnitude was 0.03°, and there was a small overlap between the dot for the left eye and the corresponding dot for the right eye. The binocular correlation would have been slightly higher than the notional value, since the effective contrast would have been increased for correlated dots and decreased for anticorrelated dots. However, we suggest that the artifact would not confound the interpretation. First, according to our measurement of the crosstalk size, the contrast change due to this artifact would be around 3%. Second, the upper limit should be even smaller than 3% because the disparities tested were not zero. Third, the disparity tunings of the primate V1 neurons have the amplitudes biased toward correlated RDSs against anticorrelated RDSs, unlike the energy models (Cumming & Parker, 1997). The artificial bias less than 3% in our RDSs would be negligible in light of the bias already present in V1. Fourth, even if the artifact creates bias toward positive correlation, this would not affect our stimulus manipulation in Experiment 3, which targeted the correlation variability. Even with a small bias toward positive correlation, the variability of local correlation is higher for larger values of percent paired dots. 
Match-based and correlation-based representations in stereoscopic depth
Neither the correlation-based nor match-based representation alone can explain the change in the psychometric function depending on the stimulus refresh rate. The above-chance performance for half-matched RDSs (50% match) is evidence of the match-based representation. It cannot be accounted for by the correlation-based representation, because any simple (linear) combination of the correlation-based mechanisms does not provide depth information. In contrast, reversed depth for anticorrelated RDSs (0% match) is evidence of the correlation-based representation (Tanabe, Yasuoka, & Fujita, 2008). An alternative explanation of reversed depth, based on vergence eye movements, has been recently refuted (Doi, Tanabe, & Fujita, 2011). The data also cannot be explained by fixed contributions of the two representations across different refresh rates. If signal quality is reduced by the refresh rate, then psychometric functions will also reduce only in the amplitude (deviation from chance level), but not in any other way. We, however, found that as the refresh rate was increased, the psychometric function shifted from the prediction of the match-based representation toward the prediction of the correlation-based representation. We suggest that stereoscopic depth perception combines the two representations and weights the correlation-based representation more for higher refresh rates. 
We previously showed that a spatial property of visual stimuli, the magnitude of binocular disparity, influences the relative weight of the two representations. The match-based representation dominates fine depth, while the correlation-based representation contributes to coarse depth (Doi et al., 2011). The present results complement our previous finding by showing the effects of the temporal properties of visual stimuli. Stereo depth perception recruits the two disparity representations for different inputs: a simple, correlation-based representation for more dynamic (faster) and coarser features, and a complex, match-based representation for less dynamic (slower or stationary) and finer features. 
What advantages do such multiple disparity representations have for the stereoscopic system? We speculate that an answer resides in a proposed role of attention and eye movements in vision: Given the limited capacity, the visual system cannot perform detailed analyses on all of the incoming signals, so it must allocate resources on selected aspects of the inputs to generate timely behavior (Itti & Koch, 2001; Neri & Heeger, 2002). Thus, the correlation-based representation is well suited for the system that guides attention and eye movements. The spatial scale of processing is coarse, but the temporal process is fast. Simple mechanisms such as the disparity energy model are sufficient to construct the correlation-based representation. The disparity energy signal is, indeed, used for guiding vergence eye movements (Masson, Busettini, & Miles, 1997; Takemura et al., 2001). In contrast, the match-based representation seems suited for more time-demanding, detailed analysis on visual scenes in three dimensions. The spatial scale is fine, but the temporal process is slow. Complex mechanisms are used for the refinement of the initial disparity signals such as reducing the disparity selectivity for anticorrelated images (Haefner & Cumming, 2008; Janssen et al., 2003; Kumano et al., 2008; Lippert & Wagner, 2001; Read, 2002; Read & Cumming, 2007; Tanabe & Cumming, 2008; Tanabe et al., 2004). The perceptual latency may be shorter for the depth mediated by the correlation-based representation than for that by the match-based representation, as the latency of the energy mechanism is shorter than those of the matched filtering for monocular luminance stimuli (Neri & Heeger, 2002). 
Relationship to sustained and transient stereopsis
The refresh-rate dependent use of the correlation-based and match-based representations is distinct from a previously described dichotomy between “sustained stereopsis” and “transient stereopsis” (Schor, Edwards, & Pope, 1998). Their transient stereopsis is often considered to be based on second-order, contrast mechanisms, and the sustained stereopsis on the first-order, luminance mechanisms (Hess & Wilcox, 2008; Wilcox & Allison, 2009; but see Edwards, Pope, & Schor, 2000 for an involvement of the first order mechanisms in the transient stereopsis). Transient stereopsis depends less on stimulus orientation (Edwards, Pope, & Schor, 1999), spatial frequency (Schor et al., 1998), and binocular contrast polarity (Pope, Edwards, & Schor, 1999) than does the sustained stereopsis. A difference of the present study from these previous studies is that we only targeted the first-order, luminance mechanisms. The second-order mechanisms operating on the contrast envelope were unlikely to play a major role in our experiments, because our RDSs always had an annulus at a fixed position and consisted of relatively dense (24%) dots. The contrast envelope was fixed and provided no cues for the depth of our RDSs. Fine-scale contrast mechanisms (Wilcox & Hess, 1997) were also unlikely to contribute, because such mechanisms using full-wave or half-wave rectification should be blind to the graded anticorrelation of RDSs. The present study reveals that even within the first-order luminance mechanisms, multiple representations are recruited differently for slowly and rapidly changing stimuli. 
Conflict across spatial-frequency channels with 0%-match RDSs
We modeled the decision-making process as the subtraction between near-preferring and far-preferring neurons. An alternate model is the “winner-take-all,” where the maximally activated unit dictates the decision. For example, the near decision is made when a neuron with a crossed-disparity preference has the largest response across a population of neurons with various preferred disparities. In this framework, the conflict across spatial-frequency channels is a key issue with 0%-match, anticorrelated RDSs (Read, 2002; Read & Eagle, 2000) and may explain the performance shift observed as the refresh rate is varied. Suppose that a disparity value is reported by a population of neurons with the tuned-excitatory tuning shape and various tuning-curve centers (i.e., various preferred disparities). The reported disparity is the preferred disparity of the neuron maximally activated within the population. When the tuning curves are inverted with 0%-match RDSs, the reported disparity changes depending on the population's spatial-frequency sensitivity (i.e., tuning-curve width). For high-frequency populations (i.e., populations with narrow disparity tunings), reported disparities have the same sign as the stimulus disparity of a 0%-match RDS. For low-frequency populations (i.e., populations with broad tunings), reported disparities have the opposite sign from the stimulus disparity, given a bias in favor of a smaller disparity magnitude (McKee & Mitchison, 1988). The high- and low-frequency neurons prefer slow and fast temporal modulations, respectively (DeAngelis et al., 1993). It may be possible that both high-frequency and low-frequency neurons are activated in a low refresh-rate condition, reporting conflicting signs of disparities. In contrast, only low-frequency neurons may be activated in a high refresh-rate condition, reporting reversed disparities. Thus, the conflict across spatial-frequency channels can also explain the reduction of reversed depth perception for slower refresh rates with 0%-match RDSs. We, however, note that this mechanism is insufficient to explain the observed changes in the entire psychometric function. 
Neural substrates for the correlation-based and match-based representations
The correlation-based and match-based representations may have neural correlates in the dorsal and ventral visual pathways, respectively. Along the dorsal pathway, disparity tuning curves to anticorrelated stereograms are inverted relative to those to correlated ones. This observation was made in the medial temporal (MT) and medial superior temporal (MST) areas (Krug et al., 2004; Takemura et al., 2001; but see Bridge & Parker, 2007; Preston, Li, Kourtzi, & Welchman, 2008, for discrepant results in human imaging studies). The transient component of vergence eye movement also shows an inverted pattern to anticorrelated RDSs (Masson et al., 1997). Thus, we speculate that the inverted representation to anticorrelated stereograms propagates up to some later stages of the dorsal pathway, especially eye-movement-related areas such as the lateral intraparietal area (Gnadt & Mays, 1995). We note, however, that the inverted representation to anticorrelated RDSs is attenuated in the anterior intraparietal area (Theys et al., 2012), which is crucial for visually guided grasping (Gallese, Murata, Kaseda, Niki, & Sakata, 1994). We suggest that the dorsal-pathway areas with inverted disparity tunings form neural substrates of the correlation-based representation. Along areas V4 and the inferior temporal area (IT) in the ventral visual pathway, the false-match responses to anticorrelated stereograms are gradually suppressed (Janssen et al., 2003; Kumano et al., 2008; Tanabe et al., 2004), suggesting that the disparity representation is based on binocular match. 
The present findings are consistent with the view that the two representations of disparity reside separately in the MT-MST pathway and V4-IT pathways. The correlation-based representation has stronger impacts on depth perception with more dynamic inputs, while the match-based representation contributes more with slower inputs. Areas MT and MST are implicated in the analysis of dynamic inputs such as motion. The disparity selectivity in MT is stronger for moving dots than for stationary dots (Palanca & DeAngelis, 2003). In contrast, the ventral pathway including V4 and IT is implicated in object recognition and analysis for static scenes. Performances for detecting visual objects are better for lower presentation rates and drop at a presentation rate of 8 Hz during rapid sequential visual presentation (Potter, 1975). The hemodynamic correlates of this temporal limitation are found in higher stages of the ventral pathway (McKeeff, Remus, & Tong, 2007). The response latency of the ventral-pathway neurons is slower than that of the dorsal-pathway neurons (Schmolesky et al., 1998). Therefore, the differential temporal properties of MT-MST pathway and V4-IT pathway may explain the refresh-rate dependent contributions of the correlation-based and match-based representations to depth perception. 
An alternative possibility is that the decision mechanism directly reads out disparity information from the primary visual cortex. The disparity representations in V1 are diverse: about a half of the tested cells respond as predicted by the disparity-energy model, constituting a neural substrate of the correlation-based representation, while the other half respond with attenuated disparity modulation, possibly constituting a neural substrate of the match-based representation (Haefner & Cumming, 2008). The former and latter groups of neurons might have transient and sustained temporal properties, respectively. 
A patient with ventral-stream damage (patient DF) could not perform stereoscopic tasks with anticorrelated RDSs, just as control subjects could not perform those tasks (Read, Phillipson, Serrano-Pedraza, Milner, & Parker, 2010). The results might appear to contradict our proposal based on the ventral and dorsal pathways, because without the ventral-stream contribution, the dorsal stream should govern stereoscopic perception and give rise to reversed depth for anticorrelated RDSs. However, we note that the reversed depth for anticorrelated RDSs is only perceived when the reference surface is binocularly correlated (Doi et al., 2011; Tanabe et al., 2008). It will be interesting to examine how patients with ventral-pathway damage perform with our graded anticorrelation and correlated reference. 
Role of temporal channels in stereoscopic depth perception
Temporal channels may play an important role in how the match-based and correlation-based representations are weighted according to the refresh rate. We suggest MT-MST and V4-IT pathways as likely neural substrates of the correlation-based and match-based representations, respectively. These pathways receive different inputs from early visual pathways in terms of temporal channels. Inputs to area MT predominantly originate from the magnocellular pathway (Maunsell, Nealey, & DePriest, 1990), which has a transient, band-pass response property (De Valois et al., 2000). The inputs to V4 originate from both magnocellular and parvocellular pathways (Ferrera, Nealey, & Maunsell, 1992), suggesting that the V4-IT pathway inherits more sustained, low-pass response characteristics than the MT-MST pathway. Psychophysical studies have shown that both the transient and sustained channels contribute to stereopsis (Kontsevich & Tyler, 2000; Lee, Shioiri, & Yaguchi, 2007). Our simulation showed that an increase in the refresh rate of our RDSs increases the responses that pass through the transient channel and decreases those that pass through the sustained channel (Figure 2D). Therefore, we suggest that both match-based and correlation-based representations combine inputs from the two temporal channels, but the relative weight of the transient over sustained channels differs between the two representations. The transient channel, which prefers rapid pattern updates, may be the dominant input source for the correlation-based representation. The sustained channel, which prefers static images, may predominantly underlie the match-based representation. 
The transient nature of the correlation-based representation is consistent with a previous observation in our laboratory. Depth perception for anticorrelated RDSs is in the reversed direction at zero interocular delay but in the correct direction with an approximately 100 ms interocular delay (Tanabe et al., 2008). The mechanisms mediating reversed depth perception must have a biphasic, transient temporal property (Figure 2C, inset). The present study suggests that the match-based representation has a different temporal property so that the relative influence of the two representations changes depending on the input temporal characteristics. 
Match-based representation through a nonlinear pooling of local disparity energy
The performance for half-matched RDSs improved when spatially adjacent pairs were made between contrast-matched dots and between contrast-reversed dots (positive pairing; Figure 11A). The performance decreased when the pairs were made between contrast-matched and contrast-reversed dots (negative pairing). The sign of pairing was indistinguishable in monocular vision (Figure 8C), suggesting that the performance change must be caused by the binocular process. The variability of locally defined binocular correlation was well correlated with behavioral performance (Figure 11B); it increased for positive pairing and decreased for negative pairing (Figure 10B). This variability of local correlation causes the variability of energy model responses (Anzai et al., 1999; Qian & Zhu, 1997; Tanabe et al., 2005). Our simulation showed that the variability of energy model responses signals disparity information (Figure 9). Taken together, we suggest that the performance for half-matched RDSs, a manifestation of the match-based representation, relies on the response variability of disparity energy units. 
These results provide insight into the process that mediates match-based representation. First, the spatial scale should be fine in the initial disparity detection by energy mechanisms. In half-matched RDSs, contrast-matched dots and contrast-reversed dots are spatially intermingled. With a larger (frontoparallel) RF, the signals from these opposite types of dots are more likely to cancel out within the RF, limiting the ability to signal disparity information in response variability (Figure 9E). Indirect evidence supporting this notion is that the performance deteriorated with stimulus eccentricity (see subject MT's data in Figure 11A; also compare the nonperfect performance from the other two subjects at 0% paired dots, tested at the eccentricity of 3.25°, to the perfect performance at 50% match reported in Doi et al., 2011, tested at the eccentricity of 3.00°). Second, it is not sufficient to simply average the signals detected at the initial stage across space and/or time, because such an average gives only the baseline response that corresponds to the zero correlation of half-matched RDSs. Psychophysical performance should be at chance level regardless of the percentage of paired dots. To extract disparity information, a nonlinear process must operate on the disparity energies computed for a small visual area before those local signals are pooled. An example solution is the threshold nonlinearity that operates on the outputs of energy mechanisms (Lippert & Wagner, 2001). The threshold nonlinearity, followed by spatiotemporal pooling, can encode the width of the energy-model's response distribution. If the response distribution is wider, the threshold pooling should give a larger value. Thus, neurons that nonlinearly transform the responses of energy-model-like neurons would have a mean firing rate that varies with the disparity of half-matched RDSs, which qualifies as a neural substrate of the match-based representation (Figure 1C, middle, red curve). 
The same threshold mechanism is effective to reduce the disparity selectivity to anticorrelated RDSs of disparity energy units (Lippert & Wagner, 2001). This mechanism works only with the disparity tunings with an even-symmetric shape. It is an interesting coincidence that the even-symmetric tuning contributes to extracting useful disparity signals from variable responses to half-matched RDSs. The signals in the response variability were consistent with those in the response mean only when the disparity tuning had an even symmetric shape (Figure 9). The extension to the odd-symmetric tuning needs further combination of the energy-model units (Haefner & Cumming, 2008; Tanabe & Cumming, 2008). Therefore, we suggest that the match-based representation is constructed through a nonlinear pooling of the disparity energies computed with a fine spatial scale, although this view should be considered tentatively due to mixed support from physiological studies (Tanabe & Cumming, 2008; Tanabe, Haefner, & Cumming, 2011). 
From the results of our experiments, we suggest that a temporally sustained and spatially fine process contributes to match-based stereoscopic depth. Energy-model-like neurons in the primary visual cortex with sustained properties may send their outputs to the V4-IT pathway to compute binocular matching. Small RFs should be advantageous to represent disparity information in the form of response variability when correlated and anticorrelated dots are spatially intermingled. This information can be read out by nonlinearity that comes before pooling. In contrast, a more transient process contributes to correlation-based stereoscopic depth. Energy-model-like neurons with transient properties may rapidly pass the signals to the MT-MST pathway without further nonlinear refinement of the signals. 
Can a single temporal channel or a single disparity representation explain our data?
We propose above that our data can be explained by considering both sustained and transient channels as well as correlation-based and match-based representations. Here, we discuss alternative explanations based on a single temporal channel or a single disparity mechanism. First, the observed performance with half-matched RDSs suggests a process such as a sustained channel → the disparity-energy model → an additional nonlinearity → decision circuit (Figure 12, red pathway). If the outputs of the energy-model units are passed onto the decision circuit without undergoing a nonlinear transformation, the disparity information in the response variability would be lost, resulting in chance-level performance. Thus, the nonlinearity should come after the energy model. If the dominant input source to the energy model is the transient channel, the response variability of the energy model and the percent correct of near/far decision would increase as the pattern refresh rate is increased as opposed to what was found with the experimental data (Figure 5B). Thus, the dominant input source should be a sustained channel. Second, the reversed depth with 0%-match RDSs suggests another process such as a transient channel → the energy model → decision circuit (Figure 12, blue pathway). If the process has the nonlinearity after the energy model, the reversed disparity tuning should be attenuated (Lippert & Wagner, 2001), resulting in an only weak reversed depth perception. Moreover, feeding the transient-channel output into the nonlinear pathway (the energy model followed by a nonlinearity) cannot explain the data with 50%-match RDSs as discussed above. Thus, this process should not have an additional nonlinearity after the energy model. If the dominant input source is a sustained channel, the reversed depth perception should become stronger as the refresh rate is decreased, as opposed to what we observed (Figure 5C). Overall, we speculate that sustained and transient temporal channels feed inputs differently to a correlation-based representation, which is mediated by the pure disparity-energy mechanism, and a match-based representation, which is mediated possibly by the energy mechanism followed by nonlinear signal transformation. 
Figure 12
 
Schematic illustration of a possible mechanism underlying the observed data. Visual inputs are processed with transient temporal channels (blue) and sustained temporal channels (red). The signals from the left and right eyes are combined according to the disparity-energy model. The energy model is the mechanism of the correlation-based representation. One way to construct the match-based representation is to process the outputs of the energy model with an additional nonlinearity. The correlation-based representation and the match-based representation are combined in the decision-making circuit. The arrows from the temporal channels to the energy models suggest presumably dominant pathways but are not intended to exclude other functional connections. Also, we do not intend that the nonlinear transformation of disparity energy signals is the only mechanism underlying the match-based representation (see Discussion, conflict across spatial-frequency channels with 0%-match RDSs).
Figure 12
 
Schematic illustration of a possible mechanism underlying the observed data. Visual inputs are processed with transient temporal channels (blue) and sustained temporal channels (red). The signals from the left and right eyes are combined according to the disparity-energy model. The energy model is the mechanism of the correlation-based representation. One way to construct the match-based representation is to process the outputs of the energy model with an additional nonlinearity. The correlation-based representation and the match-based representation are combined in the decision-making circuit. The arrows from the temporal channels to the energy models suggest presumably dominant pathways but are not intended to exclude other functional connections. Also, we do not intend that the nonlinear transformation of disparity energy signals is the only mechanism underlying the match-based representation (see Discussion, conflict across spatial-frequency channels with 0%-match RDSs).
Conclusion
We identified a new property of the disparity representations underlying stereopsis. When the refresh rate of visual patterns was increased, the contribution of a correlation-based representation increased, while that of a match-based representation decreased. This refresh-rate dependent switching may be consistent with the following model: Both correlation-based and match-based representations combine the transient and sustained temporal channels for luminance modulation, but the relative weight of the two channels differs between the two representations. The relative weight of the transient channel is larger for the correlation-based representation than for the match-based representation. We further suggest that the match-based representation exploits the response variability of fine-scale disparity-energy units, which has the capacity to signal disparity information. The match-based representation may involve a nonlinear mechanism for reading out the disparity energy signals, a process not needed for the correlation-based representation. Therefore, the stereoscopic system derives depth information using a simple, correlation-based representation with transient temporal properties and a more complex, match-based representation with sustained properties. The parallel mechanism in the stereoscopic domain may represent an instance of a general strategy by which the primate visual system processes sensory inputs efficiently (Nassi & Callaway, 2009). 
Acknowledgments
We thank Hiroshi Shiozaki, Mohammad Abdolrahmani, Seiji Tanabe, and Shuntaro Aoki for helpful comments on the manuscript, and Keiko Abe for the discussion on temporal channels. This work was supported by grants to I. F. from the Ministry of Education, Culture, Sports, Science and Technology of Japan (23135522, 23240047), the Japan Science and Technology Agency (CREST), and the Takeda Science Foundation. T. D. was supported by the Japan Society for the Promotion of Science Research Fellowship for Young Researchers. 
Commercial relationships: none. 
Corresponding author: Takahiro Doi. 
Email: takadoi@mail.med.upenn.edu. 
Address: Department of Neuroscience, University of Pennsylvania, Philadelphia, PA, USA. 
References
Anzai A. Ohzawa I. Freeman R. D. (1999). Neural mechanisms for processing binocular information I. Simple cells. Journal of Neurophysiology, 82 (2), 891–908. [PubMed]
Bishop P. O. Henry G. H. (1971). Spatial vision. Annual Review of Psychology, 22, 119–160. [CrossRef] [PubMed]
Bridge H. Parker A. J. (2007). Topographical representation of binocular depth in the human visual cortex using fMRI. Journal of Vision, 7 (14): 15, 1–14, http://www.journalofvision.org/content/7/14/15, doi:10.1167/7.14.15. [PubMed] [Article]. [CrossRef] [PubMed]
Cai D. DeAngelis G. C. Freeman R. D. (1997). Spatiotemporal receptive field organization in the lateral geniculate nucleus of cats and kittens. Journal of Neurophysiology, 78 (2), 1045–1061. [PubMed]
Cumming B. G. Parker A. J. (1997). Responses of primary visual cortical neurons to binocular disparity without depth perception. Nature, 389 (6648), 280–283. [CrossRef] [PubMed]
De Valois R. L. Cottaris N. P. Mahon L. E. Elfar S. D. Wilson J. A. (2000). Spatial and temporal receptive fields of geniculate and cortical cells and directional selectivity. Vision Research, 40 (27), 3685–3702. [CrossRef] [PubMed]
Dean A. F. (1981). The variability of discharge of simple cells in the cat striate cortex. Experimental Brain Research, 44 (4), 437–440. [PubMed]
DeAngelis G. C. Cumming B. G. Newsome W. T. (1998). Cortical area MT and the perception of stereoscopic depth. Nature, 394 (6694), 677–680. [PubMed]
DeAngelis G. C. Ohzawa I. Freeman R. D. (1993). Spatiotemporal organization of simple-cell receptive fields in the cat's striate cortex. I. General characteristics and postnatal development. Journal of Neurophysiology, 69 (4), 1091–1117. [PubMed]
Doi T. Tanabe S. Fujita I. (2011). Matching and correlation computations in stereoscopic depth perception. Journal of Vision, 11 (3): 1, 1–16, http://www.journalofvision.org/content/11/3/1, doi:10.1167/11.3.1. [PubMed] [Article] [CrossRef] [PubMed]
Edwards M. Pope D. R. Schor C. M. (1999). Orientation tuning of the transient-stereopsis system. Vision Research, 39 (16), 2717–2727. [CrossRef] [PubMed]
Edwards M. Pope D. R. Schor C. M. (2000). First- and second-order processing in transient stereopsis. Vision Research, 40 (19), 2645–2651. [CrossRef] [PubMed]
Edwards M. Schor C. M. (1999). Depth aliasing by the transient-stereopsis system. Vision Research, 39 (26), 4333–4340. [CrossRef] [PubMed]
Ferrera V. P. Nealey T. A. Maunsell J. H. (1992). Mixed parvocellular and magnocellular geniculate signals in visual area V4. Nature, 358 (6389), 756–761. [CrossRef] [PubMed]
Gallese V. Murata A. Kaseda M. Niki N. Sakata H. (1994). Deficit of hand preshaping after muscimol injection in monkey parietal cortex. Neuroreport, 5 (12), 1525–1529. [CrossRef] [PubMed]
Glass L. (1969). Moire effect from random dots. Nature, 223 (5206), 578–580. [CrossRef] [PubMed]
Glass L. Perez R. (1973). Perception of random dot interference patterns. Nature, 246 (5432), 360–362. [CrossRef] [PubMed]
Gnadt J. W. Mays L. E. (1995). Neurons in monkey parietal area LIP are tuned for eye-movement parameters in three-dimensional space. Journal of Neurophysiology, 73 (1), 280–297. [PubMed]
Haefner R. M. Cumming B. G. (2008). Adaptation to natural binocular disparities in primate V1 explained by a generalized energy model. Neuron, 57 (1), 147–158. [CrossRef] [PubMed]
Hawken M. J. Shapley R. M. Grosof D. H. (1996). Temporal-frequency selectivity in monkey visual cortex. Visual Neuroscience, 13 (3), 477–492. [CrossRef] [PubMed]
Hess R. F. Wilcox L. M. (2008). The transient nature of 2nd-order stereopsis. Vision Research, 48 (11), 1327–1334. [CrossRef] [PubMed]
Itti L. Koch C. (2001). Computational modelling of visual attention. Nature Reviews Neuroscience, 2 (3), 194–203. [CrossRef] [PubMed]
Janssen P. Vogels R. Liu Y. Orban G. A. (2003). At least at the level of inferior temporal cortex, the stereo correspondence problem is solved. Neuron, 37 (4), 693–701. [CrossRef] [PubMed]
Kontsevich L. L. Tyler C. W. (2000). Relative contributions of sustained and transient pathways to human stereoprocessing. Vision Research, 40 (23), 3245–3255. [CrossRef] [PubMed]
Krug K. Cumming B. G. Parker A. J. (2004). Comparing perceptual signals of single V5/MT neurons in two binocular depth tasks. Journal of Neurophysiology, 92 (3), 1586–1596. [CrossRef] [PubMed]
Kumano H. Tanabe S. Fujita I. (2008). Spatial frequency integration for binocular correspondence in macaque area V4. Journal of Neurophysiology, 99 (1), 402–408. [PubMed]
Lee S. Shioiri S. Yaguchi H. (2007). Stereo channels with different temporal frequency tunings. Vision Research, 47 (3), 289–297. [CrossRef] [PubMed]
Lippert J. Wagner H. (2001). A threshold explains modulation of neural responses to opposite-contrast stereograms. Neuroreport, 12 (15), 3205–3208. [CrossRef] [PubMed]
Marr D. Poggio T. (1979). A computational theory of human stereo vision. Proceedings of the Royal Society B: Biological Sciences, 204 (1156), 301–328. [CrossRef]
Masson G. S. Busettini C. Miles F. A. (1997). Vergence eye movements in response to binocular disparity without depth perception. Nature, 389 (6648), 283–286. [PubMed]
Maunsell J. H. Nealey T. A. DePriest D. D. (1990). Magnocellular and parvocellular contributions to responses in the middle temporal visual area (MT) of the macaque monkey. The Journal of Neuroscience, 10 (10), 3323–3334. [PubMed]
McKee S. P. Mitchison G. J. (1988). The role of retinal correspondence in stereoscopic matching. Vision Research, 28 (9), 1001–1012. [CrossRef] [PubMed]
McKeeff T. J. Remus D. A. Tong F. (2007). Temporal limitations in object processing across the human ventral visual pathway. Journal of Neurophysiology, 98 (1), 382–393. [CrossRef] [PubMed]
Nassi J. J. Callaway E. M. (2009). Parallel processing strategies of the primate visual system. Nature Reviews Neuroscience, 10 (5), 360–372. [CrossRef] [PubMed]
Neri P. Heeger D. J. (2002). Spatiotemporal mechanisms for detecting and identifying image features in human vision. Nature Neuroscience, 5 (8), 812–816. [PubMed]
Nienborg H. Bridge H. Parker A. J. Cumming B. G. (2004). Receptive field size in V1 neurons limits acuity for perceiving disparity modulation. The Journal of Neuroscience, 24 (9), 2065–2076. [CrossRef] [PubMed]
Ohzawa I. DeAngelis G. C. Freeman R. D. (1990). Stereoscopic depth discrimination in the visual cortex: neurons ideally suited as disparity detectors. Science, 249 (4972), 1037–1041. [CrossRef] [PubMed]
Palanca B. J. DeAngelis G. C. (2003). Macaque middle temporal neurons signal depth in the absence of motion. The Journal of Neuroscience, 23 (20), 7647–7658. [PubMed]
Parker A. J. (2007). Binocular depth perception and the cerebral cortex. Nature Reviews Neuroscience, 8 (5), 379–391. [CrossRef] [PubMed]
Poggio G. F. Fischer B. (1977). Binocular interaction and depth sensitivity in striate and prestriate cortex of behaving rhesus monkey. Journal of Neurophysiology, 40 (6), 1392–1405. [PubMed]
Poggio G. F. Gonzalez F. Krause F. (1988). Stereoscopic mechanisms in monkey visual cortex: Binocular correlation and disparity selectivity. The Journal of Neuroscience, 8 (12), 4531–4550. [PubMed]
Poggio G. F. Talbot W. H. (1981). Mechanisms of static and dynamic stereopsis in foveal cortex of the rhesus monkey. The Journal of Physiology, 315, 469–492. [CrossRef] [PubMed]
Pope D. R. Edwards M. Schor C. S. (1999). Extraction of depth from opposite-contrast stimuli: Transient system can, sustained system can't. Vision Research, 39 (24), 4010–4017. [CrossRef] [PubMed]
Potter M. C. (1975). Meaning in visual search. Science, 187 (4180), 965–966. [CrossRef] [PubMed]
Preston T. J. Li S. Kourtzi Z. Welchman A. E. (2008). Multivoxel pattern selectivity for perceptually relevant binocular disparities in the human brain. The Journal of Neuroscience, 28 (44), 11315–11327. [CrossRef] [PubMed]
Prince S. J. Cumming B. G. Parker A. J. (2002). Range and mechanism of encoding of horizontal disparity in macaque V1. The Journal of Neuroscience, 87 (1), 209–221.
Prince S. J. Eagle R. A. (2000). Weighted directional energy model of human stereo correspondence. Vision Research, 40 (9), 1143–1155. [CrossRef] [PubMed]
Qian N. Zhu Y. (1997). Physiological computation of binocular disparity. Vision Research, 37 (13), 1811–1827. [CrossRef] [PubMed]
Read J. C. (2002). A Bayesian model of stereopsis depth and motion direction discrimination. Biological Cybernetics, 86 (2), 117–136. [CrossRef] [PubMed]
Read J. C. Cumming B. G. (2007). Sensors for impossible stimuli may solve the stereo correspondence problem. Nature Neuroscience, 10 (10), 1322–1328. [CrossRef] [PubMed]
Read J. C. Eagle R. A. (2000). Reversed stereo depth and motion direction with anti-correlated stimuli. Vision Research, 40 (24), 3345–3358. [CrossRef] [PubMed]
Read J. C. Phillipson G. P. Serrano-Pedraza I. Milner A. D. Parker A. J. (2010). Stereoscopic vision in the absence of the lateral occipital cortex. PLoS One, 5 (9), e12608. [CrossRef] [PubMed]
Rieke F. Warland D. de Ruyter van Steveninck R. Bialek W. (1999). Spikes: Exploring the neural code. Cambridge, MA: MIT Press.
Schmolesky M. T. Wang Y. Hanes D. P. Thompson K. G. Leutgeb S. Schall J. D. Leventhal A. G. (1998). Signal timing across the macaque visual system. Journal of Neurophysiology, 79 (6), 3272–3278. [PubMed]
Schor C. M. Edwards M. Pope D. R. (1998). Spatial-frequency and contrast tuning of the transient-stereopsis system. Vision Research, 38 (20), 3057–3068. [CrossRef] [PubMed]
Shadlen M. N. Britten K. H. Newsome W. T. Movshon J. A. (1996). A computational analysis of the relationship between neuronal and behavioral responses to visual motion. The Journal of Neuroscience, 16 (4), 1486–1510. [PubMed]
Shiozaki H. M. Tanabe S. Doi T. Fujita I. (2012). Neural activity in cortical area V4 underlies fine disparity discrimination. The Journal of Neuroscience, 32 (11), 3830–3841. [CrossRef] [PubMed]
Takemura A. Inoue Y. Kawano K. Quaia C. Miles F. A. (2001). Single-unit activity in cortical area MST associated with disparity-vergence eye movements: Evidence for population coding. The Journal of Neuroscience, 85 (5), 2245–2266.
Tanabe S. Cumming B. G. (2008). Mechanisms underlying the transformation of disparity signals from V1 to V2 in the macaque. The Journal of Neuroscience, 28 (44), 11304–11314. [CrossRef] [PubMed]
Tanabe S. Doi T. Umeda K. Fujita I. (2005). Disparity-tuning characteristics of neuronal responses to dynamic random-dot stereograms in macaque visual area V4. The Journal of Neuroscience, 94 (4), 2683–2699.
Tanabe S. Haefner R. M. Cumming B. G. (2011). Suppressive mechanisms in monkey V1 help to solve the stereo correspondence problem. J Neurosci, 31 (22), 8295–8305. [CrossRef] [PubMed]
Tanabe S. Umeda K. Fujita I. (2004). Rejection of false matches for binocular correspondence in macaque visual cortical area V4. The Journal of Neuroscience, 24 (37), 8170–8180. [CrossRef] [PubMed]
Tanabe S. Yasuoka S. Fujita I. (2008). Disparity-energy signals in perceived stereoscopic depth. Journal of Vision, 8 (3): 22, 1–10, http://www.journalofvision.org/content/8/3/22, doi:10.1167/8.3.22. [PubMed] [Article] [CrossRef] [PubMed]
Theys T. Srivastava S. van Loon J. Goffin J. Janssen P. (2012). Selectivity for three-dimensional contours and surfaces in the anterior intraparietal area. Journal of Neurophysiology, 107 (3), 995–1008. [CrossRef] [PubMed]
Tyler C. W. (1990). A stereoscopic view of visual processing streams. Vision Research, 30 (11), 1877–1895. [CrossRef] [PubMed]
Uka T. DeAngelis G. C. (2004). Contribution of area MT to stereoscopic depth perception: Choice-related response modulations reflect task strategy. Neuron, 42 (2), 297–310. [CrossRef] [PubMed]
Wilcox L. M. Allison R. S. (2009). Coarse-fine dichotomies in human stereopsis. Vision Research, 49 (22), 2653–2665. [CrossRef] [PubMed]
Wilcox L. M. Hess R. F. (1997). Scale selection for second-order (non-linear) stereopsis. Vision Research, 37 (21), 2981–2992. [CrossRef] [PubMed]
Figure 1
 
Experimental rationale. (A) Screenshot of a half-matched random-dot stereogram (RDS). The RDS consists of a center disk and a surrounding annulus. Red crosses are fixation markers. During the experiments, we used red phosphors for stimuli and background while using white fixation crosses. A scale bar indicates 1°. (B) Graded contrast-reversal (anticorrelation) of random-dot stereograms (RDSs). Three representative RDSs are shown schematically. Pairs of circles denote the luminance contrast of stimulus dots projected to the left and right eyes. (Right) All pairs have matched contrast between the two eyes. Thus, the stereo half images are correlated perfectly. (Left) All pairs have reversed contrast; the stereo half images are anticorrelated perfectly. (Center) Half of the pairs have matched contrast. The half images are uncorrelated because correlated and anticorrelated dots cancel each other. (C) Ideal disparity tuning functions of the match-based (red, solid curve) and correlation-based (blue, dashed curve) representations for the three RDSs shown in (B). Only one of the two representations shows disparity-dependent modulation at the left (0% match) and center (50% match). (D) Hypothetical psychometric functions in a two-alternative near/far discrimination task. The probability of a correct choice is derived from percent match values for the match-based representation (red, solid curve) and from percent correlation values for the correlation-based representation (blue, dashed curve).
Figure 1
 
Experimental rationale. (A) Screenshot of a half-matched random-dot stereogram (RDS). The RDS consists of a center disk and a surrounding annulus. Red crosses are fixation markers. During the experiments, we used red phosphors for stimuli and background while using white fixation crosses. A scale bar indicates 1°. (B) Graded contrast-reversal (anticorrelation) of random-dot stereograms (RDSs). Three representative RDSs are shown schematically. Pairs of circles denote the luminance contrast of stimulus dots projected to the left and right eyes. (Right) All pairs have matched contrast between the two eyes. Thus, the stereo half images are correlated perfectly. (Left) All pairs have reversed contrast; the stereo half images are anticorrelated perfectly. (Center) Half of the pairs have matched contrast. The half images are uncorrelated because correlated and anticorrelated dots cancel each other. (C) Ideal disparity tuning functions of the match-based (red, solid curve) and correlation-based (blue, dashed curve) representations for the three RDSs shown in (B). Only one of the two representations shows disparity-dependent modulation at the left (0% match) and center (50% match). (D) Hypothetical psychometric functions in a two-alternative near/far discrimination task. The probability of a correct choice is derived from percent match values for the match-based representation (red, solid curve) and from percent correlation values for the correlation-based representation (blue, dashed curve).
Figure 2
 
The effects of pattern refresh rates on the relative activation of transient and sustained channels. (A) Simulated time courses of the luminance contrast at a given pixel for one eye in the RDSs. Time courses with low refresh rate (5.3 Hz, eight repetitions of the same image) and high refresh rate (42.5 Hz, only one repetition of the same image) are shown in the left and right, respectively. The contrast of three corresponds to a bright dot, one to the background, and −1 to either a dark dot or the closure of the shutter glass that occurs in every other frame. (B) The outputs of the sustained temporal channel (shown in the inset) for the example stimulus time courses shown in (A). A tick interval for the inset is 0.1 s for the x axis and 0.2 for the y axis. (C) The outputs of the transient temporal channel for the same example time courses shown in (A). The convention is the same as in (B). (D) The energy of the channel outputs as a function of refresh rate. We generated a random dot pattern 1,000 times and calculated the trial average of the signal energy (squared channel output followed by time integral) for each channel.
Figure 2
 
The effects of pattern refresh rates on the relative activation of transient and sustained channels. (A) Simulated time courses of the luminance contrast at a given pixel for one eye in the RDSs. Time courses with low refresh rate (5.3 Hz, eight repetitions of the same image) and high refresh rate (42.5 Hz, only one repetition of the same image) are shown in the left and right, respectively. The contrast of three corresponds to a bright dot, one to the background, and −1 to either a dark dot or the closure of the shutter glass that occurs in every other frame. (B) The outputs of the sustained temporal channel (shown in the inset) for the example stimulus time courses shown in (A). A tick interval for the inset is 0.1 s for the x axis and 0.2 for the y axis. (C) The outputs of the transient temporal channel for the same example time courses shown in (A). The convention is the same as in (B). (D) The energy of the channel outputs as a function of refresh rate. We generated a random dot pattern 1,000 times and calculated the trial average of the signal energy (squared channel output followed by time integral) for each channel.
Figure 3
 
Psychometric functions have different shapes for low and high refresh rates of dot patterns (Experiment 1). The average functions from all subjects tested (top left) and individual functions are shown. Continuous and dashed curves are best-fitted functions of the data for low (open circles) and high (solid circles) refresh rates, respectively. The correct percentages were calculated from 360 choices for the average data and 60 choices for the example data. Error bars indicate standard errors of the means across six subjects for the average data and across two blocks of trials for the example data.
Figure 3
 
Psychometric functions have different shapes for low and high refresh rates of dot patterns (Experiment 1). The average functions from all subjects tested (top left) and individual functions are shown. Continuous and dashed curves are best-fitted functions of the data for low (open circles) and high (solid circles) refresh rates, respectively. The correct percentages were calculated from 360 choices for the average data and 60 choices for the example data. Error bars indicate standard errors of the means across six subjects for the average data and across two blocks of trials for the example data.
Figure 4
 
Odd symmetry of the psychometric function increases with the pattern refresh rate (Experiment 1). (A) The odd symmetry was quantified as an area-based index. The area of the odd-symmetric component (blue area, odd symmetric with respect to the central point at 50% match and 50% correct) is divided by the total area that represents the total performance deviation from chance level (blue area plus gray area) to give an index, denoted the fractional area. The match-based and correlation-based representations predict zero and one fractional area, respectively. (B) The fractional area is plotted against the log refresh rate for the six subjects.
Figure 4
 
Odd symmetry of the psychometric function increases with the pattern refresh rate (Experiment 1). (A) The odd symmetry was quantified as an area-based index. The area of the odd-symmetric component (blue area, odd symmetric with respect to the central point at 50% match and 50% correct) is divided by the total area that represents the total performance deviation from chance level (blue area plus gray area) to give an index, denoted the fractional area. The match-based and correlation-based representations predict zero and one fractional area, respectively. (B) The fractional area is plotted against the log refresh rate for the six subjects.
Figure 5
 
Detailed characterization of the shape of the psychometric function (Experiment 1). Three quantities are plotted against the log refresh rate for individual subjects and the average across them. The values predicted by the correlation-based representation are shown with horizontal dotted lines. (A) The match level with chance performance increased with the refresh rate. (B) The percentage of correct choices with 50% match decreased with the refresh rate on average, with an offset depending on subjects. (C) The percentage of correct choices with 0% match decreased with the refresh rate on average, with an offset depending on subjects. The percent correct values were based on the fitted values but not on the raw data.
Figure 5
 
Detailed characterization of the shape of the psychometric function (Experiment 1). Three quantities are plotted against the log refresh rate for individual subjects and the average across them. The values predicted by the correlation-based representation are shown with horizontal dotted lines. (A) The match level with chance performance increased with the refresh rate. (B) The percentage of correct choices with 50% match decreased with the refresh rate on average, with an offset depending on subjects. (C) The percentage of correct choices with 0% match decreased with the refresh rate on average, with an offset depending on subjects. The percent correct values were based on the fitted values but not on the raw data.
Figure 6
 
Psychometric functions have different shapes for low and high refresh rates even when the number of independent dot patterns is fixed (Experiment 2). The notation is the same as in Figure 3; however, the refresh rates are different from those in Experiment 1 (Figure 3). The correct percentages were calculated from 420 choices for the average data (leftmost) and 60 choices for the data from example subjects.
Figure 6
 
Psychometric functions have different shapes for low and high refresh rates even when the number of independent dot patterns is fixed (Experiment 2). The notation is the same as in Figure 3; however, the refresh rates are different from those in Experiment 1 (Figure 3). The correct percentages were calculated from 420 choices for the average data (leftmost) and 60 choices for the data from example subjects.
Figure 7
 
The results in Experiment 1 were mostly replicated even when the number of dot patterns was fixed (Experiment 2). The odd symmetry and the three point-based characterizations of the psychometric functions are plotted as a function of refresh rate. (A) Fractional area, an index for odd symmetry, was higher for higher refresh rate. (B) Percent match with chance performance was higher for higher refresh rate. (C) Percent correct choice at 50% match was lower for higher refresh rate. (D) Percent correct at 0% match had no significant difference. The gray long line in each plot is the best fitted line to the data obtained in Experiment 1. The fitted model had a common slope parameter across subjects and variable offset parameters, which were averaged in this figure. The horizontal dotted lines represent the predicted level from the pure correlation-based representation.
Figure 7
 
The results in Experiment 1 were mostly replicated even when the number of dot patterns was fixed (Experiment 2). The odd symmetry and the three point-based characterizations of the psychometric functions are plotted as a function of refresh rate. (A) Fractional area, an index for odd symmetry, was higher for higher refresh rate. (B) Percent match with chance performance was higher for higher refresh rate. (C) Percent correct choice at 50% match was lower for higher refresh rate. (D) Percent correct at 0% match had no significant difference. The gray long line in each plot is the best fitted line to the data obtained in Experiment 1. The fitted model had a common slope parameter across subjects and variable offset parameters, which were averaged in this figure. The horizontal dotted lines represent the predicted level from the pure correlation-based representation.
Figure 8
 
Dot pairing of half-matched RDSs to manipulate the statistics of the local binocular correlation (Experiment 3). (A) Two example RDSs with positive pairing (right) and negative pairing (left). The RDSs consist of a center disk with a certain disparity and a surrounding annulus with zero disparity. Red crosses are the fixation marker, and its vertical bar was a nonius line. Scale bar: 1° (shown for only illustrative purpose, not present in the experiments). (B) Schematic illustration of the dot-paired and unpaired RDSs. (Right) 100% paired RDSs in which all the contrast-matched dots (blue outline) and all the contrast-reversed dots (orange outline) are vertically paired. The combination of monocular contrasts (white or black) is random. (Center) 0% paired RDSs in which no pairs are made. (Left) −100% paired RDSs in which all the contrast-matched dots are paired with contrast-reversed dots vertically. (C) All possible combinations of monocular contrasts and binocular correlation. In a single combination, the left and right columns represent the dots for the left eye and right eye, respectively. The positive and negative pairing cannot be distinguished from the combination of monocular contrasts, because every combination is equally likely to occur irrespective of the sign of the pairing.
Figure 8
 
Dot pairing of half-matched RDSs to manipulate the statistics of the local binocular correlation (Experiment 3). (A) Two example RDSs with positive pairing (right) and negative pairing (left). The RDSs consist of a center disk with a certain disparity and a surrounding annulus with zero disparity. Red crosses are the fixation marker, and its vertical bar was a nonius line. Scale bar: 1° (shown for only illustrative purpose, not present in the experiments). (B) Schematic illustration of the dot-paired and unpaired RDSs. (Right) 100% paired RDSs in which all the contrast-matched dots (blue outline) and all the contrast-reversed dots (orange outline) are vertically paired. The combination of monocular contrasts (white or black) is random. (Center) 0% paired RDSs in which no pairs are made. (Left) −100% paired RDSs in which all the contrast-matched dots are paired with contrast-reversed dots vertically. (C) All possible combinations of monocular contrasts and binocular correlation. In a single combination, the left and right columns represent the dots for the left eye and right eye, respectively. The positive and negative pairing cannot be distinguished from the combination of monocular contrasts, because every combination is equally likely to occur irrespective of the sign of the pairing.
Figure 9
 
Predicted disparity tuning of the response mean and the standard deviation with the energy model. (A) Even-type disparity tuning of the response mean. Three functions were disparity-tuning curves of the same model at 100% (black, open circle), 50% (gray, solid circle), and 0% (black, solid circle) match levels. The tuning function was flat at 50% match. (B) The disparity tuning of the standard deviation (SD) of the responses with the same model used in (A). The function at 50% match (gray, solid circle) has a similar tuning to the mean tuning function at 100% match. The mean and SD were calculated based on 10,000 simulated trials. (C) (D) Odd-type disparity tunings plotted with the same convention used in (A) and (B). (E) The SD tuning curves of even-type units with a range of RF sizes. The match level was 50%. The response SD was normalized by the response mean at each RF size. The disparity was scaled by the SD of RF envelope (the RF size inserted indicates two SD). Each data point was based on 10,000 trials.
Figure 9
 
Predicted disparity tuning of the response mean and the standard deviation with the energy model. (A) Even-type disparity tuning of the response mean. Three functions were disparity-tuning curves of the same model at 100% (black, open circle), 50% (gray, solid circle), and 0% (black, solid circle) match levels. The tuning function was flat at 50% match. (B) The disparity tuning of the standard deviation (SD) of the responses with the same model used in (A). The function at 50% match (gray, solid circle) has a similar tuning to the mean tuning function at 100% match. The mean and SD were calculated based on 10,000 simulated trials. (C) (D) Odd-type disparity tunings plotted with the same convention used in (A) and (B). (E) The SD tuning curves of even-type units with a range of RF sizes. The match level was 50%. The response SD was normalized by the response mean at each RF size. The disparity was scaled by the SD of RF envelope (the RF size inserted indicates two SD). Each data point was based on 10,000 trials.
Figure 10
 
The effects of dot pairing on the distribution of local binocular correlation. (A) The simulated distribution of local correlation is shown for three levels (−100%, 0%, 100%) of percent paired dots. The number of trials was 10,000 for each histogram. (B) The mean (solid circle) and standard deviation (open circle) of the distributions are plotted as a function of percent paired dots. The standard deviation increased with the percent paired dots. (C) The standard deviation was calculated and plotted for different window sizes from which the local correlation was calculated. The data with 0.42° (white circles) were the same as those shown in (B).
Figure 10
 
The effects of dot pairing on the distribution of local binocular correlation. (A) The simulated distribution of local correlation is shown for three levels (−100%, 0%, 100%) of percent paired dots. The number of trials was 10,000 for each histogram. (B) The mean (solid circle) and standard deviation (open circle) of the distributions are plotted as a function of percent paired dots. The standard deviation increased with the percent paired dots. (C) The standard deviation was calculated and plotted for different window sizes from which the local correlation was calculated. The data with 0.42° (white circles) were the same as those shown in (B).
Figure 11
 
(A) The performance increases with the percent paired dots. The percent correct on the ordinate was calculated from 240 choices for the average data (upper left panel), 60 choices for subjects TD and TO (upper right and lower left panels, respectively), and 120 choices for subject MT (large circles in lower right panel). Error bars indicate ± standard error of the means (SEMs) across three subjects for the average data and ± SEMs across blocks of trials for individual data (two blocks for subjects TD and TO, four blocks for subject MT). For subject MT, the performance was tested with easy (upper small circles) and difficult (lower small circles) conditions and averaged (middle large circles). (B) The average performance was replotted against the standard deviation (SD) of local correlation at a corresponding percent paired dots (Figure 10B).
Figure 11
 
(A) The performance increases with the percent paired dots. The percent correct on the ordinate was calculated from 240 choices for the average data (upper left panel), 60 choices for subjects TD and TO (upper right and lower left panels, respectively), and 120 choices for subject MT (large circles in lower right panel). Error bars indicate ± standard error of the means (SEMs) across three subjects for the average data and ± SEMs across blocks of trials for individual data (two blocks for subjects TD and TO, four blocks for subject MT). For subject MT, the performance was tested with easy (upper small circles) and difficult (lower small circles) conditions and averaged (middle large circles). (B) The average performance was replotted against the standard deviation (SD) of local correlation at a corresponding percent paired dots (Figure 10B).
Figure 12
 
Schematic illustration of a possible mechanism underlying the observed data. Visual inputs are processed with transient temporal channels (blue) and sustained temporal channels (red). The signals from the left and right eyes are combined according to the disparity-energy model. The energy model is the mechanism of the correlation-based representation. One way to construct the match-based representation is to process the outputs of the energy model with an additional nonlinearity. The correlation-based representation and the match-based representation are combined in the decision-making circuit. The arrows from the temporal channels to the energy models suggest presumably dominant pathways but are not intended to exclude other functional connections. Also, we do not intend that the nonlinear transformation of disparity energy signals is the only mechanism underlying the match-based representation (see Discussion, conflict across spatial-frequency channels with 0%-match RDSs).
Figure 12
 
Schematic illustration of a possible mechanism underlying the observed data. Visual inputs are processed with transient temporal channels (blue) and sustained temporal channels (red). The signals from the left and right eyes are combined according to the disparity-energy model. The energy model is the mechanism of the correlation-based representation. One way to construct the match-based representation is to process the outputs of the energy model with an additional nonlinearity. The correlation-based representation and the match-based representation are combined in the decision-making circuit. The arrows from the temporal channels to the energy models suggest presumably dominant pathways but are not intended to exclude other functional connections. Also, we do not intend that the nonlinear transformation of disparity energy signals is the only mechanism underlying the match-based representation (see Discussion, conflict across spatial-frequency channels with 0%-match RDSs).
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×