Human stereopsis has two well-known constraints: the disparity-gradient limit, which is the inability to perceive depth when the change in disparity within a region is too large, and the limit of stereoresolution, which is the inability to perceive spatial variations in disparity that occur at too fine a spatial scale. We propose that both limitations can be understood as byproducts of estimating disparity by cross-correlating the two eyes' images, the fundamental computation underlying the disparity-energy model. To test this proposal, we constructed a local cross-correlation model with biologically motivated properties. We then compared model and human behaviors in the same psychophysical tasks. The model and humans behaved quite similarly: they both exhibited a disparity-gradient limit and had similar stereoresolution thresholds. Performance was affected similarly by changes in a variety of stimulus parameters. By modeling the effects of stimulus blur and of using different sizes of image patches, we found evidence that the smallest neural mechanism humans use to estimate disparity is 3–6 arcmin in diameter. We conclude that the disparity-gradient limit and stereoresolution are indeed byproducts of using local cross-correlation to estimate disparity.

*D*

_{max}. Finally, the spatial variation in disparity from one part of the stimulus to another must not occur at too fine a scale. The finest perceptible variation is the stereoresolution limit. These limits to stereopsis are summarized in Figure 1. The upper panel is a stereogram of sinusoidal corrugations in which disparity amplitude increases from left to right and spatial frequency increases from bottom to top. View the stereogram at a distance of 40 cm and cross-fuse or divergently fuse to see the corrugations. One can perceive the sinusoidal depth variation in the middle of the stereogram but not elsewhere. The lower panel is a graph, replotted from Tyler (1975), showing the combinations of disparity amplitude and spatial frequency for which the corrugation in depth is perceived and the combinations for which it is not. Our purpose is to better understand the determinants of the boundary conditions for stereopsis.

*S*

_{L,R}are the responses of simple cells in the left and right eyes and even and odd refer to the symmetry of the simple-cell receptive fields (Prince & Eagle, 2000). In the last two terms, the left eye's response is multiplied by the right eye's response. A bank of such cells, each tuned for a different disparity, making this computation performs the equivalent of windowed or local cross-correlation.

*P*and

*Q*in a stereogram, the coordinates in the left eye's image are (

*x*

_{PL},

*y*

_{PL}) and (

*x*

_{QL},

*y*

_{QL}), and the coordinates in the right eye are (

*x*

_{PR},

*y*

_{PR}) and (

*x*

_{QR},

*y*

_{QR}). The separation is the vector

**from the average position of**

*S**P*to the average position of

*Q*. Its magnitude is

**; its magnitude is**

*D***∣/∣**

*D***∣. In Burt and Julesz (1980), the direction of was varied, but was always horizontal. They found that two-element stereograms could not be fused when the disparity gradient exceeded 1 regardless of the direction of . In other words, they found that the disparity-gradient limit was unaffected by the tilt of the stimulus (Stevens, 1979).**

*S***will be parallel to tilt. The definition of the disparity gradient for a horizontally oriented sawtooth corrugation is schematized in Figure 2.**

*S**A*is the amplitude of the sawtooth wave and

*SF*is the spatial frequency. The disparity gradient for the discontinuities between the slats is infinite. While sine waves do not have a constant disparity gradient, the disparity gradient of the steepest part of the waveform will have a similar relationship to the amplitude and spatial frequency. The corrugations in Figure 1 are horizontal (as they were in Tyler, 1973, 1974, 1975) and are defined by horizontal disparity only; thus, is vertical and

**is horizontal.**

*D*^{2}and an extent of 15° horizontally and vertically. The average luminous intensity of a dot was 1.72 × 10

^{−6}cd and the size was 0.53 arcmin. The dots were randomly distributed in the half-images. Two methods were used to create disparity. In the first, we shifted the dots horizontally in screen coordinates (which correspond to horizontal in Helmholtz coordinates). This is the most common method for creating stereograms, but such stimuli presented in a haploscope do not have the vertical disparities that are produced by real-world stimuli at finite distances (Held & Banks, 2008). In the second method, we used “back projection” to create the appropriate horizontal and vertical disparities (Backus et al., 1999).

*h*(

*x, y*)):

*a*= 0.583,

*s*

_{1}= 0.443 arcmin, and

*s*

_{2}= 2.04 arcmin (Geisler & Davila, 1985). The resulting images were scaled such that the spacing between rows and columns was 0.6 arcmin, corresponding roughly to the spacing between foveal cones (Geisler & Davila, 1985). These values were chosen to best approximate the analogous viewing situation for the human observers.

*L*(

*x, y*) and

*R*(

*x, y*) are the image intensities in the left and right half-images,

*W*

_{ L}and

*W*

_{ R}are the windows applied to the half-images,

*μ*

_{ L}and

*μ*

_{ R}are the mean intensities within the two windows, and

*δ*

_{ x}is the displacement of

*W*

_{ R}relative to

*W*

_{ L}(where the displacement is disparity). The normalization by mean intensity assures that the correlation is always between −1 and 1; the correlation for identical images is 1. Without the normalization, the resultant would depend on the contrast and average intensity of the half-images.

*W*

_{ L}and

*W*

_{ R}were identical two-dimensional Gaussian weighting functions:

*σ*

_{ x}and

*σ*

_{ y}had the same values (for an example of the use of anisotropic functions, see Kanade & Okutomi, 1994). These weighting functions were used to select patches of the left and right half-images, which were then cross-correlated. Throughout the manuscript, we refer to the size of these weighting functions as “window size”; the window size we report is the diameter of the part of the Gaussian containing ±1

*σ*. The actual windows used in the simulations extended to ±3

*σ*until they were truncated. Our weighting functions mimic the envelopes associated with cortical receptive fields, but not the even- and odd-symmetric weighting functions of the disparity-energy model (Ohzawa et al., 1990).

*W*

_{ L}along a vertical line perpendicular to the sawtooth corrugations in the middle of the left eye's half-image. For each position of

*W*

_{ L}, we then computed the correlation for different horizontal positions of

*W*

_{ R}relative to

*W*

_{ L}( Equation 9; horizontal defined in Helmholtz coordinates). The restriction of shifting

*W*

_{ L}along one vertical line greatly reduced computation time but did not affect the main results.

*W*

_{ L}along the vertical search line and the ordinate represents the horizontal position of

*W*

_{ R}relative to

*W*

_{ L}; thus, the ordinate is the horizontal disparity. Red corresponds to high correlations, green to correlations near 0, and blue to negative correlations.

^{2}was too low to provide sufficient luminance variation over that small a region. We conclude that disparity estimation via correlation suffers from a disparity-gradient limit, and that the limit cannot be avoided by choosing smaller window sizes.

^{2}. The disparity amplitude was 4.8 and 16 arcmin.

*N*is the number of dots and

*A*is the stimulus area. The diagonal line in Figure 13 is

*f*

_{ N}for the various dot densities. Human stereoresolution followed this sampling limit up to a density of ∼30 dots/deg

^{2}, so the highest resolvable spatial frequency was determined by dot sampling in the half-images at low to medium densities. At higher densities, it was restricted by something else. Banks et al. ( 2004) showed that the restriction was not the disparity-gradient limit; they showed this by reducing disparity and thereby reducing the disparity gradient, and observing that the asymptotic spatial frequency was unaffected.

^{2}. Peak-to-trough disparity amplitude was fixed at 10 arcmin, a value midway between the values of 4.8 and 16 arcmin used in Banks et al. ( 2004). For some modeling runs, we blurred the half-images with isotopic Gaussians before presenting them to the rest of the model.

- Window size was varied over a different range. As before, portions of the half-images were selected with isotropic Gaussian windows, but window sizes were now varied from 1.5 to 60 arcmin.
- The window in the left eye was translated in two directions, +10° and −10° from horizontal, corresponding to directions parallel to the two stimulus orientations.
- The decision templates were also oriented +10° and −10° from horizontal.
- The templates included both the tested spatial frequencies and half those values.

*f*

_{A}∝ 1/

*w,*where

*f*

_{A}is the asymptotic spatial frequency and

*w*is the window size. The model should be able to detect disparity variation at a finer scale simply by using smaller windows. However, for each window size there should also be a dot density below which estimation fails because the number of dots falling within the correlation window becomes on average too small, and correlation rises for false matches leading to failures in disparity estimation. From this argument, the limiting dot density would be inversely proportional to window area:

*d*∝ 1/

*w*

^{2}. Thus, for each combination of corrugation frequency and dot density, there should be an optimal size for the correlation window, a size constrained by the two limiting factors.

*f*

_{A}= 45/

*w*(diagonal line;

*f*

_{A}in cpd and

*w*in arcmin). (The absolute values of the spatial frequency plateaus can depend on other stimulus parameters, such as disparity amplitude, but a description of the effects of those parameters is beyond the scope of this paper.) For smaller window sizes, however, the asymptotic frequency leveled off at values lower than predicted. Those asymptotic frequencies depended strongly on blur magnitude: lower asymptotes for greater blur. Thus, there are two things that limit the highest discriminable corrugation frequency: the size of the correlation window (summarized by

*f*

_{A}= 45/

*w*), and the blur associated with the images sent to the cross-correlator (with greater blur, the luminance variation is insufficient to yield robust correlations between the two eyes' images).

*w*

^{2}(where dot density is in dots/deg

^{2}and

*w*is in arcmin) for the model to estimate disparity reliably. Thus, we were able to confirm that there is indeed an optimal window size for each combination of corrugation frequency and element density (Banks et al., 2004; Kanade & Okutomi, 1994).

*S*) are uniformly distributed in the world, the distribution of slants that stimulate a region in the retina is proportional to cos(

*S*) defined from −90° to +90° (Arnold & Binford, 1980; Hillis, Watt, Landy, & Banks, 2004). As a consequence, it is quite uncommon for a given part of the retina to be stimulated by slants sufficiently large to exceed the disparity-gradient limit at distances of 40 cm and beyond. We conclude that the gradient limit is generally not problematic for everyday viewing of opaque surfaces. Gradients exceeding the disparity-gradient limit are much more likely when viewing transparent surfaces (Akerstrom & Todd, 1988).

**∣/∣**

*D***∣, where**

*S***is the vector representing the binocular disparity between the points and**

*D***is the vector representing the separation between them. As far as we know, all previous investigations of the disparity-gradient limit have only considered horizontal disparities. We have argued here that the disparity-gradient limit is caused by the decrease in local cross-correlation that occurs when the two eyes' images become too different, as happens with large disparity gradients. From this point of view, it should make little difference whether the images differed horizontally or vertically. We wondered, therefore, whether a similar disparity-gradient limit applies for vertical disparity. Figure 19 demonstrates that such a limit does exist for vertical disparity and that the critical value is about the same as it is for horizontal disparity.**

*S*