Binocular stereo cues are important for discriminating 3D surface orientation, especially at near distances. We devised a single-interval task where observers discriminated the slant of a densely textured planar test surface relative to a textured planar surround reference surface. Although surfaces were rendered with correct perspective, the stimuli were designed so that the binocular cues dominated performance. Slant discrimination performance was measured as a function of the reference slant and the level of uncorrelated white noise added to the test-plane images in the left and right eyes. We compared human performance with an approximate ideal observer (planar matching [PM]) and two subideal observers. The PM observer uses the image in one eye and back projection to predict a test image in the other eye for all possible slants, tilts, and distances. The estimated slant, tilt, and distance are determined by the prediction that most closely matches the measured image in the other eye. The first subideal observer (local planar matching [LPM]) applies PM over local neighborhoods and then pools estimates across the test plane. The second suboptimal observer (local frontoparallel matching [LFM]) uses only location disparity. We find that the ideal observer (PM) and the first subideal observer (LPM) outperforms the second subideal observer (LFM), demonstrating the additional benefit of pattern disparities. We also find that all three model observers can account for human performance, if two free parameters are included: a fixed small level of internal estimation noise, and a fixed overall efficiency scalar on slant discriminability.

*d*′ values) are scaled by an arbitrary efficiency parameter. However, if we include another plausible factor, a fixed level of internal estimation noise, then all three models make good quantitative predictions. Although the LPM observer does not predict the pattern of human thresholds significantly better than the LFM observer, its absolute performance is substantially better and more robust across analysis patch size (e.g. receptive field size), and thus there may have been evolutionary pressure to incorporate similar structural-disparity computations into the early visual system. We also measured depth discrimination, in addition to slant discrimination, with the same stimuli and found that there was a trend for human observers to be more efficient (relative to ideal) at slant discrimination than at depth discrimination.

^{2}. All experiments and analyses were done using custom code written in MATLAB using the Psychophysics Toolbox (Brainard, 1997; Pelli, 1997).

*s*is the test slant minus the reference slant, σ is the standard deviation parameter, and β is the bias parameter. The bias parameter corresponds to the 50% point of the psychometric function, and the value of the standard deviation was defined to be the threshold. Note that the discriminability,

*d*′, equals \({{\Delta s} / \sigma }\). We define threshold to be the value of Δ

*s*for which

*d*′ = 1.

*R*and

_{t}*L*are the images of the sample of 3D texture in the right and left eyes, and

_{t}*R*and

_{n}*L*are the independent samples of white noise added to the right and left images.

_{n}*f*(

*x*,

*y*) be the optimal filtering kernel in the space domain, then the filtered right and left images are given by the following:

**θ**= (θ

_{1},⋅⋅⋅, θ

_{m}). For planar surfaces, there are three parameters: distance, slant, and tilt. In our experiment, there are only two parameters: distance and slant (the tilt was fixed at zero). The goal of the observer is to estimate the surface geometry from the left and right images. Even after filtering, the added noise remains statistically independent across the two images and is the dominant noise source. Thus, using Bayes rule, the maximum posterior estimate of the surface geometry is given approximately by the following:

*p*(

**θ**) is the prior over surface geometry, σ

_{f}is the standard deviation of the filtered samples of white noise,

*n*is the number of pixels in the left image,

**R**

_{f}and

**L**

_{f}are the filtered right and left images represented in vector notation, and \({{{\bf \hat{L}}}_f}( {{{\bf \theta }},{{{\bf R}}_f}} )\) is the predicted left image given the right image and a specific surface geometry. The predicted left image is obtained by back projecting the right image to the 3D surface specified by the parameters and then forward projecting to the left eye. Note that one can also project from the left image to the right image (or both ways), but we have found that it makes little difference.

_{1}=

*s*and the intercept distance θ

_{2}= ζ. Because the tilt was fixed in our experiments, we assumed that the tilt is known. The specific equations for the backward and forward projection are given in the Appendix, for the case where the planar surfaces can have arbitrary slant, tilt, and distance. For present purposes, we assumed a uniform prior over slant (±70 degrees) and over intercept distance (100 cm ± 1 cm), which covers the full range of possibilities in the main experiment.

*w*. The computations for each patch are basically the same as described above, except now Equation 7 is applied to local patches within the test region.

*s*and distance of the patch

_{i}*z*, rather than the slant

_{i}*s*and intercept distance ζ

_{i}_{i}(see Figure 5). In other words, in Equation 7, we take

**θ**

_{i}= (

*s*,

_{i}*z*) rather than

_{i}**θ**

_{i}= (

*s*,ζ

_{i}_{i}). The reason is that the estimates of slant and intercept distance become more correlated the larger the horizontal distance (

*x*) of the patch from the cyclopean axis (i.e. changes in the estimate of slant cause changes in the estimate of intercept distances). On the other hand, slant and distance are nearly statistically independent everywhere. The relationship between the intercept distance and distance of the patch is given by ζ

_{i}_{i}=

*z*−

_{i}*x*tan

_{i}*s*, which can be substituted into the equations in the Appendix to express the backward and forward projections in terms of slant and distance.

_{i}*r*and

_{z}*r*are the reliabilities of the two estimates. To measure the reliability of the slant estimates for each cue, we computed slant estimates (for many test patches) separately for each reference slant. From these estimates we then computed the bias in the slant estimates for each reference slant. Finally, the reliability of the estimates was determined from the standard deviation of a cumulative Gaussian fit to bias-corrected slant estimates, as a function of test-patch slant. The reliability of the slant estimates was taken to be one over the square of this standard deviation (i.e. the reciprocal variance).

_{s}**θ**= (0,

*z*), which for the geometry in Figures 3 and 5 is equivalent to horizontally translating each patch in the right eye to find the best match in the left eye to obtain an estimated disparity, and then computing the estimated distance given the separation between the eyes. The estimated slant and intercept distance of the test plane are then obtained by applying Equation 8 to the set of estimated distances. These computations are illustrated in the bottom row of Figure 4. Note that there are errors in the estimated local distances (visible as the lumpiness in the \({\hat{z}_i}\) map). These failures to perfectly solve the local correspondence problem occur because of the model's assumption that the local slant is zero.

_{i}_{0}added to the slant estimates, and (ii) an overall efficiency scale factor ε that scales down all discriminability (

*d*′) values. Specifically, the discriminability of the model observers with these two free parameters is given by the following:

*s*is the mean difference in estimated slant between the test and reference plane for a model observer with no free parameters, and σ is the standard deviation of these estimated slant differences. Note that the efficiency scalar could correspond to scaling down the numerator, scaling up the denominator, or some combination. Given threshold is defined as a discriminability of 1.0, the slant-discrimination threshold of the model observers is given by the following:

*z*were varied. For more details about the model predictions see https://github.com/CanOluk/Stereo-Slant-Discrimination.

*w*. The value of this parameter was set so as to maximize the performance of the LFM model. The value of the patch-width parameter in the LPM model was identical to the patch width in the best-performing LFM observer. All three models outperform the best-performing human participant in the experiment (black symbols). As expected, the thresholds of the PM observer (blue symbols) are lowest in all conditions. The performance of the LPM model is similar to the LFM model; however, if the patch width used for the LPM observer is made larger, its performance improves, and (of course) asymptotes to the performance of the PM observer.

*w*), estimation-noise standard deviation (σ

_{0}) and overall efficiency scale factor (ε) were allowed to vary (only σ

_{0}and ε were allowed to vary for the PM model). Note that although the fits were obtained by maximizing likelihood, we report the root-mean-squared error (RMSE) in the figure because it is more intuitive. The predicted rate in fall-off in the thresholds with reference slant is similar to that in the human observers. Note that the best fitting values of σ

_{0}are quite small, on the order of 1 degree to 1.5 degrees, and hence are in a plausible range. Surprisingly, the predictions are about equally good for the three models, so the data are not sufficient to differentiate between the models.

*)*. Specifically, humans are most precise at computing differences in the slant and distance of surfaces (which is why we used the stimuli illustrated in Figure 2). What the current models do not predict are changes in performance with increased separation in space and time between the test and reference. To make plausible predictions for such experiments would require including other factors, such as memory limitations, disparity contrast mechanisms, and reduced spatial resolution in the periphery. The importance of the proximity of the test and reference surfaces may explain the surprising observation that humans appear to be more efficient at slant discrimination than distance discrimination (see Figure 12). In the slant task, the depth information (relative to the reference plane) is concentrated near the edges of the test plane; whereas, for depth discrimination the relative depth information is uniformly distributed across the test plane. If humans are better able to integrate information near the reference plane, then their efficiency (relative to the model observers) should be higher in the slant task.

*PLoS Computational Biology,*7(8), e1002142. [CrossRef]

*Vision Research,*39, 1143–1170. [CrossRef]

*Journal of the Optical Society of America A,*2, 1211–1216. [CrossRef]

*Journal of Neuroscience,*24, 2077–2089. [CrossRef]

*Vision Research,*10(11), 1181–1199. [CrossRef]

*The Journal of Physiology,*211(3), 599–622. [CrossRef]

*PLoS One,*8 (12), e82999. [CrossRef]

*Journal of Neuroscience*, 21(18), 7293–7302. [CrossRef] [PubMed]

*Current Opinion in Neurobiology,*18(4), 425–430. [CrossRef]

*Visual Neuroscience*, 18(6), 879–891. [CrossRef] [PubMed]

*Annual Review of Vision Science,*6: 491–517. [CrossRef]

*Journal of Vision,*14(2): 1. [CrossRef]

*Journal of Neuroscience,*30(22): 7714–7721. [CrossRef]

*PLoS Computational Biology,*13(2), e1005281. [CrossRef]

*Journal of Vision,*16(13), 1–25. [CrossRef]

*Vision Research,*33, 2189–2201. [CrossRef]

*Nature Communincations*11, 6390.

*Journal of Neurophysiology,*107, 3281–3295. [CrossRef]

*Data Fusion for Sensory Information Processing Systems*. New York, NY: Kluwer Academic Publishers.

*Supplement to the Journal of the Royal Statistical Society,*4, 102–118. [CrossRef]

*Vision Research,*31, 2195–2207. [CrossRef]

*Ophthalmology and Physiological Optics,*13, 3–7. [CrossRef]

*Annual Review of Neuroscience,*24, 203–238. [CrossRef]

*Nature,*352, 156–159. [CrossRef]

*Computing differential properties of 3-D shapes from stereoscopic images without 3-D models*(pp. 208–213). Sophia Antipolis, France: INRIA.

*Journal of Vision,*9(1): 8.1–18. [CrossRef]

*Vision Research,*11, 1299–1305.

*Vision Research,*51, 771–781.

*Journal of Vision,*9(13): 17.1–16.

*Perception & Psychophysics*36, 559–564. [PubMed]

*Journal of Vision,*9(9): 8.1–20.

*Frontiers in Psychology,*4, 1014.

*Signal Detection Theory and Psychophysics*. New York, NY: Wiley & Sons.

*Neural Computation,*21(9), 2581–2604.

*Vision Research,*36, 2263–2270.

*Spatial Vision*, 16(2), 183–207. [PubMed]

*Vision Research,*38(8), 1073–1084.

*Journal of Experimental Psychology: Human Perception and Performance,*28(2), 469.

*Journal of Vision,*4(12), 967–992.

*IEEE Conference on Computer Vision and Pattern Recognition,*pp. 1–8, https://doi.org/10.1109/CVPR.2007.383248.

*Perceiving in Depth Vol. 2: Stereoscopic Vision*. Oxford, UK: Oxford University Press.

*Journal of Vision,*17(12): 16.

*Proceedings of the European Conference on Computer Vision,*Lecture notes in Computer Science, vol. 588. Berlin, Heidelberg, Germany: Springer, 661–669.

*Philosophical Transactions of the Royal Society B,*371, 20150266.

*eLife,*7: e31148

*PLoS Computational Biology,*16 (6), e1007947.

*Vision Research*, 38, 1655–1682. [PubMed]

*Vision Research,*43(24), 2539–2558.

*International Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition*(pp. 617–632). Berlin, Heidelberg, Germany: Springer.

*Proceedings of the Royal Society of London. Series B. Biological Sciences*, 204(1156), 301–328.

*Vision Research,*30, 1781–1791.

*Journal Neuroscience,*24(9), 2065–2076.

*International Journal of Computer Vision*65(3), 147–162.

*Archives of Ophthalmology,*20(4), 604–623.

*Science,*249(4972), 1037–1041.

*Journal of Neurophysiology,*77, 2879–2909.

*Journal of Vision,*20(11), 578–578.

*Vision Research,*43, 2451–2468.

*Journal of Neurophysiology,*107, 1857–1867.

*Spatial Vision,*10, 437–442.

*Journal of Neurophysiology,*90, 946–960.

*Journal of Neurophysiology,*95(5), 2768–2786.

*International Journal of Computer Vision,*47(1-3), 7–42.

*Vision Research,*24, 533–542.

*Vision Research,*32, 1685–1694.

*Perception & Psychophysics,*33(3), 241–250.

*IEEE Transactions on Pattern Analysis And Machine Intelligence,*19, 247–252.

*Nature neuroscience,*6(9), 989–995.

*Vision Research,*18, 101–105.

*Frontiers of Computational Neuroscience,*https://doi.org/10.3389/fncom.2012.00047.

*Perception,*28(9), 1121–1145.

*Journal of Vision,*16(5), 16.

*Nature Neuroscience,*5, 598–604.

*Philosophical Transactions of the Royal Society of London,*128, 371–394

*IEEE Transactions on Pattern Analysis and Machine Intelligence*, 13, 761–774.

*Vision Research,*16, 983–989.

*s*and a tilt of τ, for the imaging geometry illustrated in Figure 3. The equations are shown in Figure A2. These equations are for arbitrary slant, tilt, and distance, but for the current experiments the tilt was set to zero.

*x*,

_{R}*y*): \({\hat{I}_L}( {{{\hat{x}}_L},{{\hat{y}}_L}| {s,\zeta } } ) = {I_R}( {{x_R},{y_R}} )\). The estimate of distance and surface orientation are the values of distance, slant, and tilt that give the most accurate prediction of the left-eye image (smallest mean squared error).

_{R}*z*rather than intercept distance ζ(see Figure 5). The equations in Figure A2 can be expressed in terms of distance by setting ζ =

*z*−

*x*tan

*s*cos τ +

*y*tan

*s*sin τ. Note that in the present experiment, where tilt is zero, ζ =

*z*−

*x*tan

*s*, and that for the LFM model ζ =

*z*, because of the assumption that

*s*= 0.

*z*is the distance from the nodal point to the image plane, and (

_{f}*x*′,

*y*′) is the point in the image plane. If the nodal point is shifted to the left by a distance of

*a*(left eye point in Figure 3), then the point in the image plane (

*x*,

_{L}*y*) is given by

_{L}*x*,

*y*,

*z*) are defined here in global Euclidean coordinates as the slant, tilt, and intercept distance (

*s*, τ, ζ) of the plane passing through that surface point (Figure 3), where the slant, tilt, and distance are with respect to the cyclopean axis. The intercept distance ζ is the intercept of the plane with the cyclopean axis. The slant

*s*is defined as the magnitude of the angle (0–90 degrees) between the surface normal and the cyclopean (or optic) axis, and the tilt τ is defined as the direction (−180 degrees to –180 degrees) around that axis in which distance is changing most rapidly (the counter-clockwise angle of the parallel projection of the surface normal vector; see Figure 1). Using these definitions, the equation of the plane is

**n**· (

**x**−

**x**

_{p}) = 0, where

**n**is the normal vector at location

**x**

_{ζ}= (0, 0, ζ). The normal vector is obtained by rotating the unit normal vector of the frontoparallel plane, (0, 0, −1), around the vertical (

*y*) axis by angle

*s*, and then rotating the resulting vector around the distance (

*z*) axis by angle τ. Substituting the rotated normal vector into the equation for a plane gives Equation A7.

*a*from both sides of Equation A8 and then taking the ratio with Equation A9 we get

*x*in Figure A2,

*y*in Figure A2: