Abstract
Many animals, including humans, have substantial binocular overlap within their visual field. In the binocular zone, each eye's viewpoint yields a slightly different image of the same part of the scene. Binocular disparity – the local differences between the images – is a powerful cue for estimating the depth structure of the scene. But before disparity can be used for depth estimation, disparity must be estimated from the images. Psychophysical, neurophysiological, and computational studies have discovered many of the computational principles, cellular mechanisms, and behavioral limits of disparity estimation. However, methods for optimally estimating disparity in natural stereo-images given a vision system's constraints remain to be determined. Here, we describe a principled procedure for determining how to optimally estimate disparity given a set of natural stereo-images, an inter-ocular separation, a wave-optics model of each eye, and two photosensor arrays. First, we randomly selected a large set of patches from well-focused natural stereo-images; all had disparities within Panum's fusional range (+30 arcmin). Next, we passed the images through each eye's optics. Then, we removed undetectable image detail as predicted by the human retinal contrast detection threshold. Finally, we used a task-focused Bayesian statistical learning method to discover the spatial filters that are optimal for estimating disparity in natural stereo-images. We found the filters to be spatial frequency bandpass, with characteristics similar to disparity sensitive receptive fields in early visual cortex. We used the filters to obtain unbiased, high-precision estimates of disparity in 0.5 deg (or smaller) natural stereo-image patches. The optimal filters and estimation performance provide rigorous benchmarks against which existing behavioral, neurophysiological, and computational results can be evaluated.