Abstract
Current understanding of stereopsis emphasises the detection of matching features between the eyes (i.e., 'solving the correspondence problem') so that the depth of objects in the world can be triangulated and 'false matches' discarded. While this seems intuitive, binocular images naturally give rise to multiple mismatched features, and many V1 neurons appear optimised for binocularly incongruent stimuli. Here we propose an alternative approach, based on optimal information encoding, that mixes disparity detection with proscription: actively ruling out alternative interpretations by exploiting dissimilar features. We develop a psychophysical demonstration involving discriminating a step edge depicted in random dot stereograms (RDS). We quantified the masking effect of adding anticorrelated dots (aRDS: e.g., bright dot in one eye paired with a dark dot in the other) to an edge depicted using correlated stimuli (cRDS: e.g., bright dots match bright dots). While V1 neurons encode disparities in aRDS (Cumming & Parker, 1997), these stimuli are traditionally understood to stimulate 'false matches' that are discarded by the brain. Participants judged which side of the step was closer to them. We measured thresholds by changing the relative proportion of correlated vs. anticorrelated dots in the display. Our critical manipulation changed the disparity configuration depicted by the anticorrelated dots: unbeknownst to the viewer, this was either the same vs. opposite that specified by the cRDS. Masking was much stronger when correlated and anticorrelated dots specified the 'same' disparity configuration. This is expected from proscription: anticorrelation drives suppression of the encoded disparity, thereby making the correlated depth harder to see. Control measurements ruled out explanations based on residual perceptual sensitivity to aRDS disparity per se. We capture these findings in a Binocular Likelihood Model that provides a principled means of translating between disparity detection vs. proscription when estimating the most likely depth structure of a viewed scene.
Meeting abstract presented at VSS 2017