Abstract
Many visual tasks are carried out by using multiple sources of sensory information to estimate environmental properties. In this work, we present a model for how the visual system combines disparity and velocity information. We propose that, in a first stage of processing, the best possible estimate of the affine structure is obtained by computing a composite score from the disparity and velocity signals. In a second stage, a maximum likelihood Euclidean interpretation is assigned to the recovered affine structure. Observers were asked to match the perceived amount of depth of two 3D smooth surfaces. One surface was defined by disparity or motion alone; the other presented both cues together. Condition 1: In each 2AFC trial, the combined-cues stimulus simulated a constant depth amount. Within three separate staircases, the simulated depth of (i) a disparity-velocity stimulus (with consistent simulated depth magnitudes from each cue), (ii) a disparity-only stimulus, and (iii) a velocity-only stimulus was varied. Condition 2: Within three separate staircases, the simulated depth of the combined disparity-velocity stimulus was varied, while maintaining constant the simulated depth of (i) a disparity velocity stimulus (with consistent simulated depth magnitudes from each cue), (ii) a disparity-only stimulus, and (iii) a velocity-only stimulus. Our results are consistent with the predictions of our model (Domini, Caudek and Tassinari, in press), both for the PSEs and the variability of observers' judgments. The present findings are also discussed in the framework of another theoretical approach of the depth cue combination process termed Modified Weak Fusion (MWF).
This work was supported by grant NSF: BCS 0345763