Abstract
The visual system is conventionally assumed to be optimized for recovering the ‘veridical’ structure of the external 3D world. A major paradox for this view is the perception of pictorial depth. Pictures generate a robust impression of depth and 3D structure contrary to physical reality. This paradox is often overlooked because it is conventionally assumed that pictorial depth is the statistically optimal outcome of a computation involving conflicting independent depth signals. Surprisingly, this assumption has never been tested. Here we examined if the perception of depth in pictorial images can be explained by any of the variants of statistically optimal cue combination (maximum likelihood estimation (MLE), statistical robustness, cue promotion, etc.). Methods: Subjects viewed simple pictures consisting of a textured elliptical hemi-cylinder and judged the perceived curvature-in-depth. We measured both the magnitude of perceived depth (PSEs) as well as thresholds (JNDs) for depth discrimination under binocular and monocular viewing for different base curvatures. We also measured depth discrimination thresholds for disparity specifying a flat surface. Results: There was a predicted effect of base curvature on depth magnitude judgments, but surprisingly no difference between monocular and binocular viewing for any base curvature. Disparity thresholds were an order of magnitude lower than those for texture. These results are contrary to MLE which predicts that disparity should have received a much higher weight than texture, yielding smaller depth magnitudes under binocular viewing (especially at low curvatures). Statistical robustness also predicts little or no depth under binocular viewing, since disparity (and not texture) is consistent with other potentially available cues, all of which specify a flat surface (visible surface microtexture, sequential vergence, defocus blur, motion parallax from small movements). Invoking other popular statistical free parameters such as the "flatness prior" makes the situation worse. We discuss the results in the context of alternative theories of depth cue combination based on signal-to-noise ratios.
Meeting abstract presented at VSS 2013