Abstract
How does the visual system derive estimates of 3D shape properties? Typically, models assume that sources of depth information such as texture gradients and binocular disparities are processed independently to produce a set of unbiased estimates sharing a common spatial metric. Under this assumption, the independent estimates can be optimally combined into a unified, accurate depth representation via Maximum Likelihood Estimation (MLE). However, this approach tacitly assumes that the visual system has learnt a veridical mapping between the individual image signals and the 3D properties they encode. An alternate approach, termed the Intrinsic Constraint (IC) model, does not require this assumption and instead considers raw image signals as vector components of a multidimensional signal. Critically, as long as the raw signals are proportional to physical depth, the vector magnitude will be as well. Assuming a fixed, but generally non-veridical scaling of the image signals (a normalization step), the IC model directly maps for any mixture of signals: (1) a vector magnitude to a perceived depth and (2) a fixed change in vector magnitude to a Just-Noticeable Difference (JND). We tested these predictions in two related studies. In the first, we asked participants to adjust a 2D probe to indicate the perceived depth of stimuli with simulated depths defined by disparity only, texture only, or disparity and texture together. In the second, participants adjusted the depths of cue-conflict stimuli with five fixed conflict ratios (texture depth relative to disparity depth: 0, 0.41, 1, 2.41, ∞) until all appeared equally deep. In both studies, JNDs were then measured for personalized sets of stimuli that evoked the same perceived depth. The results were incompatible with the assumption of accurate depth perception central to the MLE approach, while the IC model fared well in predicting the data of each task without free parameters.