In natural settings, monocular information about depth is very imprecise. Our results show that, even for objects in rich local surroundings, monocular depth thresholds are as much as a log unit higher than binocular depth thresholds. We argue that, for static viewing, this imprecision follows from the viewing geometry; monocular information about relative depth along the
z-axis depends on the
projected distance separating features along the
x-axis. To produce a detectable change in depth monocularly, the associated change in the
x-axis projection has to reach threshold levels (see
Figure 5). Thresholds for lateral separation are 1–2%. To produce a 1–2% increment in the
x-axis projection, the change in the viewing distance to the test object has to be roughly 1–2%. Our monocular depth thresholds in the austere setting correspond to the 1–2% change in the
z-axis distance needed to produce the 1–2% change in the
x-axis projection.
In our enriched setting, the textured paper provided many marks that served as additional reference points. Thresholds in the enriched setting were therefore somewhat lower than thresholds for the austere setting. Would monocular thresholds be even lower if the rods were superimposed directly on the textured surface, or better yet, superimposed on a ruler with demarcations specifying numbered units? If an observer were estimating the position of a test rod positioned on a ruler lying on the z-axis, then determining where the rod fell on the ruler, e.g., where exactly the rod was sitting between the 5- and 6-cm marks, would still be imprecise for the same geometrical reasons described above. Of course, the optimum strategy for the monocular observer is simple. Walk to one side of the display, so that the z-axis is directly converted into an x-axis. Then, reading the position from a ruler is limited by the exquisite human sensitivity for lateral separation. In fact, in any natural setting, the optimum strategy for utilizing monocular cues is to view the depth relationships off the line of sight, so that the z-axis is converted into an x-axis judgment. This strategy obviously will not work if the objects are very far away; it also takes time. Fine stereopsis provides a rapid, precise assessment of depth differences along the line of sight without any need to change position.
In the
Introduction section, we noted that thresholds are usually limited by internal sources of noise. From the poor thresholds, one might guess that all monocular processing is inherently noisier than binocular processing. This conclusion is incorrect. The Weber fraction for width is about 1–2%, much better than the Weber fraction for disparity, which is 5–6%. If the monocular noise is so low, then why are the monocular thresholds so bad? Keep in mind that we are not measuring monocular thresholds for incremental changes along the
x-axis, i.e., width. Instead, we are measuring the ability to discriminate changes along the
z-axis from the information in the monocular image. Changes along the
z-axis necessarily produce angular changes in
x-axis dimensions—in the projected lateral separation between the rods or in the angular subtense of the rods—but as noted above, these changes are remarkably small. In short, monocular thresholds for real objects are largely limited by the lack of physical information, rather than by internal noise.