There are many situations in which retinal information by itself provides insufficient information about the layout of objects in 3D space. For example, the perception of 3D shape from horizontal binocular disparity information requires an estimate of viewing distance. This estimate of viewing distance is needed because depth (
Z) is related to disparity (δ) by the following approximate equation (
Howard & Rogers, 2002):
where
I is the interocular separation, and
D the viewing distance. Notice that if
D is changed (e.g., by moving away from an object), the same depth,
Z, will be specified by a different binocular disparity. How does the visual system obtain an estimate of viewing distance that can allow it to scale disparity for depth and shape perception? The vergence state of the eyes is potentially useful as there is a direct relationship between vergence angle and object distance. If observers had knowledge about the vergence posture of their eyes, estimates of object distance could be made from 10 cm to 6 m (
Foley, 1980). Some evidence in the literature has cast doubt on the utility of vergence information as a cue to distance (
Erkelens & Collewijn, 1985;
Collewijn & Erkelens, 1990). However, extra-retinal cues to distance are no noisier than extra-retinal cues to visual direction (
Brenner & Smeets, 2000), extra-retinal cues appear to contribute to distance perception (
Brenner, van den Berg & van Damme, 1996), and some evidence suggests that extra-retinal cues can support reliable judgments of distance (
Mon-Williams & Tresilian, 1999).
The two tasks used in this study differed in the necessity of subjects to estimate viewing distance for their successful completion. Our first task was devised such that it required observers to make an estimate of viewing distance. This DEPTH task involved observers making a judgment about the shape of three patches positioned in depth to form an isosceles triangle (see
Figure 1). The apex of the triangle was closer to the observers than the base, and observers were required to indicate whether the base of the triangle was longer than its height (apex to base distance). (This task was modeled after the Apparently Circular Cylinder task and its variants used elsewhere to investigate the use of depth cues –
Johnston, 1991;
Glennerster, Rogers & Bradshaw, 1996;
Bradshaw, Parton & Glennerster, 2000). This task requires that observers make judgments about the metric depth structure of the information presented to them.
There are potentially three ways in which observers could exploit vergence eye movements to obtain information about the distance of objects in a scene. First,
Enright (1996) noted that observers judging the relative distances of objects tend to look at them in turn. One way to obtain information about the layout of objects in a scene would be to lock the vergence state of the eyes across a saccade so that when the eyes landed on a target at a different distance, the object would fall at disparate points on the retina. The difference between the retinal position of the object following a saccade (absolute disparity with respect to the fovea), compared with where it was before the eye movement, could be used to measure disparities (
Enright, 1991). This could be advantageous as very large disparities of objects far apart in both depth and lateral distance could be registered with respect to the fovea rather than being measured as much more peripheral relative disparities. This would also be useful if the extra-retinal vergence information was not accurate or precise. It could be easier to monitor maintained vergence than to measure shifts of vergence. Under this scheme, observers would be expected to make isovergent saccades when making distance judgments.
Second, Foley (
1980,
1985) suggested that the visual system uses a single estimate of distance, a notional reference point, relative to which differences in binocular disparity are scaled to obtain estimates of depth. It is not known whether observers specifically fixate on their chosen reference point. If observers were to use such a scheme they might be expected to fixate that reference point, obtain an estimate of the distance of the point, perhaps through sustained vergence, and then make judgments about the position of other objects with respect to this fixated point. It would not be necessary to move the eyes again.
Third, observers often combine changes of version with changes of vergence (e.g.,
Ono, Nakamizo & Steinbach, 1978;
Enright, 1984,
1986,
1992;
Collewijn, Erkelens, & Steinman, 1988a,
1988b). Observers may be sensitive to changes in ocular convergence across saccades (
Wright, 1951;
Brenner & van Damme, 1998). If this is the case, observers making judgments requiring distance scaling could simply make saccades that coordinate changes of version and vergence and register the changing vergence component. Under this scheme, observers making distance judgments would be predicted to make regular saccades that combine vergence changes with shifts in version.
The second task we employed was one that would not benefit from accurate vergence changes. In the RATIO task, observers examined three rectangular patches and determined which of them had a different aspect ratio from the other two presented (see
Figure 1). This task can be performed on the basis of retinal information: observers could simply compare the retinal width of a patch to its retinal height. In this task, the distance of each rectangle from the observer is functionally irrelevant. To be precise, it should be noted that although the physical aspect ratios of the targets might be equal, on the retina they would be slightly distorted due to perspective. We calculated the perspective distortion of the aspect ratios for our task configuration and found them to be very small indeed (maximum distortion was 0.8%). Such distortions would be indiscriminable as they are well below the aspect ratio discrimination thresholds (
Regan & Hamstra, 1994: thresholds are around 3–5%). Therefore, we are confident that all the useful information for performing this task is specified on the retina. For this task, we expected observers to fixate sequentially on each of the patches to examine the aspect ratio of each patch with their fovea. Control of vergence state across saccades would need to be sufficient only to prevent diplopia.
Using the two tasks described above, the aims of the study were, first, to characterize the use of eye movement strategies produced by naïve observers when performing a task requiring judgments about shape involving the scaling of disparities; second, to compare these measurements to those obtained when subjects performed a task that did not require accurate binocular eye movements, to determine whether there were any task specific effects (cf.
Epelboim et al., 1997;
Malinov et al., 2000); and, finally, to confirm that eye movement dynamics measured when observers perform a task unrelated to eye movement are similar to those when subjects are given specific instructions about the movement of their eyes.