Our perception of depth in a three-dimensional world relies on the visual system’s interpretation of the information from our two-dimensional retinal surface. While many different depth cues have been enumerated, binocular stereopsis and motion parallax are arguably the most important. Binocular stereopsis uses the slight differences in the images falling upon the two retina, known as binocular disparity (BD), to recover depth information. The stimulus conditions for motion parallax (MP) are created when an observer translates while viewing a rigid environment. While the observer’s fixation is automatically maintained on a specific point, objects nearer or farther than the fixation point move relative to each other on the observer’s retina. The visual system uses this relative movement of objects on the retina, motion parallax, as a cue to the relative depth of these objects in the environment. Observer movements may be abrupt lateral head translations or more sustained observer translations such as those generated when looking out the side window of a vehicle, a stimulus condition originally called motion perspective (
Gibson, 1950).
Unlike binocular stereopsis, surprisingly little is known about the essential processing mechanisms necessary for MP. The role of head movement has been assumed to be of central importance (
Steinbach, Ono & Wolf, 1991). Most recently, MP sensitivity has been quantified with regard to observer head translation velocity (
Ujike & Ono, 2001) suggesting a primary role of head movement in the perception of depth from MP. However, there remains disagreement on whether head movement provides a required extra-retinal signal for the perception of depth from MP (
Braunstein & Tittle, 1988;
Rogers & Rogers, 1992).
In their original work demonstrating the importance of motion parallax as an independent depth cue, Rogers and Graham (
1979) pioneered an experimental paradigm wherein shearing movement within a random-dot display was linked to translations of the observer’s head parallel to the interaural axis. To an observer making a translational head movement, the stimulus appears to be stationary corrugated surface with peaks extending out from the computer monitor and valleys extending back into the monitor. When head movements and stimulus shearing motion both stop, no depth is perceived. Rogers and Graham (
1979) also reported that the perception of depth was just as compelling, and unambiguous, with a fixed head when stimulus shearing movement was yoked to translation of the display monitor. Therefore, observer head movement does not appear to be a necessary condition for MP.
Recently, Nawrot (
2003) proposed that the slow eye movement system provides the extra-retinal signal required for the unambiguous perception of depth from MP. This proposal recognizes that all the stimulus conditions creating MP have a single common demand that the observer’s eyes move to maintain fixation on the stimulus. Using the Ono and Ujike (
1994) motion aftereffect paradigm, Nawrot (
2003) dissociated the roles of head movements, vestibularly driven eye movements — specifically the translation vestibulo-ocular response (TVOR) — and visually driven eye movements that will here be referred to as the optokinetic response (OKR). These visually driven eye movements could also be considered smooth pursuit, or the early, direct phase of optokinetic nystagmus (OKNe) (see
Miles & Busettini, 1992 for a review). Although these terms describe eye movements in response to slightly different stimulus conditions, the movements all share functional and physiological similarities and further study is undoubtedly required to understand their similarities or differences with respect to MP (e.g.,
Post & Leibowitz, 1985). Regardless of the specific terminology, Nawrot (
2003) showed that these OKR eye movements provide the extra-retinal signal required for the perception of unambiguous depth from MP.
One important problem in understanding the perception of depth from motion parallax is understanding how MP scales with depth, otherwise known as depth constancy (see
Howard & Rogers, 2002, Chapter 26 for a review). Similar to the perception of depth from binocular disparity, the perception of depth from MP appears to scale with viewing distance, or more specifically with apparent distance (
Rivest et al., 1989). However, this scaling is quite imperfect in typical laboratory conditions causing Ono et al., (
1986) to ask: “Why does the effectiveness of parallax decrease as a function of viewing distance?” Or, phrased in terms of depth constancy later in their paper, “Why does the compensation fail as the viewing distance increases?” That is, laboratory conditions for studying MP with side-to-side head movements appear to work best with short viewing distances. At larger viewing distances either no depth is perceived, or if depth is perceived it is ambiguous, fluctuates between reversing depth interpretations, and it shows no consistent relationship with the direction of observer head translation. For this reason most MP experiments in the literature include a viewing distance of 40 cm to 60 cm. Ono et al., (
1986) is quite unusual in including viewing distances farther than 114 cm, and it was at these distances that they reported MP becoming less effective.
The link between OKR eye movements and motion parallax suggests a way to study the question of depth scaling in MP. Central is the consideration of the observer’s head and eye movements occurring in tandem with the MP on the observer’s retina. To maintain fixation during an abrupt lateral head movement, the eyes move in the opposite direction compensating for the head movement. The magnitude of the compensatory eye movement scales inversely with viewing distance. These compensatory eye movements typically have a gain very close to 1.0, relying on a combination of TVOR and OKR. Studies conducted in dark (non-visual) conditions show that the TVOR eye movement scales with the distance to the remembered or imagined fixation point, instead of remaining constant as you might expect of a response that occurs even in non-visual (dark) conditions (
Schwarz et al., 1989;
Bronstein & Gresty, 1988;
Oas et al., 1992;
Paige & Tomko, 1991;
Paige et al., 1998). At near viewing distances, TVOR gain is typically less than 1 meaning that a large OKR component is required to maintain fixation. As viewing distance increases, TVOR gain approaches 1, meaning that smaller OKR eye movements are required with larger viewing distances. However, at much larger viewing distances TVOR gain is greater than 1 meaning that OKR eye movements must now suppress, cancel, or counteract the TVOR eye movements if fixation is to be maintained (
Paige & Tomko, 1991). The current study investigates whether these changes in OKR with viewing distance are related to viewing distance changes in MP depth constancy.
One problem is how to measure a subjective experience such as perceived depth from MP. How does an observer report the magnitude of depth perceived in a specific condition? In the experiment presented here, the magnitude of depth perceived from binocular disparity (BD) is used to index the magnitude of depth perceived from MP. The most important reason for using this technique is that very similar visual stimuli can be used for both. Moreover, the two types of stimuli can be quantified in very similar ways. Binocular disparity may be quantified in terms of the difference in the horizontal angles subtended at the two eyes between an object point and the fixation point. Motion parallax is commonly quantified in terms of disparity equivalence (DE) that is the amount of local stimulus translation or displacement in the frontal plane for a head translation equal to the interocular distance, along the interaural axis. To compare MP and BD, an assumed interocular distance of 6.5 cm was used.
To model this comparison between BD and MP, we must consider the stimulus parameters that affect perceived depth. The distance-squared law, which specifies the relationships between these stimulus parameters (
Cormack & Fox, 1985), provides a useful starting point. For the BD stimulus, the distance-squared law is:
where
dS is the specified depth,
DS is the distance to the stimulus,
δ is the binocular disparity, and
i is the inter-ocular distance. For the MP stimulus, the commonly used distance-squared law (
Rogers & Graham, 1982) is:
where
dM is the specified depth,
DM is the distance to the stimulus,
μ is the disparity equivalence given by stimulus translation or displacement, and
t is the distance the head translated laterally. The psychophysical study described here will determine the disparity of the BD stimulus that generates perceived depth that matches the perceived depth in the MP stimulus; that is the stimulus parameters giving
dS =
dM. We can model this comparison of BD and MP by equating
Equation 1 and
Equation 2:
If all the variables in
Equation 3 maintained the same relationships over changes in viewing distances, the perceived depths in the BD and MP stimuli would be equal when the specified parameters were equal. As the findings of Ono et al., (
1986) tell us, this does not occur. So we have to consider which of the variables in
Equation 3 might differ between the BD and MP stimuli.
In the experiment presented here, and in those by Ono et al., (
1986), it is assumed that there is no systematic difference in the internal representation of viewing distance,
DS and
DM. Such a difference is unlikely due to the unobstructed view of experimental apparatus. In the current study, the difference between BD and MP viewing was whether the observer’s eyes were occluded sequentially by the shutter glasses (BD) or a single eye was briefly occluded (MP). Indeed, Bradshaw et al (
1998,
2000) conclude from a BD and MP matching paradigm that BD and MP use “the same estimate of viewing distance to scale size and depth estimates.” If
DS =
DM, then they cancel in
Equation 3 and do not explain the failure of constancy with motion parallax.
It is also assumed that there is no systematic difference in the perception of disparity and motion parallax,
δ and
μ, over viewing distance Both parameters are quantified as proximal retinal stimuli, and the effect of viewing distance is only apparent when these proximal stimuli are used in the interpretation of depth. Moreover, the cue combination paradigm of Rogers and Collett (
1989) suggests a very close perceptual equivalence for equivalent
δ and
μ parameters, at least when presented at a single 57 cm viewing distance. Therefore, if
δ =
μ, then they also cancel in
Equation 3. (The reader should not confuse this theoretical equivalence in discussion of the distance-square law with the following study that uses a variable value of
δ to match a standard value of
μ.)
Finally, since
i (the observer’s interocular distance) remains constant over changes in viewing distance, the only term in
Equation 3 that can produce a difference in the perceived-depth matches as a function of viewing distance is
t, the measured lateral translation of the head. Why might
t be mis-estimated?
The hypothesis is that the effective
t—meaning the internal parameter that affects the perceived depth in a MP display—is provided by the OKR eye movement signal. We have known since the original study by Rogers and Graham (
1979) that head movements are not required for the unambiguous perception of depth from MP. Instead, Nawrot (
2003) proposes that the model parameter
t is served by an OKR eye movement signal. The current study investigates whether changes in viewing distance (
DM) produce a change in the perception of depth (
dM) from motion parallax (
μ) that co-varies with changes in the OKR signal.
While it is unclear what metric the visual system uses for the OKR signal, for the current study we use OKR gain to reflect the magnitude of the OKR signal. However, OKR gain is inversely proportional to the model parameter
t. Consider, a fixed magnitude head movement (
t) generates a smaller OKR eye movement as viewing distance increases; the use of OKR gain in the model preserves this relationship. When OKR gain is high (which occurs with near viewing distances and when the gain of TVOR is low), the resulting depth estimate is similar to the depth estimate generated by a smaller effective
t. The predictions illustrated in
Figure 1 stem from this hypothesis.
Figure 1 illustrates some possible results from a procedure in which BD is used to match the perception of depth from MP at various viewing distances. Assume motion parallax DE is fixed at 8 minarc at all viewing distances. The black line describes the result if observers require 8 minarc of BD to match the 8 minarc DE standard at each distance. The blue line describes the result if MP has less than perfect constancy and is perceived as compressed in depth. In this case, only 7 minarc of BD would be needed to match the depth portrayed by 8 minarc of DE. However, imagine that MP was matched by decreasing amounts of BD with increasing viewing distance. The red line in
Figure 1A is one description of this hypothetical result. To illustrate this in regard to perceived depth,
Figure 1B shows the disparity matches in
Figure 1A transformed into perceived depth values using the distance-squared law. Incomplete depth constancy is commonly observed with MP; behavior like this is represented by the red lines in
Figure 1A and
1B
Finally, what results are predicted if an OKR eye movement signal provides the necessary extra-retinal information required for recovery of unambiguous depth order in MP displays? Because OKR magnitude changes inversely with TVOR magnitude, which changes with viewing distance, OKR magnitude decreases with increasing viewing distance. If there is a connection between OKR and MP, depth scaling in a MP display should mirror changes in OKR gain (red lines). Considered with respect to
Equation 2, the red line in
Figure 1 also describes an increase in
t (a decrease in OKR) with viewing distance.