When observers move their head while watching two points placed at different distances from them, a relative motion between the two points is produced on the retina. This relative motion caused by observers' self motion is called motion parallax, and it generates compelling subjective impressions of depth (Rogers & Graham,
1979). The main objective of this paper is to examine the temporal characteristics of depth perception from motion parallax.
Relative motion on the retina is produced not only by an observer's motion but also by the object's motion. An observer sees relative motion when objects move with respect to the observer. The visual system has to resolve this ambiguity; that is, it has to sort out to two components, one caused by the observers' self motion with respect to the objects, and the other caused by the objects' motion relative to the observer. The visual system resolves this problem mainly by relying on knowledge about the observer's own motion. If self motion is known and can be related to retinal motion, the motion components caused by self motion and by object motion can be segregated. Depth structure can also be perceived even when objects are moving without an observer's self motion, a situation called “structure from motion” (SfM). In SfM, the motion component caused by an object's global motion such as rotation or translation has to be segregated from the relative motions caused by depth structure. To perform this segregation, the visual system supposedly relies heavily on an assumption that objects are rigid (rigidity assumption). In addition, it has been proposed that it is necessary to accumulate motion input for a relatively long period to obtain SfM information (e.g., Hildreth, Grzywacz, Adelson, & Inada,
1990; Ullman,
1984). Considering the similarity between motion parallax and SfM, it is plausible that depth from motion parallax also requires relatively long temporal accumulation. However, although significant knowledge has been accumulated on temporal characteristics of SfM, the temporal characteristics of depth perception from motion parallax have not been studied systematically. Therefore, we begin by reviewing SfM studies relevant to a consideration of the temporal characteristics of motion parallax.
Treue, Husain, and Andersen (
1991) manipulated the life time of dots for an SfM display representing a rotating cylinder and found a life-time threshold of 50 to 100 ms, which is similar to the threshold for velocity estimation. They claimed that this similarity suggests that velocity measurements are used to process SfM. At the same time, in an experiment in which the duration was manipulated, they found that the performance in distinguishing cylindrical and scrambled displays increased as a function of presentation time up to 1000 ms when the life time was 100 ms. They also found that the reaction time for detecting an SfM cylinder was about 1000 ms. In a similar study, Eby (
1992) reported that although the depth from SfM could be perceived for durations as short as 100 ms, it was underestimated for such short durations. Perceived depth increased as the stimulus duration increased up to 500 to 1000 ms, at which point it reached a plateau. More recently, Domini, Vuong, and Caudek (
2002) reported that perceived depth from SfM at a given moment is affected by stimulus changes presented up to 1 s prior to the moment. All these results suggest that although low-level motion parameters on which SfM perception relies are detected rather quickly, several such measurements have to be integrated over time to obtain reliable SfM perception. Based on these results, Caudek, Domini, and Di Luca (
2002) proposed a model for SfM perception with two stages. In their model, it is assumed that the first stage calculates depth from local motion distribution within 150 ms, and the second stage calculates global consistency by integrating the output of the first stage over a period of approximately 1 s.
It is plausible that the mechanism for motion parallax has a similar second stage structure to that for SfM, and thus has similar temporal characteristics. For depth detection from motion parallax, local motions and relative motions between adjacent local motions have to be detected before depth is reconstructed. This motion detection stage, probably together with some pooling process, might correspond to the first stage proposed for depth from SfM. In addition, for parallax depth, motion is not visible even when depth is clearly visible if parallax is very small (e.g., Ono & Ujike,
2005). These detected motion signals then have to be bound with information about self-motion to be converted to depth signals.
Recently, Nawrot and Stroyan (
2012) reported that reaction time for depth-order discrimination was as short as 32 ms. However, in this study, they did not employ head movement. They examined depth perception when observers moved only their eyes, and they demonstrated that it was possible to discriminate depth polarity (near/far) with a very short stimulus duration with this paradigm. They did not examine if amount of depth can be perceived or not. Their results, nonetheless, could be understood as evidence for a two-stage mechanism for depth from motion parallax. However, as already mentioned, the knowledge about temporal characteristics of depth perception from motion parallax is very limited, and more knowledge should be accumulated before we discuss the configuration of mechanisms in detail. Therefore, the main objective of this study was to examine the temporal characteristics of depth perception from motion parallax in more general terms.
In the first experiment, we measured the perceived amplitude of a one-dimensional sinusoidal depth modulation induced by stimulus movement that was yoked to observers' side-to-side head movement. The motion parallax was presented, that is, the stimulus motion was yoked to the head movement for only a part of each component head movement. The yoke-ratio, which was the ratio of head to stimulus movements, geometrically corresponded to the depicted, or parallax-specified depth. Therefore, in other words, the parallax-specified depth was turned on and off in this experiment. The percentage of the yoked period and the yoke-ratio were systematically manipulated. If depth is detected by a mechanism with short temporal integration (i.e., high temporal resolution), perceived depth should appear to change and should be independent of the percentage of parallax presentation; however, if depth is determined by a slow integrating mechanism, perceived depth should be some sort of average when the percentage is lower. In the second experiment, to examine the response characteristics of parallax detection, we introduced an abrupt change in parallax during head movements to see if observers could detect the change.