Accurate and precise estimation of scene-relative object motion is important during self-movement. When one moves through the world, the projected image of objects in the scene forms a moving pattern on the eye, which is termed “optic flow” (
Gibson, 1950;
Gibson, 1958). If an object moves in the scene during one's self-movement, its retinal motion is due to both scene-relative object motion and one's self-movement in the world. To estimate scene-relative object motion accurately, the visual system must remove the self-movement component from the retinal motion of the object.
Warren and Rushton (2007,
2008,
2009; see also
Rushton & Warren, 2005) proposed that the visual system can parse out the self-movement component and attribute the remaining retinal motion to scene-relative object movement. They conducted a series of experiments and found that optic flow plays an important role for the visual system to identify and parse out the self-movement component and thus termed this process “flow parsing.” However, parsing out the self-movement component is incomplete based on visual information (such as optic flow) alone (e.g.,
Matsumiya & Ando, 2009;
Niehorster & Li, 2017;
Swanston & Wade, 1988;
Wexler, 2003). Non-visual information (such as vestibular, somatosensory, and proprioceptive information) generated during self-movement has been reported to contribute to the identification of scene-relative object motion during self-movement. For example,
Dyde and Harris (2008) found that self-movement in the form of sinusoidal oscillation affected the extent to which a fixated object had to move in space to appear earth stationary to the observer.
MacNeilage, Zhang, DeAngelis, and Angelaki (2012) asked observers to discriminate object motion during simulated self-movement through a star-field scene with or without scene-consistent vestibular stimulation. They found that observers had a better performance with added vestibular information than with visual information alone. Likewise,
Dokka, MacNeilage, DeAngelis, and Angelaki (2015) asked observers to judge object motion trajectory and found a larger compensation for the self-movement component with combined visual and vestibular information than with visual information alone.
Dupin and Wexler (2013) asked observers to judge the speed of object rotation and found similar results even though they placed visual and non-visual information in conflict. Finally, Fajen and his colleagues (
Fajen & Matthis, 2013;
Fajen, Parade, & Matthis, 2013) had participants walk to avoid moving obstacles in a virtual environment and found that participants relied on both visual and non-visual information about self-movement to judge whether they could safely pass.
Despite the above findings, to the best of our knowledge no study so far has quantitatively and systematically examined the contribution of visual versus non-visual information to the estimation of scene-relative object motion during walking, a common form of self-movement in daily life. Accordingly, it still remains in question how accurately people perceive scene-relative object motion during walking using visual information alone, non-visual information alone, or combined visual and non-visual information. In this study, we aimed to address this question. We placed a probe object on the ground that moved in the scene during simulated or real walking and used a nulling method developed by
Niehorster and Li (2017) to directly measure to what extent the retinal motion component of the probe due to self-movement could be removed to determine the accuracy of the estimation of scene-relative object motion during walking. We tested three stimulus conditions: (1) In the visual-information condition, participants stood still and passively viewed through a head-mounted display (HMD) the scene that simulated smooth forward walking over a random-dot ground plane with no head bobbing and body sway (
Figure 1a and
Supplementary Movie S1). Participants were thus provided with noise-free visual information (such as optic flow from the random-dot ground plane) about translational self-movement in this condition. (2) In the non-visual-information condition, participants walked straight in the real world while viewing through the HMD the online-generated scene that corresponded to their walking straight over an empty ground (
Figure 1b and
Supplementary Movie S2) in a virtual environment. The empty ground provided no optic flow and effectively eliminated any visual information about translational self-movement. Participants were thus provided with mainly non-visual information about self-movement (such as vestibular, somatosensory, and proprioceptive information) generated from walking in this condition. (3) In the combined-information condition, participants walked straight while viewing through the HMD the online-generated scene that corresponded to their walking over a random-dot ground in a virtual environment that generated optic flow (
Supplementary Movie S3). Participants were thus provided with combined visual and non-visual information about self-movement in this condition. By measuring the proportion of translational self-movement component subtracted from the retinal motion of the probe in these three conditions, we systematically examined the accuracy and the precision of the estimation of scene-relative object motion during walking based on visual information alone, non-visual information alone, or combined visual and non-visual information.