Abstract
As we move through the world, the pattern of light projected on our eyes is complex and dynamic, yet we are still able to distinguish moving and stationary objects. One might hypothesize that this is achieved by detecting discontinuities in the spatial pattern of velocities, however this computation is also sensitive to velocity discontinuities at boundaries of stationary objects. We instead propose that humans make use of the specific constraints that self-motion imposes on retinal velocities. When an eye translates and rotates within a rigid 3D world, the velocity at each location on the retina is constrained to a line segment in the 2D space of retinal velocities (Longuet, Higgins, Prazdny 1980). The slope and intercept of this segment is determined by the eye’s translation and rotation, and the position along the segment is determined by depth of the scene. Since all possible velocities arising from a rigid world must lie on this segment, velocities not on the segment must correspond to moving objects. We hypothesize that humans make use of these constraints, by partially inferring self-motion based on the global pattern of retinal velocities, and using deviations of local velocity from the resulting constraint lines to detect moving objects. Using a head-mounted virtual reality device we simulated a translation forward in different virtual environments: one consisting of textured cubes above a textured ground plane, and one of scattered depth-matched dots. Participants had to determine if a cued cube/dot moved relative to the scene. Consistent with the hypothesis, we found that performance depended on the deviation of the object velocity from the constraint segment, not on the difference between retinal velocities of the object and its surround. Our findings contrast with previous inconclusive results, that relied on an impoverished stimulus with a limited field of view.