Abstract
Past studies found that the perception of 3D structure from moving images only involves the analysis of the first-order optic flow. Accordingly, current structure-from-motion models map the properties of the instantaneous velocity field into 3D properties like orientation, depth, and motion. The present study shows that this purely instantaneous analysis of the optic flow cannot completely account for human 3D perception. Instead, we propose a model that combines the incoming information from the instantaneous velocity field with a preexisting representation of the 3D orientation of the derived surface at each instant of time. To test this model, we conducted three experiments in which observers compared the slant of two rotating surfaces specified by the motion of random-dots, and judged which surface was more rotated in depth. These surfaces had different past histories (Experiment 1), oscillated at different frequencies (Experiment 2), or had velocity fields that varied over time, e.g., decreasing over time (Experiment 3). The results show that this model is a better predictor of human performance than an instantaneous model, as revealed through simulations. This suggests that temporal integration is involved in the derivation of 3D structure from the dynamical properties of retinal images.
Supported by National Science Foundation grant 78441.