Abstract
Most models of local motion estimation in human and computer vision assume translational motion. For this class of models, the accuracy of velocity estimates depends on several factors including the sensor noise, magnitude of the stimulus velocity, extent to which the motion breaks the assumption of translation, ambiguity of the stimulus (e.g. the aperture problem), and spatial area (aperture) over which information is pooled. When the aperture size of a motion estimator increases, the signal to noise ratio improves and the probability of encountering an ambiguous motion stimulus decreases. On the other hand, the larger the aperture, the greater the likelihood of complex motions that break the assumption of translational motion. Therefore, we propose that there is an ideal aperture size at each velocity that will balance these two constraints. The size of this pooling area depends on the frequency and nature of the non-translational motions that occur in the world.
In order to measure how aperture size should vary with velocity, we have developed a simple world model based on the statistics of natural scenes and have rendered movies from this world using a ray tracer. For locomotion through this world we calculate the error in local velocity estimates as a function of aperture size. For this particular world and measurement task, we find, as predicted, that the ideal aperture size increases monotonically with velocity magnitude. This principle may be used by biological visual systems and may be a useful addition to machine vision systems.
Supported by NIH grant EY11247.