Abstract
Motion perception depends on spatial integration of local motion signals, requiring a solution to the correspondence problem created by the ambiguity inherent in matching features between successive frames. We describe a computational theory of how the visual system solves the correspondence problem. We derived a Bayesian ideal observer for detecting coherent motion in random dot kinematograms, where a proportion of dots move coherently and the rest move randomly. We obtained Barlow and Tripathy's classic model as a good approximation to the Bayesian ideal within a certain intermediate range of dot densities. We confirmed previous findings that the ideal observer qualitatively predicts human performance change with increases in dot density, but that the human absolute level of performance is much worse than the ideal.
To account for this discrepancy, we propose that humans use generic, general purpose, models of motion. In particular, we impose a prior constraint favoring slow and smooth motion patterns (consistent with the statistics of motion in the natural world). We found that the slow-and-smooth model not only predicts the qualitative pattern of human performance, but also provides a quantitative fit to the absolute level of performance. Most remarkably, the slow-and-smooth model achieved above 70% accuracy in predicting human perception of random motion stimuli.
Our analysis shows the Bayesian framework allows derivation of the ideal observer for complex visual stimuli. Human performance on psychophysical tasks may be based on generic models with general prior assumptions, as exemplified by the slow-and-smooth model.