Abstract
Accurately estimating the motion of objects and the self through the environment requires accurately estimating the speed of local retinal image motion. Here, we consider the task of estimating the speed of retinal image motion created by natural image movies. This is a difficult task because of the enormous complexity and variability of natural images. We developed an ideal observer model of speed estimation with movies derived from natural images. Our analysis determines the set of linear space-time receptive fields that encode the retinal image information most useful for speed estimation. Our analysis also determines the rules for optimally combining these linear-receptive-field responses to obtain a population of speed selective units with arbitrary speed preferences. Standard population decoding of these speed-selective units yields precise, unbiased speed estimates. Next, we pitted ideal and human observers against each other in matched speed discrimination tasks. Both types of observers were shown a large random set of natural image movies, where the observers never saw the same movie twice. Human performance parallels ideal performance, although ideal performance is somewhat more precise. More remarkably, the addition of a single free parameter, which captures computational inefficiency, yielded a close quantitative match between human and ideal performance (accounting for 95% of the variance). Our analysis and results suggest that many properties of speed selective neurons in V1 and MT, as well as many details of human speed estimation performance, are predicted from a task-specific analysis of natural signals.