Abstract
Because of their small receptive fields, neurons in area V1 can convey only limited information about the velocity of a moving object. This “aperture problem” is thought to be resolved in the middle temporal area (MT), where receptive fields cover much larger regions of visual space. However, the accuracy of MT responses depends critically on both the spatial structure of the stimulus and the temporal interval over which the response is measured. When probed with plaid stimuli, many MT neurons fail to integrate the motion signals properly, whereas with stimuli consisting of tilted bars there is little error in the response of the vast majority of MT neurons. In all cases the correct motion direction is signalled after a delay of roughly 60 ms.
We simulated a model in which MT neurons simply integrate the output of V1 neurons, which inhibit each other in a manner similar to that proposed by models that use divisive normalization. We found that the spatial structure of the inhibitory pool was critical for obtaining accurate results. Specifically, when the inhibition was aligned with the preferred orientation of a given V1 cell, the cell became powerfully endstopped: It was responsive to line-endings or short bars and its receptive field size depended on stimulus contrast. This assumption proved to be sufficient to capture the accurate motion computation observed for tilted bars in MT, as well as the temporal dynamics of the responses. By varying the directional bandwidth of the MT integration stage, we were able to capture the range of pattern motion selectivity seen in MT with plaid stimuli (as in Rust et al., 2006). This suggests that a simple model can capture the majority of results on motion integration in MT, provided that the inhibition at the V1 stage contains appropriate spatial structure.