Abstract
We have recently shown that human visual speed perception is based on the combination of sensory information across independent spatiotemporal frequency channels with a prior expectation for slow speeds. Here, we present experimental results that allow us to determine the exact form of information integration across the channels. We tested subjects in a 2AFC speed discrimination task using stimuli moving at 3 degrees/second, selectively targeting four spatiotemporal frequency channels. Stimuli were either gratings with narrow-band motion energies (centered at 0.5, 1, 2 or 4 cycles/degree) activating single channels, or multiple-band gratings that activated two, three, or four channels simultaneously. Importantly, we calibrated the individual motion energy levels of each narrow-band stimulus such that they were matched in their response noise. First, we estimated the noise characteristics of all four channels and the local slope of the speed prior from stimulus conditions that targeted single channels. We then used these parameters to predict the full psychometric curves for the other stimulus conditions that targeted multiple channels. We found that a Bayesian model with optimal cue combination successfully matched both, a gradual decrease in perceptual thresholds and biases with an increasing number of activated channels. The model clearly outperformed two alternative Bayesian models that assumed that subjects either only considered the most reliable channel or generated an average percept across all channels. Our results provide significant evidence that the human visual system combines speed cues in a statistically optimal way based on cue reliability and prior expectations.