Abstract
When multiple visual stimuli are presented simultaneously in receptive fields of neurons, their responses are surprisingly lower than when the identical stimuli are presented sequentially. Why does a simple sequence change result in a lower response? The neural computations underlying simultaneous suppression remain elusive. To answer this question, we used fMRI and computational modeling to test the extent to which linear spatial summation, compressive spatial summation (CSS), or compressive spatiotemporal summation (CST) within population receptive fields (pRFs) of visual areas predicts simultaneous suppression. Ten subjects participated in two sessions: (i) retinotopy to independently estimate spatial pRF parameters and (ii) simultaneous/sequential experiment in which colorful squares were presented either simultaneously or sequentially in the periphery. To separate nonlinearities in space and time, we also varied stimulus size and number of transients (onsets or offsets) for both types of sequences. We found that the amount of suppression increased along the visual hierarchy, starting already in V1. This could not be explained by linear summation within pRFs and the level of suppression did not vary systematically with pRF size. Across all visual areas, CST—but not CSS—pRFs best predicted individual voxel responses. The CST model not only predicted simultaneous suppression in pRFs overlapping multiple stimuli, but also the enhanced responses to shorter, transient stimuli, as well as the modest increase in response to larger stimuli. These results indicate that spatial integration alone is insufficient to explain simultaneous suppression and underscore the crucial role of time in visual processing. Ventral visual areas, thought to process complex static image properties, were especially time-driven, forcing us to rethink their role in spatiotemporal visual processing. Our computational framework provides a much-needed temporal extension for visual fMRI encoding models and a foundation for understanding spatiotemporal processing of dynamic visual stimuli in human visual cortex more broadly.