Abstract
The visual system integrates sensory input across space and time. To investigate how spatial and temporal information are processed in the human visual system, we developed and tested a spatiotemporal population receptive field (pRF) model. The spatiotemporal pRF models spatial responses with 2D Gaussians (Dumoulin & Wandell, 2008) and temporal responses with nonlinear neural impulse functions in millisecond resolution (Stigliani et al., 2019; Zhou et al., 2019). First, we developed a software that synthesizes fMRI responses given a spatiotemporal pRF, a spatio-temporal visual stimulus, and a noise level. Additionally, given an fMRI time series and stimulus, the software solves the spatiotemporal pRF parameters. This allowed us to compare model solutions to ground truth. Our model recovered spatiotemporal pRF parameters from noiseless synthetic fMRI time series with 99% accuracy. Second, we evaluated how well different spatiotemporal pRF models (conventional spatial pRF, spatiotemporal pRF with compressive temporal summation, and 2-channel spatiotemporal pRF) predict empirical fMRI data. In the experiment, ten observers viewed, while fixating, a bar containing colored cartoon stimuli presented at varying temporal rates (33ms to 5s duration, 33ms to 200ms interstimulus intervals) that swept the visual field (12° radius, 4 directions, 9 steps). Across ventral and lateral visual streams (V1, V2, V3, hV4, LO1, LO2, and TO1), the spatiotemporal pRF model showed progressively higher accuracy for predicting single voxel responses than the spatial pRF model. This indicates that the incorporation of temporal nonlinearities to the spatial pRF is especially important for higher visual areas. Additionally, estimated pRF sizes were larger for the spatial than spatiotemporal models, suggesting that the standard spatial pRF method may overestimate pRF size to account for extended temporal responses. Together, we provide a new framework and computational model to synthesize and predict responses in the visual system to any dynamic visual stimulus.