Abstract
Brain-encoding models can be trained to learn the correspondence between visual stimuli and the brain’s response to those stimuli. To learn meaningful visual features, input images are commonly aligned with the participant’s fovea, which complicates the use of training data acquired with free-viewing paradigms. Here, we tested whether an end-to-end brain encoder could be trained on movie-viewing fMRI data without requiring gaze fixation or image recentering. We trained a Neural Information Flow (NIF) model to predict responses in brain areas V1, V2, V3, hV4, V3a and V3b using data from a subject who watched 3 seasons of the sitcom Friends, from the Courtois-Neuromod project. With video stimuli as input, NIF couples brain areas with tensors that encode spatiotemporal features represented in its activity. As no eye-tracking data were acquired during viewing time, gaze position over movie frames was estimated with DeepGaze MR. Input images were either recentered around this estimated gaze position, or not recentered. Preliminary results indicate that NIF acquired biologically plausible features predictive of visual cortical activity without gaze fixation or input image realignment. The variance explained for predicted voxels was similarly distributed independent of whether gaze alignment was applied, indicating no improvement in performance for models trained on recentered movie frames. Eye-tracking data acquired post-hoc for individual Friends episodes revealed similar gaze prediction performance for DeepGaze MR relative to the assumption of central fixation. Moreover, 67% of the measured eye positions were in the central 3.5 degrees of visual angle. These results suggest that recentering may not be required, and that brain encoding models may learn visual representations from free-viewing data even without correcting for eye movements, albeit possibly with lower predictive performance than with eye-tracking. Our findings have implications for future work training models on free-viewing data without eye-tracking.