Abstract
Allocentric (landmark-centered) and egocentric (eye-centered) visual information are optimally integrated for goal-directed movements. This process has been observed within the supplementary and frontal eye fields, but the underlying processes for this combination remain a puzzle, mainly due to inadequacy of current theoretical models to explain data at different levels (i.e., behavior, single neuron, and distributed network). The purpose of this study was to create and validate a theoretical framework for this process using physiologically constrained inputs and outputs. To implement a general framework, we modelled the visual system as a Convolutional Neural Network (CNN) and the sensorimotor transformation as a Multilayer Perceptron (MLP). The network was trained on a task where a landmark shifted relative to the saccade target. These parameters were input to the CNN, the CNN output and initial gaze position to the MLP, and a decoder transformed MLP output into saccade vectors. The network was trained on both idealized and actual monkey gaze behavior. Decoded saccade output replicated idealized training sets with various allocentric weightings, and actual monkey data (Bharmauria et al. Cerebral Cortex 2020) where the landmark shift had a partial influence (R2 = 0.80). Furthermore, MLP output units accurately simulated motor response field shifts recorded from monkeys (including open-ended response fields that shifted partially with the landmark) during the same paradigm. These results suggest that our framework works and provides a suitable tool to study the underlying mechanisms of allocentric-egocentric integration and other complex visuomotor behaviors.