Abstract
The stability of visual perception, despite ongoing shifts of retinal images with every saccade, raises the question of how the brain overcomes temporally and spatially separated visual inputs to provide a unified, continuous representation of the world through time. The brain could solve this challenge by retaining, updating and integrating the visual feature information across saccades. However, at this time there is no one model that accounts for this process at the computational and/or algorithmic level. Previously, feedforward convolutional neural network (CNN) models, inspired by hierarchical structure and visual processing in the ventral stream, have shown promising performance in object recognition (Bengio 2013). Here, we present a recurrent CNN to model the spatiotemporal mechanism of feature integration across saccades. Our network includes 5 layers: an input layer that receives a sequence of gaze-centered images, a recurrent layer of neurons with V1-like receptive fields (feature memory) followed by a pooled layer of the feature maps which reduces the spatial dependency of the feature information (similar to higher levels in the ventral stream), a convolutional map layer which is fully connected to an output layer that performs a categorization task. The network is trained on a memory feature integration task for categorization of integrated feature information collected at different time points. Once trained, the model showed how the feature representations are retained in the feature memory layer during a memory period and integrated with the new entering features. The next step is to incorporate internal eye movement information (intended eye displacement, eye velocity and position) in the model to see the effect of intended eye movements on updating of the feature maps. Our preliminary results suggest that recurrent CNNs provide a promising model of human visual feature integration and may explain the spatiotemporal aspects of this phenomenon across both fixations and saccades.
Meeting abstract presented at VSS 2016