Abstract
In less than the blink of an eye, the human brain processes visual sensory input, interprets the visual scene, identifies faces, and recognizes objects. Decades of neurophysiological studies have demonstrated that the brain accomplishes these complicated tasks through a dense network of feedforward and feedback neural processes in the ventral visual cortex. So far, these visual processes are primarily modeled with feedforward hierarchical neural networks, and the computational role of feedback processes is poorly understood. In this study, we developed a generative autoencoder neural network model and adversarially trained it on a large categorically diverse data set of images (Objects, scenes, faces, and animates). We hypothesized that the feedback processes in the ventral visual pathway can be represented by reconstruction of the visual information performed by the generative model. To test the hypothesis, we compared representational similarity of the activity patterns in the internal layers of the proposed model with magnetoencephalography (MEG) and functional magnetic resonance imaging (fMRI) data acquired while participants (N=15) viewed a set of 156 images organized in four categories of objects, scenes, faces, and animates. Our proposed model identified two segregated neural dynamics in the ventral visual pathway. The representational comparison with MEG data revealed a temporal hierarchy of processes transforming low level visual information into high level semantics in the feedforward sweep, and a temporally subsequent dynamics of inverse processes reconstructing low level visual information from a high level latent representation in the feedback sweep. Further, representational comparison of model encoder and decoder layers with two fMRI regions of interests, namely early visual cortex (EVC) and inferior temporal area (IT), revealed a growing categorical representation (similar to IT) along the encoder layers (feedforward sweep) and a progression in detail visual representations (akin to EVC) along the decoder layers (feedback sweep).