Purchase this article with an account.
Michele Winter, Michael Eickenberg, Michael Oliver, Jack L. Gallant; Comparison of generic convolutional networks versus biologically inspired networks as models of V4 neurons. Journal of Vision 2020;20(11):461. doi: https://doi.org/10.1167/jov.20.11.461.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
Area V4 is an intermediate visual area for which no accurate computational models exist. Previous failed attempts to produce good V4 models were limited by the available neurophysiology data. Therefore, we performed long-term Utah array recordings in area V4 of two fixating macaques while presenting videos simulating saccadic eye movements over natural moving scenes. The resulting data set consists of over 400 neurons, each stimulated with up to several million video frames. We used two different types of neural network models to predict V4 neuron responses to these video sequences: a baseline model adapted from convolutional networks typically used in computer vision, and a deep convolutional energy (DCE) model that incorporates prior knowledge about the structure of the primate visual system. To construct the baseline model each stimulus video frame was first passed through the standard VGG-16 pre-trained image classification network (Simonyan & Zisserman, 2014), and then consequent activations were regressed against neuronal responses at multiple temporal delays. The baseline model provides a lower bound on prediction accuracy that should be beaten by more realistic models. To construct the DCE model we architected a deep network consisting of several biologically-inspired components arranged in sequence: a log-polar transform; quadrature-pair filters; pooling into a second stage of quadrature-pair filters; and a final pooling stage. The neurophysiology data were then used to train this network. Training influenced the structure and distribution of quadrature pair filters and the pooling between layers of the network. Our results show that both the baseline and DCE models can accurately predict V4 neuronal responses even at the video frame rate (16.6 ms), the DCE model usually predicts responses significantly more accurately than does the baseline model. However, the baseline model performs surprisingly well given that the source features were extracted by a network trained independently from our experiment.
This PDF is available to Subscribers Only