October 2020
Volume 20, Issue 11
Open Access
Vision Sciences Society Annual Meeting Abstract  |   October 2020
Comparison of generic convolutional networks versus biologically inspired networks as models of V4 neurons
Author Affiliations & Notes
  • Michele Winter
    University of California, Berkeley
  • Michael Eickenberg
    University of California, Berkeley
  • Michael Oliver
    Allen Institute for Brain Science
  • Jack L. Gallant
    University of California, Berkeley
  • Footnotes
    Acknowledgements  This work was funded by the following sources: NIH NEI R01 EY012241-05, NIH NEI R01 EY019684-01A1, ONR 60744755-114407-UCB, ONR N00014-15-1-2861, NIH National Eye Institute Training Grant T32EY007043
Journal of Vision October 2020, Vol.20, 461. doi:https://doi.org/10.1167/jov.20.11.461
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Michele Winter, Michael Eickenberg, Michael Oliver, Jack L. Gallant; Comparison of generic convolutional networks versus biologically inspired networks as models of V4 neurons. Journal of Vision 2020;20(11):461. doi: https://doi.org/10.1167/jov.20.11.461.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

Area V4 is an intermediate visual area for which no accurate computational models exist. Previous failed attempts to produce good V4 models were limited by the available neurophysiology data. Therefore, we performed long-term Utah array recordings in area V4 of two fixating macaques while presenting videos simulating saccadic eye movements over natural moving scenes. The resulting data set consists of over 400 neurons, each stimulated with up to several million video frames. We used two different types of neural network models to predict V4 neuron responses to these video sequences: a baseline model adapted from convolutional networks typically used in computer vision, and a deep convolutional energy (DCE) model that incorporates prior knowledge about the structure of the primate visual system. To construct the baseline model each stimulus video frame was first passed through the standard VGG-16 pre-trained image classification network (Simonyan & Zisserman, 2014), and then consequent activations were regressed against neuronal responses at multiple temporal delays. The baseline model provides a lower bound on prediction accuracy that should be beaten by more realistic models. To construct the DCE model we architected a deep network consisting of several biologically-inspired components arranged in sequence: a log-polar transform; quadrature-pair filters; pooling into a second stage of quadrature-pair filters; and a final pooling stage. The neurophysiology data were then used to train this network. Training influenced the structure and distribution of quadrature pair filters and the pooling between layers of the network. Our results show that both the baseline and DCE models can accurately predict V4 neuronal responses even at the video frame rate (16.6 ms), the DCE model usually predicts responses significantly more accurately than does the baseline model. However, the baseline model performs surprisingly well given that the source features were extracted by a network trained independently from our experiment.


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.