Purchase this article with an account.
Dean Pospisil, Wyeth Bair; Comparing response properties of V1 neurons to those of units in the early layers of a convolutional neural net. Journal of Vision 2017;17(10):804. doi: https://doi.org/10.1167/17.10.804.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
Deep convolutional neural networks (CNNs) trained for object recognition contain units in their later layers that reflect an encoding somewhat similar to that in cortical areas V4 and IT. If it is also true that the earlier stages in these CNNs reflect response properties commonly observed in V1, then CNNs could offer a compelling image-computable model for understanding computations in the ventral stream. To test this, we measured the responses of the CNN known as AlexNet (Krizhevskyet al., 2012) to sinusoidal grating stimuli like those used extensively to characterize V1. We evaluated tuning for orientation, spatial frequency (SF), color, F1/F0 ratio and cross-orientation suppression. In a complementary approach, we directly analyzed the weights (rather than responses) of AlexNet for these properties. We found that the early layers contain an even coverage of orientation and SF, with bandwidths similar to those in V1, and with the second layer containing only complex cells. The 1st layer primarily consists of a group of luminance filters and a smaller group of chromatic filters, with the former tending to prefer higher spatial frequencies. Consistent with cross-orientation suppression, the 2nd layer weights have a sinusoidal relationship with the preferred orientation of their 1st layer inputs. We also tested an untrained network, and found that nearly all of these V1-like properties were absent. We conclude that a CNN can approximate several V1 response properties, and that optimizing for object recognition is sufficient to achieve these properties. Our results support the use of CNNs as a tool to understand how sophisticated cortical representations, for which we do not yet have good biologically plausible image-computable models, may arise from early cortical representations, for which a richer set of models already exists.
Meeting abstract presented at VSS 2017
This PDF is available to Subscribers Only