Purchase this article with an account.
Chen-Ping Yu, Talia Konkle; Map-CNN: A Convolutional Neural Network with Map-like Organizations. Journal of Vision 2017;17(10):809. doi: https://doi.org/10.1167/17.10.809.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
Deep convolutional neural networks (CNNs) are currently the best computational models of visual processing. A core operation of these models is convolution: each artificial neuron of a CNN performs a sweep through the entire input image to produce a response profile. In contrast, neurons in the visual cortex have receptive fields, which are tuned to particular features at particular locations, though a common assumption is that a small set of features are replicated in hypercolumns uniformly across all positions in the retinotopic map. Here we examined this assumption using a computational model with map-like early layers. We constructed a map-CNN in which the artificial neurons in the map layer have a spatial organization and receptive field scaling similar to human V1. First, retinotopy was implemented with local convolutions of unshared weights, with neurons organized in a grid-like layout. Second, a retina-like transformation to the input image was applied, such that images are compressed with increasing distance from the center. The combination of these designs naturally captures both cortical magnification of the fovea and the receptive field size scaling with eccentricity. Finally, the network was trained on 1000-way object classification using the ImageNet dataset. We found that the features learned at each position of the visual field were not uniform, violating the convolutional assumption about the features represented across the visual field. Explorations of these tunings show that foveal map units (< 5°) had more gaussian-blob tuning than peripheral map units, and that while edge filters were learned uniformly across the visual field, the orientations of those edge features exhibited substantial positional biases. These results demonstrate that features learned from natural image statistics in order to perform successful object recognition are naturally heterogeneous across the visual field, and make testable predictions for the spatial distribution of feature tuning in retinotopic areas.
Meeting abstract presented at VSS 2017
This PDF is available to Subscribers Only