Abstract
Any visual system – biological or artificial – has an inherent trade-off between the number of units used to represent the visual environment and the spatial resolution of the sampling array. The human visual system is able to locally allocate attention to reconfigure its sampling array (cortical receptive fields (RFs)), thereby enhancing spatial resolution at attended locations in the visual field without changing the overall number of sampling units.
Here, we examine how features in a convolutional neural network interact and interfere with each other in an eccentricity-dependent RF pooling array and how these interactions are influenced by dynamic changes in spatial resolution across the array. We study feature interactions within the framework of visual crowding, a well-characterized perceptual phenomenon in which target objects in the visual periphery that are easily identified in isolation are much more difficult to identify when flanked by similar nearby objects.
Our model replicates basic properties of human visual crowding, including anisotropies based on inner/outer and radial/tangential spatial configurations of targets and flankers. Moreover, by separately simulating effects of spatial attention on RF size and density of the pooling array, we demonstrate that increased density is more beneficial than size changes for enhancing target classification in crowded stimuli. Finally, we separately compare effects of attention and target-flanker spacing on visual crowding and find that enhanced redundancy of feature representation (due to increased density of RFs at the target location with attention) has more influence on target classification than enhanced fidelity of the feature representations themselves (due to increased target-flanker spacing).
These results provide 1) insights into the use of dynamic RF pooling arrays in artificial neural networks and 2) testable hypotheses for future perceptual and physiological studies of visual crowding.