Abstract
Neurons in visual cortex are sensitive to context. Neural responses to stimuli presented within their classical receptive fields (CRFs) are modulated by the presence of other stimuli – in their CRF and their surrounding extra-classical receptive field (eCRF). While these effects have been extensively studied since the inception of visual neuroscience, the circuit mechanisms that shape them and their role in visual perception are not yet understood. We start from the recurrent neural network by Serre et al. (VSS 2020) which can be optimized on visual tasks to implement hierarchical contextual interactions through horizontal (within a layer) and top-down connections (between layers). When optimized for object segmentation, the model rivals human performance and exhibits CRF and eCRF phenomena associated with primate vision, despite having no explicit constraints to do so. With this model, we ask (i) how CRF and eCRF phenomena relate to task performance and (ii) what circuit mechanisms are necessary for their emergence. First, in order to achieve human-level performance along with CRF and eCRF phenomena, a model must learn to implement divisive operations, like shunting inhibition, through its horizontal and top-down connections. Critically, models are only able to learn to do this when they are trained on images with intact semantic-level features. This means that task performance, CRF, and eCRF phenomena are a consequence of horizontal and top-down connections optimized for parsing natural scenes. We validated this finding by developing a procedure for generating contextual stimuli from optimized models. Models that exhibit CRF and eCRF effects can generate stimuli causing lightness- or tilt-illusions for humans. These effects are concomitant with spatially distinct near-excitatory vs. far-inhibitory eCRFs in the model, resembling the long-speculated spatial arrangement of primate visual cortex. Overall, our work demonstrates how recurrent circuitry and visual statistics combine to cause both human-level performance and neural biases.