Abstract
As visual information propagates along the ventral pathway, individual neurons respond selectively to stimulus features of increasing complexity. Neurons in primary visual cortex (V1) respond to oriented gratings, while many neurons of the second visual area (V2) respond to more complex patterns, such as the pseudo-periodic structure of visual texture. Although V1 responses are well explained by receptive field models localized in space, orientation, and scale, it is unknown how a neuron in V2, receiving input from a subset of V1 afferents, achieves selectivity for the complex visual features common to natural scenes.
Recently, we have shown that by computing differences between populations of V1-like units, V2-like units can become sensitive to higher-order statistics of natural texture, beyond the oriented energy (spectral) features relayed by V1.
Here we test our theoretical predictions against single-unit recordings in areas V1 and V2 in an awake and fixating macaque. After characterizing a unit’s receptive field using standard methods, we presented novel stimuli generated by superimposing patches of oriented gratings at multiple positions and scales. We fit a two-stage convolutional linear-nonlinear model to these responses: Stimuli are initially processed with a convolutional bank of V1-like filters selective for position, orientation, and scale. We then jointly optimize a linear-nonlinear combination of these rectified units to explain observed neural data. The model often explains a substantial fraction of response variance, comparable or superior to existing models. Qualitative differences emerge between models trained on V1 and V2 data. As expected, V1 models respond to a narrow range of position, orientation, and scale, whereas V2 cells often exhibit sensitivity to differences over stimulus position, scale, and orientation energy. By explicitly combining V1-like afferent activity, our two-stage model can explain V2’s selectivity for higher-order stimulus features.