Purchase this article with an account.
Timothy D Oleskiw, Eero P Simoncelli; A canonical computational model of cortical area V2. Journal of Vision 2019;19(10):14b. doi: https://doi.org/10.1167/19.10.14b.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
As visual information propagates along the ventral pathway, individual neurons respond selectively to stimulus features of increasing complexity. Neurons in primary visual cortex (V1) respond to oriented gratings, while many neurons of the second visual area (V2) respond to more complex patterns, such as the pseudo-periodic structure of visual texture (Freeman et al., 2013; Yu et al., 2015). Although V1 neurons are well explained by simple receptive field models localized in space, orientation, and scale, it is unknown how a neuron in V2, receiving input from a subset of V1 afferents, achieves selectivity for complex visual features common to natural scenes. Recently we have leveraged the statistics of natural images to learn a low-dimensional computational model of visual texture selectivity (Oleskiw & Simoncelli, SfN 2018). Our model, trained to detect naturalistic statistics across stimuli matched in local orientation energy, learns computational units resembling localized differences (i.e., derivatives) spanning the 4-dimensional space of V1 selectivity, namely horizontal and vertical position, orientation, and scale. In the present work, we use this observation to construct a population of V2-like units that captures statistics of natural images. Images are first encoded in the rectified responses of a population of V1-like units localized in position, orientation, and scale. We then explicitly compute derivatives of local spectral energy via linear combinations over the 4D space of V1 activity. A low-dimensional population of simple (rectified linear) and complex (squared) V2-like units is shown to accurately distinguish natural textures from spectrally matched noise, outperforming even the optimal Fisher linear discriminant trained over the full set of V1 afferents (91% vs. 86% accuracy, using only 0.2% of available dimensions). We conclude by demonstrating how this canonical and physiologically-plausible model of V2 computation selectively captures complex features of natural images from the local spectral energy conveyed by V1.
This PDF is available to Subscribers Only