Abstract
Humans and monkeys can effortlessly recognize objects in natural scenes. This ability relies on neural computations in the ventral stream of visual cortex, which culminates in temporal visual areas (IT), where neurons respond selectively to natural scenes and objects. The intermediate computations in V2 and V4 that lead to IT object selectivity are not well understood, but previous studies implicate V4 as an early site of selectivity for object shape. To explore the mechanisms of this selectivity, we generated “scrambled” Portilla-Simoncelli textures from natural images using techniques that preserve the local statistics of the original image while discarding structural information about scene and shape. To create a continuum of images (an “image family”) that smoothly varies between fully scrambled textures and natural images, we varied the size of scrambling regions from small localized regions to the whole image. For all sizes, these scrambling regions seamlessly covered the whole image, with modest overlap. We measured the responses of single units in awake macaque V4 to these images. Responses in V4 vary widely across different cells and different sets of images. On average, V4 neurons were slightly more active in response to natural images than their scrambled counterparts. However, the subset of cells that respond most strongly to each image family showed both 1) a much stronger difference between natural and scrambled images, and 2) a graded level of modulation for images of intermediate pooling sizes. Similarly, the subset of most effective images for each cell also evoked stronger, more graded levels of modulation. The population preference for natural images emerges slowly, roughly 80 ms after the onset of neuronal activity. Our results suggest that for a population of cells and for favored images, object-selective responses emerge dynamically in V4.