Abstract
Inferotemporal cortex contains neurons that respond selectively to complex visual images such as faces, hands, and bodies. Such neural tuning is typically probed in highly controlled experiments where stimuli are presented in isolation. However, our typical visual experience is more complex. Here, we explored tuning properties of IT neurons under naturalistic conditions. We recorded from microelectrode arrays implanted in the middle and posterior face patches in two adult monkeys. Tuning was first characterized with a rapid serial visual presentation of 3,000 images. Consistent with prior studies, neurons in these face patches responded more strongly to human and monkey faces than to any other image category. We then probed tuning to large natural scene images. To identify scene components that modulated these neurons, we presented each image centered at different positions relative to each neuron’s receptive field, such that neural responses were probed to all parts of each scene across the experiment. Consistent with our initial results, we found that neurons responded most strongly when faces were centered within their receptive fields. However, neurons also typically responded to non-face images, such as cookies, that contained features typical to faces: dark round things in a tan background. Strikingly, neurons also responded when occluded faces were presented within their receptive fields. For example, neurons did not respond to a picture of Stuart Anstis’s hat held in his hand, but did respond when the same hat occluded his face. Together, our results demonstrate that face-selective neurons are sensitive not only to features common to faces but also to contextual cues. We propose that visual features are the building blocks for face-selective neurons, and context-dependent responses arise from learning statistical regularities in the environment. In support of this, visual features that co-occur with faces, such as bodies, appear to be critical for these contextual responses.