Abstract
The act of finding a specific visual object begins with our brains forming a representation of the identities of the objects in view, “invariant” to identity-preserving transformations such as changes in the objects' position, size, and background context. Next, our brain must combine this visual representation with a working memory representation of target identity to determine whether the currently-viewed scene contains the target object. Both of these computations are thought to be implemented in inferotemporal cortex (IT), however, the means by which they interact is not well-understood. To assess this, we trained subjects to perform an invariant delayed-match-to-sample object search task in which objects appeared under a variety of identity-preserving transformations and the same objects appeared as targets and as distractors in different blocks of trials. While subjects performed this task, we recorded the simultaneous activity of small populations of neurons in IT. We found that population performance, assessed by a linear read-out of object identity, was significantly higher when objects were targets as compared to when the same objects were distractors. Further, we found that increased population performance could simply be attributed to higher firing rate responses to targets and did not depend on other factors such as changes in trial-by-trial variability or changes in the structure of noise correlations. These results extend previous descriptions of attentional modulation in IT, and reveal that visual and working memory signals are combined via a mechanism that enhances invariant object representations of targets relative to distractors.