Abstract
The dynamics of inferior temporal (IT) object population representations is not well understood. Here we analyze single-unit recordings from monkey IT to investigate the emergence of representational distinctions and the degree to which they can be explained by computational features of different complexities (GIST and the layers of a deep neural network). Single-unit activity was recorded from 989 neurons in the inferior bank of the superior temporal sulcus of two adult macaques passively viewing 100 grayscale object images from five categories (faces, body parts, fruit, objects, indoor scenes) presented for 300 ms at fixation at 5 degrees visual angle (Bell et al., 2011). We analyzed activity patterns across all visually responsive neurons with dynamic representational similarity analysis. Both within- and between-category distinctions arise early, with the earliest distinctions evident 50-80 ms after stimulus onset. Between-category distinctions tend to peak earlier (130-160 ms after stimulus onset) than within-category distinctions (about 10-50 ms later), possibly reflecting the fact that between-category pairs of exemplars are also visually more distinct. When removing the effect of low-level visual similarity by regressing out the pattern of distinctions predicted by the GIST model, the within-category distinctions are entirely explained for most of the categories. However, between-category distinctions remain substantial and significant at larger latencies after stimulus onset. We also used the layers of a deep neural net (Krizhevsky et al. 2012) to explain the representational dissimilarities as a function of time with a linear model containing one dissimilarity predictor for each layer. Results revealed that the high-level semantic layer 7 explained distinctions arising late (around 130-200 ms), while lower layers explained both early and late distinctions. Taken together, these results suggest that the IT code reflects both visual similarity and category distinctions and that the latter require some recurrent processing to emerge.
Meeting abstract presented at VSS 2016