Abstract
We have a remarkable ability to remember the images that we have seen. At the same time, we also remember some images better than others. Image memorability could arise from multiple sources, including di_erences in how images are encoded in high-level brain areas such as inferotemporal cortex (IT), where the identities of objects and scenes are thought to be reflected as patterns of spikes across the population. We hypothesized that image memorability arises from a complementary coding scheme in IT, as a consequence of the total number of spikes evoked by an image. To test this hypothesis, we recorded IT neural responses as two rhesus monkeys performed a visual memory task in which they reported whether images were novel or familiar. The memorability of each image was determined by a deep convolutional neural network designed to predict image memorability for humans. We found that the images that were predicted to be the most memorable for humans were also the most memorable for the monkeys. Additionally, the correlation between image memorability scores and IT population response magnitudes was strong (r = 0.68; p=10–15), consistent with our hypothesis. Finally, to probe the degree to which this variation follows naturally from a system designed to identify (as opposed to remember) image content, we probed convolutional neural network (CNN) models trained to categorize objects but not explicitly trained to predict memorability. We found that correlations between image memorability and population response magnitudes emerged at the higher stages of these networks, where visual representations are most analogous to IT representations. Together, these results suggest that image memorability is directly related to variation in the magnitude of the IT population response, and that this variation is a natural consequence of visual systems designed to identify objects.