Purchase this article with an account.
Fenil Doshi, Hrag Pailian, George A. Alvarez; Using Deep Convolutional Neural Networks to Examine the Role of Representational Similarity in Visual Working Memory. Journal of Vision 2020;20(11):149. doi: https://doi.org/10.1167/jov.20.11.149.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
To what extent does representational similarity between items affect our ability to store them in visual working memory? This question has been addressed for simple perceptual features, such as color (Lin & Luck, 2009), but has been difficult to address for higher-level shape representations, where the nature of the feature space is unknown. Here we approach this challenge by leveraging deep neural networks to generate stimuli designed to produce maximal responses in higher-level primate visual cortex. This approach enables us to generate stimuli that vary in their degree of similarity in high-level feature space.
To generate synthetic images that differentially drive V4 neural activity in a predictable fashion, we adapted the methods of Bashivan et al., (2019) who created a V4 encoding model by mapping AlexNet conv-3 activations onto macaque V4 activity. The complete differentiable nature of this image-computable model provides a means to extract gradients of artificial neurons with respect to image pixels (initialized as random noise), and modulate these pixels to maximize neural site predictions. Indeed, when presented to macaques, Bashivan showed that the resulting synthetic images increase activity of V4 relative to previously-best drivers of this area.
Using Bashivan’s encoding model, we generated a family of textures, estimated their predicted neural responses, and computed a representational dissimilarity matrix (difference in predicted neural response for all image pairs). We then had participants perform a change detection task, and measured whether it was easier to detect changes between images with higher predicted neural dissimilarity than images predicted to have more similar neural responses. Performance was consistent with the model predictions, such that change detection was more accurate for displays containing representationally distant vs. close images, t(7)=-3.80,p<.01. This neural-net guided approach may prove instrumental towards generating biologically plausible hypotheses of VWM architecture and its underlying constraints.
This PDF is available to Subscribers Only