Abstract
Familiar concepts can be described by their visual and semantic features. These types of information are hard to dissociate in mental representations. In a recent study we used visual and language DNNs to disentangle and quantify the unique contributions of visual and semantic information in human mental representations of familiar stimuli. We revealed a larger contribution of visual than semantic information during stimuli presentation in perception but a reversed pattern when recalled from memory based on their names. Here we adopt the same methodology to ask how long after stimulus offset does visual dominance shifts to semantic dominance. The duration for which visual information is retained following stimulus offset has been debated. To that end, across two studies, we manipulated the delay between stimulus offset and its recall from memory. In Study 1, participants rated the visual similarity of pairs of familiar faces in simultaneous presentation and in a sequential presentation with 2sec, 5sec or 10sec delays. We extracted representations of faces from a face-trained DNN, and of their Wikipedia description from a language model. In Study 2, we used data collected by Bainbridge et al (2019), in which participants were presented with an image of a scene and were asked to copy it while looking at the scene, 1 second or 10 mins after it was removed; or draw the scene based on its name with no prior exposure. We extracted representations of drawings from an object-trained DNN fine-tuned for drawings, and of their Wikipedia description from a language model. Both experiments revealed visual dominance after stimulus offset across all delays, and semantic dominance when retrieved from memory based on names. We conclude that visual information is dominant even 10 minutes after visual stimulus offset. Semantic information dominates the representation when a stimulus is recalled based on verbal information