Abstract
The ability to accurately retrieve visual details of past events is a fundamental cognitive function relevant for daily life. While a visual stimulus contains an abundance of information, only some of it is later encoded into long term memory. However, an ongoing challenge has been to objectively define and isolate what representations of past visual experiences are maintained in memory over time. To address this question, we leveraged the hierarchal structure of convolutional neural networks (CNNs), and its correspondence to human visual processing. Participants were recruited from the Amazon Mechanical Turk platform and performed the task online. They first encoded a set of images and were then tested via a two alternative forced choice recognition memory test at different encoding-retention time intervals (immediate, 24 hours, 7 days). Importantly, to objectively isolate different levels of visual processing, distractors were selected based on their similarity to the target along specific network layers of the CNN (VGG-16) hierarchy. Accordingly, distractors were assigned based on sharing high similarity in early, intermediate, or late network layers with the target. Preliminary results suggest that high-level representations (corresponding to late network layers) are better retained over time, while lower-level representations (corresponding to early network layers) decay faster. This experimental approach and consequent findings provide novel insights into the dynamics of different levels of visual memory representations over time.