Abstract
We introduce a framework that uses Generative Adversarial Networks (GANs) to study cognitive image properties (e.g., memorability, aesthetics, valence). Often, we do not have concrete visual definitions of what these properties entail. Starting from input noise, GANs generate a manifold of natural-looking images with fine-grained differences in their visual attributes. By navigating this manifold, we can visualize what it looks like for a particular GAN-image to become more (less) memorable.
Specifically, we trained a Transformer module to learn along which direction to move a BigGAN-image’s corresponding noise vector in order to increase (or decrease) its memorability. Memorability was assessed by an off-the-shelf Assessor (MemNet). After training, we generated a test set of 1.5K “seed images”, each with four “clone images”: two modified to be more memorable (one and two “steps” forward along the learned direction) and two to be less memorable (one and two steps backward; examples in Supplemental). The assessed memorability significantly increased when stepping along the learned direction (β = 0.68, p < 0.001), suggesting training was successful. Through a behavioral repeat-detection memory experiment, we verified that our method’s manipulations indeed causally affect human memory performance (β = 1.92, p < 0.001).
The seeds and their clones (i.e., "visual definitions") surfaced candidate image properties (e.g., “object size”, “colorfulness”) that may underlie memorability and were previously overlooked. These candidates correlated with the learned memorability direction. We furthermore demonstrate that stepping along a learned “object size” direction indeed increases human memorability, though less strongly (β = 0.11, p < 0.001). This showcases how the individual, causal effects of a candidate can be studied further using the same framework.
Finally, we find that by substituting the Assessor, our framework can also provide visual definitions for aesthetics (β = 0.72, p < 0.001) and emotional valence (β = 0.44, p < 0.001).