Abstract
Memorability, or the likelihood of a stimulus to be remembered, is an intrinsic stimulus property that is highly consistent across viewers; i.e., people tend to remember and forget the same images as one another. The consistency of this property has been established across a wide range of stimulus types, including faces (Bainbridge et al., 2013), scenes (Isola et al., 2014), words (Xie et al., 2020), and even dance moves (Ongchoco et al., 2022). However, stimulus memorability research until now has been entirely limited to the visual domain. Using stimuli from a large-scale voice database (Garofolo, 1993), we aimed to examine whether this consistency in what individuals remember and forget extends to auditory stimuli, and if so, whether similar factors influence memory for voices and images. Over 1000 Amazon Mechanical Turk workers participated in a continuous recognition task in which they heard a sequence of different speakers speaking the same sentence, and made a key press whenever they heard a repeated voice. We found that participants were indeed significantly consistent in the voices they remembered and forgot, allowing us to calculate an intrinsic memorability score for each voice. Next, we aimed to predict the memorability of voices from a mix of low-level (e.g., fundamental frequency, harmonic amplitude) and high-level properties (e.g., gender, dialect). Our model was significantly predictive of voice memorability, successfully cross-validating across independent sets of stimuli. A model containing only low-level features was able to explain more variance than similar visual models of image memorability (Kramer et al., 2022). In sum, our results reveal parallels between the memorability of visual and auditory stimuli and suggest subtle differences between the factors that make a voice or image memorable.