Abstract
Various work has suggested that the memorability of an image is consistent across people, and thus can be treated as an intrinsic property of an image. Using computer vision models, we can make specific predictions about what people will remember or forget. While older work used a now-outdated deep learning architecture to predict image memorability, innovations in the field have given us new techniques to apply to this problem. Here, we propose and evaluate five alternative deep learning models to MemNet which exploit developments in the field from the last five years, largely the introduction of residual neural networks. We also evaluate the pre-existing implementation of MemNet on a broader set of images. Five new models with architectural differences were implemented and tested on a mixture of MemNet’s original training set, LaMem, and a recent dataset, MemCat. LaMem is a large database of objects and scenes, many of which are designed to have high memorability. MemCat complements this, with a large number of exemplars in object categories. The new models all utilize residual neural networks, which are intended to mimic the structure of pyramidal cells with skip connections, in their feature extraction stages, allowing the model to use semantic information in the memorability estimation process. The most complex model also utilizes semantic segmentation, which ascribes a semantic category to each pixel. Our findings suggest that the original paper overstated MemNet’s generalizability and MemNet likely was overfitting on LaMem. Our new models outperform MemNet, all achieving similar scores to one another, but when allowed to retrain the semantic segmentation based model outperforms the rest. This information leads us to conclude that Residual Networks outperform simpler convolutional neural networks in memorability regression, which will in turn improve memory researchers’ ability to make predictions about memorability on a wider range of images.