Abstract
[Introduction] People typically feel uneasy when observing robots and computer graphic characters that resemble humans but are not perfectly human-like, an effect known as the “uncanny valley.” Several empirical studies examining affective responses to morphed images between human and non-human categories suggested that visual cues from two different categories elicit conflicting inferences about the entity, leading to feelings of eeriness. However, the detailed relationship between visual representations and emotional responses remains unclear. Artificial neural networks (ANNs), which can predict the relevant text description given an image, are promising models for providing insight into the processes underlying human cognition by exploring vast instances of human affective responses to visual concepts. In this study, we investigated how an ANN evaluates the matching of the morphed images to affective words used to describe uncanny valley effects in previous studies. [Methods] We created stimulus images by morphing between human faces and non-human objects at five morph levels and assessed the score of the images’ matching to words using CLIP (Contrastive Language–Image Pre-training), a state-of-the-art ANN that estimates semantic matching between an image and a caption. Ho and MacDorman proposed the indices of humanness, eeriness, and attractiveness using a semantic differential scale for evaluating the affective responses of human observers in studies of the uncanny valley. We calculated CLIP scores for the adjectives comprising the three indices and examined how these indices changed across morph levels. [Results and Conclusions] The eeriness index was highest at the midpoint of the morph continuum, where visual cue conflicts were maximal. This result indicates that CLIP associates visual cue conflicts in images with eerie impressions through training on an enormous amount of data covering our daily visual experiences. The current study explored how visual representations are related to human observers’ sentiment using ANN.