Abstract
Animacy may be a fundamental dimension in human object perception. We applied dimensionality reduction to deep layer activations in a convolutional neural network (CNN) trained for object classification to construct a two-dimensional visual object space. One of these two dimensions approximates an animate-inanimate distinction. Most inanimate objects map onto the inanimate part of object space, and most animate objects onto the animate part. However, some images are projected onto the “wrong” side of an animacy classification boundary. In these instances, there is an incongruency between object identity and its location in object space. In a series of studies, we explore what happens during these visuo-semantic clashes in visual object discrimination and classification. In studies 1 (N = 70), 2 (N = 43, preregistration: https://osf.io/g7kuy), and 3 (N = 511, preregistration: https://osf.io/tqgdk), we found that the degree to which distance in object space accounts for object discrimination speed depends on both real and predicted animacy, i.e., whether the object truly is animate and whether it looks animate according to a CNN. Furthermore, semantic relationships between to-be-discriminated objects greatly affect performance in cases of a visuo-semantic clash. In study 4 (N = 32, preregistration: https://osf.io/4sc5x), people classified images as animate or inanimate when presented very briefly (33 ms) or for longer (500 ms). In cases of a visuo-semantic clash, people were less accurate in their categorization specifically when presentation was short and masked, which may interrupt recurrent visual object processing. Overall, we suggest that studies 1-4 reflect the integration of top-down information with an initial feedforward sweep of visual information through the visual pathway. A visual dimension that closely, but not fully, maps onto the animate-inanimate distinction is a main visual dimension on which objects truly differ from one another, and CNNs and humans may extract this information directly from visual statistics.