Abstract
When one triangle moves closer to another triangle on a screen, we readily perceive it as one approaching the other. What we see in dynamic scenes transcends physical features like positions and velocity, extending into social realms. Here, we explore the rich representations people extract from viewing Heider-Simmel-type (HS) animations by modeling human similarity judgments. Observers viewed three HS animations, each including two moving shapes (depicting different social interactions, such as “hug”, “follow”, and “fight”). Similarity scores were derived from judgments in the odd-one-out task for 27 animations with different semantic labels. Human similarity scores revealed animation clusters, including an approaching-related cluster (e.g., hug, huddle) and a keeping-distance-away cluster (escape, avoid, throw). These results were replicated when the instructions explicitly emphasized social interactions. We found high consensus in similarity judgments between the two groups of participants (r=.75). We next derived similarity scores for animations from three types of models. Models based on low-level motion features (average velocity, speed, acceleration, and speed change) predicted human similarity judgments to some extent, with average speed performing the best (r=.30). Three deep learning models (social graphic neural network, LSTM, and bag-of-words), after training with hundreds of comparisons of HS animations, were weaker predictors (r=.21; r=.21, r=.24). Human similarity judgments were best predicted by a social force model (r=.35), which estimates latent forces capturing the interactions of two entities that repel at close distance, attract at moderate distance, and do not interact at far distance. Critically, the latent force representations remained significant contributors after controlling for average speeds (semipartial correlation r=.17). Thus, when viewing scenes consisting of moving shapes, people may see the dynamics using latent representations based on social forces. These findings also provide insights into how our social perception potentially developed, by scaffolding onto perceptual processes for intuitive physics.