Abstract
How can we tell that a falling leaf is an object? That a ball hitting a basket was likely launched by an intelligent mind? The spontaneity of social perception hides a rich complexity of inferred agency: agents' intentions and plans; states of mind, and relationships; reasoning about physical forces and constraints (Heider and Simmel 1944). Many studies examined aspects of perceived agency, and proposed models of joint belief-desire inference. However, most are limited to simple displays. Here, we introduce a system for generating Heider-Simmel-like animations of social interactions in a physical world. The animations can be synthesized automatically, using a hierarchical planner and a physics engine, and via an online interface, where humans control geometric shapes to enact social interactions. The resulting animations depict agents and objects in a continuous physical world, with landmarks and obstacles. Agents have a limited field of view, and can interact in ways such as helping, fighting, chasing, cooperating, carrying, etc. Our system enables a procedural generation of hundreds of unique animations, which can be used for human studies, or for benchmarking machine perception. The system provides a record of trajectories of all entities, forces exerted by agents, as well as agents' goals, relationships, and strengths. Experimental evaluation shows that humans describe the depicted scenarios as a wide range of real-life social interactions, rate the simulated agent behaviors as highly human-like, and infer the agents' goals and relationships accurately. While human inferences of the agents' goals and relationships are predicted with a high accuracy by a Bayeisan inverse planning-based method, state-of-the-art DNN models fail to achieve similar results. In addition, we are also able to train a DNN to detect animacy using synthesized stimuli and probe what visual cues about animacy it can learn and whether they would match with well-known cues used by humans.