Abstract
Human vision supports social perception by efficiently detecting agents and extracting rich information about their actions, goals, and intentions. Here we explore the cognitive architecture of perceiving intention by constructing Bayesian models that integrate domain-specific hypotheses of social agency with domain-general cognitive constraints on sensory, memory, and attentional processing. Our model posits that the perception of intention combines a bottom-up, feature-based, parallel search for goal-directed movements with a top-down selection process for intent inference. The interaction of these architecturally distinct processes makes the perception of intentionality fast, flexible, and yet cognitively efficient. In the context of chasing, in which a predator (the “wolf”) pursues a prey (the “sheep”), our model addresses the computational challenge of identifying target agents among varying numbers of distractor objects, despite a quadratic increase in the number of possible interactions as more objects appear in a scene. By comparing modeling results with human psychophysics in several studies, we show that the effectiveness and efficiency of human perception of intention can be explained by a Bayesian ideal observer model with realistic cognitive constraints. These results provide an understanding of perceived animacy and intention at the algorithmic level—how it is achieved by cognitive mechanisms such as attention and working memory, and how it can be integrated with higher-level reasoning about social agency.