Abstract
Making sense of the hierarchical arrangement of form and motion is central to visual scene perception. For example, while driving, other vehicles’ locations must be anticipated from the traffic flow even if they are temporarily occluded. Despite its ubiquity in everyday reasoning, surprisingly little is known about how exactly humans and animals employ motion structure knowledge when perceiving dynamic scenes. To investigate this question, we propose a formal framework for characterizing structured motion and generating structured motion-stimuli, which supports a wide range of hierarchically arranged real-world motion relations among stimulus features. A key benefit is that the joint distribution of generated stimulus trajectories is analytically tractable, which allowed us to compare human performance to ideal observers. To do so, we first introduced structured motion in the well-established multiple object tracking task. We found that humans performed better in conditions with structured than independent object motion, indicating that they benefitted from structured motion. A Bayesian observer model furthermore revealed that the observed performance gain is not due to the stimulus itself becoming simpler, but due to active use of motion structure knowledge during inference. A second experiment, in which trajectories of occluded objects had to be predicted from the remaining visible objects, provided a fine-grained insight into which exact structure human predictions relied on in the face of uncertainty: Bayesian model comparison suggests that humans employed the correct or close-to-correct motion structure, even for deep motion hierarchies. Overall, we demonstrated – to our knowledge – for the first time that humans can make use of hierarchical motion structure when perceiving dynamic scenes, and flexibly employ close-to-optimal motion priors. Our proposed formal framework is compatible with existing neural network models of visual tracking, and can thus facilitate theory-driven designs of electrophysiology experiments on motion representation along the visual pathway.