Abstract
INTRODUCTION: Humans are highly skilled at interpreting intent or social behavior from strongly impoverished stimuli (Heider & Simmel, 1944). It has been hypothesized that this visual function is based on high-level cognitive processes, such as probabilistic reasoning. We demonstrate that several classical observations on animacy and interaction perception can be accounted for by simple and physiologically plausible neural mechanisms, using an appropriately extended hierarchical (deep) model of the visual pathway. METHODS: Building on classical biologically-inspired models for object and action perception (Riesenhuber & Poggio, 1999; Giese & Poggio, 2003), by a front-end that exploits deep learning (VGG-16) for the construction of low and mid-level feature detectors, we propose a learning-based hierarchical neural network model that analyzes shape and motion features from video sequences. The model consists of streams for form and object motion in a retinal frame of reference. We try to account with this model simultaneously for several experimental observations on the perception of animacy and social interaction. RESULTS: Based on input video sequences, the model successfully reproduces results of Tremoulet and Feldman (2000) on the dependence of perceived animacy on motion parameters and the body axis. In addition, the model classifies correctly six categories of social interactions that have been frequently tested in the psychophysical literature (following, fighting, chasing, playing, guarding, and flirting) (e.g. Scholl & McCarthy, 2012; McAleer et al., 2008). In addition, we show that the model can be extended for the processing of simple interactions in real-world movies. CONCLUSION: Since the model accounts simultaneously for a variety of effects related to animacy and interaction perception using physiologically plausible mechanisms, without requiring complex computational inference and optimization processes, it might serve as starting point for the search of neurons that are forming the core circuit of the perceptual processing of animacy and interaction.
Acknowledgement: HFSP RGP0036/2016 the European Commission H2020 COGIMON H2020-644727, BMBF FKZ 01GQ1704, and BW Stiftung NEU007/1