Abstract
Social perception is an important part of everyday life that develops early and is shared with non-human primates. To understand the spatiotemporal dynamics of naturalistic social perception in the human brain, we first curated a dataset of 250 500-ms video clips of two people performing everyday actions. We densely labeled these videos with features of visual social scene, including scene and object features, visual social primitives, and higher-level social/affective features. To investigate when and where these features are represented in the brain, patients with implanted stereoelectroencephalography electrodes viewed the videos. We used time-resolved encoding models in individual channels to investigate the time course of representations across the human brain. We find that an encoding model based on all of our social scene features predicts responses in a subset of channels around 400 ms after video onset. The channels that are best predicted by the social scene model are mostly non-overlapping with the channels that are best predicted by a model of early visual responses (the second convolutional layer of an ImageNet-trained AlexNet). Future analyses will investigate when and where individual features of the social scene model are predictive of neural responses, and how these interact with visually-selective channels to extract high-level social information from visual input.