Abstract
During everyday tasks like navigating a crowded store or deciding where to sit on a bus, we perceive rich details about the social interactions of others. The ability to perceive others’ social interactions is a core component of human cognition, but its neural computations are only beginning to be understood. While recent work has made progress in understanding the neural mechanisms of social interaction perception with tightly controlled stimuli, continued progress requires a dataset of social and nonsocial actions that is representative of everyday life and captures variance along a range of social dimensions, e.g. valence, cooperativity, and interpersonal relationships. However, using naturalistic stimuli to investigate social interaction perception presents significant challenges. First, the number of people in a given video is highly predictive of the presence of a social interaction–a video of one person likely shows a nonsocial action, while a video of three or more people likely shows a social interaction. Further, existing datasets of social actions are not representative of everyday actions. They dramatically oversample sporting activities, for instance. We addressed this challenge by curating a dataset of two-person social and nonsocial actions from the Moments in Time dataset. Our videos were selected from common action categories based on the American Time Use Survey. We also balanced our dataset on visual features such as whether the action occurred indoors or outdoors. Finally, we selected our final set of videos so that sociality was not predictable by early layers of a deep convolutional neural network. This dataset will be an important tool in benchmarking human social perception against computer vision models and can pave the way for the same progress that has been achieved in object and scene perception.