Abstract
Social perception is used ubiquitously in daily life. Prior work has revealed that a region in the right posterior superior temporal sulcus (STS) selectively supports social interaction perception in controlled stimuli such as point light displays. In contrast, both the left and right STS have been shown to support social interaction perception in naturalistic stimuli. However, previous work did not account for the rich verbal signals that occur simultaneously with social visual signals in natural settings and did not compare with controlled experiments in individual subjects. Do social interaction and language selectivity generalize across simple, controlled experiments and a more naturalistic movie stimulus? In an fMRI experiment, 12 participants completed controlled tasks previously shown to identify social interaction and language selective voxels in the STS (viewing interacting versus independent point light figures and listening to spoken versus scrambled language). Participants also viewed a 45 minute naturalistic movie. We fit an voxel-wise encoding model that included low- and mid-level visual and auditory features, as well as higher-level social and language features, including the presence of a social interaction and language model embeddings of the spoken language in the movie. Despite the drastically different nature of our controlled versus movie experiments, voxel-wise preference mapping and variance partitioning revealed spatial and functional overlap between the movie and controlled experiments for both social interaction and language-selective voxels. However, there were some differences between the controlled experiments and the natural movie. The movie stimuli enabled a richer characterization of voxel-wise processing and also elicited stronger bilateral social responses in the pSTS, even when accounting for spoken language. Overall, these results show that controlled and naturalistic stimuli recruit similar areas for social processing, but naturalistic stimuli can give a richer understanding of the neural underpinnings of simultaneous visual and verbal social processing in real-word settings.