Abstract
The superior temporal sulcus (STS) is crucially involved in social perception, selectively processing visual and auditory stimuli in partly overlapping sub-regions. However, social interaction perception in the posterior STS (pSTS) has solely been studied using visual stimuli. Here, we sought to elucidate its response profile to auditory social interactions. In Experiment 1, we scanned 23 monolingual participants while they listened to recordings of either two speakers (conversations or scrambled conversations) or one speaker (narrations) in both their native and an unfamiliar language. Additionally, we separately localised visual-interaction-, and voice-selective regions-of-interest (ROI). We explored the effect of speaker number in both languages, and of conversation coherence in participants’ native language. In Experiment 2, 12 bilingual participants underwent similar procedures to further explore the role of language comprehension. ROI analyses revealed significantly greater activation in bilateral visually defined interaction-selective pSTS (I-pSTS) to two speakers compared to one across languages and experiments. In bilinguals, I-pSTS robustly activated to all two-speaker conditions, whilst for monolinguals this was only true in their native language. Bilateral STS voice area (V-STS) showed a similar response pattern but, unsurprisingly, was strongly activated for both languages in both experiments. Multivariate-pattern-analysis could discriminate two vs one across languages in bilateral V-STS and right I-pSTS. Crucially, while both right I-pSTS and left V-STS responded slightly more strongly to scrambled than coherent conversations, MVPA discrimination between scrambled and coherent conversations was significantly above chance only in the right I-pSTS. Altogether, these results cautiously suggest that right I-pSTS is not exclusively visual but also represents information about auditory interactions that goes beyond mere number of speakers. However, interaction-selectivity was significantly smaller for auditory compared to visual stimuli in visually defined I-pSTS. Future research should clarify the role of V-STS in auditory interaction perception, and further probe I-pSTS interaction-selectivity using non-semantic prosodic cues.