Abstract
Background and objective: The brain has separate specialized units to process faces and voices located in occipital and temporal cortices, respectively. However, humans seamlessly integrate signals from the face and the voice of others for optimal social interaction. How does redundant information delivered by faces and voices, like emotion expressions, is integrated in the brain? We characterized the neural basis of face-voice integration, with a specific emphasis on how face or voice selective regions interact with multisensory regions, and how emotional expression affects integration properties. Method: We presented 24 subjects with 500ms stimuli containing visual only, auditory only, or combined audio-visual information, which varied in expression (neutral, fearful). We specifically searched for regions responding more to bimodal than to unimodal stimuli, as well as examining the response in face- and voice- selective regions of interest, as defined by independent localizer scans. Finally, regions of interest were entered into dynamic causal modeling (DCM) in order to determine, using Bayesian model selection methods, the direction of information flow between these regions. Results: Using a whole-brain approach, we found that only the right posterior STS responded more to bimodal stimuli than to face or voice alone when the stimuli contained emotional (fearful) expression. No regions responded more to bimodal than unimodal neutral stimuli. Region-of-interest analysis including face- and voice-selective regions extracted from independent functional localizers similarly revealed multisensory integration in the face-selective right posterior STS only. DCM analysis revealed that the right STS receives unidirectional information from the face-selective fusiform face area (FFA), and voice-selective middle temporal gyrus (MTG), with emotional expression affecting their connections strength. Conclusion: Our study promotes a hierarchical model of integration of face and voice signal with a convergence zone in the right STS and that such integration depends on the (emotional) salience of the stimuli
Meeting abstract presented at VSS 2016