Abstract
Human speech consists of visual information from the talker’s mouth and auditory information from the talker’s voice. A key question is whether the neural computations that integrate visual and auditory speech are additive, superadditive (excitatory) or subadditive (suppressive). To answer this question, we recorded brain activity from 7 patients implanted with electrodes for the treatment of medically-intractable epilepsy. We examined 33 intracranial electrodes (iEEG) located over the posterior superior temporal gyrus (pSTG), a key brain area for multisensory speech perception. Patients listened to audiovisual speech words in three formats: natural asynchrony between auditory and visual speech onset, auditory speech onset advanced 300 ms (A300V), and visual speech onset advanced 300 ms (V300A). We used deconvolution to decompose the measured iEEG responses to audiovisual speech into unisensory auditory and visual speech responses. Manipulating the asynchrony of the auditory and visual speech allowed us to separately estimate the responses to auditory and visual speech, and hence the rule by which their neural responses were combined. The deconvolved esponses were then fit to two models. The additive model sums the deconvolved unisensory responses and was a poor fit to the actual data (RMSE=41). The non-additive model sums the deconvolved unisensory responses plus an auditory-visual interaction term. The non-additive model was a better fit to the actual data (RMSE=20). We also examined the sign of the interaction term. A positive interaction indicates a measured response greater than the summed unisensory responses (superadditivity), while a negative interaction indicates a measured response less than the summed unisensory responses (subadditivity). The interaction was negative for 25 of 33 electrodes. These data indicate a suppressive interaction between visual and auditory speech information consistent with a cross-modal suppression model of speech perception in which early arriving visual speech information inhibits the responses of neurons selective for incompatible auditory phonemes.