Abstract
Visual lip movements improve auditory speech perception in noisy environments (e.g., McGettigan et al., 2012) and crossmodally activate auditory cortex (e.g., Pekkola et al., 2005). What specific information about visual lip movements is relayed to auditory cortex? We investigated this question by recording electrcorticographic (ECoG) activity from electrodes implanted within primary/secondary auditory cortex of the brains of epilepsy patients. We presented four representative auditory phonemes (/ba/, /da/, /ta/, and /tha/), or presented the corresponding lip movements, visemes, articulating these phonemes. We constructed an ensemble of deep convolutional neural networks to determine whether the identity of the four phonemes (from auditory trials) and visemes (from visual trials) could be decoded from auditory cortical activity. Reliable decoding of viseme identity would provide evidence of coding of visual lip-movement information in auditory cortex. We first verified that auditory phoneme information was reliability decoded with high accuracy from auditory-evoked activity in auditory cortex. Critically, viseme information was also reliably decoded from visual-evoked activity in both the left and right auditory cortices, indicating that visemes generate phoneme-specific activity in auditory cortex in the absence of any sound. Furthermore, the classifier trained to identify visemes successfully decoded phonemes with comparable accuracy, indicating that the patterns of activity in auditory cortex evoked by visemes (from visual trials) were similar to those evoked by phonemes (from auditory trials). These results suggest that visual lip movements crossmodally activate auditory speech processing in a content-specific manner.
Meeting abstract presented at VSS 2016