Abstract
Speech perception under natural conditions requires the integration of auditory and visual information. Understanding how a viewer assimilates the information contained in these two sensory modalities requires a detailed description of both the speech information and information processing. To understand better the process of gathering facial information, we have quantified the distribution of gaze fixations of humans performing an audiovisual speech perception task with dynamic talking faces. Here we examined the degree of gaze fixation asymmetry to determine whether left-right biases reflect the face stimulus or the strategy of the viewer. Most participants preferentially fixated the right side of the faces, suggesting a left visual field bias, and their bias persisted even when they viewed horizontally mirrored faces, different talkers, and static faces. This bias was usually present in the very first fixations, but participants showed stronger gaze fixation asymmetries when viewing dynamic faces, in comparison to static faces or face-like objects, and especially when their gaze was directed to the talker's eyes. Correlation analysis revealed that the degree and side of bias of gaze asymmetry during audiovisual speech (i.e., a dynamic stimulus) are predicted by the same parameters measured during simple face processing (i.e., a static stimulus). Although viewing dynamic faces in these experiments significantly enhanced speech perception, we did not find any correlation between task performance and gaze fixation asymmetry. Similarly, asymmetry did not appear to be related to other measures of laterality (handedness and eye-dominance). These results suggest that the asymmetrical distribution of gaze fixations reflect the participants' viewing strategies rather than being a product of asymmetry within the faces themselves. That these strategies did not predict audiovisual speech perception suggests that the process of gathering information from a talker's face involves a large visual span that is independent of the ability to assimilate information.
This work was supported by fundings from the National Institute on Deafness and other Communication Disorders (grant DC-00594) and the Natural Sciences and Engineering Research Council of Canada to K.G. Munhall and from the Canadian Institutes of Health Research and the EJLB Foundation to M. Paré. I.T. Everdell holds an Ontario Graduate Scholarship, and M. Paré holds a New Investigator Award from the Canadian Institutes of Health Research.