Abstract
People exhibit remarkable visual sensitivity to point-light displays of human movement. A neural region that has been strongly implicated in the perception of human actions generally, and the processing of point-light displays specifically, is the posterior region of the Superior Temporal Sulcus. The STSp has also been implicated in the integration of signals across modalities in the formation of multimodal representations of observed actions.
Because perception of human action is mediated by the STS, and because the STS has been implicated as a region where sensory inputs are integrated, we hypothesized that the addition of auditory cues corresponding to human actions depicted in point-light displays should improve visual sensitivity to those actions. Previous research has shown that audiovisual temporal frequency matching improves when auditory cues are phase-locked to upright point-light walkers but not to scrambled or inverted versions of these displays (Saygin, Driver, & De Sa, 2008). Here we further asked whether semantically appropriate auditory cues improve sensitivity more than cues that are coincident with the action but semantically unrelated.
To test this hypothesis, a psychophysical study was conducted wherein participants detected the presence of coherent point-light walkers in a mask under unimodal (visual) or audiovisual conditions. Participants in the audiovisual conditions heard either pure tones or veridical footfalls coincident with the walkers' footsteps. Results revealed a significant improvement in detection sensitivity when visual displays were paired with veridical auditory cues (footfalls). The addition of coincident yet non-veridical auditory cues (tones) tended to, but did not significantly, enhance detection of coherent point-light walkers. The fact that merely coincident auditory cues did not increase detection sensitivity suggests that the effect cannot be due to temporal coincidence alone. Rather, these psychophysical results converge with neurophysiological evidence suggesting that the analysis of human action reflects the integration of high-level auditory and visual processes.