While visual processes have traditionally been studied in isolation, it is becoming increasingly clear that multisensory interactions are the rule rather than the exception (Shimojo & Shams,
2001). In humans, enhancements in visual sensitivity as a function of auditory cues have been found for low-level stimuli (e.g., Bolognini, Frassinetti, Serino, & Làdavas,
2005; Driver & Spence,
1998; Frassinetti, Bolognini, & Làdavas,
2002; Vroomen & de Gelder,
2000), and these effects often depend upon some degree of spatial and/or temporal coincidence between stimuli. Here we find evidence for an enhancement in visual sensitivity as a function of meaningful auditory cues. Previous evidence exists for the role of meaningful stimulus correspondence in audiovisual detection tasks. For example, semantically congruent multimodal color stimuli (colored circles and color word vocalizations) are detected faster than either unimodal or incongruent multimodal stimuli (Laurienti, Kraft, Maldjian, Burdette, & Wallace,
2004). Reaction times for congruent multimodal stimuli are faster than those predicted by probability summation, suggesting multimodal integration. Recent evidence points towards audiovisual interactions at higher levels of complexity as well. For instance, semantically congruent sounds speed the visual detection of related pictures (Iordanescu, Guzman-Martinez, Grabowecky, & Suzuki,
2008; Molholm, Ritter, Javitt, & Foxe,
2004). A similar detection benefit is not found when sounds are paired with related
words, which suggests that the effect occurs at the level of visual object processing rather than at some perceptually removed, amodal semantic or decision level (Iordanescu et al.,
2008). Additionally, semantically congruent sounds increase participants' visual sensitivity to briefly presented, backwards masked pictures of animals and tools, while incongruent sounds tend to result in a decrease (Chen & Spence,
2010). Such examples of meaning-based crossmodal enhancement suggest that perceptual representations of objects and events may be fundamentally multimodal. The current results suggest the same for representations of human actions. However, as evidenced in
Experiment 2, such representations may be constrained by factors such as stimulus orientation. In order for such high-level stimuli to be integrated, it is likely that they must match the internal model of the observed actions.