Abstract
Research has shown that early visual processing is enhanced when preceded by spatially co-localized sounds (Störmer et al 2009). While this work has shown that visual perception for low-level stimuli (Gabor patches) is cross-modally influenced by equally low-level auditory stimuli (noise bursts), it is not known whether such cross-modal influences exist at higher processing stages. Here, we asked whether and how complex, real-world sounds facilitate visual processing of real-world objects. We designed a novel cross-modal paradigm that tracks the amount of sensory evidence needed to accurately detect real-world objects. We then compared object recognition performance when these visual objects were preceded by congruent or incongruent auditory information. Each trial started with the presentation of a 2-s sound (e.g., a train) which was immediately followed by the presentation of a continuous visual stimulus. This stimulus started out as an incoherent, grayscale noise patch that slowly and continuously decreased its noise level to eventually form a clearly visible object. On half of the trials the object was congruent (e.g., a train), and on the remaining trials it was incongruent (e.g., an elephant) with the sound. Participants were instructed to press a key as soon as they recognized the object, which also immediately ceased the stimulus presentation. Subsequently participants indicated which object they saw from two, within-category, highly similar objects positioned on either side of fixation. Participants (n=26) showed shorter reaction times when initially recognizing the object on congruent trials relative to incongruent trials (p=0.005). Overall, our results show that visual object recognition can be facilitated when preceded by congruent vs. incongruent auditory information. Broadly, this indicates that naturalistic sounds enhance the perceptual processing of visual objects.