Abstract
Statistical learning theories suggest that people implicitly learn arbitrary stimulus-stimulus associations based solely on the statistics of inter-stimulus contingencies (Fiser and Aslin, 2001). Various studies have supported such learning in the visual, auditory and somatosensory modalities. To date, cross-modal statistical learning has not been investigated. Here, we present results of a novel audio-visual statistical learning procedure where participants are passively exposed to arbitrary audio-visual pairings (comprised of artificial/synthetic auditory and visual stimuli). Following this exposure period, participants' degree of familiarity to the experienced audio-visual pairings is evaluated against novel audio-visual combinations drawn from the same stimulus set. The results of this comparison demonstrate the existence of audio-visual statistical learning.
Additionally, we investigated whether audio-visual associations with an appropriate “gestalt” were learned more robustly than those with less appropriate relationships. We used a procedure in which visual objects disappeared into a ‘visual abyss’ and reappeared as new objects (Fiser and Aslin 2002). During each disappearance-to-reappearance interval, an upward or downward frequency-modulated sound was played. The Gestalt+ condition consisted of downward-frequency sweeps paired with disappearances and upward-frequency sweeps paired with appearances. The opposite combinations were assigned to the Gestalt- condition. In this condition, subjects showed greater familiarity for Gestalt+ than Gestalt- audio-visual pairs.
Our results suggest that audio-visual statistical learning occur naturally despite the absence of a task or of an explicit attentional engagement, for audio-visual stimuli that are spatio-temporally coincident. More importantly, the degree of learning depends partially on appropriate gestalt relationships between the multisensory events.