Our perceptual system is bombarded with diverse information arriving from different sensory modalities, yet we achieve an integrated, coherent percept of our environment. The perceptual system exploits spatial and temporal correlations among sensory events in order to maximize our sensitivity to external cues (Alais & Burr
2004a; Stein & Meredith,
1990). The information from one sensory modality may also be weighed more heavily than others in forming an impression of events (Hillis, Ernst, Banks, & Landy,
2002). For example, the sound of an animal in the woods can indicate a potential threat even when it can be neither seen nor localized through sound effectively. Once the target is foveated, however, the visual system provides the most precise spatial resolution. The perceptual system must therefore be adaptive and dynamic. The study of cross-modal influences provides some insight into the complex processes that coordinate and maintain a reliable representation of the environment.
Much research has shown that for spatial events, when auditory and visual cues are discrepant, the sound can appear to emanate from the visual source. The well-known ventriloquist effect (Bertelson & Radeau,
1981; Thurlow & Jack,
1973) is an excellent example of this phenomenon. We experience this effect in our daily lives when the sound from television loudspeakers appears to derive directly from the actors on the screen. A static auditory stimulus can also appear to move with a visual moving object (Kitajima & Yamashita,
1999; Mateeff, Hohnsbein, & Noack,
1985).
With regard to dynamic auditory and visual stimuli, a moving light stimulus can induce perceptual shifts in the apparent velocity as well as the direction of a sound source moving along horizontal, vertical, or depth directions in three-dimensional space (Kitajima & Yamashita,
1999). Similarly, the perceived direction of auditory apparent motion is strongly modified by visual apparent motion (Soto-Faraco, Lyon, Gazzaniga, Spence, & Kingstone,
2002; Soto-Faraco, Spence, & Kingstone,
2004). Yet, auditory spatial events can, under some conditions, induce shifts in the perceived motion of visual stimuli. Meyer and Wuerger (
2001) manipulated the proportion of dots that moved in a coherent direction while a supra-threshold auditory signal was cross-faded between two loudspeakers located behind the display. They demonstrated a bias in the perceived direction of visual motion that was consistent with the direction of auditory motion.
Recently, Maeda, Kanai, and Shimojo (
2004) have described an illusion in which an auditory stimulus containing no information about motion in space altered the perception of visual motion. A tone that is ascending (or descending) in pitch biases the ambiguous motion of two horizontal superimposed, oppositely moving gratings to be perceived as moving upward (or downward).
Although there is some evidence that conditions exist for auditory stimuli to alter the perception of simultaneously presented visual motion signals, cross-modally induced aftereffects have been obtained following adaptation to visual but not to auditory stimuli. Aftereffects are considered particularly important for demonstrating the existence of sensory effects because they are thought to arise from changes to neural mechanisms, after they have been exposed for a long time to a stimulus that activates them. They are unlikely to reflect changes in response bias because their direction is opposite to that expected by response bias on the basis of the observed motion direction.
Kitagawa and Ichihara (
2002) have shown that an auditory aftereffect is elicited from adaptation to purely visual motion: after adapting to an expanding/contracting square, a steady sound was perceived as getting softer/louder. The converse, however, did not occur: after adapting to a sound the intensity of which grew (or diminished) over time, a static square was not perceived to be changing in size, thus extending the visual dominance to cross-modally induced aftereffects. In fact, the visually induced auditory loudness aftereffect described by Kitagawa and Ichihara is sufficiently robust that it can be elicited even when subjects are presented simultaneously with both expanding and contracting discs and they direct their attention to one of the competing stimuli. In this case, a significant loudness aftereffect is produced favoring the attended stimulus (Hong & Papathomas,
2006).
Vroomen and de Gelder (
2003) showed that a visual motion cue can influence the strength of the auditory contingent aftereffect first described by Dong, Swindale, and Cynader (
1999). Dong et al. observed that, after subjects adapt to a leftward-moving sound rising in pitch and a rightward-moving sound falling a pitch, a stationary sound with rising pitch is perceived as moving rightward and vice versa. Vroomen and de Gelder observed that aftereffects were significantly larger when subjects viewed a small bright square that moved in tandem with the auditory apparent motion stimulus. For incongruent auditory and visual motion, aftereffects were reversed. Thus, aftereffects were contingent on the visual motion stimulus.
Studies such as those of Kitagawa and Ichihara (
2002) and Vroomen and de Gelder (
2003) that examine aftereffects provide evidence that multisensory interactions do occur at a low-level sensory stage, rather than simply reflecting changes in high-level decision processes, such as response-related bias (Vroomen, Bertelson, & de Gelder,
2001). Post-adaptation biased responses would favor the direction of the adapting stimulus, rather than the opposite direction of the motion aftereffect (MAE) that subjects report perceiving.
Relatively few studies have demonstrated a strong influence of auditory motion stimuli on the perception of visual spatial events. This may be due, in part, to a lack of ambiguity in the visual stimuli. The addition of cross-modal cues may not alter visual perception because the visual system already provides stable, unambiguous representation of events. The perceptual system may be predisposed to use cross-modal information primarily to resolve ambiguity in sensory impressions. Indeed, multimodal neurons of the superior colliculus exhibit maximal response enhancement with pairs of weak unimodal stimuli. As unimodal stimuli become more effective, there is a decline in the amount by which bimodal stimuli modify neural responses (Wallace, Wilkinson, & Stein,
1996). Moreover, there is evidence that auditory and visual motion signals are combined best when they are coincident in space and time. For example, Meyer, Wuerger, Röhrbein, and Zetzsche (
2005) used an array of loudspeakers and light emitting diodes to produce apparent-motion signals and observed that combined input to lower thresholds of the bimodal signals required spatial co-localization and coincidence.
As outlined above, there have been numerous studies on how a stimulus in one modality affects the perception of a simultaneous stimulus in another modality, but fewer studies on cross-modal motion aftereffects (MAEs). The present study used a comprehensive set of stimuli and conditions to examine cross-modal influences in auditory and visual motions. We first studied such influences by assessing performance with brief simultaneous stimuli. To investigate further the origins of these influences, we tested cross-modal aftereffects. In all cases, we examined both types of cross-modal influence: auditory-to-visual and visual-to-auditory. For each cross-modal influence type, we examined influences in three configurations:
-
x-axis motion, i.e., left-/right-moving visual and auditory stimuli.
-
z-axis motion, i.e., perceptually approaching/receding visual stimuli paired with loudness-changing sounds that appear to move in depth.
-
y-axis visual motion (up/down) and pitch-changing sounds (we also used vertical auditory motion in one condition).
The pairing of auditory motion across frequency and visual motion across space in this last configuration appears odd at first glance, but it is motivated by Maeda et al.'s (
2004) report of cross-modal effects of spectral sound motion on simultaneous vertical visual motion. Our selection of this pairing was deliberate to test two not yet tested hypotheses: first, to test whether Maeda et al.'s (
2004) short-term effect generates aftereffects; second, to test cross-modal effects in the other direction, from spatial visual to auditory spectral motion. Thus, we have 12 conditions, in which we use comparable stimuli: 2 Influence Types (auditory-to-visual and visual-to-auditory) × 2 modes (simultaneous and aftereffects) × 3 configurations (motion along
x, y, and
z-axes).
We hypothesized that much of the apparent dominance of visual over auditory information in spatial events arises because of the higher visual spatial resolution. Auditory and visual motion signals may exist that produce a more equitable distribution of relative cross-modal influences. To address this issue, we used visual and auditory motion stimuli for which we could manipulate the degree of ambiguity with respect to motion direction. In much the same way, Alais and Burr (
2004a) have manipulated visual spatial ambiguity and concluded that bimodal localization is not captured by visual stimulation but is governed by optimal integration of visual and auditory information. To introduce ambiguity in visual motion, we superimposed two oppositely moving sinusoidal gratings of the same spatial frequency (counterphase flickering motion). Visual signal strength and direction was manipulated by changing the relative contrast of the two components. Auditory motion signal strength along the
x-axis was controlled by manipulating the signal amplitude from two laterally placed loudspeakers. Sound intensity was varied to simulate sound motion along the
z-axis. Sounds of gliding pitch, as well as cross-fading energy between two vertically placed speakers, were used to accompany visual motion along the
y-axis.
Cross-modal influences were examined using a strong unambiguous motion stimulus in one modality, which we call primary modality, simultaneously paired with a relatively weaker stimulus in the secondary modality that possessed some degree of ambiguity. In one task subjects indicated the motion direction of the weaker motion signal. In a second task subjects indicated whether the motion directions in the two modalities were the same. Both tasks yielded similar results: the stimulus having the strong motion signal altered perceived motion direction regardless of whether it was visual or auditory. To examine cross-modal aftereffects, subjects adapted to strong motion stimuli in the primary modality and subsequently discriminated motion direction in the secondary modality. We found that visual stimuli generated significant cross-modal auditory aftereffects, yet visual aftereffects could not be elicited using spatial auditory motion as adapting stimuli. However, auditory spectral motion did induce visual vertical motion aftereffects.