Free
Research Article  |   December 2010
Reaction time facilitation for horizontally moving auditory–visual stimuli
Author Affiliations
Journal of Vision December 2010, Vol.10, 16. doi:https://doi.org/10.1167/10.14.16
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Neil R. Harrison, Sophie M. Wuerger, Georg F. Meyer; Reaction time facilitation for horizontally moving auditory–visual stimuli. Journal of Vision 2010;10(14):16. https://doi.org/10.1167/10.14.16.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

For moving targets, bimodal facilitation of reaction time has been observed for motion in the depth plane (C. Cappe, G. Thut, B. Romei, & M. M. Murray, 2009), but it is unclear whether analogous RT facilitation is observed for auditory–visual motion stimuli in the horizontal plane, as perception of horizontal motion relies on very different cues. Here we found that bimodal motion cues resulted in significant RT facilitation at threshold level, which could not be explained using an independent decisions model (race model). Bimodal facilitation was observed at suprathreshold levels when the RTs for suprathreshold unimodal stimuli were roughly equated, and significant RT gains were observed for direction-discrimination tasks with abrupt-onset motion stimuli and with motion preceded by a stationary phase. We found no speeded responses for bimodal signals when a motion signal in one modality was paired with a spatially co-localized stationary signal in the other modality, but faster response times could be explained by statistical facilitation when the motion signals traveled in opposite directions. These results strongly suggest that integration of motion cues led to the speeded bimodal responses. Finally, our results highlight the importance of matching the unimodal reaction times to obtain response facilitation for bimodal motion signals in the linear plane.

Introduction
The ability to respond quickly and accurately to one's surroundings is enhanced by combining information from multiple sensory modalities. Auditory and visual signals arising from the same object in the external world are often highly correlated and can convey redundant information about the object's location and movement. Laboratory studies have found that response times are typically faster under dual-stimulus conditions compared to single-stimulus conditions; this is known as the redundant signal effect (RSE; Miller, 1982; Raab, 1962; Todd, 1912). 
Two classes of models have been proposed to explain the RSE: statistical facilitation models and co-activation models. In statistical facilitation models (commonly known as race models), it is assumed that each stimulus (e.g., auditory and visual) feeds into a separate decision unit. In trials where signals in more than one modality are available, the response will then be initiated by the modality that reaches its decision criterion first, that is, the response time on a redundant signal trial is the minimum of the response times for the two separate stimuli (Raab, 1962). Co-activation models comprise an integration stage of the unimodal sensory signals prior to the decision stage; a single decision criterion is assumed, which receives information from both modalities (Miller, 1982). The reduction in response times in bimodal redundant signal trials usually exceeds the redundancy gain predicted by statistical facilitation, providing evidence for co-activation of auditory and visual signals (e.g., Diederich & Colonius, 1987, 2004; Miller & Ulrich, 2003; Molholm et al., 2002). Evidence for co-activation models has commonly been found in tasks where observers were asked to respond as fast as possible to the onset of stationary cues. However, many sensory signals crucial to the survival of organisms are not stationary but vary in space and time due to real object motion, head movements, eye movements, or any combination of these, and both the auditory and visual systems possess special mechanisms to extract the speed and direction of moving stimuli. Whether horizontally moving bimodal signals yield reductions in response times relative to unimodal signals and whether such reductions go beyond statistical facilitation are still open questions and will be addressed here. 
There is a relatively large number of studies addressing the perception of motion in more than one sensory modality; in general, these studies focus on how the perception of motion in one modality is modulated by the presence of motion in a second modality (for reviews, see Soto-Faraco, Kingstone, & Spence, 2003; Soto-Faraco, Spence, & Kingstone, 2004). For instance in the well-known “dynamic capture” effect, visual motion can affect the perceived direction of auditory motion (Sanabria, Spence, & Soto-Faraco, 2007; Soto-Faraco, Lyons, Gazzaniga, Spence, & Kingstone, 2002; Strybel & Vatakis, 2004), and in the “bouncing disks” effect, both stationary (Sanabria, Correa, Lupiáñez, & Spence, 2004; Sekuler, Sekuler, & Lau, 1997) and moving (Sanabria, Lupiáñez, & Spence, 2007) auditory signals can influence the perception of ambiguous visual motion stimuli. It has also been shown that an auditory signal can bias the perceived direction (Brooks et al., 2007; Jain, Sally, & Papathomas, 2008; Maeda, Kanai, & Shimojo, 2004) and speed (Ecker & Heller, 2005) of visual motion cues. 
In contrast, there are relatively few studies that have focused on performance enhancements for bimodal motion stimuli relative to unimodal baselines. Increased detection rates for auditory–visual compared to unimodal motion cues have been observed for strictly co-localized linear motion signals (Hofbauer et al., 2004; Meyer, Wuerger, Roehrbein, & Zetzsche, 2005), and recently, it has been found that visual and auditory motion cues in the depth plane are responded to faster under cross-modal compared to unimodal presentation (Cappe, Thut, Romei, & Murray, 2009). Performance benefits under bimodal stimulation have also been observed for displays of complex human movement patterns (Scheef et al., 2009), and selective attention to motion has been found to improve performance both within and across modalities (Beer & Röder, 2004). In spite of these findings, it remains unclear whether an RSE would be found for horizontally moving auditory–visual cues; therefore, the main aim of the studies here is to test whether strictly co-localized and synchronized apparent motion signals will elicit faster responses under redundant auditory–visual conditions compared to unimodal baselines. 
Enhanced motion detection under bimodal conditions has been reported for signals at threshold (Meyer et al., 2005), but it is unclear whether performance enhancement would be similarly facilitated for stimuli at suprathreshold intensity levels. In Experiment 1, we will investigate the effect of stimulus intensity on the RSE by varying the visual and auditory stimulus intensities between threshold and suprathreshold levels in a direction-discrimination task. In Experiment 2, we investigate the RSE during a direction-discrimination task for stimuli with roughly equated unimodal reaction times, and Experiments 3A and 3B will test for RSE in a motion-onset paradigm (where the motion phase is preceded by a stationary phase). In Experiment 4, we will test whether the integration of specifically motion information is required for speeded bimodal responses to moving cues, or whether the combination of a motion signal in one modality and a stationary, but spatially co-localized, signal in the other modality would also lead to RT facilitation. Additionally, Experiment 4 will test whether the AV motion signals are required to move in the same direction, or whether RT gains are observed when motion cues begin at the same location but travel in opposite directions. 
In all experiments, we generated co-localized and co-incident auditory and visual stimuli motion signals with a series of point sources to ensure valid motion cues in both the visual and auditory domains; these conditions are known to enhance integration of auditory–visual motion cues (Meyer et al., 2005). 
Experiment 1
The goal of Experiment 1 was to test whether the RSE is observed for horizontally moving auditory–visual stimuli and whether any reaction time gains are greater than would be expected by a statistical facilitation account (i.e., race models; Miller, 1982). Studies using cross-modal stationary signals have found robust RSEs for stimuli well above threshold (see, e.g., Miller, 1982, 1986), but it is unclear whether bimodal facilitation of reaction times for motion stimuli would be present at suprathreshold levels, given the potential dominance of the visual system for spatial information (Posner, Nissen, & Klein, 1976), and the fact that multisensory integration effects are often strongest when the individual signals are weakest (the “inverse effectiveness” rule; see Stein, Huneycutt, & Meredith, 1988). In Experiment 1, intensity levels of the visual, auditory, and auditory–visual motion stimuli were systematically varied in a direction-discrimination task to investigate whether an RSE is level dependent. 
Methods
Participants
Twelve participants took part in the experiment, after giving written informed consent. All participants were University of Liverpool students. The mean age of the participants was 24.1 (range: 19–29) and gender distribution was 5/7 (male/female). All participants reported normal hearing and normal or corrected-to-normal vision. 
Apparatus and stimuli
Stimuli were presented on an array of 13 LEDs and loudspeakers arranged in a horizontal row with a 10-cm gap between each LED/loudspeaker. At a viewing distance of 1 m, each LED/loudspeaker was separated by a 6° visual angle. Each LED was positioned directly (3 cm center to center) above its equivalent loudspeaker to ensure that bimodal stimuli were co-localized (for details on the apparatus, see Meyer et al., 2005). 
Apparent motion was created by feeding a succession of 50-ms white noise bursts to seven consecutive LEDs/loudspeakers (Figure 1). All motion signals began at a central fixation point and moved horizontally with a speed of 120°/s either leftward or rightward by 36°. These stimulus parameters (duration of 50 ms and a stimulus-onset asynchrony of 50 ms between consecutive LEDs/loudspeakers) have been shown to reliably produce the impression of continuous apparent motion (Sanabria, Soto-Faraco, Chan, & Spence, 2005). Participants in a pilot study reported that the auditory and visual motion streams were not perceived as a series of discontinuous flickers/noises but rather as one continuous, smooth stream of motion. Signal presentation was controlled by a Tucker Davis RP2 real-time signal processor (Tucker-Davis Technologies) connected to a personal computer. 
Figure 1
 
The trajectory of the motion stimuli over space and time. The horizontal axis displays spatial eccentricity, and the vertical axis displays time (each unit T denotes a duration of 50 ms). Auditory and/or visual stimuli (shaded squares) begin in the center (0°) at time “T1” and then move by 6° either leftward or rightward at each subsequent 50-ms period. Note that stimuli move either to the left or to the right.
Figure 1
 
The trajectory of the motion stimuli over space and time. The horizontal axis displays spatial eccentricity, and the vertical axis displays time (each unit T denotes a duration of 50 ms). Auditory and/or visual stimuli (shaded squares) begin in the center (0°) at time “T1” and then move by 6° either leftward or rightward at each subsequent 50-ms period. Note that stimuli move either to the left or to the right.
Reaction times were measured for three intensity levels, labeled “threshold,” “mid-intensity,” and “high-intensity.” The stimulus intensities and procedure for deriving the intensities are described in the Procedure section. 
Procedure
Participants were seated in front of the audiovisual apparatus at a distance of 1 m in a dark, sound-attenuated chamber. Participants were asked to maintain fixation on an LED positioned 1 cm above the central LED and to try to avoid making any eye movements during each trial. 
In a preliminary session, the threshold intensity level was estimated for each participant separately for visual and auditory stimuli with a simple staircase procedure using a 2AFC direction-discrimination task. On each trial, the participant had to judge whether the stimulus moved to the left or to the right and press the appropriate button on the response pad. In this procedure, the stimulus intensity began at an easily discernable level, and participants continued until at least twelve reversals were made. Threshold was defined as the mean of the intensities at which the final twelve reversals occurred, which corresponds to 75% correct on the associated psychometric function (Levitt, 1971). Mean luminance (across all participants) at threshold was 0.3 cd/m2 and mean sound level was 45 dB (A) (measured at the participants head). 
The mid-intensity and high-intensity levels were determined separately for each participant by increasing their threshold level by 12 dB and 24 dB, respectively. The resulting mean stimulus levels across all participants are displayed in Table 1
Table 1
 
Mean intensity levels.
Table 1
 
Mean intensity levels.
Auditory (dB (A)) Visual (cd/m2)
Threshold 45 0.3
Mid-intensity 57 5
High-intensity 69 76
In the experimental session, for five out of the ten experimental blocks, target stimuli (p = 0.9) moved in a rightward direction and participants were asked to respond as quickly as possible following detection by pressing a button on the response pad. Non-target stimuli (catch trials, p = 0.1) moved in a leftward direction and participants were asked to avoid responding to the non-target stimuli. For the other five blocks, the directions were reversed. Within each experimental block, the stimulus conditions (auditory alone (A), visual alone (V), and auditory–visual (AV)), intensity conditions (threshold, mid-intensity, and high-intensity levels), and direction (left or right) were presented in random order. The intertrial interval varied randomly between 1500 and 2000 ms. Each participant received a total of 1980 stimulus presentations (100 presentations of each target condition, and 10 catch trials per condition). Participants were asked to be as accurate as possible while still pressing the button as quickly as possible. Reaction times were recorded up to a latency of 1200 ms following stimulus onset. 
Data analysis
Mean reaction times, catch trial errors, and misses for left and right directions of motion were compared using two-way ANOVAs with factors stimulus condition and direction of motion, for each intensity level. Subsequent analyses were conducted on data pooled across left and right directions of motion if there were no differences between motion directions. Catch trial errors and misses (target trials where the participant failed to respond) were then analyzed using two-way repeated-measures ANOVAs with factors of intensity level (threshold, mid, high) and modality (V, A, AV). Post-hoc t-tests were used to evaluate pairwise differences. 
To test whether an RSE was present, reaction times were analyzed with a two-way repeated-measures ANOVA (factors: intensity and modality). The magnitude of the RSE (i.e., the redundancy gain) was calculated by subtracting the bimodal RT from the RT to the fastest unimodal condition. 
To compare reaction times while taking into account false alarms, a further procedure was carried out to calculate the “discrimination time” (DT) of correct responses for stimuli in each condition, based on VanRullen and Thorpe (2001). Here the latency at which correct responses were significantly higher than false alarms for individual participants was calculated for each stimulus condition by dividing the corresponding RT distributions into 20-ms bins and finding the first bin where there were four consecutive bins where p < 0.01 as tested with a χ 2 test (1 df). Discrimination times for different conditions were then compared using a repeated-measures ANOVA. Overall discrimination times (±10 ms) for the complete set of reaction times for all participants were then calculated per condition by finding the first 20-ms bin where correct responses significantly (p < 0.001) outnumbered false positives for at least twenty consecutive bins as tested with a χ 2 test (1 df). 
If the response and discrimination times to the bimodal stimuli were significantly faster than the responses to either of the unimodal stimuli, we tested whether this bimodal facilitation could be explained by a race model. The race model consists of two components: (a) the assumption of statistical facilitation and (b) the assumption of context independence (Colonius, 1990). Statistical facilitation means that the response time on a redundant signal trial is the minimum of the decision times for the two separate stimuli. Context independence assumes that the distribution underlying the response times in the single-stimulus trials (auditory or visual) is identical to the distribution underlying the reaction times in the dual-stimulus condition. In other words, one assumes that the process underlying the decision within each modality is not altered when more than one stimulus is presented. When these conditions are met, the testable race model inequality (Miller, 1982) can be derived: the probability that an observer's response to the bimodal stimulus is equal or faster than a particular reaction time t is smaller than the probability that the observer responds at or before time t to the visual component plus the probability that the observer responds at or before time t to the auditory component: P(AV < t) < P(A < t) + P(V < t). If the observed bimodal probability at any time t is greater than the sum of the unimodal probabilities, the race model inequality is violated suggesting that some form of interaction between the two modalities led to the speeded response. The complete set of reaction times (i.e., 0 to 1200 ms) was divided into time bins of width 30 ms, and the cumulative reaction time for each bin was calculated separately for each participant for each stimulus condition. The sum of the visual and auditory CDFs was then compared to the auditory–visual CDF at each time bin, to assess whether the race model has been violated, and if so, at which time bins. A paired Wilcoxon Signed Ranks test was used as the assumption of normality for the RT distribution could not be verified. A measure of effect size (r) was then calculated by dividing Z by the square root of the number of participants. 
The race model can be erroneously accepted if the response time distributions are contaminated by fast guesses, where participants do not actually respond to the stimulus but just press the button at the time they expect the next stimulus to arrive (Eriksen, 1988; Gondan & Heckel, 2008). If the race model was not rejected using the standard procedure described above, the so-called “kill-the-twin” correction procedure was applied to rule out biasing by fast guesses. In this procedure, for each “response” to a catch trial, a response time smaller than or equal to the catch trial latency is eliminated from the distribution of target response times to visual stimuli. This procedure corresponds to a modified race model inequality, which includes response times for catch trials (Gondan & Heckel, 2008): P(AV < t) < = P(A < t) + P(V < t) − P(C < t), where P(C < t) denotes the incomplete response time distribution for catch trials. 
Results
Our main finding in Experiment 1 was that the RSE was present only at threshold level (Figure 2C). Both unimodal and bimodal reaction times decreased with an increase in intensity; this dependence on intensity was also reflected in the miss rates and false alarms, which are presented first (Figures 2A and 2B). 
Figure 2
 
(A) Mean (N = 12) miss rates as percentages for visual-alone (V), auditory-alone (A), and auditory–visual (AV) conditions at each stimulus intensity level (threshold/mid-intensity/high-intensity). (B) Mean false alarms for catch trials are shown as percentages for each stimulus condition at each stimulus intensity level. (C) Mean reaction times for each stimulus condition at each stimulus intensity level. Error bars display SEM values. The legend applies to all three graphs. *p < 0.05. **p < 0.01. ***p < 0.001. NS: not significant.
Figure 2
 
(A) Mean (N = 12) miss rates as percentages for visual-alone (V), auditory-alone (A), and auditory–visual (AV) conditions at each stimulus intensity level (threshold/mid-intensity/high-intensity). (B) Mean false alarms for catch trials are shown as percentages for each stimulus condition at each stimulus intensity level. (C) Mean reaction times for each stimulus condition at each stimulus intensity level. Error bars display SEM values. The legend applies to all three graphs. *p < 0.05. **p < 0.01. ***p < 0.001. NS: not significant.
Two-way repeated-measures ANOVAs with factors condition (V, A, AV) and direction of motion (left, right) found no differences between left and right directions of motion for miss rates or false alarms at any intensity level (p > 0.2 in all cases); therefore, the data were pooled across motion direction. Both mean miss rates (a failure to respond to a target stimulus; Figure 2A) and mean false alarms (Figure 2B) decreased with an increase in stimulus intensity for both unimodal and bimodal stimuli. Misses decreased significantly with increasing intensity [F(2,22) = 38.53, p < 0.001]; the dependence on intensity was different for the unimodal and the bimodal stimuli [significant interaction; F(4,44) = 4.55, p = 0.004]. T-tests revealed significant differences between conditions at threshold for V vs. AV [t(11) = 3.21, p = 0.008] and A vs. AV [t(11) = 3.50, p = 0.005]; at mid-intensity for V vs. A [t(11) = 2.37, p = 0.037] and for A vs. AV [t(11) = 2.72, p = 0.020]; and at high-intensity for V vs. A [t(11) = 3.27, p = 0.008] and for V vs. AV [t(11) = 2.88, p = 0.015]. 
The false alarm rate also decreased with intensity [F(2,22) = 75.32, p < 0.001], but the decrease was not uniform, i.e., there was a significant interaction between modality and intensity [F(4,44) = 3.12, p = 0.024]. T-tests revealed significant differences between false alarms at the threshold level for V vs. AV [t(11) = 4.71, p < 0.001] and for A vs. AV [t(11) = 3.63, p = 0.004]; for stimuli at the mid- and high-intensity levels, there were no significant differences between false alarm rates among conditions. 
Two-way repeated-measures ANOVAs with factors condition (V, A, AV) and direction of motion (left, right) found no differences between left and right directions of motion for reaction times at any intensity level (p > 0.4 in all cases); therefore, the RT data were pooled across motion direction. Mean RTs are displayed in Figure 2C for threshold level (V = 519.2 ms; A = 509.4 ms; AV = 460.2 ms), for mid-intensity level (V = 391.3 ms; A = 421.9 ms; AV = 383.8 ms), and for high-intensity stimuli (V = 334.7 ms; A = 378.3 ms; AV = 337.3 ms). Reaction times were significantly reduced when the stimulus intensity was increased [F(2,22) = 35.65, p < 0.001]. The decrease in reaction times was not uniform with the auditory modality gaining less than the visual modality [F(4,44) = 9.49, p < 0.001]. Only at threshold level did we observe a significant RSE (of 49 ms): the auditory–visual RTs were faster than both the visual-alone [t(11) = 5.973, p < 0.001] and auditory-alone [t(11) = 7.468, p < 0.001] reaction times, and here the unimodal reaction times did not differ significantly from each other. At both the mid- and high-intensity levels, the bimodal reaction times were driven by the visual modality and only auditory reaction times differed from the bimodal reaction times [mid-intensity: t(11) = 4.927, p < 0.001; high-intensity: t(11) = 5.558, p < 0.001]. The unimodal RTs differed significantly at both mid-intensity [t(11) = 4.92, p < 0.001] and high-intensity [t(11) = 3.488, p = 0.005] levels. 
Discrimination times were calculated to compare the latencies of correct responses in the visual, auditory, and auditory–visual conditions. At threshold level, DTs differed significantly between conditions [F(1,11) = 16.130, p < 0.001]. DTs for auditory–visual stimuli were faster than DTs for visual cues [t(11) = 4.885, p < 0.001] and were faster than DTs for auditory cues [t(11) = 5.069, p < 0.001]. Overall DT was 310 ms for visual, 330 ms for auditory, and 270 ms for auditory–visual cues. There were no significant differences in DTs at the mid-intensity level [F(1,11) = 0.450, p = 0.516] (overall DT for visual = 210 ms, auditory = 230 ms, AV = 210 ms). For high-intensity stimuli, there were significant DT differences between conditions [F(1,11) = 15.660, p = 0.002]; here both visual and AV stimuli were discriminated faster than auditory stimuli, but there were no differences between visual and AV discrimination times (overall DT for visual = 190 ms, auditory = 210 ms, auditory–visual = 190 ms). 
To test for interactive processing between the auditory and the visual inputs, cumulative distributions of the reaction times were calculated for the auditory, the visual, and the auditory–visual conditions, separately for all three intensity levels (Figures 3A3C). For stimuli close to threshold, the unimodal CDFs are highly overlapping (see Figure 3A); for targets above threshold (Figures 3B and 3C), the CDFs for visual targets are shifted to the left, which reflects faster mean reaction times. Only for stimuli close to threshold were the observed reaction times for the bimodal stimuli faster than the race model prediction; differences between the observed auditory–visual reaction times and the race model predictions are plotted in Figure 3D; positive differences are indicative of deviations from the race model (Miller, 1982). Paired Wilcoxon tests compared the observed and expected CDFs at each time bin where the CDF difference was positive (Miller, 1982); we found significant violations at bin 11 [Z = 2.040, p = 0.041, r = 0.589], bin 12 [Z = 0.746, p = 0.006, r = 1.370], and bin 13 [Z = 3.059, p = 0.002, r = 0.883]; corresponding to 300–390 ms post-stimulus onset for near-threshold stimuli; the largest deviation from the race model was found at bin 12 (330–360 ms). There were race model violations for 11 out of 12 observers at near-threshold level. At both suprathreshold intensity levels, there were no RSEs and no significant deviations from the race model. The “kill-the-twin” correction procedure (Gondan & Heckel, 2008) was implemented for the suprathreshold response sets, as the race model may have been erroneously accepted due to contamination by fast guesses. After this correction procedure, the race model remained accepted at all time bins (p > 0.3) for mid-intensity and high-intensity stimuli. 
Figure 3
 
Cumulative distribution function (CDFs) for RTs are displayed for visual alone (squares), auditory alone (triangles), auditory–visual (circles), and the CDF predicted by the race model (solid trace) for (A) threshold stimuli, (B) mid-intensity stimuli (after correction by the “kill-the-twin” procedure), and (C) high-intensity stimuli (after correction by the “kill-the-twin” procedure). (D) Miller inequality derived by subtracting the CDF predicted by the race model from the CDF of the AV condition at each bin for threshold stimuli (triangles), mid-intensity stimuli (circles), and high-intensity stimuli (squares). Values greater than zero (thin dashed trace) signify a violation of the race model in that bin. CDFs are plotted from 100 to 800 ms after stimulus onset for display purposes, at the mid-point of each time bin. Note that the race model will produce a CDF of 2 eventually, but the key comparison is for the early part of the CDF (Miller, 1982).
Figure 3
 
Cumulative distribution function (CDFs) for RTs are displayed for visual alone (squares), auditory alone (triangles), auditory–visual (circles), and the CDF predicted by the race model (solid trace) for (A) threshold stimuli, (B) mid-intensity stimuli (after correction by the “kill-the-twin” procedure), and (C) high-intensity stimuli (after correction by the “kill-the-twin” procedure). (D) Miller inequality derived by subtracting the CDF predicted by the race model from the CDF of the AV condition at each bin for threshold stimuli (triangles), mid-intensity stimuli (circles), and high-intensity stimuli (squares). Values greater than zero (thin dashed trace) signify a violation of the race model in that bin. CDFs are plotted from 100 to 800 ms after stimulus onset for display purposes, at the mid-point of each time bin. Note that the race model will produce a CDF of 2 eventually, but the key comparison is for the early part of the CDF (Miller, 1982).
Discussion
Reaction times for direction discrimination were measured for visual, auditory, and auditory–visual horizontally moving stimuli at threshold and suprathreshold intensity levels. We found a significant RSE (redundancy gain of around 50 ms) only for stimuli at threshold level (Figure 2C); this RSE could not be explained by statistical facilitation. The peak magnitude of the violation of the race inequality was 0.059, which is in line with previously reported violations of the race inequality for manual response times to stationary cues (Hughes, Reuter-Lorenz, Nozawa, & Fendrich, 1994) for stimuli of roughly similar intensities (visual: 0.04 cd/m2; auditory: 46 dB); saccadic reaction times are known to yield increased violations of the race inequality (about 0.4; Harrington & Peck, 1998; Hughes et al., 1994). The redundancy gain for bimodal reaction times at threshold was accompanied by an improvement in performance; participants' miss rates to the target stimuli, false alarm rates on the catch trials, and discrimination times were significantly lower in the auditory–visual condition at threshold intensity compared to either visual- or auditory-alone signals (Figures 2A and 2B). The RT gains for moving stimuli are in agreement with studies reporting RSEs for stationary stimuli for manual response times (Diederich & Colonius, 2004; Miller, 1982, 1986; Savazzi & Marzi, 2008). 
For mid-intensity (12 dB above threshold) and high-intensity levels (24 dB above threshold), the stimuli were easily detectable (miss rates lower than 5%) and false alarm rates were low across all stimulus conditions (less than 5%); here no RSE was observed (Figure 2C). The race model was not violated at mid- and high-intensity levels, even after the “kill-the-twin” correction procedure had been applied, which is a less conservative test. While a failure to observe speeded responses at suprathreshold intensity levels could be viewed as a consequence of the inverse effectiveness principle (multisensory integration would be less likely when the unimodal signals were easily detectable), robust cross-modal interactions at suprathreshold levels have been observed previously for suprathreshold motion stimuli (Sekuler et al., 1997; Soto-Faraco et al., 2002). A more likely interpretation is that at these intensity levels the unimodal reaction times differed significantly from each other (by about 30 ms for mid-intensity and 44 ms for high-intensity stimuli; see Figure 2C), and consequently, the reaction times in the bimodal case were determined solely by the faster modality. In Experiment 2, we employed stimuli above threshold, but we now optimized the unimodal stimuli by matching unimodal reaction times for both unimodal stimuli for individual subjects. 
Experiment 2
In Experiment 2, the intensities of the visual and auditory stimuli were adjusted to produce roughly equal unimodal reaction times, effectively creating a large overlap between the RT distributions for the unimodal stimuli, making bimodal speeded responses more likely to occur (Hughes et al., 1994). The aim was to test whether under conditions of similar unimodal reaction times, the RSE observed at threshold in Experiment 1 could be generalized to suprathreshold stimuli. 
Methods
Participants
Twelve new participants took part in the experiment as part of their University of Liverpool course requirement, after giving written informed consent. The mean age of the participants was 20.4 (range: 18–22) and gender distribution was 5/7 (male/female). All participants reported normal hearing and normal or corrected-to-normal vision. 
Apparatus and stimuli
The motion stimuli used in Experiment 2 were identical to those in Experiment 1 except for the intensity levels, which were adjusted for each participant on the basis of a preliminary session to produce equivalent latencies for visual and auditory stimuli. The signal intensities used were as follows: the mean intensity (across all participants) of each LED was 8 cd/m2, and the mean amplitude of the auditory signals was 70 dB (A). 
Procedure
The procedure was similar to Experiment 1, except here instead of matching threshold levels, we explicitly matched response times in the A and V conditions for each participant separately in a preliminary session to identify stimulus intensities that produced roughly equal response latencies to visual and auditory moving stimuli. In the preliminary session, two blocks of stimuli were presented for both visual and auditory conditions; in the first block, target trials moved to the right and catch trials moved to the left, and in the second block, target trials moved to the left and catch trials moved to the right. Participants were required to press a button in response to target stimuli and avoid pressing the button for catch trials. Stimulus intensity varied randomly across trials. Visual and auditory stimulus intensities were selected for each participant that yielded similar response times and accuracy levels greater than 96%; these values were then used in the main experiment. 
The procedure for the experimental session was the same as for Experiment 1, except here the intensity level of the stimuli did not vary across trials. One hundred target trials were presented for each participant for each of the six stimulus conditions, and there were 10 catch trials per condition. 
Data analysis
Analysis was the same as described in Experiment 1
Results
Two-way repeated-measures ANOVAs with factors condition (V, A, AV) and direction of motion (left, right) found no differences between left and right directions of motion for miss rates, false alarms, and RTs (p > 0.3 in all cases); therefore, the following analyses were carried out on data pooled across motion direction. The mean response times for the direction-discrimination task for all three conditions (auditory, visual, and bimodal) are shown in Figure 4A. A one-way repeated-measures ANOVA revealed significant differences between AV, A, and V response times [F(2,22) = 28.12, p < 0.001]. A and V reaction times did not differ significantly from each other, but a significant RSE occurred [visual RT = 377.6 ms vs. auditory–visual RT = 347.9: t(11) = 5.175, p < 0.001; auditory RT = 383.9 vs. auditory–visual RT: t(11) = 6.863, p < 0.001]. The RSE was 30 ms, i.e., the bimodal reaction times were 30 ms shorter than the shortest unimodal reaction times. Discrimination times differed significantly between conditions [F(1,11) = 9.087, p = 0.001]; participants were on average faster at discriminating AV signals compared to visual [t(11) = 4.780, p = 0.001] and auditory signals [t(11) = 5.234, p < 0.001]. Overall discrimination time was 190 ms for visual and auditory stimuli and 170 ms for auditory–visual signals. Misses (failure to respond to a target stimulus) and false alarms were below 2% and were not analyzed further. 
Figure 4
 
(A) Reaction times for visual (V), auditory (A), and auditory–visual (AV) stimuli. (B) Cumulative distribution function (CDFs) for RTs are displayed for V (squares), A (triangles), AV (circles), and the CDF predicted by the race model (solid trace). (C) Miller inequality derived by subtracting the CDF predicted by the race model from the CDF of the auditory–visual condition at each bin.
Figure 4
 
(A) Reaction times for visual (V), auditory (A), and auditory–visual (AV) stimuli. (B) Cumulative distribution function (CDFs) for RTs are displayed for V (squares), A (triangles), AV (circles), and the CDF predicted by the race model (solid trace). (C) Miller inequality derived by subtracting the CDF predicted by the race model from the CDF of the auditory–visual condition at each bin.
CDFs were calculated for the V, A, and AV conditions (Figure 4B). Bimodal CDFs were then compared with the predictions based on the race model inequality and statistically significant violations of the race model were found at bins 7–9 (180–270 ms post-stimulus onset; Figure 4C) [bin 7: Z = 2.240, p = 0.025, r = 0.647; bin 8: Z = 2.803, p = 0.005, r = 0.809; bin 9: Z = 2.982, p = 0.003, r = 0.861]. The maximum value of the Miller inequality (Figure 4C) was 0.037, observed at bin 9 (240–270 ms). The race model was violated by 10 out of the 12 observers. 
Discussion
In Experiment 2, we measured reaction times for discriminating the direction of motion for suprathreshold auditory, visual, and auditory–visual moving targets; in contrast to Experiment 1, the suprathreshold unimodal targets were equated in terms of response times (in a preliminary session, stimulus intensities for the unimodal visual and auditory stimuli had been determined for each participant such that they resulted in roughly similar unimodal response times). Under these conditions, we observed a significant RSE, i.e., the responses to auditory–visual targets were faster than the responses to either of the unimodal targets (Figure 4A), and the same pattern was observed for the discrimination times. The size of the RSE in Experiment 2 (of 30 ms) is comparable with studies using stationary as opposed to moving stimuli (see, e.g., Gondan & Röder, 2006). 
Furthermore, the RSE could not be explained by statistical facilitation; analysis of cumulative distributions of the reaction times found that the RT gains under the bimodal condition were faster than the redundancy gains predicted by the race model (Figures 4B and 4C). The peak magnitude of the race inequality violation was roughly in line with previous reports (Hughes et al., 1994; Leo, Bertini, di Pellegrino, & Ladavas, 2008) for stationary stimuli of comparable intensities and unimodal reaction times. 
False alarms and misses were uniformly low across the visual, auditory, and auditory–visual conditions as the stimuli were well above threshold; this is in contrast to stimuli near threshold in Experiment 1 where unimodal false alarms and misses were significantly lower in the bimodal condition compared to the unimodal stimuli. 
Experiment 2 extended the findings of Experiment 1 by demonstrating that bimodal RT gains that go beyond statistical facilitation occur for well-above threshold horizontally moving auditory and visual targets, provided the unimodal reaction times are sufficiently similar. Experiments 1 and 2 used a type of motion stimulus paradigm where the presence of the stimulus occurred simultaneously with the appearance of stimulus motion. A different type of motion stimulus consists of the onset of motion of a previously stationary object (e.g., a stationary dog begins to move), and further experiments were run to assess whether bimodal RT gains would be observed using this second type of motion paradigm. 
Experiment 3A
Experiments 1 and 2 used a paradigm where the motion signals appeared abruptly, and the onset of the motion signal occurred simultaneously with the onset of the stimulus. The aim of Experiments 3A and 3B was to assess whether an RSE is observed between auditory and visual motion stimuli in a motion-onset paradigm, where motion stimuli are preceded by a stationary phase; this paradigm is frequently used in visual motion research using event-related potentials (for review, see, e.g., Kuba, Kubová, Kremlacek, & Langrova, 2007) and has recently also been reported for auditory stimuli (Krumbholz, Hewson-Stoate, & Schonwiesner, 2007). In the motion-onset paradigm, the influence of stationary cues in the combination of visual and auditory motion signals is minimized, due to the stationary period preceding the onset of motion. In Experiments 1 and 2, speeded response times could have resulted from the presence of an additional cue in the AV condition, regardless of whether it was moving, for instance by directing spatial attention to the location of the motion signal in the other modality. Our aim in Experiment 3A was to assess whether bimodal RT gains were present when the influence of any alerting effect due to simultaneous stimulus and motion onset was minimized. 
Methods
Participants
Twelve new participants took part in the experiment as part of their University of Liverpool course requirement, after giving written informed consent. The mean age of the participants was 21.1 (range: 18–25) and gender distribution was 6/6 (male/female). All participants reported normal hearing and normal or corrected-to-normal vision. 
Apparatus and stimuli
The apparatus was the same as reported for Experiment 1. The motion stimuli used in Experiment 3A were identical to those in Experiment 2 here, but the onset of motion was preceded by a stationary period that varied randomly between 1000 and 1200 ms (Figure 5). The modality of the stationary period was always the same as the modality of the motion stimulus (e.g., an auditory motion stimulus was always preceded by an auditory stationary stimulus). 
Figure 5
 
(A) Trajectories of the motion stimuli over space and time. The horizontal axis displays spatial eccentricity, and the vertical axis displays time (each unit Tn and Sn denotes a 50-ms duration). Auditory and/or visual stimuli (shaded squares) begin in the center at time “S1” and remain stationary until time “Sx” (which is between 1000 and 1200 ms from stimulus onset), then move by 6° either leftward or rightward at each subsequent 50-ms period (T1–T6). Motion is either to left or right, never simultaneously in both directions.
Figure 5
 
(A) Trajectories of the motion stimuli over space and time. The horizontal axis displays spatial eccentricity, and the vertical axis displays time (each unit Tn and Sn denotes a 50-ms duration). Auditory and/or visual stimuli (shaded squares) begin in the center at time “S1” and remain stationary until time “Sx” (which is between 1000 and 1200 ms from stimulus onset), then move by 6° either leftward or rightward at each subsequent 50-ms period (T1–T6). Motion is either to left or right, never simultaneously in both directions.
Procedure
The procedure was the same as for Experiment 2, except in the preliminary session two blocks of stimuli were presented in which the unimodal auditory (left/right) and visual (left/right) stimuli were presented in an equiprobable random order while stimulus intensity varied randomly across trials. Within each experimental block, 50% of signals moved to the left and 50% moved to the right, and subjects were asked to respond to the direction of motion by pressing the appropriate button on a response pad. Visual and auditory stimulus intensities were selected, which yielded similar response times and accuracy levels greater than 95%. 
In the experimental session, four blocks of stimuli were presented in which the stimulus conditions (auditory (left/right), visual (left/right), or auditory–visual (left/right)) were presented in an equiprobable random order. Within each experimental block, 50% of signals moved to the left, and 50% of signals moved to the right. The observers were instructed to respond to the direction of motion independent of modality (auditory, visual, or audiovisual) by pressing a button on the response pad: the left response button for leftward motion and the right response button for rightward motion. One hundred trials were presented to each participant for each of the six stimulus conditions. 
Data analysis
Data analysis for the RSE and race model was the same as described in Experiment 1. Discrimination times were calculated in the same way as described for Experiment 1, except the distribution of incorrect responses was used to determine the latency of discrimination of correct responses, instead of the distribution of catch trial errors in Experiment 1
Results
Two-way repeated-measures ANOVAs with factors condition (V, A, AV) and direction of motion (left, right) found no differences between left and right directions of motion for RTs, accuracy rates, and misses (p > 0.4 in all cases); therefore, the following analyses were carried out on data pooled across motion direction. Mean response times for all three conditions (auditory, visual, and bimodal) are shown in Figure 6A (left panel). A one-way repeated-measures ANOVA revealed significant differences between AV, A-alone, and V-alone conditions for response times [F(2,22) = 14.092, p < 0.001]. As expected, there was no significant difference between the auditory and visual response times, but the bimodal response times were significantly shorter than the auditory [mean bimodal RT = 341.4 ms; mean auditory RT = 385.8 ms; t(11) = 7.738, p < 0.001] and visual response times [mean visual RT = 373.5 ms; t(11) = 3.78, p = 0.003]; the redundancy gain was 32 ms. Discrimination times differed between conditions [F(1,11) = 21.000, p < 0.001]. It was found that AV stimuli were correctly discriminated faster than visual [t(11) = 5.046, p < 0.001] and auditory stimuli [t(11) = 5.334, p < 0.001]. Overall discrimination time was 190 ms for visual cues, 190 ms for auditory cues, and 170 ms for bimodal signals. The accuracy for all conditions was above 97% and there were no significant differences in accuracy between modalities. Misses were less than 2% and were not analyzed further. CDFs were calculated for the visual, auditory, and auditory–visual conditions (Figure 6B). Bimodal CDFs were then compared with the predictions based on the race model inequality, and Figure 6D (trace with squares) shows the deviation from the race model. We found statistically significant violations of the race model at bins 6 to 9 (150–270 ms after motion onset) for motion-onset detection [bin 6: Z = 2.201, p = 0.028, r = 0.635; bin 7: Z = 2.521, p = 0.012, r = 0.728; bin 8: Z = 2.936, p = 0.003, r = 0.848; bin 9: Z = 2.847, p = 0.004, r = 0.822]. The maximum value of the Miller inequality was 0.053, observed at bin 9 (240–270 ms). The race model was violated by 11 out of the 12 observers. 
Figure 6
 
(A) Mean (N = 12) reaction times for V, A, and AV conditions, for direction-discrimination RT task (left) and motion-onset task (right). (B) CDFs for the RTs in the V (squares), A (triangles), and AV (circles) conditions, and the CDF predicted by the race model (solid trace), for Experiment 3A. (C) CDFs for the RTs for each stimulus condition and for the race model prediction in the motion-onset task (Experiment 3B). (D) Miller inequality for Experiment 3A (squares) and Experiment 3B (triangles).
Figure 6
 
(A) Mean (N = 12) reaction times for V, A, and AV conditions, for direction-discrimination RT task (left) and motion-onset task (right). (B) CDFs for the RTs in the V (squares), A (triangles), and AV (circles) conditions, and the CDF predicted by the race model (solid trace), for Experiment 3A. (C) CDFs for the RTs for each stimulus condition and for the race model prediction in the motion-onset task (Experiment 3B). (D) Miller inequality for Experiment 3A (squares) and Experiment 3B (triangles).
Discussion
In Experiment 3A, we found a redundancy gain of 32 ms, which was greater than predicted by statistical facilitation; the peak magnitude of the violation of the race model was about 0.05, which is similar to the violation observed in Experiment 2 for abrupt-onset stimuli. Moreover, in this experiment, we minimized the influence of the energy onset of the stimulus on the RSE by preceding the motion onset with a stationary cue, showing that the RSE was present when the motion onset and the stimulus onset were separated, although the influence of cue features other than motion cannot be ruled out completely in an explanation of the speeded responses found in the bimodal condition. 
Experiment 3B
Many ecologically important functions rely on the detection of motion regardless of direction, for example to detect a predator, invisible while stationary, but visible when beginning to move; in such cases, the direction of object motion may be unimportant. Experiment 3B aims to investigate whether bimodal reaction time benefits are observed for the detection of the onset of motion, regardless of direction, by using a simple reaction time task where the participant is required to press a response button as soon as a stationary stimulus (either visual, auditory, or auditory–visual) begins to move either to the right or to the left. This non-direction-selective task in Experiment 3B will be defined here as the “motion-onset detection” task to distinguish it from the direction-selective tasks in Experiments 13A
Methods
Participants
Twelve new participants took part in the experiment as part of their University of Liverpool course requirement, after giving written informed consent. The mean age of the participants was 21.9 (range: 18–25) and gender distribution was 5/7 (male/female). All participants reported normal hearing and normal or corrected-to-normal vision. 
Apparatus and stimuli
The motion stimuli used in Experiment 3B were identical to those in Experiment 3A
Procedure
The procedure was the same as Experiment 3A, except here participants were instructed to press a response button as soon as possible following detection of the onset of motion, regardless of the direction of motion of the stimulus (“motion-onset detection”). Catch trials (p = 0.1, i.e., 10 trials per condition) containing no moving stimulus at the end of the stationary phase were included to prevent participants from responding to the offset of the stationary stimulus. 
Data analysis
Data analyses for the RSE, discrimination times, and race model were the same as described in Experiment 1
Results
Two-way repeated-measures ANOVAs with factors condition (V, A, AV) and direction of motion (left, right) found no differences between left and right directions of motion for RTs, false alarms, and misses (p > 0.5 in all cases); therefore, the following analyses were carried out on data pooled across motion direction. Mean response times for all three conditions (auditory, visual, and bimodal) are shown in Figure 6A (right panel). A one-way repeated-measures ANOVA revealed significant differences between AV, A-alone, and V-alone conditions for response times [F(2,22) = 6.703, p < 0.01]. As expected, there was no significant difference between the A and V response times, but the AV response times were significantly shorter than the A [mean bimodal RT = 292.7 ms; mean auditory RT = 343.0 ms; t(11) = 8.115, p < 0.001] and V response times [mean visual RT = 331.6 ms; t(11) = 9.848, p < 0.001]—the redundancy gain was 39 ms. Discrimination times for motion were significantly different between conditions [F(2,22) = 25.857, p < 0.001]; faster discrimination times were observed for bimodal stimuli compared to visual [t(11) = 6.775, p < 0.001] and auditory stimuli [t(11) = 5.745, p < 0.001]. Overall DT was 170 ms for visual and auditory cues and 150 ms for auditory–visual stimuli. False alarm rates were less than 2% indicating that overall participants were responding to the onset of motion rather than to the offset of the stationary stimulus. Miss rates were less than 2% and were not analyzed further. CDFs were calculated for the visual, auditory, and auditory–visual conditions (Figure 6D). Bimodal CDFs were then compared with the predictions based on the race model inequality, and Figure 6D (trace with triangles) shows the deviation from the race model. We found statistically significant violations of the race model at bins 8 and 9 (210–270 ms post-motion onset) [bin 8: Z = 2.629, p = 0.009, r = 0.759; bin 9: Z = 2.824, p = 0.005, r = 0.815]; the largest value of the Miller inequality was 0.066, observed at bin 8. The race model was violated by 11 out of the 12 observers. 
Discussion
In Experiment 3B, which tested for the presence of an RSE during the motion-onset detection task, we found a redundancy gain of around 40 ms; this reaction time facilitation was greater than expected from statistical facilitation (Figure 6D). Mean RTs to visual, auditory, and auditory–visual stimuli were around 40 ms faster than the mean RTs in Experiment 3A as the task in Experiment 3B was a simple detection task requiring response to the onset of motion regardless of motion direction. Discrimination times for motion were around 20 ms faster in the bimodal condition compared to the fastest unimodal condition. The peak violation occurred in an earlier time bin (bin 8) compared to Experiments 2 and 3A (bin 9). The earlier peak of the Miller inequality may be related to the faster mean reaction times, which were likely a result of less extensive processing of the inputs being necessary to detect the onset of motion compared to the discrimination of direction of motion. 
While Experiments 3A and 3B minimized the influence of cues other than motion as a contribution to the RSE (by including a stationary fore-period, which dissociated the signal onset and the onset of motion), we cannot exclude the possibility that motion in both modalities was not necessary for the speeded bimodal responses; Experiment 4 directly addresses this concern. 
Experiment 4
Faster response times under bimodal motion conditions in Experiments 13B may have been based on the addition of a cue in a separate modality at the same spatial location, regardless of whether the additional cue was moving or stationary. In this account, speeded bimodal responses could be explained by integration by spatial location, instead of integration by motion, as the dynamic feature of the second stimulus would be irrelevant. Indeed, previous studies have found that a stationary visual stimulus can modulate the perception of auditory apparent motion (Allen & Kolers, 1981), and that a stationary auditory cue can modulate the perception of visual motion (Sekuler et al., 1997). Speeded responses to bimodal motion signals could then be accounted for by one signal acting in a general alerting role (cf., “preparation enhancement” (Nickerson, 1973) or, more recently, the “warning effect” (Diederich & Colonius, 2008)). Extending this explanation to the current paradigm, it could be argued that speeded responses in Experiments 1 to 3B could have resulted from the combination of a stationary signal acting as an alerting cue to the moving signal, rather than from the genuine integration of motion. 
Experiment 4 directly addresses the question of whether the RSE for auditory–visual motion signals is due to the presence of motion in both visual and auditory modalities. Several lines of research suggest that behavioral benefits for motion stimuli presented to the visual and auditory modalities may go beyond effects related to a combination of a co-localized stationary signal and a moving cue. For instance, it has been shown that motion in both cues is a necessary condition for the dynamic visual capture effect (Soto-Faraco et al., 2002); dynamic visual capture failed when the signals were perceived as stationary rather than moving. Moreover, it has been reported that auditory–visual motion cues in the depth plane produce speeded responses compared to the combination of a motion signal in one channel and a stationary signal in a second channel (Cappe et al., 2009), while Brooks et al. (2007) found increased detection of visual biological motion cues in the presence of suprathreshold auditory motion, compared to a stationary control condition. Despite these findings it remains unclear whether speeded responses for bimodal linear motion are based on integration of motion across the visual and auditory channels. 
Experiment 4 includes bimodal combinations of stationary and moving cues (either “visual stationary/auditory moving” (VsAm), or “visual moving/auditory stationary” (VmAs)) to specifically test whether the addition of a stationary cue to a motion cue (stationary/moving combinations are henceforth termed “incongruent”) would produce similar RT gains compared to motion in both visual and auditory modalities traveling in the same direction (“bimodal congruent”). A similar design using stationary and motion cues was used by Cappe et al. (2009) to investigate the contribution of motion cues in both modalities to the perception of motion in depth. Crucially, in the current study the stationary signals span the same horizontal range as the motion signals, therefore constituting spatially co-localized, synchronous alerting cues for the duration of the motion stimuli. If congruent bimodal motion cues and incongruent combinations produce similar RT gains compared to unimodal equivalents, this would lend support to the response preparation model of the redundant signal effect (although of course the congruent and incongruent signals could still be integrated by different mechanisms at a neural level but produce similar RTs). If RT gains for congruent stimuli were larger than RT gains for incongruent signals, this would provide evidence that the integration mechanisms responsible for the RSE for auditory–visual motion cues depend on the presence of motion in both modalities, i.e., on spatiotemporal matching between the inputs, rather than purely spatial matching between the inputs. 
Finally, Experiment 4 also tested whether the integration of auditory and visual signals is direction selective, i.e., whether auditory–visual motion signals that begin at the same spatial location but travel in opposite horizontal directions (termed “bimodal opposite”) would produce faster response times in bimodal conditions compared to unimodal equivalents. 
Methods
Participants
Ten new participants took part in the experiment as part of their University of Liverpool course requirement, after giving written informed consent. The mean age of the participants was 23.2 (range: 20–41) and gender distribution was 4/6 (male/female). All participants reported normal hearing and normal or corrected-to-normal vision. 
Apparatus and stimuli
Moving and stationary stimuli were presented on the apparatus as described in Experiment 1. The motion stimuli used in Experiment 4 were identical to those in Experiments 1 and 2. For stationary stimuli, a series of 88 very short (3.98 ms) bursts was presented at randomly varying LED/loudspeaker positions between the central LED/loudspeaker and the 7 LEDs/loudspeakers to the right of fixation for right hemifield stationary stimuli (and vice versa for left hemifield stationary conditions). This created the impression of a stationary stimulus, as a 3.98-ms duration was too short to create the impression of motion between LEDs/loudspeakers (Soto-Faraco, Spence, & Kingstone, 2005; Strybel & Neale, 1994). These parameters ensured that the horizontal extent of the stationary and moving signals was equated across the 350-ms duration of the stimulus. The stimulus intensities used were given as follows: the mean intensity (across all participants) of each LED was 9 cd/m2, and the mean amplitude of the auditory signals was 70 dB (A). 
Procedure
The procedure for determining the intensity levels of the stimuli was similar as that described for Experiment 2. In the preliminary session, one block of stimuli was presented for visual and auditory conditions. Participants were required to press a button in response to target stimuli (motion) and avoid pressing the button for catch trials (stationary). Stimulus intensity varied randomly across trials. Visual and auditory stimulus intensities were selected for each participant, which yielded similar response times and accuracy levels greater than 90%; these values were then used in the main experiment. In the main experiment, ten blocks of stimuli were presented where stimuli consisted of either (a) moving visual, auditory, or auditory–visual stimuli, which could move either to the left or to the right of fixation, (b) stationary visual, auditory, or auditory–visual stimuli, which could be presented in either the left or the right visual field (catch trials), (c) a bimodal combination of stationary and moving stimuli (bimodal incongruent), or (d) a moving stimulus to the left (right) and a moving stimulus in the opposite direction (bimodal opposite direction). Bimodal opposite cues began in the same central location for both visual and auditory modalities. 
The combination of visual, auditory, and bimodal stimuli, which could either move or remain stationary and be presented in the left or right hemifield, yielded eighteen experimental conditions (Table 2). The experimental task was to press the response button to signals that contained any movement in either modality but to refrain from pressing the button on “catch trials” (p = 0.1) that contained no motion. Each participant was presented with 100 trials per target condition and 20 trials per catch condition. 
Table 2
 
Experimental conditions in Experiment 4. Note: s, stationary; m, motion; l, left; r, right.
Table 2
 
Experimental conditions in Experiment 4. Note: s, stationary; m, motion; l, left; r, right.
Target (response required) Catch (no response required)
V A V A
Unimodal visual ml sl
mr sr
Unimodal auditory ml sl
mr sr
Bimodal congruent ml ml sl sl
mr mr sr sr
Bimodal Incongruent ml sl
sl ml
mr sr
sr mr
Bimodal opposite ml mr
mr ml
Data analysis
Reaction times and accuracy were first pooled across motion direction to create the following conditions: As (auditory stationary), Am (auditory motion), Vs (visual stationary), Vm (visual motion), VmAm (bimodal congruent motion), VmAs and VsAm (bimodal incongruent), and VmAm_Opp (bimodal opposite). Analysis for mean RTs, discrimination times, and the race model were the same as described for Experiment 1
Results
Two-way repeated-measures ANOVAs with factors condition (V, A, AV) and direction of motion (left, right) found no differences between left and right directions of motion for RTs, accuracy rates for targets, and false alarms (p > 0.25 in all cases); therefore, the following analyses were carried out on data pooled across motion direction. 
Mean accuracy rates are displayed in Table 3. A repeated-measures ANOVA found significant differences between conditions [F(5,45) = 7.221, p < 0.001]. Post-hoc t-tests found that unimodal visual stimuli were responded to more accurately than unimodal auditory stimuli [t(9) = 2.555, p = 0.031]. Bimodal congruent motion responses were more accurate than bimodal incongruent responses (VmAm versus VmAs: [t(9) = 4.876, p = 0.001] and VmAm versus VsAm: [t(9) = 5.749, p < 0.001]). There was no difference in accuracy rates between bimodal motion signals traveling in the same direction and bimodal motion signals in the opposite direction. 
Table 3
 
Mean reaction time (RT), mean discrimination time, and mean accuracy per target condition.
Table 3
 
Mean reaction time (RT), mean discrimination time, and mean accuracy per target condition.
Mean RT (ms) Overall discrimination time (ms) Mean accuracy (%)
Vm 570.13 390 97.50
Am 583.28 410 92.83
VmAs 581.40 410 93.75
VsAm 601.15 410 92.30
VmAm 529.91 370 98.17
VmAm_Opp 559.69 390 97.34
Mean RTs for target conditions are displayed in Table 3; a one-way ANOVA found significant RT differences between stimulus conditions [F(5,45) = 10.392, p < 0.001]. Paired t-tests were then used to examine whether there were significant RSEs for congruent and incongruent bimodal conditions. An RSE (40 ms) was revealed for congruent bimodal signals (VmAm versus Vm: [t(9) = 4.893, p = 0.001]). There was no significant difference between Vm and VmAs (p > 0.3) and no significant difference between VsAm and Vm (p > 0.1). Crucially, responses to the bimodal congruent condition were faster than responses to incongruent conditions (VmAm versus VmAs: [t(9) = 5.584, p < 0.001]; VmAm versus VsAm: [t(9) = 4.414, p = 0.002]). There was a small but significant RSE (10 ms) for congruent bimodal signals, which moved in the opposite direction (VmAm_Opp versus Vm: [t(9) = 2.581, p = 0.030]). Responses to bimodal motion in the same direction were faster than the responses to bimodal motion in opposite directions [t(9) = 4.693, p = 0.001]. Discrimination times (Table 3) for motion differed significantly between conditions [F(5,45) = 6.557, p < 0.001]. DT was faster for VmAm compared to visual alone [t(9) = 6.692, p < 0.001], bimodal motion in the same direction was faster than bimodal motion in opposite directions [t(9) = 3.582, p = 0.006], and DTs for bimodal motion in the same direction were faster than incongruent bimodal cues (VmAm versus VmAs: [t(9) = 4.297, p = 0.002]; VmAm versus VsAm: [t(9) = 4.311, p = 0.002]). Mean false alarm rates (responses to catch trials) are displayed in Table 4, and a repeated-measures ANOVA found no significant difference between conditions (p > 0.2). 
Table 4
 
Mean false alarms and standard errors (SEs).
Table 4
 
Mean false alarms and standard errors (SEs).
False alarms
Mean (%) SE
Vs 12.43 4.98
As 11.75 6.83
VsAs 10.47 5.46
The CDFs of the sum of the respective unimodal RTs were compared to the bimodal CDF for same direction motion signals (Figure 7). Race model violations were observed at bins 13–16 (360–480 ms): bin13 [Z = 2.310, p = 0.021, r = 0.731]; bin 14 [Z = 2.803, p = 0.005, r = 0.886]; bin 15 [Z = 2.708, p = 0.007, r = 0.855]; bin 16 [Z = 2.599, p = 0.009, r = 0.822]. The race model was violated for bimodal congruent motion cues by 8 out of 10 observers. The CDFs of the sum of the respective unimodal RTs were then compared to the bimodal opposite direction RTs, and there were no statistically significant violations of the race model at any time bin. In order to rule out potential biasing from fast guesses, the “kill-the-twin” correction procedure was applied (see Data analysis section in Experiment 1 for full details), and the race model remained accepted at all time bins for bimodal motion signals moving in opposite directions. 
Figure 7
 
(A) CDFs for the RTs in the V (squares), A (triangles), and AV same direction (circles), and AV opposite direction (stars) conditions, and the CDF predicted by the race model (solid trace). (B) Miller inequality for bimodal motion in the same direction (triangles) and bimodal motion in the opposite direction (squares).
Figure 7
 
(A) CDFs for the RTs in the V (squares), A (triangles), and AV same direction (circles), and AV opposite direction (stars) conditions, and the CDF predicted by the race model (solid trace). (B) Miller inequality for bimodal motion in the same direction (triangles) and bimodal motion in the opposite direction (squares).
Discussion
Easily detectable moving congruent bimodal targets were responded to faster and more accurately than their unimodal equivalents in a moving/stationary discrimination task. Speeded responses to bimodal congruent motion signals violated the race model inequality during the early part of the response time distribution (from 360 to 480 ms), signaling that the reduction in RTs was consistent with a co-activation rather than statistical facilitation account. 
To test whether motion in both modalities was necessary for speeded responses, Experiment 4 included conditions where a motion signal in one modality (either visual or auditory) was paired with a stationary cue in the second modality (either visual or auditory). The stationary signals spanned the same amount of visual angle as the motion signals, but their spatiotemporal pattern created a stationary signal that cued spatial attention to the location of the motion signal in the second modality. No RSE was found for these incongruent combinations, and importantly, it was found that responses were slower for incongruent combinations than for congruent pairs (i.e., motion in both modalities). Responses were also more accurate in the bimodal congruent condition compared to the bimodal incongruent conditions. A small RSE of 10 ms was observed for bimodal signals that began at the same central location but traveled in opposite directions; this RSE could be explained by a statistical facilitation model. Responses were faster for bimodal signals moving in the same direction, compared to bimodal signals starting in the same location but moving in opposite directions, suggesting that optimal bimodal behavioral improvements require motion traveling in the same direction, at least for signals that begin at the same location. Interestingly, accuracy for the bimodal signals moving in opposite directions was as high as for bimodal signals moving in the same direction. 
Response times in Experiment 4 were on average slower than for Experiments 1 to 3B, most likely as the stationary/moving discrimination task was more difficult than a direction-discrimination task, and false alarm rates for stationary catch trials did not differ across conditions. 
In summary, these findings provide evidence that motion signals are necessary in both modalities to explain the redundancy gains observed in the bimodal congruent condition, i.e., that a signal in one modality does not simply act as an alerting stimulus to speed the detection of motion in the second channel, as would be expected in the response preparation account (Nickerson, 1973) or the warning effect account (Diederich & Colonius, 2008). 
General discussion
The experiments reported here explored the conditions under which faster responses occurred for horizontal auditory–visual apparent motion targets compared to unimodal signals. Auditory, visual, and auditory–visual manual response times were assessed for different stimulus intensities (threshold and suprathreshold), different motion paradigms (abrupt onset and motion onset), and different tasks (direction discrimination and onset of motion regardless of direction). We found that an RSE was elicited by auditory–visual horizontally moving stimuli and the size of the RSE was greater than predicted by statistical facilitation for both abrupt-onset and motion-onset detections and for both a direction-discrimination task and for a non-direction-selective task. While evidence for bimodal speeded responses has been observed for looming and receding motion cues (Cappe et al., 2009), these results show bimodal RT gains for horizontally moving auditory and visual cues. We found evidence for co-activation, that is, a violation of the race model inequality, for threshold and suprathreshold intensity levels; the latter effects, however, were only revealed when the unimodal targets were roughly equated in terms of their reaction times. Crucially, we found that motion was necessary in the visual and auditory modalities for speeded responses that went beyond statistical facilitation; the pairing of a motion signal in one modality with a stationary signal in the other modality did not yield faster responses than unimodal baselines. 
For threshold stimuli, faster response times were observed for auditory–visual stimuli than for unimodal stimuli and the degree of response time facilitation was greater than that predicted by statistical combination (see Figure 3D). In addition to faster response times, the direction of bimodal motion at the near-threshold level in Experiment 1 was detected more accurately than the unimodal equivalents; this improved performance in motion direction detection is in agreement with results by Meyer et al. (2005) who—using the same experimental setup and comparable stimulus parameters—found a substantial improvement in direction-discrimination performance for bimodal compared to unimodal moving stimuli. 
At stimulus contrasts significantly above direction-discrimination threshold in Experiment 1, no RSE was observed. However, for the high- and mid-intensity levels, the unimodal reaction times differed from each other significantly (by 31 ms and 44 ms, respectively). A failure to observe a significant RSE could have been due to a temporal integration window of a putative summation mechanism, which is much smaller than the differences between the unimodal reaction times. Indeed, the importance of central simultaneity of the auditory and visual information to maximize co-activation has been pointed out by Miller (1986) and prior to that by Hershenson (1962) for manual response times. The need for strictly synchronized auditory and visual targets seems to be at odds with the long integration window of neurons in the superior colliculus (SC), which is often greater than 500 ms (Meredith, Nemitz, & Stein, 1987). This long integration window in SC neurons may explain the robust RSEs and race model violations observed for saccadic reaction times even in cases when the unimodal reaction times differ by as much as 40 ms (Hughes et al., 1994). It is still an open question whether the RSE observed for manual response times originates in the SC or in cortical areas, or in both. Recent evidence using s-cone isolating stimuli suggests that the site of co-activation for spatial orienting responses is the SC (Leo et al., 2008; Maravita, Bolognini, Bricolo, Marzi, & Savazzi, 2008), since race model violations were only observed when the visual signal provide input to the SC. Imaging and electrophysiological measurements suggest that in addition to the SC, both primary and secondary sensory cortices (Gondan, Niederhaus, Rosler, & Röder, 2005; Martuzzi et al., 2007) and polymodal cortical areas (for review, see Calvert & Thesen, 2004) may contribute to the RSE for manual response times. 
It is worth noting that Leo et al. (2008) obtained a significant RSE for manual response times and evidence for co-activation even though the unimodal reaction times were very similar to ones we collected (around 300–400 ms) and the difference between the unimodal RTs was even larger (about 50 ms). The major difference between their study and our experiment was that we used moving stimuli while Leo et al. asked observers to respond to the onset of a stationary target. Hughes et al. (1994; Experiment 2) set out to explicitly investigate the importance of matching auditory–visual reaction times on the RSE by manipulating the intensity of the visual and the auditory targets. The intensity differences (12 cd/m2 for visual and 30 dB for auditory targets) yielded reaction time differences of about 40 ms. Since both reaction time differences and stimulus intensities are comparable with the settings in our experiment, we can attempt to draw some direct comparisons. The most interesting result in Hughes et al.'s experiment was that violations of the race model inequality were most pronounced for saccadic eye movements and were present even for non-matched unimodal stimuli; SOAs of up to 40 ms produced robust co-activation effects for saccades. The race model violations were much smaller for simple manual response times for the onset detection of stationary targets but still significant in 2 of the 3 observers. These findings together with our current results suggest that the temporal integration window for auditory–visual moving targets (at least in the linear plane) is much smaller than for detecting the onset of stationary targets: for moving targets, we find no evidence for co-activation for unimodal reaction time differences as small as 30 ms. Clearly, further research is necessary to elucidate why the temporal integration window for dynamic cues appears smaller than for stationary cues. 
We therefore matched in Experiments 24 the unimodal reaction times by adjusting the intensity of the unimodal stimuli (Hughes et al., 1994; Jaskowski & Sobieralska, 2004). Under these optimized conditions, we observed an RSE for suprathreshold stimuli; the redundancy gains were 30 ms for Experiment 2, 32 ms for Experiment 3A, 39 ms for Experiment 3B, and 40 ms for Experiment 4. These redundancies for suprathreshold cues were all smaller than the 49-ms redundancy gain reported for threshold stimuli in Experiment 1; this is in line with the principle of inverse effectiveness, where unimodal cues that are weakest produce the largest response enhancements when presented bimodally (Stein et al., 1988). 
The redundancy gains observed under these optimized stimulus conditions (matched unimodal RTs) were all larger than predicted by statistical facilitation (see Figures 4C, 6D, and 7B), indicating a convergence of auditory and visual information prior to the response criterion in our manual response tasks. Reaction time gains inconsistent with statistical facilitation have been observed for suprathreshold stationary signals (Diederich & Colonius, 2004; Gondan et al., 2005; Miller, 1982; Molholm et al., 2002; Savazzi & Marzi, 2008) and suprathreshold motion signals in the depth plane (Cappe et al., 2009); the current results expand on these findings to include dynamic stimuli in the horizontal plane. 
It is important to remark that overall correct discrimination times for auditory–visual stimuli occurred either earlier or simultaneously (±10 ms) with the earliest point of violation of the race model for all the experiments reported here, except for Experiment 3A. In Experiment 3A, the first violation of the race model occurred at 150 ms, but the overall discrimination time was 170 ms; a race model violation at a latency before correct discrimination could indicate an erroneous rejection of the race model. While fast guesses or anticipatory responses may explain the race model violation prior to correct discrimination in Experiment 3A, these are generally more likely to lead to erroneous acceptance of the race model (Gondan & Heckel, 2008). An alternative explanation, recently proposed by Otto and Mamassian (2010), is that the distribution in the bimodal condition may contain greater variance due to noise than the unimodal conditions, which could result in spurious rejection of the race model. While it is unclear why this should happen in only one out of the five experiments reported here, future studies using discrimination tasks could usefully check for race model violations before correct discrimination, to rule out potential erroneous violations of the race model. 
Experiment 4 addressed the issue of whether the faster bimodal responses are a result of integration of visual and auditory motion information. It was found that speeded responses to bimodal motion signals compared to unimodal equivalents cannot be explained by the presence of a stationary stimulus in one modality paired with motion in the other modality. The redundant signal effect appeared to depend on complementary information in both modalities rather than simply bimodal stimulation, ruling out the explanation that a cue in one modality acted as an alerting cue, or that the signal triggered spatial attention to the location of the motion cue in the other modality. While previous research has mainly focused on how motion cues in one modality modulate the perception of motion in a second modality (for reviews, see Soto-Faraco et al., 2003; Soto-Faraco, Spence et al., 2004), Experiment 4 shows that the integration of easily detectable visual and auditory linear motion cues resulted in faster behavioral performance, which represents a new and important finding. This finding also expands on previous results showing behavioral benefits of integration of visual and auditory horizontal motion cues for signals at threshold level (Meyer et al., 2005) by showing that such benefits are also observed for suprathreshold cues. 
Previously obtained results have found that a stationary cue in one modality can influence the perception of motion in another modality (Sanabria et al., 2004; Sekuler et al., 1997), but here Experiment 4 found that a stationary cue did not lead to a facilitation of response times. One explanation for this lack of faster responses for incongruent cues is that the “assumption of unity” (Welch & Warren, 1980) may have been violated because the information in the visual and auditory modalities was conflicting; therefore, the observer may have perceived the signals as separate events rather than a single fused signal, and consequently, no bimodal gains were observed. 
Auditory–visual response gains that went beyond predictions by statistical facilitation were observed using motion stimuli that were spatially co-localized. Experiment 4 found that AV motion cues beginning at the same central location must travel in the same direction to be effectively integrated; RT gains with AV cues that started in the center and moved in opposite directions were small (around 10 ms) and were consistent with statistical facilitation, rather than with neural integration of visual and auditory motions. This finding is in agreement with the results of Meyer et al. (2005) who found optimal integration only for signals moving in the same direction. While it could be argued that AV motion signals traveling in opposite directions but presented in the same hemifield may be combined, previous research has found decreased performance for opposite direction cues in the same hemifield compared to signals presented in the same hemisphere but traveling in the same direction (Meyer et al., 2005), suggesting that dynamic cues must travel in the same direction to be effectively integrated. 
The current results are consistent with studies on the dynamic capture effect, which have isolated the contribution of dynamic cues in the interactions between visual and auditory apparent motion cues (Soto-Faraco, Kingstone, & Spence, 2004; Soto-Faraco et al., 2002; Soto-Faraco et al., 2005). Studies using biological motion cues (Brooks et al., 2007) and motion in depth (Cappe et al., 2009) have also found performance benefits from bimodal motion integration, suggesting that integration of visual and auditory dynamic cues may be a common feature of motion processing. While integration of horizontal and depth cues likely involve at least partly different neural mechanisms, imaging studies have begun to identify a distributed network of cortical regions including the higher order posterior parietal, prefrontal, and premotor cortices (Baumann & Greenlee, 2007; Bremmer et al., 2001; Lewis, Beauchamp, & DeYoe, 2000), as well as lower order sensory cortices (Alink, Singer, & Muckli, 2009; Scheef et al., 2009), which are thought to be involved in the integration of auditory–visual moving signals. A recent event-related potential study investigated the stage of processing at which visual and auditory motion cues are integrated and found evidence that dynamic cues are integrated at an early sensory stage of processing (Stekelenburg & Vroomen, 2009). The bimodal speeded responses in the current study are fully compatible with a low-level integration account (Meyer et al., 2005), although it cannot be ruled out that integration at later processing stages (i.e., decisional or motor) may have contributed to the faster auditory–visual responses. Indeed, there is ongoing debate about the level of processing at which integration occurs in the redundant signal effect (Savazzi & Marzi, 2008), and future research will be needed to investigate the locus of processing of the integration of auditory–visual motion signals in the redundant signal effect. 
The apparent motion stimuli in the current experiments were generated using a series of point sources that provide all the available cues for auditory stimulus localization (interaural timing and intensity differences and monaural spectral cues). These stimulus factors appear to be a crucial determinant for improved bimodal performance of motion detection in the horizontal plane (Meyer et al., 2005), as no performance benefits were found when a visual cue consisted of a random-dot kinematogram and the auditory motion signal was generated by varying only interaural timing differences (Alais & Burr, 2004) or intensity differences (Meyer & Wuerger, 2001; Wuerger, Hofbauer, & Meyer, 2003). Interestingly, bimodal facilitation was observed for biological motion when the auditory motion signals consisted of interaural timing differences and the visual and auditory signals were not strictly spatially matched (Brooks et al., 2007), suggesting that spatial co-localization and the use of point-source auditory cues may be a necessary precondition for bimodal response enhancement with more abstract auditory–visual signals (such as those used in the current study), whereas these factors may be less crucial for more ecologically valid motion signals, such as point-light walkers. 
Conclusion
In summary, the current experiments have established faster manual reaction times for horizontally moving auditory–visual apparent motion cues compared to unimodal visual and auditory baselines. Crucially, these bimodal gains could not be explained by the presence of a spatially co-localized stationary cue in one modality paired with a motion cue in the other modality, strongly suggesting that speeded response times were the result of integration of dynamic information. Performance enhancements were present in tasks involving direction-discrimination, motion/stationary discrimination, and motion-onset detection tasks. Redundancy gains were found in bimodal conditions both for stimuli where the appearance of motion coincided with the appearance of the stimulus and where the motion phase was preceded by a stationary fore-period, and the redundancy gains could not be explained by assuming separate processing of visual and auditory signals. The results raise questions for future research on the stage of processing that the AV motion signals are combined and highlight the importance of equating the salience of the visual and auditory motion cues at suprathreshold levels in order to prevent one cue from dominating the response. Together, these results show that the perceptual system combines information available in the visual and auditory channels to react quickly to the detection and direction of motion in the horizontal plane. 
Acknowledgments
The authors would like to thank the anonymous reviewers for their helpful comments and suggestions on an earlier version of this manuscript. 
Commercial relationships: none. 
Corresponding author: Neil R. Harrison. 
Email: harrisn@hope.ac.uk. 
Address: Department of Psychology, Liverpool Hope University, Hope Park, Liverpool, UK. 
References
Alais D. Burr D. (2004). No direction-specific bimodal facilitation for audiovisual motion detection. Cognitive Brain Research, 19, 185–194. [CrossRef] [PubMed]
Alink A. Singer W. Muckli L. (2008). Capture of auditory motion by vision is represented by an activation shift from auditory to visual motion cortex. Journal of Neuroscience, 28, 2690–2697. [CrossRef] [PubMed]
Allen P. G. Kolers P. A. (1981). Sensory specificity of apparent motion. Journal of Experimental Psychology: Human Perception and Performance, 7, 1318–1326. [CrossRef] [PubMed]
Baumann O. Greenlee M. W. (2007). Neural correlates of coherent audiovisual motion perception. Cerebral Cortex, 17, 1433–1443. [CrossRef] [PubMed]
Beer A. Röder B. (2004). Attention to motion enhances processing of both visual and auditory stimuli: An event-related potential study. Cognition, Affective, and Behavioural Neuroscience, 18, 205–225.
Bremmer F. Schlack A. Shah N. J. Zafiris O. Kubischik M. Hoffmann K. P. et al. (2001). Polymodal motion processing in posterior parietal and premotor cortex: A human fMRI study strongly implies equivalences between humans and monkeys. Neuron, 29, 287–296. [CrossRef] [PubMed]
Brooks A. van der Zwan R. Billard A. Petreska B. Clarke S. Blanke O. (2007). Auditory motion affects visual biological motion processing. Neuropsychologia, 45, 523–530. [CrossRef] [PubMed]
Calvert G. A. Thesen T. (2004). Multisensory integration: Methodological approaches and emerging principles in the human brain. The Journal of Physiology, 98, 191–205.
Cappe C. Thut G. Romei V. Murray M. M. (2009). Selective integration of auditory–visual looming cues by humans. Neuropsychologia, 47, 1045–1052. [CrossRef] [PubMed]
Colonius H. (1990). Possibly dependent probability summation of reaction times. Journal of Mathematical Psychology, 34, 253–255. [CrossRef]
Diederich A. Colonius H. (1987). Intersensory facilitation in the motor component? Psychological Research, 49, 23–29. [CrossRef]
Diederich A. Colonius H. (2004). Bimodal and trimodal multisensory enhancement of reaction time: Effects of stimulus onset and intensity. Perception & Psychophysics, 66, 1388–1404. [CrossRef] [PubMed]
Diederich A. Colonius H. (2008). Cross-modal interaction in saccadic reaction time: Separating multisensory from warning effects in the time window of integration model. Experimental Brain Research, 186, 1–22. [CrossRef] [PubMed]
Ecker A. J. Heller L. M. (2005). Auditory–visual interactions in the perception of a ball's path. Perception, 34, 59–75. [CrossRef] [PubMed]
Eriksen C. W. (1988). A source of error in attempts to distinguish coactivation from separate activation in the perception of redundant targets. Perception & Psychophysics, 44, 191–193. [CrossRef] [PubMed]
Gondan M. Heckel A. (2008). Testing the race model inequality: A simple correction procedure for fast guesses. Journal of Mathematical Psychology, 52, 322–325. [CrossRef]
Gondan M. Niederhaus B. Rosler F. Röder B. (2005). Multisensory processing in the redundant-target effect: A behavioural and event-related potential study. Perception & Psychophysics, 67, 713–726. [CrossRef] [PubMed]
Gondan M. Röder B. (2006). A new method for detecting interactions between the senses in event-related potentials. Brain Research, 1073–1074, 389–397. [CrossRef] [PubMed]
Harrington L. K. Peck C. K. (1998). Spatial disparity affects visual–auditory interactions in human sensorimotor processing. Experimental Brain Research, 122, 247–252. [CrossRef] [PubMed]
Hershenson M. (1962). Reaction time as a measure of intersensory facilitation. Journal of Experimental Psychology, 63, 289–293. [CrossRef] [PubMed]
Hofbauer M. Wuerger S. Meyer G. Roehrbein F. Schill K. Zetzsche C. (2004). Catching audio-visual mice: Predicting the arrival time of auditory–visual motion signals. Cognitive, Affective & Behavioural Neuroscience, 4, 241–250. [CrossRef]
Hughes H. C. Reuter-Lorenz P. A. Nozawa G. Fendrich R. (1994). Visual–auditory interactions in sensorimotor processing: Saccades versus manual responses. Journal of Experimental Psychology: Human Perception and Performance, 20, 131–153. [CrossRef] [PubMed]
Jain A. Sally S. L. Papathomas T. V. (2008). Audiovisual short-term influences and aftereffects in motion: Examination across three sets of directional pairings. Journal of Vision, 8, (15):7, 1–13, http://www.journalofvision.org/content/8/15/7, doi:10.1167/8.15.7. [PubMed] [Article] [CrossRef] [PubMed]
Jaskowski P. Sobieralska K. (2004). Effect of stimulus intensity on manual and saccadic reaction time. Perception & Psychophysics, 66, 535–544. [CrossRef] [PubMed]
Krumbholz K. Hewson-Stoate N. Schonwiesner M. (2007). Cortical responses to auditory motion suggest an asymmetry in the reliance on inter-hemispheric connections between the left and right auditory cortices. Journal of Neurophysiology, 97, 1649–1655. [CrossRef] [PubMed]
Kuba M. Kubová Z. Kremlacek J. Langrova J. (2007). Motion-onset VEPs: Characteristics, methods, and diagnostic use. Vision Research, 47, 189–202. [CrossRef] [PubMed]
Leo F. Bertini C. di Pellegrino G. Ladavas E. (2008). Multisensory integration for orienting responses in humans requires the activation of the superior colliculus. Experimental Brain Research, 186, 67–77. [CrossRef] [PubMed]
Levitt H. (1971). Transformed up–down methods in psychoacoustics. The Journal of the Acoustical Society of America, 49, 467–477. [CrossRef] [PubMed]
Lewis J. W. Beauchamp M. S. DeYoe E. A. (2000). A comparison of visual and auditory motion processing in human cerebral cortex. Cerebral Cortex, 10, 873–888. [CrossRef] [PubMed]
Maeda F. Kanai R. Shimojo S. (2004). Changed pitch induced visual motion illusion. Current Biology, 14, 990–991. [CrossRef]
Maravita A. Bolognini N. Bricolo E. Marzi C. A. Savazzi S. (2008). Is audiovisual integration subserved by the superior colliculus in humans? NeuroReport, 19, 271–275. [CrossRef] [PubMed]
Martuzzi R. Murray M. M. Michel C. M. Thiran J.-P. Maeder P. Clarke S. et al. (2007). Multisensory interactions within human primary cortices revealed by BOLD responses. Cerebral Cortex, 17, 1672–1679. [CrossRef] [PubMed]
Meredith M. A. Nemitz J. W. Stein B. (1987). Determinants of multisensory integration in superior colliculus neurons: 1. Temporal factors. The Journal of Neuroscience, 7, 3215–3229. [PubMed]
Meyer G. F. Wuerger S. M. (2001). Cross-modal integration of auditory and visual motion signals. NeuroReport, 12, 2557–2560. [CrossRef] [PubMed]
Meyer G. Wuerger S. M. Roehrbein M. Zetzsche C. (2005). Low-level integration of auditory and visual motion signals requires spatial co-localisation. Experimental Brain Research, 166, 538–547. [CrossRef] [PubMed]
Miller J. (1982). Divided attention: Evidence for coactivation with redundant signals. Cognitive Psychology, 14, 247–279. [CrossRef] [PubMed]
Miller J. (1986). Time course of coactivation in bimodal divided attention. Perception & Psychophysics, 40, 331–343. [CrossRef] [PubMed]
Miller J. Ulrich R. (2003). Simple reaction time and statistical facilitation: A parallel grains model. Cognitive Psychology, 46, 101–151. [CrossRef] [PubMed]
Molholm S. Ritter W. Murray M. M. Javitt D. C. Schroeder C. E. Foxe J. J. (2002). Multisensory auditory–visual interactions during early sensory processing in humans: A high-density electrical mapping study. Cognitive Brain Research, 14, 115–128. [CrossRef] [PubMed]
Nickerson R. S. (1973). Intersensory facilitation of reaction time: Energy summation or preparation enhancement. Psychological Review, 80, 489–509. [CrossRef] [PubMed]
Otto T. U. Mamassian P. (2010). Noise vs. multisensory integration: The return of the race model. Paper presented at the 11th International Multisensory Research Forum (IMRF), Liverpool, UK.
Posner M. I. Nissen M. J. Klein R. M. (1976). Visual dominance: An information-processing account of its origins and significance. Psychological Review, 83, 157–171. [CrossRef] [PubMed]
Raab D. H. (1962). Statistical facilitation of simple reaction times. Transactions of the New York Academy of Sciences, 24, 574–590. [CrossRef] [PubMed]
Sanabria D. Correa A. Lupiáñez J. Spence C. (2004). Bouncing or streaming? Exploring the influence of auditory cues on the interpretation of ambiguous visual motion. Experimental Brain Research, 157, 537–541. [CrossRef] [PubMed]
Sanabria D. Lupiáñez J. Spence C. (2007). Auditory motion affects visual motion perception in a speeded discrimination task. Experimental Brain Research, 178, 415–421. [CrossRef] [PubMed]
Sanabria D. Soto-Faraco S. Chan J. Spence C. (2005). Intramodal perceptual grouping modulates multisensory integration: Evidence from the cross-modal dynamic capture task. Neuroscience Letters, 377, 59–64. [CrossRef] [PubMed]
Sanabria D. Spence C. Soto-Faraco S. (2007). Perceptual and decisional contributions to audiovisual interactions in the perception of apparent motion: A signal detection study. Cognition, 102, 299–310. [CrossRef] [PubMed]
Savazzi S. Marzi C. A. (2008). Does the redundant signal effect occur at an early visual stage? Experimental Brain Research, 184, 275–281. [CrossRef] [PubMed]
Scheef L. Boecker H. Daamaen M. Fehse U. Landsberg M. W. Granath D.-O. et al. (2009). Multimodal motion processing in area V5/MT: Evidence from an artificial class of audio-visual events. Brain Research, 1252, 94–104. [CrossRef] [PubMed]
Sekuler R. Sekuler A. B. Lau R. (1997). Sound alters visual motion perception. Nature, 385, 308. [CrossRef] [PubMed]
Soto-Faraco S. Kingstone A. Spence C. (2003). Multisensory contributions to the perception of motion. Neuropsychologia, 41, 1847–1862. [CrossRef] [PubMed]
Soto-Faraco S. Kingstone A. Spence C. (2004). Cross-modal dynamic capture: Congruency effects in the perception of motion across sensory modalities. Journal of Experimental Psychology: Human Perception and Performance, 30, 330–345. [CrossRef] [PubMed]
Soto-Faraco S. Lyons J. Gazzaniga M. Spence C. Kingstone A. (2002). The ventriloquist in motion: Illusory capture of dynamic information across sensory modalities. Cognitive Brain Research, 14, 139–146. [CrossRef] [PubMed]
Soto-Faraco S. Spence C. Kingstone A. (2004). Moving multisensory research along: Motion perception across sensory modalities. Current Directions in Psychological Science, 13, 29–32. [CrossRef]
Soto-Faraco S. Spence C. Kingstone A. (2005). Assessing automaticity in the audiovisual integration of motion. Acta Psychologica, 118, 71–92. [CrossRef] [PubMed]
Stein B. Huneycutt W. S. Meredith M. A. (1988). Neurons and behaviour: The same rules of multisensory integration apply. Brain Research, 17, 355–358. [CrossRef]
Stekelenburg J. J. Vroomen J. (2009). Neural correlates of audiovisual motion capture. Experimental Brain Research, 198, 383–390. [CrossRef] [PubMed]
Strybel T. Z. Neale W. N. (1994). The effect of burst duration, interstimulus onset interval and loudspeaker arrangement on auditory apparent motion in the free field. Journal of the Acoustical Society of America, 96, 3463–3475. [CrossRef] [PubMed]
Strybel T. Z. Vatakis A. (2004). A comparison of auditory and visual apparent motion presented individually and with cross-modal distractors. Perception, 33, 1033–1048. [CrossRef] [PubMed]
Todd J. W. (1912). Reaction to multiple stimuli. Archives of Psychology, 25, 1–65.
VanRullen R. Thorpe S. J. (2001). Is it a bird? Is it a plane? Ultra-rapid visual categorisation of natural and artifactual objects. Perception, 30, 655–668. [CrossRef] [PubMed]
Welch R. B. Warren D. H. (1980). Immediate perceptual response to intersensory discrepancy. Psychological Bulletin, 88, 638–667. [CrossRef] [PubMed]
Wuerger S. M. Hofbauer M. Meyer G. (2003). The integration of auditory and visual motion signals at threshold. Perception & Psychophysics, 65, 1188–1196. [CrossRef] [PubMed]
Figure 1
 
The trajectory of the motion stimuli over space and time. The horizontal axis displays spatial eccentricity, and the vertical axis displays time (each unit T denotes a duration of 50 ms). Auditory and/or visual stimuli (shaded squares) begin in the center (0°) at time “T1” and then move by 6° either leftward or rightward at each subsequent 50-ms period. Note that stimuli move either to the left or to the right.
Figure 1
 
The trajectory of the motion stimuli over space and time. The horizontal axis displays spatial eccentricity, and the vertical axis displays time (each unit T denotes a duration of 50 ms). Auditory and/or visual stimuli (shaded squares) begin in the center (0°) at time “T1” and then move by 6° either leftward or rightward at each subsequent 50-ms period. Note that stimuli move either to the left or to the right.
Figure 2
 
(A) Mean (N = 12) miss rates as percentages for visual-alone (V), auditory-alone (A), and auditory–visual (AV) conditions at each stimulus intensity level (threshold/mid-intensity/high-intensity). (B) Mean false alarms for catch trials are shown as percentages for each stimulus condition at each stimulus intensity level. (C) Mean reaction times for each stimulus condition at each stimulus intensity level. Error bars display SEM values. The legend applies to all three graphs. *p < 0.05. **p < 0.01. ***p < 0.001. NS: not significant.
Figure 2
 
(A) Mean (N = 12) miss rates as percentages for visual-alone (V), auditory-alone (A), and auditory–visual (AV) conditions at each stimulus intensity level (threshold/mid-intensity/high-intensity). (B) Mean false alarms for catch trials are shown as percentages for each stimulus condition at each stimulus intensity level. (C) Mean reaction times for each stimulus condition at each stimulus intensity level. Error bars display SEM values. The legend applies to all three graphs. *p < 0.05. **p < 0.01. ***p < 0.001. NS: not significant.
Figure 3
 
Cumulative distribution function (CDFs) for RTs are displayed for visual alone (squares), auditory alone (triangles), auditory–visual (circles), and the CDF predicted by the race model (solid trace) for (A) threshold stimuli, (B) mid-intensity stimuli (after correction by the “kill-the-twin” procedure), and (C) high-intensity stimuli (after correction by the “kill-the-twin” procedure). (D) Miller inequality derived by subtracting the CDF predicted by the race model from the CDF of the AV condition at each bin for threshold stimuli (triangles), mid-intensity stimuli (circles), and high-intensity stimuli (squares). Values greater than zero (thin dashed trace) signify a violation of the race model in that bin. CDFs are plotted from 100 to 800 ms after stimulus onset for display purposes, at the mid-point of each time bin. Note that the race model will produce a CDF of 2 eventually, but the key comparison is for the early part of the CDF (Miller, 1982).
Figure 3
 
Cumulative distribution function (CDFs) for RTs are displayed for visual alone (squares), auditory alone (triangles), auditory–visual (circles), and the CDF predicted by the race model (solid trace) for (A) threshold stimuli, (B) mid-intensity stimuli (after correction by the “kill-the-twin” procedure), and (C) high-intensity stimuli (after correction by the “kill-the-twin” procedure). (D) Miller inequality derived by subtracting the CDF predicted by the race model from the CDF of the AV condition at each bin for threshold stimuli (triangles), mid-intensity stimuli (circles), and high-intensity stimuli (squares). Values greater than zero (thin dashed trace) signify a violation of the race model in that bin. CDFs are plotted from 100 to 800 ms after stimulus onset for display purposes, at the mid-point of each time bin. Note that the race model will produce a CDF of 2 eventually, but the key comparison is for the early part of the CDF (Miller, 1982).
Figure 4
 
(A) Reaction times for visual (V), auditory (A), and auditory–visual (AV) stimuli. (B) Cumulative distribution function (CDFs) for RTs are displayed for V (squares), A (triangles), AV (circles), and the CDF predicted by the race model (solid trace). (C) Miller inequality derived by subtracting the CDF predicted by the race model from the CDF of the auditory–visual condition at each bin.
Figure 4
 
(A) Reaction times for visual (V), auditory (A), and auditory–visual (AV) stimuli. (B) Cumulative distribution function (CDFs) for RTs are displayed for V (squares), A (triangles), AV (circles), and the CDF predicted by the race model (solid trace). (C) Miller inequality derived by subtracting the CDF predicted by the race model from the CDF of the auditory–visual condition at each bin.
Figure 5
 
(A) Trajectories of the motion stimuli over space and time. The horizontal axis displays spatial eccentricity, and the vertical axis displays time (each unit Tn and Sn denotes a 50-ms duration). Auditory and/or visual stimuli (shaded squares) begin in the center at time “S1” and remain stationary until time “Sx” (which is between 1000 and 1200 ms from stimulus onset), then move by 6° either leftward or rightward at each subsequent 50-ms period (T1–T6). Motion is either to left or right, never simultaneously in both directions.
Figure 5
 
(A) Trajectories of the motion stimuli over space and time. The horizontal axis displays spatial eccentricity, and the vertical axis displays time (each unit Tn and Sn denotes a 50-ms duration). Auditory and/or visual stimuli (shaded squares) begin in the center at time “S1” and remain stationary until time “Sx” (which is between 1000 and 1200 ms from stimulus onset), then move by 6° either leftward or rightward at each subsequent 50-ms period (T1–T6). Motion is either to left or right, never simultaneously in both directions.
Figure 6
 
(A) Mean (N = 12) reaction times for V, A, and AV conditions, for direction-discrimination RT task (left) and motion-onset task (right). (B) CDFs for the RTs in the V (squares), A (triangles), and AV (circles) conditions, and the CDF predicted by the race model (solid trace), for Experiment 3A. (C) CDFs for the RTs for each stimulus condition and for the race model prediction in the motion-onset task (Experiment 3B). (D) Miller inequality for Experiment 3A (squares) and Experiment 3B (triangles).
Figure 6
 
(A) Mean (N = 12) reaction times for V, A, and AV conditions, for direction-discrimination RT task (left) and motion-onset task (right). (B) CDFs for the RTs in the V (squares), A (triangles), and AV (circles) conditions, and the CDF predicted by the race model (solid trace), for Experiment 3A. (C) CDFs for the RTs for each stimulus condition and for the race model prediction in the motion-onset task (Experiment 3B). (D) Miller inequality for Experiment 3A (squares) and Experiment 3B (triangles).
Figure 7
 
(A) CDFs for the RTs in the V (squares), A (triangles), and AV same direction (circles), and AV opposite direction (stars) conditions, and the CDF predicted by the race model (solid trace). (B) Miller inequality for bimodal motion in the same direction (triangles) and bimodal motion in the opposite direction (squares).
Figure 7
 
(A) CDFs for the RTs in the V (squares), A (triangles), and AV same direction (circles), and AV opposite direction (stars) conditions, and the CDF predicted by the race model (solid trace). (B) Miller inequality for bimodal motion in the same direction (triangles) and bimodal motion in the opposite direction (squares).
Table 1
 
Mean intensity levels.
Table 1
 
Mean intensity levels.
Auditory (dB (A)) Visual (cd/m2)
Threshold 45 0.3
Mid-intensity 57 5
High-intensity 69 76
Table 2
 
Experimental conditions in Experiment 4. Note: s, stationary; m, motion; l, left; r, right.
Table 2
 
Experimental conditions in Experiment 4. Note: s, stationary; m, motion; l, left; r, right.
Target (response required) Catch (no response required)
V A V A
Unimodal visual ml sl
mr sr
Unimodal auditory ml sl
mr sr
Bimodal congruent ml ml sl sl
mr mr sr sr
Bimodal Incongruent ml sl
sl ml
mr sr
sr mr
Bimodal opposite ml mr
mr ml
Table 3
 
Mean reaction time (RT), mean discrimination time, and mean accuracy per target condition.
Table 3
 
Mean reaction time (RT), mean discrimination time, and mean accuracy per target condition.
Mean RT (ms) Overall discrimination time (ms) Mean accuracy (%)
Vm 570.13 390 97.50
Am 583.28 410 92.83
VmAs 581.40 410 93.75
VsAm 601.15 410 92.30
VmAm 529.91 370 98.17
VmAm_Opp 559.69 390 97.34
Table 4
 
Mean false alarms and standard errors (SEs).
Table 4
 
Mean false alarms and standard errors (SEs).
False alarms
Mean (%) SE
Vs 12.43 4.98
As 11.75 6.83
VsAs 10.47 5.46
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×