In the passive–auditory condition, participants passively awaited the target sound stimulus and reported its onset time. At the beginning of each trial, the clock face was displayed for 500 ms, and the red disc appeared at a random position (1 of the 120 gray discs), starting to rotate clockwise. After a pseudo-randomized period (2.0, 2.5, 3.0, or 3.5 seconds), a keypress sound (duration 137 ms) was played, indicating that the computer had acted to trigger the subsequent target event. At 250 ms after the onset of the keypress sound, a target collision sound (duration 393 ms) was played. After the onset of the target sound was the wash-out period; the red disc continued to rotate for a pseudo-randomized period (1.0, 1.5, 2.0, or 3.0 seconds) and then disappeared while the clock face remained on the screen for an additional 500 ms. Participants were instructed to fixate on the center fixation point throughout the presentation stage, covertly attend to the rotating red disc, and memorize the position of the red disc when they heard the target collision sound. After the presentation stage, there was a 500-ms blank period before the clock face was shown again. Participants then reported the location of the moving red disc by moving the cursor to select one of the gray discs. During the response stage, participants were told to relax and allowed to move their eyes freely.
In the active–auditory condition, the sequence was similar, except the collision sound was triggered by the participants’ voluntary action. In each trial, participants were instructed to freely press a designated key to trigger the target collision sound at any chosen time, provided they allowed the red disc to complete at least one rotation. Participants were advised to press the key spontaneously, without relying on a predetermined position of the red discs or any other type of countdown strategy. The interval between the participants’ keypress and the target collision sound remained the same (250 ms) as in the passive condition.
In the passive–audiovisual integrated condition, when the red disc appeared, two colored discs (yellow and cyan) also appeared at two of the four possible locations relative to the fixation: lower left, lower right, upper left, or upper right. The distance between each colored disc and the center fixation was 2.85°. After the red disc rotated for a pseudo-randomized period (the same as in the passive–auditory condition), two colored discs were launched toward the center fixation by the computer. The keypress sound was presented simultaneously with the launch action. After 250 ms, the two colored discs met at the center and bounced away upon contact (elastic collision). The target collision sound was presented simultaneously at the moment of contact. After the collision, two color discs changed trajectories and continued moving for 1,000 ms before disappearing (see
Figure 2 for the schematic of the two color discs’ spatial dynamic). After that, there was a wash-out period (same as in the auditory condition). The participant's primary task in audiovisual integrated conditions was the same as in the auditory conditions, which was to report the position of the red disc when they heard the target collision sound. They were informed that the timing of the target sound and the timing of the two discs making contact on the screen were the same. The participants were also required to perform a “trajectory judgment (TJ) task.” During the presentation, they were instructed to focus on the center of the screen, covertly track the movements of two discs, and memorize their vanishing points. After reporting the onset of the target sound, they were randomly asked to report the final location of one of the colored discs. The clock face was divided into four quadrants (four pie-like sectors colored gray), and participants reported the location by selecting one of the sectors with a cursor. When the cursor was positioned in a specific sector, that sector was highlighted in the same color as the disc they were asked to report (see
Figure 3 for the layout of the TJ task). The purpose of the TJ Task is to ensure that participants pay attention to the visual components rather than ignoring them. Participants were instructed to monitor the rotating red disc and the movements of two colored discs throughout the presentation without tracking the objects by moving their eyes. They were also informed that both tasks were equally important and that they should strive to balance their performance across tasks, rather than focusing exclusively on one. In the active–audiovisual integrated condition, two colored discs were launched by the participant's keypress, and the rest of the sequence was identical to that in the passive condition.
The audiovisual irrelevant conditions were similar to the audiovisual integrated condition, except the two colored discs were launched in pseudo-randomized directions, making the visual component unrelated to the target collision sound. There are 12 combinations of the two discs' movements: in 10 of these combinations, the discs move in straight paths without making contact; in 2 of the combinations, the discs temporally overlap and appear to stream through each other (see
supplementary material for the video demo URL). The target collision sound is presented 250 ms after the discs are launched. For both the integrated and irrelevant conditions, the speed of moving discs is constant for a given direction: 4 pixels/frame when moving vertically or horizontally and 5.66 pixels/frame when moving diagonally. The key distinction between the integrated and irrelevant conditions is that in the integrated conditions, the auditory and visual modalities are integrated to depict a collision event, whereas in the irrelevant conditions, although the computer or participant triggers the onset of the target sound and visual dynamics simultaneously, the information from the two modalities is not perceptually related or part of an integrated physical event. Comparing the integrated and irrelevant conditions allows us to investigate whether the level of integration between modalities impacts the IB phenomenon.