Abstract
The perceptual system builds a coherent percept by integrating different sensory inputs processed through modality-specific neural pathways. By utilizing motion aftereffect (MAE; Mather, 1980), our previous study showed the strengthened visual MAE when adaptation to the visual motion was accompanied by congruent direction of auditory motion, implying audio-visual interactions in the visual pathways (Park et al., 2019). In the current study, we used fMRI to examine the neural basis of such audio-visual congruence effect. Specifically, we tested whether the audio-visual direction congruence can be decoded in visual areas by utilizing multi-voxel pattern analysis (MVPA). During the adaptation phases (30 sec initially and 12 sec thereafter), the MAE was elicited by the 100%-coherence random-dot kinematograms (RDKs) moving leftward or rightward. Sound motions with the leftward or rightward direction were mimicked by modulating the intensity of white noise presented between binaural channels of noise-canceling headphones. According to the audio-visual direction congruence, there were congruent and incongruent sound conditions, along with stationary and no-sound conditions. Participants were to report the duration and direction of the MAE while seeing a stationary RDKs in the 4-sec test phase. We found a longer MAE duration in the congruent condition than the incongruent condition, replicating our previous finding. In all four conditions, the adapting direction of visual motion can be reliably decoded from neural patterns of adaptation and test phases in striate and extra-striate visual areas (V1, V2, MT+). Interestingly, audio-visual direction congruence can be successfully classified from neural patterns in those visual areas even in the absence of physical motion (i.e., test phase) as well as in the adaptation phase. The current findings suggest underlying mechanisms of the audio-visual congruence effect observed by MAE duration by demonstrating that the neural representations in visual areas have a distinguished representation depending on the relationship between audio-visual motion.