What are the potential neural mechanisms that mediate auditory influences on rivaling visual motion percepts? Dominant visual stimuli are thought to be processed throughout the same cortical hierarchy that is involved during normal non-rivaling viewing. Thus, auditory motion may influence the dynamics of visual motion percepts at multiple cortical hierarchical levels (Sadaghiani et al.,
2009; Soto-Faraco, Kingstone, & Spence,
2003; Soto-Faraco et al.,
2004) via (1) genuine
multisensory integration, (2)
attention, and (3)
decisional processes. (1) Over the past decade,
multisensory interactions have been shown at multiple cortical levels ranging from primary sensory to higher order association cortices (Bremmer et al.,
2001; Kayser et al.,
2008; Schroeder & Foxe,
2005; Werner & Noppeney,
2010). Since true multisensory integration breaks down when sensory inputs are brought into conflict, the dominance periods of the congruent visual motion percept may be stabilized by the integration of congruent auditory and visual inputs into a unified percept. Multisensory integration or binding could then be considered a natural extension of the well-established perceptual grouping (Sobel & Blake,
2002) to the multisensory domain and may be related to attentional bottom-up effects. Given that binding within a sensory modality is generally stronger than between sensory modalities, it is not surprising that our multisensory congruency effect of approximately 30% rise in dominance time is smaller than “classical” visual perceptual grouping effects (Sobel & Blake,
2002), which lengthened dominance durations by 40% or more. In addition to “true” multisensory integration, auditory motion may influence the dominance times of rivaling percepts via additional endogenous
attentional mechanisms and voluntary control (Chong & Blake,
2006; Chong et al.,
2005; Lack,
1978; Maruya et al.,
2007; Meng & Tong,
2004). First, motion sound may withdraw attention from the visual stimulus (Alais, Parker, Boxtel, Paffen, & van Ee,
2007; Paffen et al.,
2006; Pastukhov & Braun,
2007). However, a “withdrawal mechanism” should generally slow down temporal dynamics and cannot easily explain the congruency-selective effect observed in the current study. Second, a congruent audiovisual motion percept may attract more attention leading to longer dominance times. Though we cannot exclude this effect of attention, it is interesting to note that under non-rivaling conditions incongruent rather than congruent task-irrelevant auditory inputs lead to protracted response times (Adam & Noppeney,
2010; Laurienti, Kraft, Maldijan, Burdette, & Wallace,
2004; Molholm, Ritter, Javitt, & Foxe,
2004; Noppeney, Josephs, Hocking, Price, & Friston,
2008; Noppeney, Ostwald, & Werner,
2010; Weissman, Warner, & Woldorff,
2004) and increased processing in the fronto-parietal attentional system. Third, auditory input has recently been shown to enhance voluntary control and attentional selection in binocular rivalry. However, this auditory-induced attentional selection gain was only observed when subjects attended to the sounds but not when they ignored the auditory input as in our paradigm (van Ee et al.,
2009). Collectively, these arguments suggest that endogenous top-down attention may not be the primary mechanism but rather amplify auditory influences initiated by genuine audiovisual integration. Finally, changes in dominance and suppression times may be attributed to high-level decisional biases induced by concurrent motion sound. For instance, observers may tend to report rightward motion for piecemeal periods with concurrent rightward sounds. While additional decisional influences cannot be fully excluded, we note that the dissociation of perceptual from decisional effects on subjects' report is a rather general problem in studies of binocular rivalry.