Abstract
Separate lines of research have revealed that perceptual decisions about unreliable sensory information are driven by processes that integrate evidence across time or across modalities. Here we investigate the conditions under which subjects will integrate sensory information across both time and modalities. We presented subjects with multi-modal event streams, consisting of a series of noise-masked tones and/or flashes of light. Subjects made judgments about whether the event rate was high or low. Combining across modalities could improve performance in two ways: by improving the detectibility of congruent auditory and visual events, or, more abstractly by combining rate estimates that are separately generated within each modality. Performance improved when stimuli were presented in both modalities (cue-combination condition) compared to when stimuli were presented in a single modality. Importantly, this improvement was evident both when the auditory and visual event streams were played synchronously and asynchronously. The enhancement of rate estimates we observed for asynchronous streams could not have resulted from improved detection of individual events, which argues strongly that the subjects combined estimates of overall rates that were computed separately for auditory and visual inputs. Moreover, we show that subjects' performance agrees with a Bayesian statistical observer that optimally combines separate rate estimates for auditory and visual inputs.