Abstract
When multiple sources of sensory information about a single environmental property are available, more precise estimates of that property can be formed by combining the different sources. To maximize the precision of the combined estimate each cue must be weighted in proportion to its reliability. For physical dimensions such as object size (Ernst & Banks, 2002), surface slant (Knill & Saunders, 2004) and object location (Alais & Burr, 2005), studies show that humans integrate different sensory sources in a statistically optimal fashion. We investigated the integration of auditory and visual cues for a more complex physical property: beat tempo. Stimuli were created from 3D motion capture data (240 Hz) of a drummer performing swing groove drumming at 90BPM. This movement data was converted into visual point light displays with points at the shoulder, elbow, wrist, hand and two drumstick points. The movie sample rate was 60Hz. Sounds were obtained by a simulation of the first 25 modes of a circular membrane. Parameters for the sound model were the physical parameters of the membrane and the time and impact velocity of a strike. There were three main conditions in the experiment: audio-alone, vision-alone and audio-visual combined. For each of these conditions, we measured tempo discrimination performance in a 2IFC task. One of our three observers, discrimination performance improved in the two-cue case as predicted by the statistically optimal cue combination model. However, the other two observers do not show the predicted improvement. These differences in performance may be a result of the practice in audio-visual tempo discrimination. We are currently investigating the effect of expertise and practice on the integration of audio-visual information in this tempo discrimination task.