Abstract
In order to integrate cues from different perceptual modalities the cues must be brought into a common coordinate frame. Previous work in cue combination assumes that no sensorimotor transformation noise (STN) is present during multi-modal cue integration. The goal of the present study is to show that the choice of common coordinate frame matters in the presence of STN, both to generate optimal estimates, and in terms of human performance.
Theoretically, we demonstrate that performing cue combination in the coordinate frame of the most reliable individual cue is optimal, when STN is present. In addition, we model integration of information across trials as Bayesian learning with forgetting, and show that the learned posterior distributions should also be stored in the most reliable coordinate frame. Given that vision is dominant for many tasks, this provides a possible theoretical explanation for the common empirical finding that object position estimates are stored in eye position coordinates (e.g., Batista, et. al, 1999).
These theoretical results will be used to model findings from a reaching task. The task involved reaching to a cylindrical object, whose location was specified by visual and haptic information. STN was varied by changing the relationship between the head, eyes, and object. The results show that reach trajectories systematically vary as a function of STN. Moreover, reaches are consistent with predictions made by storing estimates of the target's location in the most reliable coordinate frame. Overall, the results of this study suggest that coordinate frames are not interchangeable — for a given task there may be a preferred frame of reference.