There are many properties of signals that are likely to affect the nervous system's ability to combine information from different senses. In the work presented here, we showed that spatial separation between visual and haptic signals affects this ability. Gepshtein and Banks (
2003) showed that the difference in size between visual and haptic signals affects the ability for visual–haptic combination as well. In that study, observers made size judgments between spatially coincident visual and haptic signals. Gepshtein and Banks varied the
conflict between the two signals: the difference in the sizes specified by vision and haptics. Visual–haptic discrimination performance was best when the conflict was zero and became successively poorer as the conflict became larger (their Figure S2). Other studies have found that separation in time also affects the ability to combine signals (Bresciani et al.,
2005; Shams, Kamitani, & Shimojo,
2000).
Taken together, the present results and those of the previous studies suggest that the nervous system determines when to combine visual and haptic signals based on signal similarity: similarity of spatial position, similarity of size, and similarity in time. Thus, to determine whether to combine signals from different modalities, the nervous system is solving a classification problem (Duda, Hart, & Stork,
2001). Because signals from different modalities vary along many dimensions, it is a multidimensional classification problem. Such a problem is often solved by computing a measure of signal similarity that takes into account signal differences on multiple dimensions (Coombs, Dawes, & Tversky,
1970; Krantz, Luce, Suppes, & Tversky,
1971). Such a measure could be used by the nervous system to determine whether to combine the signals. To further investigate how signal similarity in several dimensions affects the integration of visual and haptic information, one could examine the precision of a multi-modal estimate while varying the stimulus along several sensory dimensions, as we did here for one dimension. A satisfactory model of this process would have a measure of signal similarity that reliably predicts the precision of the multi-modal estimate. In that case, different combinations of signal parameters (e.g., visual and haptic size, location, time of occurrence, etc.) that correspond to the same similarity value should yield the same precision.
It would be interesting to know whether inter-sensory combination is affected by higher-level variables such as occlusion relationships, or whether it is affected by only low-level variables such as spatial proximity. For example, imagine that an occluder is placed in front of the gap between the visual and haptic parts of our stimulus. With amodal completion (Kanizsa,
1979), the two parts might appear to belong to the same object. Would observers then combine more widely separated visual and haptic signals than we observed? Such a finding would suggest that high-level variables are indeed involved in inter-sensory combination.