Free
Research Article  |   June 2009
Integration of vision and haptics during tool use
Author Affiliations
Journal of Vision June 2009, Vol.9, 3. doi:https://doi.org/10.1167/9.6.3
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Chie Takahashi, Jörn Diedrichsen, Simon J. Watt; Integration of vision and haptics during tool use. Journal of Vision 2009;9(6):3. https://doi.org/10.1167/9.6.3.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

When integrating signals from vision and haptics the brain must solve a “correspondence problem” so that it only combines information referring to the same object. An invariant spatial rule could be used when grasping with the hand: here the two signals should only be integrated when the estimate of hand and object position coincide. Tools complicate this relationship, however, because visual information about the object, and the location of the hand, are separated spatially. We show that when a simple tool is used to estimate size, the brain integrates visual and haptic information in a near-optimal fashion, even with a large spatial offset between the signals. Moreover, we show that an offset between the tool-tip and the object results in similar reductions in cross-modal integration as when the felt and seen positions of an object are offset in normal grasping. This suggests that during tool use the haptic signal is treated as coming from the tool-tip, not the hand. The brain therefore appears to combine visual and haptic information, not based on the spatial proximity of sensory stimuli, but based on the proximity of the distal causes of stimuli, taking into account the dynamics and geometry of tools.

Introduction
Theories of sensory integration describe the statistically optimal strategy for integrating signals from vision and haptics (active touch). If both estimates are on average unbiased, and have independent, Gaussian noise, the lowest-variance combined estimate is a weighted sum of estimates from each cue, weighted by the reciprocal of their normalized variances (for a review see Oruç, Maloney, & Landy, 2003). The variance of this combined estimate is:  
σ V H 2 = σ V 2 σ H 2 σ V 2 + σ H 2 ,
(1)
where σ V 2 and σ H 2 are the variances of the visual and haptic estimates, respectively. The combined estimate always has lower variance than estimates based on either individual cue. Thus a key advantage of sensory integration is that it allows properties of the environment to be estimated with greater precision than can be achieved with either signal alone (Clark & Yuille, 1990; Knill & Pouget, 2004; Landy, Maloney, Johnston, & Young, 1995; Oruç et al., 2003; Yuille & Bülthoff, 1996). 
Clearly, it only makes sense to integrate signals if they provide information about the same property of the world. The brain must therefore solve a “correspondence problem” if it is to combine related sensory information while avoiding combining unrelated information (Ernst, 2005; Helbig & Ernst, 2007). One way this could be achieved is by considering the similarity of signals in different sensory channels, in terms of parameters such as their time of occurrence, magnitude, and spatial proximity (Ernst, 2005). In this paper we concentrate on spatial proximity as a criterion for determining cross-modal correspondence. During normal grasping, the extent to which haptic and visual signals are spatially coincident is directly related to the likelihood that they refer to the same object. So a rule that signals should be combined only if the “felt” and “seen” positions of an object are the same, would be effective in this situation. Consistent with this, recent empirical studies have shown that when objects are grasped directly human size-discrimination performance is remarkably close to the statistically optimal predictions (Ernst & Banks, 2002; Gepshtein & Banks, 2003), but the extent of visual-haptic integration decreases systematically when the “felt” and “seen” positions of an object are separated in space (Gepshtein, Burge, Ernst, & Banks, 2005; see also Jackson, 1953; Warren & Cleaves, 1971; Witkin, Wapner, & Leventhal, 1952). 
But there are common situations in which this rule for determining correspondence between visual and haptic signals would fail. One important example is during tool use. Consider using a simple tool consisting of two sticks, attached rigidly to the thumb and index finger ( Figure 1). When an object is grasped with this tool its size can be estimated from the available visual cues, as well as from the “felt” opening of the hand, when the tool touches the object. A statistically optimal tool user would integrate these signals, because they relate to the same object. Yet the tool systematically perturbs the ‘normal’ spatial mapping between visual and haptic signals—in this case they are spatially offset—and a spatial proximity rule would cause them not to be combined. 
Figure 1
 
A cartoon illustrating the spatial offset between visual and haptic signals to object size when using a simple tool, comprised of two sticks rigidly attached to the finger and thumb, respectively. We used a “virtual” tool of this kind in our experiments.
Figure 1
 
A cartoon illustrating the spatial offset between visual and haptic signals to object size when using a simple tool, comprised of two sticks rigidly attached to the finger and thumb, respectively. We used a “virtual” tool of this kind in our experiments.
The natural and intuitive way that we use tools suggests that the brain employs a more sophisticated solution to the correspondence problem during cross-modal integration. Here we ask whether, during tool use, humans appropriately integrate visual information at the location of the acting tip of a tool with haptic information (at the hand). To do this, we examined visual-haptic size discrimination under a variety of spatial offset and tool-use conditions. We first confirmed that, when grasping with the finger and thumb, cross-modal integration is optimal when the signals originate in the same location, but reduces with increasing spatial offset between visual and haptic signals (Ernst & Banks, 2002; Gepshtein & Banks, 2003; Gepshtein et al., 2005). To do this we measured single-modality (vision alone and haptic alone) discrimination performance and used this to predict optimal performance when both cues were available (Experiment 1: no tool). We then determined whether cross-modal integration could be restored despite a spatial offset between visual and haptic signals, when using a simple tool, the length of which varied with spatial offset such that it always reached the visual object (Experiment 2: zero tool-object offset). Finally, we examined whether spatial offset between the tool-tip and the object resulted in similar reductions in cross-modal integration as when the hand is offset in natural grasping (Experiment 3: variable tool-object offset). Such a pattern of results would suggest that during tool use the haptic signal is treated as originating from the tool-tip. 
Methods
Participants
The same seven right-handed participants (aged 21–44 years) took part in Experiments 1 and Experiments 2. Six of these also took part in Experiment 3, but one was unable to continue. All participants had normal or corrected to normal vision, no known motor deficits, and normal stereoacuity. Six of the participants were naive to the purpose of the experiment. 
Apparatus and stimuli
The visual and haptic stimuli were two parallel rectangular planes ( Figures 2 and 3), similar to those used by Gepshtein et al. ( 2005). 
Figure 2
 
Stimulus orientation was defined as the surface slant (relative to the line of sight). As stimulus orientation approaches 0 deg (fronto-parallel) performance is increasingly dependent on a depth estimate from binocular disparity, and thresholds increase (Gepshtein & Banks, 2003).
Figure 2
 
Stimulus orientation was defined as the surface slant (relative to the line of sight). As stimulus orientation approaches 0 deg (fronto-parallel) performance is increasingly dependent on a depth estimate from binocular disparity, and thresholds increase (Gepshtein & Banks, 2003).
Figure 3
 
A schematic of the stimuli in Experiment 1 (top row) and Experiment 2 (bottom row). The textured rectangles show the visual object, defined by two random-dot-stereogram planes. The gray rectangles show the haptic stimulus (not visible to participants). The gray spheres indicated the position of the finger and thumb. The hand was not visible. The tool was shown visually as a pair of parallel “sticks” rigidly attached to the finger and thumb, and was constrained to always lie horizontal. Grasping with the tool was simulated by spatially offsetting the visual and haptic stimuli horizontally. Thus when the tool-tip touched the visual object, force was generated at the finger and thumb. We could not simulate rotation forces on the digits because the experimental design required that visual and haptic stimuli were exactly matched across tool and no-tool experiments. Left and right columns show negative and positive spatial offsets between visual and haptic stimuli, respectively (referred to as hand-object offset). The tool and finger-position spheres were extinguished before presentation of the haptic and visual objects (see Procedure). The information available to make size discrimination judgements was therefore identical in both experiments.
Figure 3
 
A schematic of the stimuli in Experiment 1 (top row) and Experiment 2 (bottom row). The textured rectangles show the visual object, defined by two random-dot-stereogram planes. The gray rectangles show the haptic stimulus (not visible to participants). The gray spheres indicated the position of the finger and thumb. The hand was not visible. The tool was shown visually as a pair of parallel “sticks” rigidly attached to the finger and thumb, and was constrained to always lie horizontal. Grasping with the tool was simulated by spatially offsetting the visual and haptic stimuli horizontally. Thus when the tool-tip touched the visual object, force was generated at the finger and thumb. We could not simulate rotation forces on the digits because the experimental design required that visual and haptic stimuli were exactly matched across tool and no-tool experiments. Left and right columns show negative and positive spatial offsets between visual and haptic stimuli, respectively (referred to as hand-object offset). The tool and finger-position spheres were extinguished before presentation of the haptic and visual objects (see Procedure). The information available to make size discrimination judgements was therefore identical in both experiments.
The visual stimuli were random-dot stereograms depicting two (transparent) parallel planes, separated along their surface normals (c.f. Gepshtein & Banks, 2003). Each plane consisted of uniformly distributed rectangular ‘dots’ with an average width and height of 2 mm. A random jitter was added to the width and height of each dot (±1 mm, uniform distribution) to disrupt the use of dot size as a cue to the planes' separation. The dots covered approximately 8% of each plane. A new stereogram was generated for each plane on each stimulus presentation. The average width and height of each plane was 50 mm. A random variation (±10 mm, uniform distribution) was added to the height and width of each plane, so that the degree of visible overlap between the two planes was not a reliable cue to their separation. The distance from the cyclopean eye to the (visual and haptic) stimulus, in the mid-sagittal plane, was varied randomly in the range 460–530 mm so that the distance to one plane could not be used to judge the planes' separation. Participants viewed the visual stimuli in a conventional “Wheatstone” mirror stereoscope, consisting of a separate TFT monitor (refresh rate 60 Hz) and mirror for each eye. We used anti-aliasing to achieve sub-pixel accuracy of dot positions. Head position was stabilized using a chin and forehead rest. Participants could not see their hand. Two spheres indicated the positions of their finger and thumb. 
The haptic stimuli were also two rectangular planes, the dimensions and position of which matched those of the visual planes save for a variable horizontal spatial offset. The haptic stimuli were rendered by attaching the finger and thumb of the participant's right hand to separate PHANToM 3.0 force-feedback devices (SensAble Technologies, Inc.). Touching the virtual planes with the finger or thumb resulted in an opposing force, simulating contact with a real plane. 
Procedure
In all conditions we measured size discrimination performance using a two-alternative forced-choice (2AFC) procedure. Visual and/or haptic size (defined as the separation between the planes) was varied according to a method of constant stimuli, and participants indicated which interval contained the larger size. The standard size was 50 mm, and the comparison size was 41, 44, 47, 49, 50, 51, 53, 56 or 59 mm. The presentation order of standard and comparison stimuli was randomized and each interval was presented for 1 sec. In each condition participants completed 30 repetitions of each stimulus level. No feedback was given at any stage during the experiment. 
Single-modality experiment
We measured just-noticeable-differences (JNDs) in the separation of the two planes for vision and haptics alone in order to predict performance when both cues were available. According to Equation 1 the maximum improvement in discrimination performance that results from cross-modal integration occurs when the two signals have equal variance. It is therefore important experimentally to match the variance of estimates from each cue, for each participant, so that the effects of cue integration are most clearly evident (Gepshtein et al., 2005). In both single-cue conditions we measured size JNDs as a function of stimulus orientation ( Figure 2). Varying the orientation of the stimulus relative to the line of sight has a large effect on the variance of size estimates from vision, but leaves estimates from haptics relatively unaffected (Gepshtein & Banks, 2003), and we used this to match the variance of estimates from each cue for each participant in the two-cue experimental conditions (c.f. Gepshtein et al., 2005). For vision we measured size JNDs at stimulus orientations of 40, 50, 60, 70 & 80 deg. Because haptic thresholds vary little with orientation, we used fewer stimulus orientations (40, 60 & 80 deg). Vision-alone and haptic-alone blocks were completed separately. 
In the vision-alone conditions, a fixation-cross appeared at the beginning of each trial, indicating the position and orientation of the upcoming stimulus. The two intervals were then presented for 1 sec each, separated by a 1.6 sec inter-stimulus interval (this time was determined based on the typical inter-stimulus interval when haptic information was available, see below). Participants then indicated which interval contained the larger size by pressing one of two virtual (visual and haptic) buttons. 
In the haptic-alone condition, participants grasped the stimulus with the index finger and thumb. Two visual spheres appeared which indicated the position and orientation (but not the size) of the upcoming stimulus. When participants inserted their finger and thumb these ‘start zones’ changed color from yellow to green indicating that they should begin to grasp the stimulus. The start zones could not be felt. All visual information disappeared immediately when the participants moved their digits inward from the start zone positions. Participants were trained to grasp the stimulus for ∼1 sec in each interval and then release it. If the time both digits touched the surface was less than 1000 msec, or more than 1500 msec, the message “too fast” or “too slow” appeared on the screen, and that trial was discarded. In practice this was a small proportion of trials. This process was repeated for the second interval, and the participant then indicated which interval contained the larger size, as described above. 
Experiment 1: No tool
In Experiment 1 visual and haptic stimuli were presented simultaneously. The procedure was similar to the haptic-alone condition, above, with the following exceptions. First, the visual stimulus was presented only when both haptic planes were touched simultaneously. Second, the visual and haptic stimuli were presented with a variety of horizontal spatial offsets between the finger/thumb and visual object ( Figure 3, top row). We refer to this as hand-object offset. Visual and haptic stimuli were always offset by an equal and opposite amount on either side of the body midline. Hand-object offsets of 0, ±50 and ±100 mm were presented (see Figure 3). The offset was chosen randomly on each trial, because a constant offset could lead to visuo-motor adaptation. The visual start zones were shown on all stimulus intervals and were offset to indicate where the participant should grasp the haptic object. A fixation cross was presented before each interval, at the location of the visual object. The fixation cross, and feedback about finger position, disappeared before the visual and haptic stimuli were presented. They were extinguished when the finger and thumb were moved from the start zones, towards the haptic object. During the actual stimulus presentation only the random-dot planes were visible. 
Experiment 2: Zero tool-object offset
In Experiment 2 visual and haptic stimuli were again presented simultaneously, but instead of only seeing spheres indicating the finger and thumb positions, participants now saw a visual ‘virtual tool’, consisting of a cylindrical stick, rigidly attached to each sphere ( Figure 3, bottom row). The tool freely translated with the finger/thumb in the x, y and z dimensions but rotation was prevented so that it was always oriented horizontally. Stimuli were presented with the same hand-object spatial offsets as in Experiment 1 (0, ±50 and 100 mm). The length of the tool varied with hand-object offset such that the tips of the tool always reached the location of the visual object ( Figure 3), indicating to participants that they could feel the remote visual object. As in Experiment 1, the hand-object offset (here, the tool length) was chosen randomly on each trial, to minimize the chance of visuo-motor adaptation (see 2). We refer to the spatial offset between the tool-tip and the visual object as the tool-offset. Therefore in Experiment 2 there was always zero tool-offset. 
The procedure was similar to Experiment 1 except at the start of each trial participants placed the tool tips into the start zones (instead of the finger and thumb), which appeared at the location of the visual object. Once again all visual information, including the tool, disappeared before the visual and haptic stimuli were presented (see above). For a given hand-object offset, therefore, the information available to perform the discrimination task was identical to Experiment 1 (no tool). The tool was somewhat unintuitive at negative hand-object offsets, but these were necessary to prevent visuo-motor adaptation to an offset in a constant direction. Participants were given the opportunity to practice using the tool before the experiment began, and all found it straightforward to use. 
Participants completed Experiments 1 and Experiments 2 during the same period. Trials were blocked by Experiment. As noted above, within each experiment, the hand-object offsets were randomized. 
Results
Single-modality experiment: Matching the variance of visual and haptic estimates
Figure 4 plots size discrimination performance for visual and haptic modalities as a function of stimulus orientation for one example participant. JNDs were defined as the standard deviation ( σ) of the best-fitting cumulative Gaussian to the psychometric data, using a maximum-likelihood criterion. Following previous studies (e.g. Ernst & Banks, 2002; Hillis, Watt, Landy, & Banks, 2004; Knill & Saunders, 2003), we assumed that the standard deviation of the psychometric function is proportional to the standard deviation of the underlying size estimate in each case. For each participant, we therefore determined the stimulus orientation at which σ V and σ H were approximately equal (c.f. Gepshtein et al, 2005). These vision- and haptic-alone JNDs were used to compute the improvement in discrimination performance (reduction in JND) that would be expected under statistically optimal cross-modal integration ( Equation 1). Each participant completed all of the cross-modal experiments using his or her ‘JND-matched’ stimulus orientation. 
Figure 4
 
Discrimination performance (JND) of one example participant in vision- (blue closed circles) and haptic-alone (red open circles) conditions plotted as a function of stimulus orientation. The dashed line denotes the orientation that provided a close match between the precision of size estimates from the two cues. Each participant completed the cross-modal conditions using his or her ‘JND-matched’ stimulus orientation. Error bars denote ±1 standard error.
Figure 4
 
Discrimination performance (JND) of one example participant in vision- (blue closed circles) and haptic-alone (red open circles) conditions plotted as a function of stimulus orientation. The dashed line denotes the orientation that provided a close match between the precision of size estimates from the two cues. Each participant completed the cross-modal conditions using his or her ‘JND-matched’ stimulus orientation. Error bars denote ±1 standard error.
Experiment 1: Effect of hand-object offset
Figure 5 plots visual-haptic size discrimination performance (JNDs) in Experiment 1, averaged across all participants (individual data are shown in 1). The solid horizontal lines show the single-modality JNDs for vision-alone (black) and haptics-alone (gray). The dashed line shows the statistically optimal prediction if both signals are fully integrated ( Equation 1). 
Figure 5
 
The effects of hand-object offset on discrimination performance in Experiment 1 (no tool). Mean size-discrimination performance (JND) is plotted as a function of spatial offset between the visual and haptic stimuli. The black and gray horizontal lines show mean discrimination performance from vision- and haptics-alone, respectively. The dashed line shows performance if the signals are integrated optimally, calculated from the single-modality JNDs using Equation 1. Error bars denote ±1 SEM.
Figure 5
 
The effects of hand-object offset on discrimination performance in Experiment 1 (no tool). Mean size-discrimination performance (JND) is plotted as a function of spatial offset between the visual and haptic stimuli. The black and gray horizontal lines show mean discrimination performance from vision- and haptics-alone, respectively. The dashed line shows performance if the signals are integrated optimally, calculated from the single-modality JNDs using Equation 1. Error bars denote ±1 SEM.
In general the results of our no-tool experiment replicated those of Gepshtein et al. ( 2005). Discrimination performance was nearly optimal when the hand-object offset was zero, but was increasingly poor with increasing spatial offset. One-tailed t-tests showed that, compared to zero offset, the increase in JNDs did not quite reach statistical significance at ∣50∣ mm offset ( t(6) = 1.73, p = .068), but was significant at ∣100∣ mm ( t(6) = 2.15, p < .05). This result is consistent with previous findings. It shows both that the brain does integrate information from vision and haptics near-optimally when the signals originate from the same object (Ernst & Banks, 2002; Gepshtein & Banks, 2003) and, crucially, that integration reduces systematically when visual and haptic signals are separated, and so are likely to have come from different objects (Gepshtein et al., 2005). 
Experiment 2: Zero tool-object offset
Figure 6 shows the effect of hand-object offset on size discrimination in Experiment 2, when the tool tips always reached the location of the visual object (zero tool-object offset, red circles; see 1 for individual data). The results of Experiment 1 (blue diamonds) are re-plotted from Figure 5. Here near-optimal visual-haptic integration was largely restored, despite the hand-object spatial offset. Discrimination performance with the tool was not significantly different from the near-optimal performance observed in Experiment 1, at zero offset, either at ∣50∣ mm ( t(6) = 0.36, p > .05) or ∣100∣ mm ( t(6) = 0.93, p > .05) hand-object offsets. Moreover, JNDs were significantly lower with the tool than without the tool, both at ∣50∣ mm ( t(6) = 2.89, p < .05) and ∣100∣mm ( t(6) = 2.78, p < .05) hand-object offsets. One possible exception to this overall pattern was the performance at minus 100 mm offset. Although still showing significant cross-modal integration, discrimination performance at this offset was quite far from the optimal prediction. Interestingly, participants reported that this condition, in which a long tool came out of the “back” of their hand (see Figure 3, left column) felt unnatural, suggesting that if the tool is unintuitive to use, cross-modal integration may be compromised. Overall, however, the difference we observed between the pattern of results with and without the tool indicates that humans do integrate spatially offset visual and haptic signals when using a simple tool. 
Figure 6
 
The effects of hand-object offset on discrimination performance in Experiment 2 (zero tool-object offset, red circles). Mean JNDs are plotted as a function of spatial offset between visual and haptic stimuli. The blue diamonds show the results of the no-tool condition (Experiment 1), re-plotted from Figure 5 (error bars removed for clarity). The horizontal lines again show mean single-modality discrimination performance and predicted optimal performance. Error bars denote ±1 SEM.
Figure 6
 
The effects of hand-object offset on discrimination performance in Experiment 2 (zero tool-object offset, red circles). Mean JNDs are plotted as a function of spatial offset between visual and haptic stimuli. The blue diamonds show the results of the no-tool condition (Experiment 1), re-plotted from Figure 5 (error bars removed for clarity). The horizontal lines again show mean single-modality discrimination performance and predicted optimal performance. Error bars denote ±1 SEM.
Experiment 3: Variable tool-object offset
In Experiment 3 we examined the effect on visual-haptic integration of a spatial offset between the tool tip and the visual object. If, when solving the problem of which signals to integrate, the brain takes into account the geometry of specific tools, we would expect to see a fall-off in cross-modal integration as the tool tip is offset from the visual object. This would presumably be similar to that observed in Experiment 1 (no tool), above, when the hand was spatially offset, because the likelihood that the visual and haptic signals came from the same object would be similarly reduced. Alternatively, it is possible that participants in Experiment 2 adopted a simple “strategy” of always combining signals when using the tool. If so, we would again see cross-modal integration at all tool offsets. To examine this we again measured visual-haptic integration during tool use, and measured the effect of offsetting the tip of the tool from the object by varying amounts, using a fixed-length tool (variable tool-object offset). 
Procedure
We measured size discrimination performance in the same manner as Experiment 2, but using a fixed-length tool (50 mm), the tips of which were spatially offset from the visual object by ±0, 50 or 100 mm ( Figure 7). The different tool-offset conditions were randomly interleaved. All other details were as before. 
Figure 7
 
A schematic of the stimuli in Experiment 3. Here the length of the tool was always 50 mm, and the tips of the tool were spatially offset form the visual stimulus position by ±0, 50 or 100 mm (variable tool-object offset). It was possible to have zero tool-object offset with the hand to the left or right of the visual stimulus, so we collected data for both configurations. As before the tool and finger/thumb spheres were extinguished before the to-be-judged stimuli were presented.
Figure 7
 
A schematic of the stimuli in Experiment 3. Here the length of the tool was always 50 mm, and the tips of the tool were spatially offset form the visual stimulus position by ±0, 50 or 100 mm (variable tool-object offset). It was possible to have zero tool-object offset with the hand to the left or right of the visual stimulus, so we collected data for both configurations. As before the tool and finger/thumb spheres were extinguished before the to-be-judged stimuli were presented.
Experiment 3: Results
Figure 8 shows the results of Experiment 3 (variable tool-object offset, green squares/triangles) plotted with the earlier data from Experiment 1 (no tool) (see 1 for individual data). The data from Experiment 3 are plotted as a function of the tool-object offset. The data from Experiment 1 are plotted as earlier, as a function of the hand-object offset. 
Figure 8
 
Mean size-discrimination performance (JND) for the tool with variable tip-offset (Experiment 3). Negative offsets are denoted by green squares, and positive offsets by triangles. The data from the no-tool condition (Experiment 1, blue diamonds) are plotted for comparison purposes. The data for the tool experiment are plotted as a function of the tool-object offset (the offset between the tool-tip and the visual stimulus). The data for the no-tool experiment are plotted as a function of the hand-object offset (the offset between the visual and haptic stimuli). The horizontal lines show the single cue JNDs for vision (black solid line) and haptics (gray solid line), and the predicted two-cue performance (dashed line). Error bars denote ±1 SEM.
Figure 8
 
Mean size-discrimination performance (JND) for the tool with variable tip-offset (Experiment 3). Negative offsets are denoted by green squares, and positive offsets by triangles. The data from the no-tool condition (Experiment 1, blue diamonds) are plotted for comparison purposes. The data for the tool experiment are plotted as a function of the tool-object offset (the offset between the tool-tip and the visual stimulus). The data for the no-tool experiment are plotted as a function of the hand-object offset (the offset between the visual and haptic stimuli). The horizontal lines show the single cue JNDs for vision (black solid line) and haptics (gray solid line), and the predicted two-cue performance (dashed line). Error bars denote ±1 SEM.
Figure 8 shows that the changes in cross-modal discrimination performance with spatial offset were strikingly similar in Experiments 1 and Experiment 3. When the tool reached the visual object, discrimination performance was again close to that predicted by optimal integration. However, when the tool tips were offset from the visual object, discrimination performance became systematically poorer, approaching single-modality levels. The similarity between the results in the two conditions was also reflected by statistical analyses. Although discrimination performance was overall slightly poorer with the tool than without it, there were no statistical differences between JNDs in Experiment 1 and Experiment 3. Moreover, the pattern of effects observed with the tool was similar to that reported above for Experiment 1. Compared to the zero tool-object offset condition, JNDs were not significantly larger with a tool-object offset of ∣50∣ mm ( t(5) = 1.46, p > .05), but they were at ∣100∣ mm ( t(5) = 3.65, p < .01). 
Offsetting the tool-tip, or the finger/thumb when grasping without the tool, therefore had a very similar effect on the extent of visual-haptic integration that was observed. These results are consistent with the idea that the brain can take account of the geometry and dynamics of tools, and integrates spatially offset visual and haptic signals only when it is appropriate to do so. 
Discussion
Summary
For integration of information from vision and haptics to be effective, the brain should integrate signals if and only if each modality provides information about the same object. This means that the brain has to solve a correspondence problem: which of the activations in the given sensory modalities belong together? It has previously been reported that a spatial offset between where an object is seen and where an object is felt significantly reduces sensory integration (Gepshtein et al., 2005; see also Jackson, 1953; Warren & Cleaves, 1971; Witkin et al., 1952). Here we observed nearly optimal integration of size information from vision and haptics, when the spatial offset between the fingers and the object was bridged by a tool. In contrast, performance approached single-modality levels when the tool did not reach the object. These results cannot be explained by a fixed cue-integration rule that considers spatial coincidence of visual signals and an estimate of hand position. Instead we argue that the system is more flexible, making use of other information to determine when the “felt” and “seen” perceptual estimates refer to the same object. 
Solving the correspondence problem
Here we have considered a case in which the brain sometimes should combine signals from different sensory modalities and other times should not. This highlights that for sensory integration to be effective, the brain needs to solve a general correspondence problem across sensory modalities. Several related models have successfully characterized this process within a Bayesian framework (e.g. Bresciani, Dammeier, & Ernst, 2006; Ernst, 2007; Knill, 2007; Körding et al., 2007; Roach, Heron, & McGraw, 2006; Shams, Ma, & Beierholm, 2005). This is achieved by including a joint likelihood function—referred to as an “interaction” or “coupling” prior (Ernst, 2005; Roach et al., 2006)—that describes the perceptual system's knowledge of the joint distribution of the stimulus statistics in the two sensory modalities. That is, it describes the prior expectation of how tightly the stimulus properties, as measured by different sensory modalities, are coupled. So, for example, the brain may know that a certain visual size estimate of an object reliably covaries with a certain state of muscles and golgi-tendon organs that indicate a certain grip size. In which case the coupling prior would reflect the fact that these visual and haptic size estimates are very likely to occur together, and all other combinations are very unlikely. By combining the likelihoods from the sensory estimates with this prior, according to Bayes' rule, one can estimate the probability that the two sensory estimates refer to the same object. If the sensory estimates are likely to occur together, the joint (common source) estimate will have a high probability, and if they differ substantially the joint estimate will have a low probability: it is more probable that the signals came from separate objects (Ernst, 2005, 2007; for different formulations of this approach see Knill, 2007; Körding et al., 2007; Roach et al., 2006, Shams et al., 2005). 
It should be highlighted that this process does not result in a binary output (integrate or do not integrate), but provides an estimate of the probability that the signals have a common source. This can then be used to determine the extent to which signals should be integrated. As such, these types of models give a good account of empirical data showing a gradual transition from complete integration, through partial integration, to no integration of signals, as they become increasingly dissimilar (e.g. Körding et al, 2007; Roach et al., 2006). 
These studies have generally modeled the relationship between two estimates of the to-be-judged stimulus property (e.g. perceived direction). In principle, however, this computation can be carried out over any number of relevant dimensions (Ernst, 2005), including time of occurrence, and the parameter of interest here, namely spatial location. Such a model could therefore provide a good account of the progressive reduction in cross-modal integration with increasing spatial offset observed by Gepshtein et al. ( 2005) and in our Experiment 1 (no-tool). The coupling prior would describe how visual and haptic signals from the same location are most likely to originate from the same object, and signals from different locations are highly unlikely to originate from the same object. Therefore as the spatial offset between stimuli increases, the estimate of the probability that the signals have a common source decreases, and the signals are integrated less. 
The correspondence problem in tool use
The approach described above provides a plausible framework for understanding how the correspondence problem in visual-haptic integration might be solved for normal grasping. On its own, however, it cannot explain our finding that, during tool-use, participants (appropriately) showed near-optimal integration of spatially offset visual and haptic signals. How might this be achieved? 
In motor control an important concept is that of a forward model: the visuo-motor system uses a copy of the motor commands to predict the future state, for example the position and velocity of a moving limb (Wolpert, Ghahramani, & Jordan, 1995). Such an approach can naturally be extended to include representation of tool position, allowing the system to predict which visual objects will touch the tool. Therefore, in the moment the mechano-receptors in the fingertip signal contact with an object, the brain has already predicted this stimulus and assigns the proximal sensory stimulus as arising from a spatial location that is the actual source of the signal: the tool-tip. 
This idea is consistent with evidence from several domains. Informally, many of us have experienced the sensation that, when we pick up a tool, we can “feel” with its tip, even though haptic signals originate at the hand. Results of a number of empirical studies also support this idea. Studies of visual spatial attention have suggested that tools are “incorporated” into our sense of where our bodies are in space—the so-called body schema (for a review see Maravita & Iriki, 2004; see also Bonifazi, Farnè, Rinaldesi, & Làdavas, 2007; Farnè & Làdavas, 2000; Holmes, Calvert, & Spence, 2004; Massen & Prinz, 2007). Perhaps most strikingly, in a widely cited physiological study, Iriki and colleagues found that when a monkey uses a tool the receptive fields of cells responding to locations around the hand extended to include the region surrounding the tool (Iriki, Tanaka, & Iwamura, 1996). Thus we hypothesize that the correspondence problem in multi-sensory integration is not solved on a spatial map of the locations of the sensory stimuli, but is instead based on a spatial map of the inferred sources of those stimuli, a process that requires knowledge of the viewing geometry and the dynamics of the manipulated object and tools. 
Alternative explanations
While forward models provide an appealing explanation of our results, we need to consider simpler mechanisms that could underlie the observed behavior. 
First, the decision whether to integrate or not may be made based on visual information alone. Two previous studies have reported integration of visual and haptic signals despite a large spatial discrepancy. Di Luca, Ernst, and Adams ( 2008) found that visual-haptic integration was induced when spatially separated objects were made to appear as two parts of the same object, by covering the gap between them with an occluder (a phenomenon known as amodal completion). Also, Helbig and Ernst ( 2007), in a study much more closely related to ours, found near-optimal integration of visual and haptic shape information when a visual object was seen straight ahead (in a mirror), but the haptic signal was in a different location, provided that the hand was seen touching the object. 
One account of these findings is that visual information provided compelling evidence that the two signals referred to the same object, causing any spatial discrepancy between the signals to be overridden. In terms of the above models, this can be implemented by making the coupling prior for spatial location flat (i.e. all combinations of the spatial locations of the two sensory estimates are considered equally likely). The estimate of the probability that the signals have a common source will be unaffected by spatial offset, and will be made on the basis of other relevant dimensions (size/shape, time of occurrence etc.). 
Such a strategy—in effect a ‘mode’ in which any spatial discrepancy is simply ignored—could account for the results of Experiment 2, in which visual-haptic integration occurred independent of spatial offset. It cannot, however, explain the results of Experiment 3, in which we found a systematic reduction in the degree of cross-modal integration with spatial offset of the tool tip from the visual object. This finding suggests instead that the correspondence problem was solved appropriately, trial-by-trial. 
On first inspection, Helbig and Ernst's ( 2007) result suggests that merely seeing the moving hand apparently in contact with the object is sufficient to determine that visual and haptic signals should be integrated. Could only vision of the finger cursors and the tool have provided this information in our experiments? This seems unlikely for two reasons. First, our procedure differed from Helbig and Ernst's ( 2007) in a key respect. Their participants could see their hand while they actively explored the object, whereas in our experiments, the hand was never visible, and visual feedback about the tool was extinguished prior to contact with the haptic object. Therefore in our task, participants needed to predict the position of the tool tip based on the motor commands to solve the correspondence problem appropriately. Second, and more generally, it is unclear that vision alone is sufficient to identify a hand (or tool) as being one's own. As Helbig and Ernst ( 2007) point out, this process, too, presumably depends on a comparison of the expected and actual visible movements of the hand, given the motor commands. 
No visuo-motor adaptation during tool use
A second alternative explanation of our results is that changes in integration are caused not by the brain predicting the positions of the tool tips, but by visuo-motor adaptation of the estimated hand position (Ernst, 2005). In our experiment, according to this explanation, the estimated hand position is shifted towards the tip of the tool. The fixed rule of comparing hand and object position can then still be applied to solve the correspondence problem. 
The long history of prism adaptation studies demonstrates that the mapping between visual and motor space can readily be changed (Redding & Wallace, 1997). In our tool conditions, therefore, the magnitudes and signs of the spatial offsets were randomly interleaved on a trial-by-trial basis (with a mean of zero), to try to prevent a consistent “error signal” required for adaptation to occur. Nonetheless, it remains possible that adaptation occurred rapidly, on single trials, in the period during which the tool was visible. To explore this we ran a control experiment in which we examined whether the felt position of the fingers adapted while using our tool (for details see 2). Participants were asked to make grasping movements similar to those in Experiments 1Experiments 3, under two conditions: (i) a no-tool condition, in which the visible cursors indicating finger position were offset horizontally from the actual finger position (this was expected to induce adaptation), and (ii) a tool condition, similar to Experiments 2 and 3, using tool lengths that matched the offsets in the no-tool condition. Trials were blocked by condition, to provide a good opportunity for adaptation to occur. At regular intervals we probed adaptation by asking participants to place their unseen fingers at the location of two visual crosses. 
As expected, in the no-tool condition participants showed a clear adaptation effect of between 50 and 60% of the spatial offset between the cursors and actual finger position. However, there was no adaptation in the felt position of the fingers in the tool condition, although the offset between the tool-tip and hand was identical to the no-tool condition. Thus, these results argue strongly that no visual-haptic remapping of the felt finger position occurred during tool use, and that the changed integration observed in Experiments 2 and 3 must be due to a process that is dissociable from normal visuo-motor adaptation. 
Causal inference in sensory integration
While the mechanisms underlying visual-haptic integration during tool-use remain to be determined, our data suggest that the visuo-motor system is able to correctly decide when it is appropriate to combine visual and haptic signals under the conditions of spatial transformation imposed by tool use. These findings are consistent with the broader idea that the perceptual system routinely makes inferences about the causal structure of sensory signals (Körding et al., 2007). That is, in this case, the decision to integrate or not is made on the distal causes of the sensory stimuli, rather than simply on their spatial proximity. 
The results of Gepshtein et al.'s ( 2005) study, and of our no-tool experiment, also suggest that this process may be obligatory. In both studies, the visual and haptic stimuli were the only signals presented to participants in an otherwise impoverished sensory environment. Moreover, the stimuli were perfectly correlated on every stimulus interval in terms of magnitude, and time of occurrence, differing only in terms of spatial offset. An ideal observer might therefore be expected to integrate these signals because both were informative about which interval contained the larger “object”. Yet even under these circumstances a (relatively small) spatial offset between the felt and seen locations of an object reliably reduced sensory integration in both studies. This suggests that the solution to the correspondence problem is provided by an automatic and robust process, which cannot simply be “switched off”, even if the task performed would favor such a strategy. 
Conclusions
We have found that humans integrate size information from vision and haptics in a near-optimal fashion when using a simple tool that introduces a spatial offset between the spatial locations of the two signals. Moreover, we observed a fall-off in cross-modal integration with increasing offset of the tool-tip from the visual object. This is consistent with the brain deciding that the haptic signal originated at the tool tip, rather than the hand. We suggest that the brain therefore infers appropriately the causal structure of the two signals and integrates them only when it is appropriate to do so. Whether this is achieved by constructing a forward model of the tool, or a more straightforward “strategy”, remains to be determined. Nonetheless, it appears the cross-modal integration process is more sophisticated than previously thought, and can take into account the dynamics and geometry of tools. 
Appendix A
Individual data
Figure A1 plots individual participant's data from Experiment 1 (no-tool condition). The data are plotted in the same format as Figure 5 (see caption for details). 
Figure A1
 
The effects of hand-object offset on discrimination performance in Experiment 1 (no tool), plotted separately for each participant. Mean size-discrimination performance (JND) is plotted as a function of spatial offset between the visual and haptic stimuli. The black and gray horizontal lines show mean discrimination performance from vision- and haptics-alone, respectively. The dashed line shows performance if the signals are integrated optimally, calculated from the single-modality JNDs using Equation 1. Error bars denote the standard error of the estimate of σ, the standard deviation of the psychometric function.
Figure A1
 
The effects of hand-object offset on discrimination performance in Experiment 1 (no tool), plotted separately for each participant. Mean size-discrimination performance (JND) is plotted as a function of spatial offset between the visual and haptic stimuli. The black and gray horizontal lines show mean discrimination performance from vision- and haptics-alone, respectively. The dashed line shows performance if the signals are integrated optimally, calculated from the single-modality JNDs using Equation 1. Error bars denote the standard error of the estimate of σ, the standard deviation of the psychometric function.
Figure A2 plots individual participant's data from Experiment 2 (zero tool-object offset condition; see caption for details). 
Figure A2
 
The effects of hand-object offset on discrimination performance in Experiment 2 (zero tool-object offset), plotted separately for each participant. Mean JNDs are plotted as a function of spatial offset between visual and haptic stimuli. The horizontal lines show mean single-modality discrimination performance and predicted optimal performance, as in Figure A1. Error bars denote the standard error of the estimate of σ, the standard deviation of the psychometric function.
Figure A2
 
The effects of hand-object offset on discrimination performance in Experiment 2 (zero tool-object offset), plotted separately for each participant. Mean JNDs are plotted as a function of spatial offset between visual and haptic stimuli. The horizontal lines show mean single-modality discrimination performance and predicted optimal performance, as in Figure A1. Error bars denote the standard error of the estimate of σ, the standard deviation of the psychometric function.
Figure A3 plots individual participant's data from Experiment 3 (variable tool-object offset condition; see caption for details). 
Figure A3
 
Mean size-discrimination performance (JND) for the tool with variable tip-offset (Experiment 3). Negative offsets are denoted by green squares, and positive offsets by green triangles. The data are plotted as a function of the tool-object offset (the offset between the tool-tip and the visual stimulus). The horizontal lines show mean single-modality discrimination performance and predicted optimal performance, as in previous figures. Error bars again denote the standard error of the estimate of σ, the standard deviation of the psychometric function.
Figure A3
 
Mean size-discrimination performance (JND) for the tool with variable tip-offset (Experiment 3). Negative offsets are denoted by green squares, and positive offsets by green triangles. The data are plotted as a function of the tool-object offset (the offset between the tool-tip and the visual stimulus). The horizontal lines show mean single-modality discrimination performance and predicted optimal performance, as in previous figures. Error bars again denote the standard error of the estimate of σ, the standard deviation of the psychometric function.
Appendix B
Does the felt position of the fingers adapt during tool use?
We ran an adaptation experiment to examine whether, in our tool conditions, the felt position of the fingers was remapped onto the location of the tool tips. There were four participants, three of whom completed the earlier experiments. An experimental trial was very similar to one interval of Experiment 1 or 2. The visual object was a plain white rectangular object, of a constant size (50 × 50 mm). We used the force-feedback devices to limit movement to a fronto-parallel plane (see Figure 2), to simplify data analysis. The visual stimulus was presented alternately on the left and right side of the workspace. The exact x and y position of the stimulus was selected at random within a 100 × 100 mm region, centered either 75 mm to the left or right of the body midline. This ensured that participants made significant movements on consecutive trials to reach the stimulus. The start zones operated as before, and the tool and/or cursors were again extinguished when the participant started to move their finger towards the object. Participants were not required to make any judgement. 
Along with a baseline condition, in which the cursors indicated the true position of the finger and thumb, participants repeatedly grasped the stimulus under (i) two cursor-offset conditions, in which the cursors were offset to the left of the physical position of the finger/thumb by either 50 or 100 mm but no tool was presented, and (ii), tool conditions, in which the tool length was either 50 or 100 mm. Periodically, participants completed a test trial, in which they were required to place their finger and thumb at the position of two crosses (whose position was determined in the same way as the start zones). Participants received no visual feedback in the test trials, and we recorded the position of each digit. 
The experimental protocol ( Figure B1) started with 10 baseline trials, with a test trial after every five trials. Each tool/offset condition consisted of 36 adaptation trials, with a test trial after every six trials. After each adaptation condition there was an additional 18 trial baseline period, with a test trial after every six trials. This was a ‘washout’ period, designed to eliminate any adaptation from the previous condition. Each participant completed two complete blocks (72 trials in each tool/offset condition). 
Figure B1
 
Example of one ‘block’ of the experiment. The tool (50 or 100 mm) and finger/thumb offset (50 or 100 mm) conditions were randomly ordered within each block. Each participant completed two such blocks. The cursors were always offset to the left of the true hand position, and the tool also extended to the left of the hand. Test trials, in which we measured the amount of adaptation in the felt position of the fingers, occurred after every five trials in the initial baseline period, and after every six trials thereafter (see text for details).
Figure B1
 
Example of one ‘block’ of the experiment. The tool (50 or 100 mm) and finger/thumb offset (50 or 100 mm) conditions were randomly ordered within each block. Each participant completed two such blocks. The cursors were always offset to the left of the true hand position, and the tool also extended to the left of the hand. Test trials, in which we measured the amount of adaptation in the felt position of the fingers, occurred after every five trials in the initial baseline period, and after every six trials thereafter (see text for details).
Figure B2 plots the average error (in x-position) of the thumb/finger positions across all test trials, averaged across participants (there were no significant changes in adaptation within conditions, so we averaged across all trials). Positive errors mean the fingers were positioned to the right of the test targets. The blue bars are identical in each case and show the baseline condition. The red bars show the experimental conditions. Figure B2 shows that there was a small overall bias in participants' responses: they positioned their fingers approximately 20 mm to the right of the test target in the baseline condition. Compared to this, it can be seen that offsetting the cursors to the left of the true finger position caused participants' to make significant positive errors on the test trials, by approximately 50–60% of the cursor offset. This indicates adaptation of the felt position of the fingers, as was to be expected from conventional visuo-motor adaptation studies. In contrast, the errors made in the tool conditions were indistinguishable from the baseline condition. This indicates that there was no adaptation of the felt position of the fingers when using the tool, even though the same tool was used repeatedly on consecutive trials. We therefore conclude that it is very unlikely that our main tool effects can be attributed to visuo-motor adaptation of the felt hand position (see Discussion). 
Figure B2
 
Average error in the x-position of the finger and thumb on test trials, for all participants. The blue bars represent the baseline condition. They are re-plotted to allow comparison with each adaptation condition (red bars). Error bars denote ±1 SEM.
Figure B2
 
Average error in the x-position of the finger and thumb on test trials, for all participants. The blue bars represent the baseline condition. They are re-plotted to allow comparison with each adaptation condition (red bars). Error bars denote ±1 SEM.
Acknowledgments
These data were presented at the annual meeting of the Vision Sciences Society in 2008. Supported by the Overseas Research Students Awards Scheme (CT), the Biotechnology and Biological Sciences Research Council (JD) and the Engineering and Physical Sciences Research Council (CT, SJW). Thanks to Kevin MacKenzie for helpful comments. 
Commercial relationships: none. 
Corresponding author: Simon J. Watt. 
Address: School of Psychology, Bangor University, Adeilad Brigantia, Penrallt Rd., Bangor, Gwynedd, Wales, LL57 2AS, UK. 
References
Bonifazi, S. Farnè, A. Rinaldesi, L. Làdavas, E. (2007). Dynamic size-change of peri-hand space through tool-use: Spatial extension or shift of the multi-sensory area. Journal of Neuropsychology, 1, 101–114. [PubMed] [CrossRef] [PubMed]
Bresciani, J. P. Dammeier, F. Ernst, M. O. (2006). Vision and touch are automatically integrated for the perception of sequences of events. Journal of Vision, 6(5):2, 554–564, http://journalofvision.org/6/5/2/, doi:10.1167/6.5.2. [PubMed] [Article] [CrossRef]
Clark, J. J. Yuille, A. L. (1990). Data fusion for sensory information processing systems. Boston: Kluwer Academic Publishers.
Di Luca, M. Ernst, M. Adams, W. (2008). Amodal multimodal integration [Abstract]. Journal of Vision, 8(6):526, 526a, http://journalofvision.org/8/6/526/, doi:10.1167/8.6.526.
Ernst, M. O. Knoblich,, G. Thornton,, I. M. Grosjean,, M. Shiffrar, M. (2005). A Bayesian view on multimodal cue integration. Human body perception from the inside out. –131). New York: Oxford University Press.
Ernst, M. O. (2007). Learning to integrate arbitrary signals from vision and touch. Journal of Vision, 7(5):7, 1–14, http://journalofvision.org/7/5/7/, doi:10.1167/7.5.7. [PubMed] [Article] [CrossRef] [PubMed]
Ernst, M. O. Banks, M. S. (2002). Human integrate visual and haptic information in a statistically optimal fashion. Nature, 415, 429–433. [PubMed] [CrossRef] [PubMed]
Farnè, A. Làdavas, E. (2000). Dynamic size-change of hand peripersonal space following tool use. Neuroreport, 11, 1645–1649. [PubMed] [CrossRef] [PubMed]
Gepshtein, S. Banks, M. S. (2003). Viewing geometry determines how vision and haptics combine in size perception. Current Biology, 13, 483–488. [PubMed] [Article] Gepshtein, S. Burge, J. Ernst, M. O. Banks, M. S. (2005). The combination of vision and touch depends on spatial proximity. Journal of Vision, 5(11):7, 1013 [CrossRef] [PubMed]
Gepshtein, S. Burge, J. Ernst, M. O. Banks, M. S. (2005). The combination of vision and touch depends on spatial proximity. Journal of Vision, 5(11):7, 1013–1023 http://journalofvision.org/5/11/7/, doi:10.1167/5.11.7. [PubMed] [Article] Gepshtein, S. Burge, J. Ernst, M. O. Banks, M. S. (2005). The combination of vision and touch depends on spatial proximity. Journal of Vision, 5(11):7, 1013 [CrossRef]
Gepshtein, S. Burge, J. Ernst, M. O. Banks, M. S. (2005). The combination of vision and touch depends on spatial proximity. Journal of Vision, 5(11):7, 1013–1023 http://journalofvision.org/5/11/7/, doi:10.1167/5.11.7. [PubMed] [Article] [CrossRef]
Helbig, H. B. Ernst, M. O. (2007). Knowledge about a common source can promote visual-haptic integration. Perception, 36, 1523–1533. [PubMed] [CrossRef] [PubMed]
Hillis, J. M. Watt, S. J. Landy, M. S. Banks, M. S. (2004). Slant from texture and disparity cues: Optimal cue combination. Journal of Vision, 4(12):1, 967–992, http://journalofvision.org/4/12/1/, doi:10.1167/4.12.1. [PubMed] [Article] [CrossRef] [PubMed]
Holmes, N. P. Calvert, G. A. Spence, C. (2004). Extending or projecting peripersonal space with tools Multisensory interactions highlight only the distal and proximal ends of tools. Neuroscience Letters, 372, 62–67. [PubMed] [CrossRef] [PubMed]
Iriki, A. Tanaka, M. Iwamura, Y. (1996). Coding of modified body schema during tool use by macaque postcentral neurons. Neuroreport, 7, 2325–2330. [PubMed] [CrossRef] [PubMed]
Jackson, C. V. (1953). Visual factors in auditory localization. Quarterly Journal of Experimental Psychology, 5, 52–65. [CrossRef]
Knill, D. C. (2007). Robust cue integration: A Bayesian model and evidence from cue-conflict studies with stereoscopic and figure cues to slant. Journal of Vision, 7(7):5, 1–24, http://journalofvision.org/7/7/5/, doi:10.1167/7.7.5. [PubMed] [Article] [CrossRef] [PubMed]
Knill, D. C. Pouget, A. (2004). The Bayesian brain: The role of uncertainty in neural coding and computation. Trends in Neuroscience, 27, 712–719. [PubMed] [CrossRef]
Knill, D. C. Saunders, J. A. (2003). Do humans optimally integrate stereo and texture information for judgments of surface slant? Vision Research, 43, 2539–2558. [PubMed] [CrossRef] [PubMed]
Körding, K. P. Beierholm, U. Ma, W. J. Quartz, S. Tenenbaum, J. B. Shams, L. (2007). Causal inference in multisensory perception. PLoS ONE, 2, e943. [PubMed] [Article] [CrossRef] [PubMed]
Landy, M. S. Maloney, L. T. Johnston, E. B. Young, M. (1995). Measurement and modeling of depth cue combination: In defense of weak fusion. Vision Research, 35, 389–412. [PubMed] [CrossRef] [PubMed]
Maravita, A. Iriki, A. (2004). Tools for the body (schema). Trends in Cognitive Sciences, 8, 79–86. [PubMed] [CrossRef] [PubMed]
Massen, C. Prinz, W. (2007). Programming tool-use actions. Journal of Experimental Psychology: Human Perception and Performance, 33, 692–704. [PubMed] [CrossRef] [PubMed]
Oruç, I. Maloney, L. T. Landy, M. S. (2003). Weighted linear cue combination with possibly correlated error. Vision Research, 43, 2451–2468. [PubMed] [CrossRef] [PubMed]
Redding, G. M. Wallace, B. (1997). Adaptive spatial alignment. Mahwah, NJ: Erlbaum.
Roach, N. W. Heron, J. McGraw, P. V. (2006). Resolving multisensory conflict: A strategy for balancing the costs and benefits of audio-visual integration. Proceedings of the Royal Society of London B: Biological Sciences, 273, 2159–2168. [PubMed] [Article] [CrossRef]
Shams, L. Ma, W. J. Beierholm, U. (2005). Sound-induced flash illusion as an optimal percept. Neuroreport, 16, 1923–1927. [PubMed] [CrossRef] [PubMed]
Warren, D. H. Cleaves, W. T. (1971). Visual-proprioceptive interaction under large amounts of conflict. Journal of Experimental Psychology, 90, 206–214. [PubMed] [CrossRef] [PubMed]
Witkin, H. A. Wapner, S. Leventhal, T. (1952). Sound localization with conflicting visual and auditory cues. Journal of Experimental Psychology, 43, 58–67. [PubMed] [CrossRef] [PubMed]
Wolpert, D. M. Ghahramani, Z. Jordan, M. I. (1995). An internal model for sensorimotor integration. Science, 269, 1880–1882. [PubMed] [CrossRef] [PubMed]
Yuille, A. L. Bülthoff, H. H. Knill, D. C. Richards, W. (1996). Bayesian decision theory and psychophysics. Perception as Bayesian inference. –161). Cambridge: Cambridge University Press.
Figure 1
 
A cartoon illustrating the spatial offset between visual and haptic signals to object size when using a simple tool, comprised of two sticks rigidly attached to the finger and thumb, respectively. We used a “virtual” tool of this kind in our experiments.
Figure 1
 
A cartoon illustrating the spatial offset between visual and haptic signals to object size when using a simple tool, comprised of two sticks rigidly attached to the finger and thumb, respectively. We used a “virtual” tool of this kind in our experiments.
Figure 2
 
Stimulus orientation was defined as the surface slant (relative to the line of sight). As stimulus orientation approaches 0 deg (fronto-parallel) performance is increasingly dependent on a depth estimate from binocular disparity, and thresholds increase (Gepshtein & Banks, 2003).
Figure 2
 
Stimulus orientation was defined as the surface slant (relative to the line of sight). As stimulus orientation approaches 0 deg (fronto-parallel) performance is increasingly dependent on a depth estimate from binocular disparity, and thresholds increase (Gepshtein & Banks, 2003).
Figure 3
 
A schematic of the stimuli in Experiment 1 (top row) and Experiment 2 (bottom row). The textured rectangles show the visual object, defined by two random-dot-stereogram planes. The gray rectangles show the haptic stimulus (not visible to participants). The gray spheres indicated the position of the finger and thumb. The hand was not visible. The tool was shown visually as a pair of parallel “sticks” rigidly attached to the finger and thumb, and was constrained to always lie horizontal. Grasping with the tool was simulated by spatially offsetting the visual and haptic stimuli horizontally. Thus when the tool-tip touched the visual object, force was generated at the finger and thumb. We could not simulate rotation forces on the digits because the experimental design required that visual and haptic stimuli were exactly matched across tool and no-tool experiments. Left and right columns show negative and positive spatial offsets between visual and haptic stimuli, respectively (referred to as hand-object offset). The tool and finger-position spheres were extinguished before presentation of the haptic and visual objects (see Procedure). The information available to make size discrimination judgements was therefore identical in both experiments.
Figure 3
 
A schematic of the stimuli in Experiment 1 (top row) and Experiment 2 (bottom row). The textured rectangles show the visual object, defined by two random-dot-stereogram planes. The gray rectangles show the haptic stimulus (not visible to participants). The gray spheres indicated the position of the finger and thumb. The hand was not visible. The tool was shown visually as a pair of parallel “sticks” rigidly attached to the finger and thumb, and was constrained to always lie horizontal. Grasping with the tool was simulated by spatially offsetting the visual and haptic stimuli horizontally. Thus when the tool-tip touched the visual object, force was generated at the finger and thumb. We could not simulate rotation forces on the digits because the experimental design required that visual and haptic stimuli were exactly matched across tool and no-tool experiments. Left and right columns show negative and positive spatial offsets between visual and haptic stimuli, respectively (referred to as hand-object offset). The tool and finger-position spheres were extinguished before presentation of the haptic and visual objects (see Procedure). The information available to make size discrimination judgements was therefore identical in both experiments.
Figure 4
 
Discrimination performance (JND) of one example participant in vision- (blue closed circles) and haptic-alone (red open circles) conditions plotted as a function of stimulus orientation. The dashed line denotes the orientation that provided a close match between the precision of size estimates from the two cues. Each participant completed the cross-modal conditions using his or her ‘JND-matched’ stimulus orientation. Error bars denote ±1 standard error.
Figure 4
 
Discrimination performance (JND) of one example participant in vision- (blue closed circles) and haptic-alone (red open circles) conditions plotted as a function of stimulus orientation. The dashed line denotes the orientation that provided a close match between the precision of size estimates from the two cues. Each participant completed the cross-modal conditions using his or her ‘JND-matched’ stimulus orientation. Error bars denote ±1 standard error.
Figure 5
 
The effects of hand-object offset on discrimination performance in Experiment 1 (no tool). Mean size-discrimination performance (JND) is plotted as a function of spatial offset between the visual and haptic stimuli. The black and gray horizontal lines show mean discrimination performance from vision- and haptics-alone, respectively. The dashed line shows performance if the signals are integrated optimally, calculated from the single-modality JNDs using Equation 1. Error bars denote ±1 SEM.
Figure 5
 
The effects of hand-object offset on discrimination performance in Experiment 1 (no tool). Mean size-discrimination performance (JND) is plotted as a function of spatial offset between the visual and haptic stimuli. The black and gray horizontal lines show mean discrimination performance from vision- and haptics-alone, respectively. The dashed line shows performance if the signals are integrated optimally, calculated from the single-modality JNDs using Equation 1. Error bars denote ±1 SEM.
Figure 6
 
The effects of hand-object offset on discrimination performance in Experiment 2 (zero tool-object offset, red circles). Mean JNDs are plotted as a function of spatial offset between visual and haptic stimuli. The blue diamonds show the results of the no-tool condition (Experiment 1), re-plotted from Figure 5 (error bars removed for clarity). The horizontal lines again show mean single-modality discrimination performance and predicted optimal performance. Error bars denote ±1 SEM.
Figure 6
 
The effects of hand-object offset on discrimination performance in Experiment 2 (zero tool-object offset, red circles). Mean JNDs are plotted as a function of spatial offset between visual and haptic stimuli. The blue diamonds show the results of the no-tool condition (Experiment 1), re-plotted from Figure 5 (error bars removed for clarity). The horizontal lines again show mean single-modality discrimination performance and predicted optimal performance. Error bars denote ±1 SEM.
Figure 7
 
A schematic of the stimuli in Experiment 3. Here the length of the tool was always 50 mm, and the tips of the tool were spatially offset form the visual stimulus position by ±0, 50 or 100 mm (variable tool-object offset). It was possible to have zero tool-object offset with the hand to the left or right of the visual stimulus, so we collected data for both configurations. As before the tool and finger/thumb spheres were extinguished before the to-be-judged stimuli were presented.
Figure 7
 
A schematic of the stimuli in Experiment 3. Here the length of the tool was always 50 mm, and the tips of the tool were spatially offset form the visual stimulus position by ±0, 50 or 100 mm (variable tool-object offset). It was possible to have zero tool-object offset with the hand to the left or right of the visual stimulus, so we collected data for both configurations. As before the tool and finger/thumb spheres were extinguished before the to-be-judged stimuli were presented.
Figure 8
 
Mean size-discrimination performance (JND) for the tool with variable tip-offset (Experiment 3). Negative offsets are denoted by green squares, and positive offsets by triangles. The data from the no-tool condition (Experiment 1, blue diamonds) are plotted for comparison purposes. The data for the tool experiment are plotted as a function of the tool-object offset (the offset between the tool-tip and the visual stimulus). The data for the no-tool experiment are plotted as a function of the hand-object offset (the offset between the visual and haptic stimuli). The horizontal lines show the single cue JNDs for vision (black solid line) and haptics (gray solid line), and the predicted two-cue performance (dashed line). Error bars denote ±1 SEM.
Figure 8
 
Mean size-discrimination performance (JND) for the tool with variable tip-offset (Experiment 3). Negative offsets are denoted by green squares, and positive offsets by triangles. The data from the no-tool condition (Experiment 1, blue diamonds) are plotted for comparison purposes. The data for the tool experiment are plotted as a function of the tool-object offset (the offset between the tool-tip and the visual stimulus). The data for the no-tool experiment are plotted as a function of the hand-object offset (the offset between the visual and haptic stimuli). The horizontal lines show the single cue JNDs for vision (black solid line) and haptics (gray solid line), and the predicted two-cue performance (dashed line). Error bars denote ±1 SEM.
Figure A1
 
The effects of hand-object offset on discrimination performance in Experiment 1 (no tool), plotted separately for each participant. Mean size-discrimination performance (JND) is plotted as a function of spatial offset between the visual and haptic stimuli. The black and gray horizontal lines show mean discrimination performance from vision- and haptics-alone, respectively. The dashed line shows performance if the signals are integrated optimally, calculated from the single-modality JNDs using Equation 1. Error bars denote the standard error of the estimate of σ, the standard deviation of the psychometric function.
Figure A1
 
The effects of hand-object offset on discrimination performance in Experiment 1 (no tool), plotted separately for each participant. Mean size-discrimination performance (JND) is plotted as a function of spatial offset between the visual and haptic stimuli. The black and gray horizontal lines show mean discrimination performance from vision- and haptics-alone, respectively. The dashed line shows performance if the signals are integrated optimally, calculated from the single-modality JNDs using Equation 1. Error bars denote the standard error of the estimate of σ, the standard deviation of the psychometric function.
Figure A2
 
The effects of hand-object offset on discrimination performance in Experiment 2 (zero tool-object offset), plotted separately for each participant. Mean JNDs are plotted as a function of spatial offset between visual and haptic stimuli. The horizontal lines show mean single-modality discrimination performance and predicted optimal performance, as in Figure A1. Error bars denote the standard error of the estimate of σ, the standard deviation of the psychometric function.
Figure A2
 
The effects of hand-object offset on discrimination performance in Experiment 2 (zero tool-object offset), plotted separately for each participant. Mean JNDs are plotted as a function of spatial offset between visual and haptic stimuli. The horizontal lines show mean single-modality discrimination performance and predicted optimal performance, as in Figure A1. Error bars denote the standard error of the estimate of σ, the standard deviation of the psychometric function.
Figure A3
 
Mean size-discrimination performance (JND) for the tool with variable tip-offset (Experiment 3). Negative offsets are denoted by green squares, and positive offsets by green triangles. The data are plotted as a function of the tool-object offset (the offset between the tool-tip and the visual stimulus). The horizontal lines show mean single-modality discrimination performance and predicted optimal performance, as in previous figures. Error bars again denote the standard error of the estimate of σ, the standard deviation of the psychometric function.
Figure A3
 
Mean size-discrimination performance (JND) for the tool with variable tip-offset (Experiment 3). Negative offsets are denoted by green squares, and positive offsets by green triangles. The data are plotted as a function of the tool-object offset (the offset between the tool-tip and the visual stimulus). The horizontal lines show mean single-modality discrimination performance and predicted optimal performance, as in previous figures. Error bars again denote the standard error of the estimate of σ, the standard deviation of the psychometric function.
Figure B1
 
Example of one ‘block’ of the experiment. The tool (50 or 100 mm) and finger/thumb offset (50 or 100 mm) conditions were randomly ordered within each block. Each participant completed two such blocks. The cursors were always offset to the left of the true hand position, and the tool also extended to the left of the hand. Test trials, in which we measured the amount of adaptation in the felt position of the fingers, occurred after every five trials in the initial baseline period, and after every six trials thereafter (see text for details).
Figure B1
 
Example of one ‘block’ of the experiment. The tool (50 or 100 mm) and finger/thumb offset (50 or 100 mm) conditions were randomly ordered within each block. Each participant completed two such blocks. The cursors were always offset to the left of the true hand position, and the tool also extended to the left of the hand. Test trials, in which we measured the amount of adaptation in the felt position of the fingers, occurred after every five trials in the initial baseline period, and after every six trials thereafter (see text for details).
Figure B2
 
Average error in the x-position of the finger and thumb on test trials, for all participants. The blue bars represent the baseline condition. They are re-plotted to allow comparison with each adaptation condition (red bars). Error bars denote ±1 SEM.
Figure B2
 
Average error in the x-position of the finger and thumb on test trials, for all participants. The blue bars represent the baseline condition. They are re-plotted to allow comparison with each adaptation condition (red bars). Error bars denote ±1 SEM.
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×