In this study, we show that humans form highly similar perceptual spaces when they explore complex objects from a parametrically defined object space in the visual and haptic domains. For this, a three-dimensional parameter space of well-defined, shell-like objects was generated. Participants either explored two-dimensional pictures or three-dimensional, interactive virtual models of these objects visually, or they explored three-dimensional plastic models haptically. In all cases, the task was to rate the similarity between two objects. Using these similarity ratings and multidimensional scaling (MDS) analyses, the perceptual spaces of the different modalities were then analyzed. Looking at planar configurations within this three-dimensional object space, we found that active visual exploration led to a highly similar perceptual space compared to passive exploration, showing that participants were able to reconstruct the complex parameter space already from two-dimensional pictures alone. Furthermore, we found that visual and haptic perceptual spaces had virtually identical topology compared to that of the physical stimulus space. Surprisingly, the haptic modality even slightly exceeded the visual modality in recovering the topology of the complex object space when the whole three-dimensional space was explored. Our findings point to a close connection between visual and haptic object representations and demonstrate the great degree of fidelity with which haptic shape processing occurs.

*n*-dimensional space. The similarity between two objects is inversely related to the distance between two objects in this space, which can be understood as a topological representation of object properties in the brain. This perceptual space contains information about how many dimensions are perceived by humans, whether or not these dimensions correspond to the physical measures of the different entities, and how important the different physical measures are to humans.

*actively*explore the objects from different viewpoints.

*A,*sin

*β*and

*ɛ*

^{cotα }) were altered in five defined equidistant steps to construct a three-dimensional object space of 5 × 5 × 5 = 125 objects. The shape parameters can be verbalized as follows:

*A*changes the distance between aperture and tip of the shell, while sin

*β*corresponds to the symmetry of the object. Parameter

*ɛ*

^{cotα }corresponds to the number of convolutions. Note that since the shells have a rather complex shape, this verbalization of single dimensions was only possible

*post hoc,*that is, by varying the shape parameters along one dimension of the object space and observing the resulting shape change, in combination with understanding the process of object generation; all of these processes, of course were not transparent to the participants.

*shell*modifier resulting in a wall thickness of about 2 mm of the final plastic models. The surface was then smoothed using two iterations of the

*meshsmooth*modifier (this operation averages the normal direction of two neighboring vertices to smoothen the mesh; the reason for applying this modifier, however, was only to remove artifacts in the micro-mesh topology for printing—it had

*no effect*on the “visual” appearance). The objects were then printed using the EDEN250TM 16 micron layer 3-Dimensional Printing System of Objet, Belgium. The manufacturing process was performed in “high quality mode” (resolution:

*X*-axis 600 dpi = 42

*μ*m,

*Y*-axis 600 dpi = 42

*μ*m,

*Z*-axis 1600 dpi = 16

*μ*m) with a white acrylic-based photopolymer material, resulting in a hard, white, and opaque plastic model with a smooth surface. The resulting 3D objects weighed about 40 g. The maximum dimensions were 5 cm in depth, 10 cm in height, and 15 cm in width.

*visual 2D planes*), the two-dimensional pictures of the objects were presented on a screen (Figure 1b). Participants had to fixate a cross for 0.5 s before the first object appeared on the screen for 3 s. Then, the screen turned black for 0.5 s before the second object was presented for 3 s. After seeing both objects, participants had to rate the similarity between these two objects by pressing a button between 1 and 7.

*visual 3D planes*) should bridge the gap between visual perception of 2D pictures and free exploration of 3D objects. Therefore, the virtual objects were presented on the HMD (Figure 1d). Participants had 8 s to explore the first object visually from different angles by moving the interface. Then, the second object was presented for 8 s. Again participants were able to move the object freely. Afterward, they had to rate the similarity of the pair of objects by saying a number between 1 and 7 out loud. The rating was recorded by the experimenter.

*haptic 3D planes*), we compared visual object perception with haptic object perception. Three-dimensional plastic models were used (Figure 1c). Participants were blindfolded and seated in front of a table with a sound-absorbing surface. One object was placed between the hands of the participant. The experimenter gave the signal to start and the participant had 8 s to explore the object with both hands and no restrictions to the exploratory procedure. After exploring the object, the participant had to put the object back on the table and it was replaced by the second object. The experimenter again gave the signal to start and the participant had 8 s to explore the second object. After putting the object back on the table, the participant rated the similarity by saying a number between 1 and 7 out loud. The rating was recorded by the experimenter.

*whole three-dimensional object space*was analyzed to investigate whether the additional stimulus pairs would distract the percept of the stimulus configuration of the individual planes or dimensions. Every object was compared once with itself and once with every other object of the object space, resulting in 231 trials (21 * (21 − 1) / 2 + 21 = 231). Again, which object of the pair was presented first was arbitrarily chosen. The 231 pairs were shown randomly in one block. Every participant performed three blocks with different randomizations and could take a break after every block.

*visual 2D space*) was performed as described for Experiment 1 but with 231 trials instead of 84 trials.

*haptic 3D space*) was performed accordingly to Experiment 3 but was run on two consecutive days due to the length of the experiment. The first session was started with the regular 8 test trials to make participants familiar with the task and the range of objects. The second session was started with the same 8 test trials to refresh the participants' memory. Participants were able to take additional breaks within the blocks.

*t*-test on the values listed in Table 1.

Visual 2D planes | Visual 3D planes | Haptic 3D planes | Visual 2D space | Haptic 3D space | |
---|---|---|---|---|---|

Visual 2D planes | R = 0.872, SEM = 0.006 | R = 0.874, SEM = 0.003 | R = 0.848, SEM = 0.004 | R = 0.883, SEM = 0.003 | R = 0.871, SEM = 0.003 |

Visual 3D planes | R = 0.886, SEM = 0.005 | R = 0.867, SEM = 0.004 | R = 0.885, SEM = 0.002 | R = 0.878, SEM = 0.003 | |

Haptic 3D planes | R = 0.861, SEM = 0.006 | R = 0.860, SEM = 0.004 | R = 0.863, SEM = 0.004 | ||

Visual 2D space | R = 0.906, SEM = 0.003 | R = 0.892, SEM = 0.003 | |||

R = 0.861, SEM = 0.003 | R = 0.831, SEM = 0.003 | ||||

Haptic 3D space | R = 0.890, SEM = 0.005 | ||||

R = 0.824, SEM = 0.005 |

_{within}= 0.871, mean

_{between}= 0.868,

*t*(16) = 0.278,

*p*= 0.785). This shows that the task does not affect the similarity percept of single object pairs. Moreover, it shows that participants perceive the similarities among the objects in a highly similar fashion no matter if the objects are explored visually or haptically (see also Figures A4a and A4b in which the average visual and haptic similarity ratings were correlated and clearly show that visual and haptic object explorations lead to almost the same similarity percept). In addition, the correlations are high regardless of how many objects were compared within the experiments, which already hints at the robustness of the underlying perceptual processing.

*Procrustes*function of MATLAB. This function fits the points of the perceptual space to the points of the physical space by performing a linear transformation (translation, reflection, and orthogonal rotation. Note that these are valid operations, since MDS does not yield any absolute positions but relative positions in space). Its output consists of the distance of the points of the perceptual space to the physical space as the sum of squared errors (this value is indicated as “procrustes value” in the following sections. Low values indicate a better fit than high values, with a value of 0 for a perfect fit; all values are listed in Table 2). The resulting stimulus positions in perceptual space were marked with “stars” using the same stimulus colors as in the physical space (Figure 4).

Plane 1 | Plane 2 | Plane 3 | |
---|---|---|---|

Visual 2D planes | 0.0465 | 0.1394 | 0.2586 |

Visual 3D planes | 0.0807 | 0.0949 | 0.1979 |

Haptic 3D planes | 0.1542 | 0.1641 | 0.0523 |

Visual 2D space | 0.0929 | 0.0984 | 0.2400 |

Haptic 3D space | 0.0620 | 0.0978 | 0.0995 |

*Procrustes*function. The outermost contour line in Figures 4, A1, and A2 encloses 80% of the calculated perturbed MDS solutions. When overlaying two plots of different MDS maps (as done for Figures A1 and A2), the confidence ranges give a good impression of how different the two perceptual spaces really are. In addition, this method allows us to calculate a quantitative measure, that is, the overlap between two maps as determined by the enclosed confidence areas.

*Neighborhood test*: In order to get a measure of the faithfulness of the perceptual reconstruction, we determined the nearest neighbors of the perceptual and the physical spaces. That is, we took object 1 of the

*physical*space, calculated the distance between this point and all seven points of the average

*perceptual*space and determined which object was nearest. The number of points for which the correspondence was correct was then determined for each plane and each experiment.

*Random configuration*: Second, we arbitrarily distributed seven points within a circular area with a diameter of six (similarity ratings ranged between 1 and 7, resulting in a maximum distance of 6) to create 500 different random point configurations. The resulting configurations were then fit to the positions of the objects in physical space using the

*Procrustes*function and the procrustes values were determined. This distribution of fit values was then compared to the one obtained by fitting the experimental data to the physical space. Since the two distributions are not normally distributed, a Mann–Whitney

*U*-test was performed to compare the fit values. The same was repeated with 500 Y-shaped configurations. Therefore, seven points were randomly distributed along a Y-shaped form.

*Topology change*: Finally, we tested how the procrustes values would change, when participants would introduce local errors in the topology, such as changing the order of neighboring points in the topology. For this, the 500 MDS representing participants' data were shuffled by swapping two neighboring points. These configurations were again fit to the physical space and the two fit value distributions were compared using non-parametric tests.

*explicitly*asked to do so (see Figure A3).

_{MDS}= 0.111, median

_{random}= 0.781,

*U*= 3375,

*p*= 0.000, see Figure A5). Moreover, it is highly unlikely that random Y-shaped forms were perceived (median

_{MDS}= 0.111, median

_{Y}= 0.772,

*U*= 25,753,

*p*= 0.000).

_{MDS}= 0.111, median

_{shuffled}= 0.238,

*U*= 41,790,684,

*p*= 0.000). This test provides an even stronger indication that participants were, indeed, able to reconstruct the Y-shaped stimulus space and that the neighborhood relations were correctly identified.

_{Plane3}= 0.145) than plane 1 (median

_{Plane1}= 0.087,

*U*= 1,391,413,

*p*= 0.000) and plane 2 (median

_{Plane2}= 0.108,

*U*= 2,218,117,

*p*= 0.000) indicating that the combination of sin

*β*and

*A*seems to be less intuitive than the combination of sin

*β*and

*ɛ*

^{cotα }or

*ɛ*

^{cotα }and

*A*. This can also be seen in the size of the confidence ranges displayed in Figure 4, which approximate the stability of the MDS solutions. The confidence ranges are highest for plane 3 in the visual 2D and 3D planes experiment again pointing out that the parameter combination of sin

*β*and

*A*seems to be less intuitive.

*t*(313) = −9.989,

*p*= 0.000). This shows, on the one hand, that the different ways of object exploration in the different experiments lead to rather similar perceptual spaces. On the other hand, the three different planes are less congruent and thus topologically more different. The overlaps between the different planes (mean overlap plane 1 versus plane 2: mean = 26%, plane 1 versus plane 3: mean = 30%, plane 2 versus plane 3: mean = 33%) were compared in two-tailed

*t*-tests across experiments but did not reach significance (

*t*(68) = −0.517,

*p*= 0.606,

*t*(68) = −1,011,

*p*= 0.315,

*t*(68) = −0.512,

*p*= 0.610, respectively). Whereas the average data seem to show a difference between the planes, these results indicate that the inherent variance in the data across all conditions is too large to find a major effect.

_{exp1}= 0.134, median

_{exp4}= 0.104,

*U*= 1,095,052,

*p*= 0.207). Interestingly, the additional trials in the haptic modality seemed to result in an improvement in reconstruction quality (median

_{exp2}= 0.121, median

_{exp5}= 0.088,

*U*= 565,572,

*p*= 0.00).

*Procrustes*function and the distance between the points was calculated. In addition, we performed the three tests described in the previous section to verify the reconstruction quality of the resulting perceptual spaces: the Neighborhood test was used as a coarse measure of the preservation of neighborhood relations in the three-dimensional case, whereas the test against a random configuration and the topology change test were used to get a more quantitative measure of topological stability. Finally, after comparing the perceptual spaces of both modalities to the physical object space, the two perceptual spaces were compared to each other using the

*Procrustes*function.

*Procrustes*function. As described for the 2D topology in the previous section, the correspondence between the points of the physical and perceptual spaces was determined using the Neighborhood test. For the visual modality, 17 out of 21 objects showed direct correspondence. For the haptic modality, 18 objects showed direct correspondence. Given that the misassigned objects were surrounded by correctly assigned objects, we concluded that both for the visual and haptic modalities the topology and the neighborhood relations were correctly recovered. Given the highly complex objects that make up our three-dimensional physical object space (see Figure A3) and the large number of test trials involved, this result again points to a very good reconstruction quality.

_{random}= 0.900,

*p*

_{visual}= 0.000,

*p*

_{haptic}= 0.000, Figure A6). In the topology change test, in which the 21 objects were randomly distributed along three orthogonal Y-shaped configurations, we also found highly different procrustes values compared to those of the observed perceptual spaces (median

_{Y}= 0.898,

*p*

_{visual}= 0.000,

*p*

_{haptic}= 0.000). Finally, we tested how the procrustes values would change when participants would “change” the order of neighboring points. Again, this analysis led to a significant increase in the procrustes values both for the visual and haptic modalities (median

_{visual}= 0.219,

*p*= 0.000, median

_{haptic}= 0.181,

*p*= 0.000) and shows (as already stated in the 2D topology section) that participants also locally identified the neighborhood relations correctly.

_{random}= 0.937,

*p*= 0.000) and also significantly better than for random Y-shaped configurations (median

_{Y}= 0.935,

*p*= 0.000). Introducing changes into the topology also led to significantly higher procrustes values (median

_{shuffled}= 0.225,

*p*= 0.000). The neighborhood test was performed comparing always seven points per plane and recovered 16 of 21 correctly assigned objects, with the misassigned objects surrounded by correctly assigned objects. Thus, the two-dimensional perceptual space in the visual domain still recovers the Y-shaped configurations correctly. For the two-dimensional haptic perceptual space, we also found a better fit than for 21 arbitrarily distributed points as well as random Y-shaped configurations (median

_{random}= 0.937,

*p*= 0.000, median

_{Y}= 0.935,

*p*= 0.000). For changes in the order on the Y-shaped configuration (median

_{shuffled}= 0.224,

*p*= 0.238), however, the fit did not seem better. In the neighborhood test, 16 out of 21 objects were correctly assigned. Among the other 5 objects, however, the immediate neighborhood was less well preserved, indicating a slightly worse reconstruction of the full topology in the two-dimensional haptic perceptual space.

*three*dimensions. Despite this increased complexity, participants were still able to form a perceptual space based on haptic exploration that has a high degree of similarity to the physical object space.

*ɛ*

^{cotα }, the number of convolutions (mostly referred to as “bulkiness” by the participants), while visually sin

*β,*or the symmetry, corresponds to the first dimension. Haptically, the second dimension corresponds to sin

*β,*while visually it is

*ɛ*

^{cotα }. For both modalities, parameter

*A,*the distance between aperture and tip, corresponds to the third dimension. This parameter is well recovered in the haptic data, whereas for the visual modality this parameter plays only a minor role given that two dimensions might already be enough to explain the data. When going to the three-dimensional solution, however, the third dimension does result in a more faithful reconstruction of the physical space. Since we found that active exploration of the visual stimuli helped to better recover plane 3, which is varying along this parameter, such a condition might also result in a better, fully three-dimensional visual reconstruction. Further experiments will need to be run to confirm this hypothesis.