Abstract
The different pattern of recognition performance was found whether participants recognized 3-D objects in within- or cross-modality as they learned (Ueda & Saiki, 2007). The recognition performance showed viewpoint invariance in cross-modal recognition, whereas the learned viewpoint showed an advantage in within-modal recognition. In this study, we investigated eye movements to estimate strategies leading to different recognition performance between within- and cross-modal 3-D object recognition. An unfamiliar 3-D object was presented visually for 2 seconds, followed by a recognition test. Participants were told the test modality before the study phase, during which their eye movements were recorded. For the recognition test, the test stimuli were presented either visually (within-modality) or haptically (cross-modality) from various viewpoints, and participants responded as to whether or not it was the same as the object presented earlier. The patterns of eye movements during the learning phase were different depending on prespecified test modality. The distribution of fixation was significantly broader in inter-modal recognition than intra-modal recognition. Clustering of fixation data showed different patterns, possibly reflecting different learning strategies. Participants focused on connected portions of components in intra-modal recognition, suggesting that they used image features or relations among components, whereas they focused on the centroid of each component of stimuli in inter-modal recognition, suggesting that they regarded one component as one feature. The strategy in inter-modal recognition leads to viewpoint invariant performance because recognition using part shape is presumably less sensitive to object rotation. However, the strategies in intra-modal recognition lead to viewpoint dependent performance because recognition using image feature or spatial relations among components is vulnerable to the rotation of the object. These results suggest that recognition in intra-modal and inter-modal demand different information of 3-D objects, leading to different recognition performance.