We compared face recognition performance across changes in viewpoint in two-dimensional and three-dimensional conditions and made the following predictions: (a) performance in the task that involves a transformation across views would be facilitated by the addition of 3D depth cues, (b) volume-rich properties of certain facial features would be attended more in 3D condition, and (c) interpupillary distance should affect the strength of the depth effect in 3D images and, consequently, the face recognition performance during 3D viewing and in this condition only.
The results of both experiments yielded a significant advantage of the stereoscopic viewing condition for recognition accuracy. By conducting
Experiment 2 with a new, larger stimulus set, we confirmed that neither the anaglyph image nor the glasses alone (i.e., the low-level differences between viewing conditions) but the combination of both resulting in a “stereo depth” effect leads to higher accuracy rates. This finding is in accordance with the results of Burke et al. (
2007) and Hill, Schyns, and Akamatsu (
1997), since it shows that face recognition across viewpoint change becomes easier with the addition of stereoscopic depth information. However, the present results differ from the findings of Liu et al. (
2006), who did not find a difference in performance between stereo and mono conditions. We believe that an explanation for these contrasting results may lie in the nature of the tasks used in different studies. In the second study of Liu et al. (
2006) participants performed a sequential matching task, in which they were asked to decide whether a face seen at learning and a test face presented afterward belonged to the same individual. As indicated by the reported sensitivity scores (about
d′ = 3.0 for both stereo and mono conditions), this task might have been very easy since the participants obtained a high performance level, both when relying solely on 2D information and on 3D information. Therefore, additional stereoscopic information might not have provided any significant benefit when performing a very easy task. Likewise, in a similar sequential same–different task employed by Burke et al. (
2007) the overall face recognition performance did not differ significantly between stereo and mono conditions. What differed, however, was the ability to recognize faces across viewpoint transformations, which improved with the addition of stereoscopic information.
Based on the above considerations, we surmise that the effect of depth may be revealed only when the recognition task is perceptually challenging (e.g., requiring generalization across viewpoints). In particular, selecting the correct match for a stimulus face out of multiple choices might have added the needed level of complexity to the task to reveal differences due to depth information. Note also that in our task, the test items were always presented in a different view than the previously seen stimulus image, which forced the observers to perform across-viewpoint generalization. Therefore, based on the results of the current study, we agree with Jiang et al. (
2009) that learning overtime the three-dimensional structure of faces could yield better viewpoint transformation invariance than that based on two-dimensional information alone.
Another goal of the present study was to shed light on whether faces are scrutinized by gaze differently when three-dimensional cues are present compared to the flat 2D images. We confirmed the findings from previous research that when viewing 2D faces, participants spend most of the fixation time on the eyes and the nose, with increased duration of fixations for the cheeks in the profile view (Bindemann et al.,
2009; Pelphrey et al.,
2002; Walker-Smith et al.,
1977). More importantly, our region of interest analysis revealed increased viewing time for the nose and the cheeks in stereo condition compared to the two-dimensional condition. Both the nose and the cheeks would seem to provide more volumetric information about the facial structure than the eye region (Sæther et al.,
2009). Taken together with the observed better face recognition performance in the 3D condition, this finding supports our conclusion that the availability of three-dimensional structure information enhances face recognition.
Since we hypothesized that three-dimensional information facilitated face recognition in our task, we were also led to ask whether there was any between-subject variation in the subjective strength of the experienced 3D effect. Although it seems clear that IPD would play a crucial role in producing a stereo effect, it is not quite clear whether having smaller or larger IPD will significantly affect perception of stereo images. This question has been addressed in the optical engineering literature as an attempt to evaluate the accuracy of stereoscopic visual display systems used by military jet pilots for navigation. It has been shown that the most accurate performance was achieved in the condition where pilots viewed stereo with the normal IPD between the viewpoints provided to the two eyes (Merritt, Cuqlock-Knopp, Kregel, Smoot, & Monaco,
2005).
Simple regression analysis results revealed that larger deviations from the camera distance used to generate 3D images predicted lower accuracy rates in the 3D condition only. In
Experiment 1, participants with IPDs larger than the camera distance performed worse than those with smaller IPDs. In both experiments, the best performance was observed for those participants whose IPD was slightly smaller (i.e., about 0.5–1 cm) than the camera distance. Hence, it appears that viewing 3D images with a reduced stereo effect resulted in lower performance rates, while experiencing an either optimal or a slightly enhanced stereo effect led to better performance. To conclude, perceiving either the correct or slightly exaggerated binocular cues optimized perception of the three-dimensional face structure, resulting in a higher recognition performance.