To evaluate whether the representations of DCNNs are sensitive to human-like view invariant critical features, we measured the distance between representations of four types of face pairs of 25 different identities (not included in the train set).
Figure 3A shows an example of each type of face pairs: “Same identity” are different images of the same identity, “Non-critical features” are same identity face pairs in which noncritical features were replaced; “Critical features” are same identity face pairs in which critical features were replaced; and “Different identity” face pairs.
Figure 3B shows the Euclidean distances between these face pairs based on their pixel-based representations. A repeated measure ANOVA reveal a significant effect of face type
F(3, 72) = 7.25,
p = 0.001. Post hoc comparisons revealed the pixel-based distances of same identity pairs were smaller than the distance between noncritical features pairs (
t(24) = 3.75,
p = 0.002), critical feature pairs (
t(24) = 4.18,
p < 0.001) and different identity pairs (
t(24) = 3.28,
p = 0.006). Importantly, there was no difference between the pixel-based distances of face pairs that differ in critical and non critical features (t(24) = 0.42, p = 0.68). These findings indicate that pixel information is not sensitive to human-like critical features more than noncritical features.
Figure 3C shows the Euclidean distances between representations of the same face pairs, based on the penultimate layer of a fully face-trained DCNN (
Abudarham et al., 2019;
Abudarham et al., 2021). Here we see a much larger distance between faces that differ in critical features than faces that differ in noncritical features, indicating that the identity-based representation is sensitive to human-like critical features. We also show that faces that differ in critical features are as different as different identity faces, indicating that changing them is similar to changing the identity of a face. A repeated measure ANOVA across the four face types reveal a significant effect of face type
F(3, 75) = 175.51,
p < 0.001. Post hoc comparisons reveal that all conditions were statistically different from one another (
p < 0.001) (see
Supplementary Table S3 for all statistical tests), except face pairs that differ in critical features and different face pairs, which did not differ statistically (
t(24) = 0.684,
p = 0.5). These findings are consistent with our definition of critical features, which are features that changing them change the identity of the face.