The four comparisons had similar mean accuracies (1: 79.2%,
SD = 15.7%; 2: 73.7%,
SD = 17.2%; 3: 74.3%,
SD = 13.6; 4: 76.2%,
SD = 14.5%). To assess how each comparison contributed to the difficulty of each VFMT item, a multiple regression was performed with the four comparisons as predictors of the overall VFMT item accuracy. Comparisons 2, 3 and 4 were all significant predictors of VFMT item accuracy, and as a whole, even though they were measured in a separate group of subjects, accounted for 48% of the variability in performance in the task (
Table 3).
To examine how the similarity comparisons differed for VFMT items that showed age-related DIF, we looked at how the average similarity comparison accuracies differed between items with uniform (as opposed to nonuniform) DIF and compared this with items that showed no DIF. These items were categorized as either items with higher difficulty parameters for the older group (Items 9, 10, 19, 22, 23, 29, 38, 50, 51, 58, 59, 60, 65, 77, and 86), items with higher difficulty parameters for the younger group (Items 12, 20, 25, 26, 28, 32, 48, 52, 64, 71, 73, 78, 80, 89, 92, and 93), and items which showed no DIF (Items 1, 2, 3, 4, 5, 7, 13, 14, 15, 16, 17, 18, 21, 24, 27, 30, 31, 33, 35, 36, 39, 40, 41, 43, 44, 45, 46, 47, 55, 61, 63, 66, 68, 72, 74, 79, 81, 82, 83, 91, and 94).
Within each comparison, there was only one statistically significant difference between groups of items: VFMT items that advantaged younger subjects had study targets that were less similar to the study foil (Comparison 1) than those items that advantaged older subjects,
t(29) = 2.192,
p = 0.037,
d = 0.796 (
Figure 7). In addition, when comparing the relative difficulty of the different comparisons (correct rejection for Comparisons 1, 3, and 4, hit rate for Comparison 2), items that showed no DIF or an advantage for older subjects showed no differences, whereas items that advantaged younger subjects showed poorer performance for Comparison 2 relative to both Comparisons 1 and 4 (Comparisons 1 and 2:
t[14] = −2.982,
p = 0.001,
d = −1.594; Comparisons 4 and 2:
t[14] = −2.384,
p = 0.032,
d = −1.274), and the accuracy for Comparison 3 was also lower than for Comparison 1 (
t[14] = 2.619,
p = 0.020,
d = 1.400). In other words, items that advantaged younger subjects were those that had relatively distinct study faces, test foils that were not similar to the study foil, but in which the test target was not highly similar to the study target. It is unclear whether this reflects a different strategy (e.g., emphasis on the distinguishing features of the two study faces) and/or a difference in ability to generalize across different images of the same person (for Comparison 2).