Subjects rated themselves on their expertise with each of the eight VET categories and with faces on a 9-point scale, considering “interest in, years of exposure to, knowledge of, and familiarity with each category.” To be clear, here and in past work, even though the question is phrased in terms of “expertise,” it should not be assumed that these ratings correspond to
perceptual skills, especially because such ratings tend not to be strongly related to perceptual performance (Barton, Hanif, & Ashraf,
2009; McGugin, Richler et al.,
2012). For clarity, we therefore refer to this measure as one of “experience” (a conjecture supported by
Part 2). We were able to retest a small number of subjects (
n = 39) an average of 719 days (
SD = 203) after the original test. Test–reretest reliability (Pearson
r) for each category was as follows: faces, 0.20, 95% CI [−0.12, 0.48]; butterflies, 0.56, 95% CI [0.30, 0.74]; cars, 0.59, 95% CI [0.34, 0.76]; leaves, 0.66, 95% CI [0.44, 0.81]; motorcycles, 0.63, 95% CI [0.39, 0.79]; mushrooms, 0.63, 95% CI [0.39, 0.79]; owls, 0.54, 95% CI [0.27, 0.73]; planes, 0.63, 95% CI [0.39, 0.79]; and wading birds, 0.39, 95% CI [0.09, 0.63]. The test–retest reliability of the ratings across nonface categories was 0.69, 95% CI [0.48, 0.83], and with faces included it was 0.80, 95% CI [0.65, 0.89], because even though the reliability of the face ratings is low, the ratings are consistently higher than ratings for the other categories. While the test–retest reliability of these self-reports was low for several of these categories, it reached 0.60, 95% CI [0.35, 0.77] for the measure we use in our main analyses, O-EXP (the summed ratings for all nonface categories). As a comparison, one large sample study reports a test–retest reliability for the CFMT of 0.70 (Wilmer et al.,
2010).