Abstract
How precise is children’s visual concept knowledge, and how does this change across development? We created a gamified picture-matching task where children heard a word (e.g., “swordfish”) and had to choose the picture “that goes with the word.” Critically, we chose distractor items with high, medium, and low similarity to each target word, allowing us to examine the granularity of visual representations. We derived similarity via cosine embedding similarity of the target and distractor words in CLIP, a language-vision pre-training model (Radford et al., 2021). Photographs were taken from the THINGS+ dataset and combined with age-of-acquisition (AoA) ratings, yielding 108 items with unique targets and three distractors with estimated AoA ratings within 3 years of each other; we created 2AFC trials with high similarity distractors, 3AFC with high and medium similarity distractors, and 4AFC trials that included a low similarity distractor. Data were then collected from children in preschools (N=66 3-5 year-olds), 6 elementary schools, and 9 charter schools across multiple states (N=1369, 6-11 year-olds) and adults online (N=205). We modeled changes in the proportion of children who chose a given image for a certain word over development using linear mixed-effect models. We found gradual developmental changes in children's ability to identify the correct category. Error analysis from 3- and 4-AFC trials revealed that children were more likely to choose higher-similarity distractors as they grew older; children’s error patterns were increasingly correlated with CLIP target-distractor similarity. Overall, these analyses suggest a transition from coarse to finer-grained visual representations over early and middle childhood. Children’s visual concept knowledge gradually becomes more refined as children learn what distinguishes similar visual concepts from one another. Broadly, these findings demonstrate the utility of combining gamified experiments and similarity estimates from computational models to probe the content of children’s evolving visual representations.