Abstract
Artificial neural networks (ANNs) require enormous amounts of data to learn object categories. By contrast, humans can form a category from just one exemplar. This feat of categorization is integral to decision making but, surprisingly, remains poorly understood. Here we tested whether an invariant object structure—namely, an object’s internal skeleton—supports one-shot object categorization in human infants, a population with limited object experience. Across two experiments, 6- to 12-month-olds (Mage = 9.29 months; N = 82) were habituated to a single, never-before-seen object. They were then tested with objects that differed from the habituated object in their surface contours and either matched or mismatched in their skeletal structure. To further constrain the mechanisms implicated in this task, we compared infant performance to computational models of object recognition that do not incorporate a skeletal algorithm. These included top-performing recognition models (ResNet trained on ImageNet or Stylized-ImageNet), recurrent models designed to approximate recognition processes of the primate visual system (CorNet-S), as well as models designed to approximate the visual experience and learning mechanisms of infants (self-supervised ResNext trained on infant headcam videos). Importantly, these models were tested using procedures comparable to infants. Because habituation/dishabituation can be conceived as a measure of alignment between the stimulus and the infant’s internal representation, we tested ANNs by incorporating an autoencoder onto each model and measuring the error signal across habituation/dishabituation phases. We found that only infants were able to categorize novel objects from one exemplar. By contrast, ANNs failed to categorize objects under the same conditions. Qualitatively similar results were found when models were tested using conventional classification techniques. Taken together, these findings suggest that single exemplar categorization reflects an early-developing sensitivity of the human visual system to perceptually invariant object structure.