Abstract
One of our most remarkable visual abilities is the capacity to learn novel object classes from very little data. Given just a single novel object, we usually have certain intuitions about what other class members are likely to look like. Such 'one-shot learning' presumably leverages knowledge from previously learned objects, particularly: (1) by providing a feature space for representing shapes and their relationships and (2) by learning how classes are typically distributed in this space. To test this, we synthesized 20 shape classes based on unique unfamiliar 2D base shapes. Novel exemplars were created by transforming the base shape's skeletal representation to produce new shapes with limbs varying in length, width, position, and orientation. Using crowdsourcing, we then obtained responses from 500 human observers on 20 trials (1 response for each base shape). On each trial, observers judged whether a target shape was in the same class as 1 or 16 context shape(s) (transformed samples with similar characteristics). Targets came from the same class as the context shape(s), but differed in their similarity. The results reveal that participants only perceive objects to belong to the same class when they differed from one another by a limited amount, confirming that observers have restricted generalization gradients around completely novel stimuli. We then compared human responses to a computational model in which the similarity between target and context shapes was computed from >100 image-computable shape descriptors (e.g., area, compactness, shape context, Fourier descriptors). The findings reveal a surprisingly consistent distance around each base shape in the feature space, beyond which objects are deemed to belong to different classes. Thus, the model predicts one-shot learning surprisingly well with only one free parameter describing how different objects in the same class tend to be from one another.
Meeting abstract presented at VSS 2018