Abstract
How does the human visual system learn object concepts from a series of example images? This is a difficult computational challenge given that natural images are often complex and contain many attributes that are irrelevant to the concept that needs to be extracted. Typical computational schemes for concept learning require that the system be provided with a training set of images showing the target object isolated and normalized in location and scale. While such a “pre-processed” training set simplifies the problem, it also renders the approaches unrealistic and circular, because in order to normalize an image, one needs already to possess the object concept.
To address these issues, we propose a computational model for visual object concept learning, motivated by experimental studies of object perception in infancy as well as following recovery from blindness. The model is designed to work with non-normalized training data: Given a set of images, each of which contains an object instance embedded in a complex scene, the system attempts to infer what the object concept is. More specifically, the model uses an unsupervised learning strategy at the outset to formulate hypotheses about possible concepts. At this stage, the processing is, of necessity, bottom-up. The only means of complexity reduction are low-level image saliencies and a priori regularity within an object class. As visual experience accumulates, however, the object concept undergoes concurrent refinement, allowing the model to employ a top-down strategy in an effort to reduce search complexity in subsequent images. Such a mixture of bottom-up and top-down modalities represents a plausible computational analogue to the gradual use of prototypes in object recognition as observed experimentally in humans.