Abstract
Most current computational models of object learning rely on labeled, pre-normalized collections of static images to guide the process of extracting features that are diagnostic of an object class. From the standpoint of the developing visual system, these approaches are deeply problematic. An infant has no reliable labels to attach to newly observed objects, there is no guarantee that objects will be observed at a consistent scale or orientation, and, an infant's visual world is dynamic rather than static.
A growing body of results from Project Prakash, and also from studies with infants, points to the profound significance of motion information in the early development of visual object concepts. But, precisely how do dynamic cues contribute to object learning? Here we address this question in the context of a comprehensive model of object learning named ‘Dylan’ (Dynamic input based Learning in Artificial and Natural systems). Specifically, we show how dynamic information can be used to orient towards, segment and track potential object candidates, without the need for pre-normalization or labeling of input data. Interestingly, the three processes can be implemented via qualitatively similar computations involving clustering of motion vectors, operating over progressively longer time-scales. By tracking a candidate proto-object over many frames of a video sequence and recording its appearance continuously, we can construct a temporally extended model of that object's appearance. This series of images can then be used as the starting point for various feature extraction techniques, and to build up tolerances to variations of scale and illumination. Our system can also be selectively compromised to more closely approximate the perceptual limitations of the developing visual system, providing a useful means for understanding the influence of poor acuity and color sensitivity on high-level object representations.