Abstract
Visual perception emerges from a cortical hierarchy that extracts increasingly complex features, from edges to categories. Considerable computational and neural research suggests that the visual system is biased to extract slowly changing features in the input. However, little is known about the visual statistics of infant experience during early stages of receptive field formation. Here we provide evidence on the rate of change in low-level visual features and semantic features in infant everyday experience. Infants (2 to 12 months of age, n = 27) wore head cameras at home, collecting 120 hours of egocentric videos. We measured the rate of change at three levels of stimulus description: 1) raw pixels, 2) edge features (GIST, a measure of the edges at various orientations and scales), and 3) semantic features (derived from a trained CNN object classifier). For all measures, we calculated the Euclidean distance between the vector descriptions of image pairs, at a series of time lags. The distribution of distances was unimodal for all lags, with the mode increasing with lag. We then fit an exponential curve to the mode as a function of lag. We report the time constants of the exponential fits as a measure of the time scale of change. At all three levels of stimulus description, infant visual experiences changed slowly (pixels = 1.3s; gist = 1.4s; semantic = 1.9s). The rate of change for the youngest infants (2 to 4 months) was particularly slow (pixels = 1.8s; gist = 2.4s; semantic = 3.2s), especially for the edge level and semantic level. These results provide new evidence on the temporal properties of early experience and inform current theories of both unsupervised learning and receptive field formation. The findings also suggest that human altricial motor development may play a functional role in constraining early visual experiences.