Abstract
Recent studies suggest that the coherent structures learned from multi-element visual scenes and represented in human memory can be best captured by Bayesian model comparison rather than by traditional iterative pair-wise associative learning. These two learning mechanisms are polar opposites in how their internal representation emerges. The Bayesian method favors the simplest model until additional evidence is gathered, which often means a global, approximate, low-pass description of the scene. In contrast, pair-wise associative learning, by necessity, first focuses on details defined by conjunctions of elementary features, and only later learns more extended global features. We conducted a visual statistical learning study to test explicitly the process by which humans develop their internal representation. Subjects were exposed to a family of scenes composed of unfamiliar shapes that formed pairs and triplets of elements according to a fixed underlying spatial structure. The scenes were composed hierarchically so that the true underlying pairs and triplets appeared in various arrangements that probabilistically, and falsely, gave rise to more global quadruple structures. Subjects were tested for both true vs. random pairs and false vs. random quadruples at two different points during learning — after 32 practice trials (short) and after 64 trials (long). After short training, subjects were at chance with pairs (51%, p[[gt]]0.47) but incorrectly recognized the false quadruples (60%, p0.6). These results are predicted well by a Bayesian model and impossible to capture with an associative learning scheme. Our findings support the idea that humans learn new visual representations by probabilistic inference instead of pair-wise associations, and provide a principled explanation of coarse-to-fine learning.