Abstract
The dominant view on how humans develop new visual representations is based on the paradigm of iterative associative learning. According to this account, new features are developed based on the strength of the pair-wise correlations between sub-elements, and complex features are learned by recursively associating already obtained features. In addition, Hebbian mechanisms of synaptic plasticity seem to provide a natural neural substrate for associative learning. However, this account has two major shortcomings. First, in associative learning, even the most complex features are extracted solely on the basis of pair-wise correlations between their sub-elements, while it is conceivable that there are features for which higher order statistics are necessary to learn. Second, learning about all pair-wise correlations can already be intractable since the storage requirement for such representations grows exponentially with the number of elements in a scene, and learning progressively higher order statistics only exacerbates this combinatorial explosion. We present the results of a series of experiments that assessed how humans learn about higher-order statistics. We found that learning in an unsupervised visual task is above chance even when pair-wise statistics contain no relevant information. We implemented a formal normative model of learning to group elements into features based on statistical contingencies using Bayesian model comparison, and demonstrate that humans perform close to Bayes-optimal. Although the computational requirements of learning based on model comparison are considerable, they are not incompatible with Hebbian plasticity, and offer a principled solution to the storage-requirement problem by generating optimally economical representations. The close fit of the model to human performance in a large set of experiments suggests that humans learn new complex information by generating the simplest sufficient representation based on previous experience and not by encoding the full correlational structure of the input.