Abstract
Recent studies established that perceptual learning (PL) is influenced by strong top-down effects and shows flexible generalization depending on context. However, current computational models of PL utilize feedforward architectures and fail to capture parsimoniously these context-dependent and generalization effects in more complex PL tasks. We propose a Bayesian framework that combines sensory bottom-up and experience-based top-down processes in a normative way. Our model uses contextual inference to simultaneously represent multiple learning-contexts with their corresponding stimuli. It infers the extent to which each context might have contributed to a given trial, and gradually learns the transitions between contexts from experience. In turn, correctly inferring the current context supports efficient neural resource allocation for encoding the stimuli expected to occur in that context, thus maximizing discrimination performance and driving PL. In roving paradigms, where multiple reference stimuli are intermixed across trials, our model explains a broad range of previously described learning effects: (a) disrupted PL when the references are interleaved trial-by-trial, and (b) intact PL when the references are separated into blocks, or when (c) they are interleaved across the trials but follow a fixed temporal order. Our model also provides new predictions for learning and generalization in PL. First, the amount of PL should depend on the extent to which the structure is learnt, predicting more PL in roving paradigms that use more predictable temporal structures between reference stimuli. Second, rather than depending solely on the low-level perceptual similarities of stimuli, generalization in PL should also depend on the extent to which higher-order structural knowledge about contexts (e.g. their transition probabilities) generalizes across different tasks. These results demonstrate that higher-level structure learning is an integral part of any perceptual learning process and that a joint treatment of high- and low-level information about stimuli is required for capturing learning in vision.