Abstract
Human subjects were trained to perform a perceptual matching task requiring them to manipulate comparison objects until they matched target objects using the fewest manipulations possible. Efficient performance of this task requires an understanding of the hidden or latent causal structure governing the relationships between actions and perceptual outcomes. We use two benchmarks to evaluate the quality of subjects' learning. One benchmark is based on optimal performance as calculated by a dynamic programming procedure. The other is based on an adaptive computational agent that uses a reinforcement learning method known as Q-learning to learn to perform the task. Our analyses suggest that subjects were indeed successful learners. In particular, they learned to perform the perceptual matching task in a near-optimal manner (i.e., using a small number of manipulations) at the end of training. Subjects were able to achieve near- optimal performance because they learned, at least partially, the causal structure underlying the task. In addition, subjects' performances were broadly consistent with those of model-based reinforcement learning agents that built and used internal models of how their actions influenced the external environment. On the basis of these results, we hypothesize that people will achieve near-optimal performances on tasks requiring sequences of actions – especially sensorimotor tasks with underlying latent causal structures – when they can detect the effect of their actions on the environment, and when they can represent and reason about these effects using an internal mental model.