Abstract
Human behavior in binary choice tasks can strongly deviate from normative predictions, even in simple repeated tasks with independent trials. Rather than reflecting decision-making errors (as most previous accounts have assumed), we propose that people try to learn causal models of both environment and task that explain how the outcome statistics are generated, or in other words, to reduce unexplained variability in outcome. We show that optimal decision making models that try to learn the structure of the environment show the same kinds of suboptimality. In particular, models that try to learn environmental dynamics (non-stationarity) and reward outcome generation capture many aspects of choice behavior deemed suboptimal, like limited memory, probability matching, and under and over-exploration. We show how probabilistic coupling between rewarding options can be learned, and how models that learn better capture human behavior in choice tasks. We also show how models that learn dynamics can benefit from making strong prior assumptions about the stochasticity of the environment.
This work was supported by ONR N 00014-07-1-0937, and NIH Neuro-physical-computational Sciences (NPCS) Graduate Training Fellowship.