Abstract
Human decision-making is often suboptimal in sequential tasks where the more rewarding option must be learned through the experience of successes and failures and the goal is to maximize reward. Rather than learning and exploiting the better option, humans typically persist in selecting less-rewarding options, even after thousands of trials. However, because successfully learning the better option often requires maintaining highly accurate estimates of reward over time, this raises the possibility that the fundamental nature of the given reward distribution, as well as the fidelity with which the rewards are encoded and retained may play a role in decision suboptimality. To test this hypothesis, we provided subjects with either graded (e.g. any value between 0 and 25 is possible) or discrete (e.g. only values of 0 or 25 are possible) rewards. Distributions that contribute to noisy estimates of mean reward should impair decision-making performance. Participants played a game in which they piloted a ship and attempted to destroy a target by repeatedly shooting the target with one of two types of bullets. Unbeknownst to the subject, on average one bullet type did more damage to the target than the other. Four variations of the task awarded damage points via simulated draws from binomial, normal, uniform, or beta distribution functions. The distributions were matched so that the mean damage as well as the variability in small sample estimates of the mean was equivalent for all distributions. Consistent with our hypothesis, a comparison of decision-making performance across conditions indicates that decision-making is closer to optimal when rewards are drawn from the less noisy continuous functions.
This work was supported by ONR N 00014-07-1-0937.