Abstract
How do humans choose the more rewarding of two options when the reward values are initially unknown? To maximize reward in this task, initial choices should explore target options to learn which is best, but choices should eventually converge to exclusively selecting the best target. Previous research involving cognitive decision tasks suggest that people tend to under explore in early choices and over-explore (under-exploit) in later choices compared to optimal. We investigated whether these decision-making biases persist when the choice task was embedded in an engaging visuo-motor task, a video-game (created using Virtools) that involved piloting a spacecraft to attempt to shoot and destroy the more rewarding of two targets. The effect of visuo-motor control was assessed by comparing choice behavior in two versions of the game, one involving full control of the ship, and the other involving key press choices that automatically moved the ship to the target's location. Each target was assigned a different and fixed probability of exploding when hit. Subjects were instructed to destroy as many targets as possible and an on screen score counter awarded points for each successfully hit target that exploded. The optimality of subjects' choice behavior depended on the target explosion probabilities: For differences between target explosion probabilities greater than 0.2, participants converge to the better target, but fail to converge on average for smaller differences. However, even when they converge on the better target, they still over-explore (the asymptotes are less than 100%). While cognitive and visuo-motor decision strategies were similar, players initially explore more in the full control condition, however, this effect is small and is reduced by experience. While suboptimal for our task, over-exploration may result from an underlying belief that environments may change (be non-stationary) over time.
This work was supported by ONR N 00014-07-1-0937.
This work was supported by ONR N 00014-07-1-0937.