Abstract
Neurophysiology experiments have demonstrated that many cortical areas traditionally characterized as encoding visual signals, or covert visual attention, are also sensitive to the reward structure of a task, and may represent the relative value of competing actions. However, it is not known if human observers learning the reward structure also adapt their gaze and perceptual decisions accordingly. Here, we investigate whether the reward structure of a search task may be learned to optimize performance, and compare human behavior to a Bayesian ideal learner that maximizes reward. Thirteen subjects performed 300 trials in which they searched for a bright target among dimmer distractors with contrast noise (mean RMS 2.38 and 1.75, stdev. 0.25), displayed for 2s. Each of the six stimuli was surrounded by a colored circle. Observers either localized the target or decided it was absent. Correct localizations and rejections resulted in the delivery of a reward (in cents). The reward magnitude was stochastic and dependent upon which circle the target had been present. Subjects were told that some correct localizations would deliver a greater reward, although these values, or their distribution, were not specified. By the final third of trials, first saccades and perceptual decisions for both humans and the ideal observer were biased towards circles corresponding to an average higher reward (human: 35% vs. 29%; 37% vs. 32%), even on trials when no target was in fact present. However, human learning of expected value was suboptimal, both in the number of trials required for learning to manifest, as well as the overall yield of reward. Our results suggest that human gaze control and perceptual decisions are sensitive to reward structure of a search task, and this modulation may be exploited as a learning signal to resolve competing action plans.
Supported by NIHEY grant 015925.