Abstract
Many human decisions depend on learned associations between the sensory and motor features of choices and value. Little is known about how human learners solve the "curse of dimensionality," the ambiguity related to deciding which of numerous features of the environment to associate with rewarding or punishing outcomes. Reinforcement learning (RL) approaches to such decisions often implicitly assume that only relevant, attended feature-value associations are tracked, updated via reward prediction errors (RPEs), and employed in decisions. In this study, we examined whether humans are truly adept at ignoring irrelevant feature-value associations when they are explicitly aware of task-relevant dimensions. Using model-based fMRI, we examined neural responses during a simple reward-learning task (4-armed bandit), in which participants selected one of four options represented by colored squares on each trial. After selecting, participants received reward probabilistically, according to reward probability rates associated with each of the four colors. Reward rates were independent across colors and drifted over the course of the experiment to encourage participants to actively learn the values of each color throughout the experiment. Importantly, locations of the colored items were randomly determined on every trial and were completely unrelated to value. Consistent with prior work, model-based RPE corresponding to the color feature was strongly correlated with activity in several regions of the brain, including ventral striatum. However, we additionally estimated irrelevant location-value associations and corresponding prediction errors, which were then orthogonalized with respect to color RPE. Neural activity in several regions, including ventral striatum, was additionally strongly correlated with location RPE, implying latent value signals related to the irrelevant feature. Humans may track multiple feature-value associations in parallel, even when they are not presumed to be relevant to action. Such latent signals may serve to guide exploratory actions, or actions taken under high uncertainty.
Meeting abstract presented at VSS 2016