Abstract
Humans are adept at learning from reinforcement history to make more rewarding choices. When the values of two options are correlated, such contingencies are detected and exploited by human decision-makers (Wimmer, Daw, & Shohamy, 2012). We examined whether learned, purely visual contingencies also interact with reward learning mechanisms to guide behavior. Subjects completed a “4-armed bandit” choice task that consisted of 600 choice trials. On each trial, they were presented with 4 shapes placed on a 3x3 grid. They clicked on a shape and then received either a reward or no reward. Subjects were told, truthfully, that each shape was independently related to a separate probability of reward, and that this probability drifted slowly over time and independently for each item. Unbeknownst to subjects, we also paired shapes into two groups. One pair always appeared in a fixed horizontally adjacent configuration, while the other always appeared in a fixed vertically adjacent configuration, thus creating the conditions for visual statistical learning to occur. Two reinforcement-learning algorithms were fit to choice histories – one in which rewards modified only the chosen item’s value function, and another in which rewards additionally modified the value function of the chosen item’s statistical associate. Despite the fact each item’s value was truly independent of its paired associate’s values, the model that included generalization to the associated item was a better predictor of participant choices than the more basic, nested model (likelihood ratio test, p< .001). Subjects thus showed a tendency to generalize learned value to visual statistical associates, even though such generalization was not valid or helpful in this task. These results imply that humans may have a tendency to generalize value estimates based on extrinsic statistical associations.
Meeting abstract presented at VSS 2015