Purchase this article with an account.
Angela Radulescu, Yuan Chang Leong, Yael Niv; Predicting trial-by-trial attention dynamics during human reinforcement learning. Journal of Vision 2017;17(10):1098. doi: https://doi.org/10.1167/17.10.1098.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
Selective attention is thought to facilitate reinforcement learning (RL) in multidimensional environments by constraining learning to dimensions that are most relevant for the task at hand. But how would agents know what dimensions to attend to in the first place? Here we use computational modeling of human attention data to show that selective attention is sensitive to trial-by-trial dynamics of reinforcement. Twenty-five participants performed a decision-making task with multi-dimensional stimuli, while undergoing functional magnetic resonance imaging (fMRI) and eye-tracking. At any one time, only one of three stimulus dimensions (faces, houses or tools) was relevant to predicting probabilistic reward. Participants had to learn, through trial and error, which was the predictive dimension, and what feature within that dimension was the most rewarding. We chose this task design in order to capture real-world learning problems where only some dimensions in the environment consistently predict noisy reward. In previous work we showed that attention to different dimensions modulates learning in this task. To examine how subjects learn what to attend to, we developed and compared different models that specify how attention changes trial-by-trial. Both the neural and eye-tracking data were best explained by an RL model that tracks feature values learned from trial-and-error, and allocates dimensional attention in proportion to the highest valued feature along each dimension. This model outperformed models that determined attention based on choice history alone, suggesting that attention dynamically changes as a function of recent reward history. To our knowledge, ours is the first explanation of how attention measured directly and simultaneously from neural data and eye-tracking is determined. Our results establish a bidirectional interaction between attention and RL: attention constrains what we learn about, and learned values determine what we attend to.
Meeting abstract presented at VSS 2017
This PDF is available to Subscribers Only