Abstract
Intelligent agents adapt behavior based on updated knowledge of reinforcements in visual tasks. However, not all experiences are equally important in driving adaptive behavior and the mechanism by which different information influence our behavior depends on their nature. We trained four macaque monkeys (1 female) to perform a visual stop-signal task in which fluid reward was earned for shifting gaze to a visual stimulus (either left or right), unless a visual stop-signal instructed the monkey to cancel the movement. Stop-signal delay was adjusted to ensure successful stopping on ~50% of stop-signal trials. New here, reward amount was asymmetric for the two directions with block-wise alternation across positions. As expected, response time (RT) was faster for high-reward compared to low-reward locations. With each block reversal, RT to the high-reward location (associated with low reward in previous block) decreased and RT to the low-reward location (associated with high reward in previous block) increased until they plateaued over several trials. Because of the binary nature of reward association, a rational observer could gain knowledge about block switch equally from low reward (negative reward prediction error (RPE)) and high reward (positive RPE) trials. Using mixed-effects modeling of RT, we tested whether macaque monkeys’ performance conformed to the rational observer hypothesis. Specifically, we examined the contribution of negative and positive RPE trials to the rate of RT-speeding and RT-slowing. Compared to negative RPE trials, positive RPE influenced RT-speeding significantly more in three of four monkeys. However, the relative effect of positive and negative RPE on RT-slowing was more idiosyncratic across monkeys. These results provide evidence against the rational observer hypothesis, indicate that RT-speeding and RT-slowing are mediated by different mechanisms, and reveal subject-specific factors necessary for interpreting neural signals related to reinforcement learning of visual tasks.