Abstract
Neurophysiological and psychophysical studies in primates and humans have shown the pervasive role of reward in learning of visuomotor activities. Eye movements and hand movements to targets are executed so as to maximize reward. Moreover, reinforcement learning algorithms have been formulated that characterize the neuronal signals of dopaminergic neurons in response to the occurrences of stimuli associated with rewards and the delivery of the rewards across learning. Common to all these experiments is that the reward structure of the task is explicitly controlled by the experimental setup. By comparing the observed behavior with normative models, it can be inferred how close to optimal the participants perform. But how does this relate to visually guided activities in natural tasks? It has been difficult to answer this question, because the value functions underlying actions such as fixations or control of walking direction are not known a priori. We present a new algorithm for Inverse Reinforcement Learning, which is able to infer these value functions from observing human behavior. The algorithm takes advantage of the inherent structure of a task by compactly representing the value functions with a small number of basis functions. The algorithm was applied to data collected from human subjects navigating in a virtual environment while approaching and avoiding objects. The value functions underlying these tasks are recovered for each participant and the validity of the algorithm is verified by evaluating the RMS-error between the actual trajectories and those simulated utilizing the extracted parameters. Furthermore, it is shown that the rewards of objects inferred from navigation are correlated with the proportions of fixations on the corresponding object classes (rho=0.72). We conclude that Inverse Reinforcement Learning can be utilized to infer the reward structure used by subjects in natural visuomotor tasks and reveals the connection between eye movements and these rewards.