Purchase this article with an account.
Matthew H. Tong, Mary M. Hayhoe; Modeling uncertainty and intrinsic reward in a virtual walking task. Journal of Vision 2014;14(10):5. doi: https://doi.org/10.1167/14.10.5.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
When acting in the natural world, humans must balance the demands of a variety of goals, such as controlling direction and avoiding obstacles, each of which may require information from different parts of the scene. One theory of how these competing demands are managed, based on reinforcement learning, has been proposed by Sprague et al (ACM Transactions on Applied Perception, 2007), and postulates that gaze targets are chosen depending on the intrinsic reward of the current behavioral goals and the uncertainty about relevant environmental state. Actions, in turn, are chosen to maximize expected reward, which reflects the intrinsic importance of different behavioral goals for the subject. Despite the importance of the concept of reward in control of both gaze and actions, there is limited evidence on how such intrinsic rewards control actions and contribute to momentary sensory-motor decisions. We have measured both gaze and walking behavior in a virtual environment where subjects walk along a winding path, avoiding obstacles and collecting targets. Instructions varied on different trials concerning whether or not the obstacles and targets were relevant. We developed an avatar model, based on Sprague et al's algorithm, which performs the same task in the same conditions as humans. The model can produce paths comparable to those taken by the humans through the environment. The avatar's path varies with task instructions much as humans' do. This model allows us to estimate the intrinsic reward value of different goals by applying Inverse Reinforcement Learning techniques to the routes that subjects take through the targets and obstacles (Rothkopf and Ballard, Biological Cybernetics, 2013). These intrinsic rewards reflect the variation in path as a consequence of different task instructions and may capture path variations between subjects.
Meeting abstract presented at VSS 2014
This PDF is available to Subscribers Only