September 2017
Volume 17, Issue 10
Open Access
Vision Sciences Society Annual Meeting Abstract  |   August 2017
Modeling Sensorimotor Behavior through Modular Inverse Reinforcement Learning with Discount Factors
Author Affiliations
  • Ruohan Zhang
    Department of Computer Science, The University of Texas at Austin
  • Shun Zhang
    Computer Science and Engineering, University of Michigan, Ann Arbor
  • Matthew Tong
    Center for Perceptual Systems, The University of Texas at Austin
  • Mary Hayhoe
    Center for Perceptual Systems, The University of Texas at Austin
  • Dana Ballard
    Department of Computer Science, The University of Texas at Austin
Journal of Vision August 2017, Vol.17, 1267. doi:https://doi.org/10.1167/17.10.1267
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Ruohan Zhang, Shun Zhang, Matthew Tong, Mary Hayhoe, Dana Ballard; Modeling Sensorimotor Behavior through Modular Inverse Reinforcement Learning with Discount Factors. Journal of Vision 2017;17(10):1267. https://doi.org/10.1167/17.10.1267.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

In the context of natural behaviors, humans must gather information to choose between different task needs, such as avoiding obstacles or heading towards a target. The brain's neural reward machinery has been implicated in these action choices, and a technique called Inverse Reinforcement Learning (IRL) can be used to estimate reward functions from behavioral data. A frequently overlooked variable in IRL is the discount factor: how much a future reward matters compared to the current reward. If future rewards are too heavily discounted, a person could overlook a future obstacle, even though it incurs a large negative reward. We argue that reward, together with discount factor, defines a value surface for a single task. The reward controls the maximum height of the surface, and the discount factor controls how fast the reward decreases over time or distance, i.e., surface shape. Value surfaces are computationally easy to compose, hence multi-task behaviors can be modeled by combining these surfaces. This leads naturally to a divide-and-conquer approach of IRL, called modular IRL, which estimates relative rewards for subtasks. We expand upon previous modular IRL models (Rothkopf and Ballard, 2013) to include estimating the discount factor, and justify its correctness theoretically and experimentally through computer simulations as well as human experiments. We collect human navigation data in a virtual reality environment. Subjects are instructed to do a combination of following a path, collecting targets, and avoiding obstacles. We show the rewards and discount factors estimated from our algorithm reflect task instructions, and can accurately predict human actions (average angular difference = 24°). Furthermore, with two variables per objective (reward and discount factor), a virtual agent is able to reproduce long human-like navigation trajectories through the environment. We conclude that modular IRL with learned discount factors could be a powerful model for multi-task sensorimotor behaviors.

Meeting abstract presented at VSS 2017

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×