Purchase this article with an account.
Vassilios Christopoulos, Paul Schrater; Learning reward functions in grasping objects with position uncertainty via inverse reinforcement learning. Journal of Vision 2010;10(7):1084. doi: 10.1167/10.7.1084.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
Many aspects of visuomotor behavior have been explained by optimal sensorimotor control, which models actions as decisions that maximize the desirableness of outcomes, where the desirableness is captured by an expected cost or utility to each action sequence. Because costs and utilities quantify the goals of behavior, they are crucial for understanding action selection. However, for complex natural tasks like grasping that involve the application of forces to change the relative position of objects, modeling the expected cost poses significant challenges. We use inverse optimal control to estimate the natural costs for grasping an object with position uncertainty. In a previous study, we tested the hypothesis that people compensate for object position uncertainty in a grasping task by adopting strategies that produce stable grasp at first contact – in essence using time efficiency as a natural cost function. Subjects reached to an object made uncertain by moving it with a robot arm while out of view. In accord with optimal predictions, subjects compensate by approaching the object along the direction of maximal position uncertainty, thereby maximizing the chance of successful object-finger contact. Although subjects' grasps were near optimal, the exact cost function used is not clear. We estimated the unknown cost functions that subjects used to perform the grasping task based on movement trajectories. Our method involves computing the frequency that trajectories passed through a grid of spatial locations in the 2D space and used the results to estimate the transition probability matrix. Formulating the grasping task as a Markov Decision Process (MDP) and assuming a finite state-space, as well a finite set of actions, we can solve for the cost function that generate the MDP as an optimal solution. The estimated costs are consistent with a trade-off between efficient grasp placement and a low probability of object-finger collision.
This PDF is available to Subscribers Only