June 2006
Volume 6, Issue 6
Free
Vision Sciences Society Annual Meeting Abstract  |   June 2006
Getting credit assignment right in visuo-motor behaviors
Author Affiliations
  • Dana H Ballard
    Dept. of Computer Science, University of Rochester
  • Constantin Rothkopf
    Dept. of Brain and Cognitive Science, University of Rochester
Journal of Vision June 2006, Vol.6, 349. doi:https://doi.org/10.1167/6.6.349
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Dana H Ballard, Constantin Rothkopf; Getting credit assignment right in visuo-motor behaviors. Journal of Vision 2006;6(6):349. https://doi.org/10.1167/6.6.349.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Composite visuo-motor behaviors can be synthesized from simpler behaviors. For example in walking down a sidewalk, a pedestrian may have goals of staying on the sidewalk, avoiding pedestrians and picking up litter. [Sprague 03], showed in a virtual reality simulation that these individual behaviors can be learned by reinforcement learning. However that simulation assumed that the rewards associated with the individual behaviors were known. In practice this is unreasonable as only the total reward for the composite behavior is likely to be available. This is a long-standing problem in learning known as the credit assignment problem.

[Chang 03] showed that an estimate for the individual rewards could be obtained by assuming that the total reward was assigned to each behavior and the variations in that reward were assumed to be noise. This model made sense in their setting, which had the individual behaviors embedded in different agents, but introduced a problem in that the resultant reward estimates were biased and could be suboptimal.

We show that the credit assignment problem has a solution when the visuo-motor behaviors all are embedded in the same agent. Each behavior needs to know which other behaviors are simultaneously active. It can then keep a running estimate of its share as its current estimate adjusted by the total instantaneous reward minus the reward estimates of the concurrent behaviors. The simulations show that, as long as the behaviors are updated in a random order, the estimated reward for each behavior converges to its true value.

Ballard, D. H. Rothkopf, C. (2006). Getting credit assignment right in visuo-motor behaviors [Abstract]. Journal of Vision, 6(6):349, 349a, http://journalofvision.org/6/6/349/, doi:10.1167/6.6.349. [CrossRef]
Footnotes
 This work was supported by NIH grants EY05729 and RR09283
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×