Abstract
A pervasive question in perceptual and motor learning concerns the conditions under which learning transfers. In reinforcement learning, an agent can learn either a policy (i.e., a mapping between states and actions) or a predictive model of future outcomes from which the policy can be computed online. The former is computationally less expensive, but is highly specific to the given task/goal, while the latter is computationally more expensive, but allows the agent to know the proper actions to take even for novel goals. Policy learning is appropriate when forward look ahead is not required and the number of policies to be learned is small, while model learning is appropriate under the opposite conditions. Therefore, by manipulating these factors in a given task, the degree of transfer should be predictably altered as well. The current study tests this hypothesis with a navigation task requiring subjects to steer an object through a novel flow field to reach visible targets as quickly as possible. We vary the predictive component of the task by manipulating the amount of control subjects have over the object. Half steer the object for the entire duration of the experiment, which favors policy learning, while the rest lose control intermittently, which adds a look ahead component to the task and favors model learning. We vary the number of policies to be learned by manipulating the number of target locations the subject reaches where large numbers are expected to favor model learning. Half have only two target locations to reach while the rest have twelve. Performance on transfer tasks where the environment is held constant, but the goal is altered, is better for those subjects trained under conditions that favor model learning. These results suggest that developing training tasks that discourage simple policy learning is critical if generalization is desired.