Abstract
Recent success of artificial intelligence is largely based on reinforcement learning (RL), in which an agent acts to maximize expected rewards. RL has deep roots in Psychology for modeling animal behavior and captures the fact that human actions are driven by desires. Nevertheless, it misses one mental representation highlighted by more recent cognitive Theory-of-Mind (ToM) models: Intention. Unlike desires, intentions form stable, partial plans of action concerning the future, demand “commitment” (Bratman, 1987). While having conflicting desires is part of human nature (e.g. losing weight and enjoying food), intentions should always be coherent, stable and admissible. Here we tested predictions of RL and ToM in a visual navigation task involving conflicting desires. A human or RL model controls an agent to reach one of two equally desirable restaurants in a 2D map. With a low-probability, random action noises can cause the agent to drift slightly. To test “commitment to an intention”, we created a special trial: once the agent clearly moves towards one restaurant, noises will push the agent away so that the alternative restaurant becomes a better “rational” choice. As predicted, the RL agent showed no commitment, with close to 0% still pursuing the original restaurant. In contrast, in 70% of trials, humans fought the noise and pursued the original restaurant persistently. In addition, humans form a commitment with deliberation. In the same task, we performed online prediction of the agent’s destination through well-established Bayesian ToM. The results demonstrated that while RL quickly displayed a preference, humans avoided showing any preference early on. In conclusion, humans are unlike RL in that they appreciate the gravity of commitment, preferring not to rush forming an intention, but once committed, remain so despite setbacks. These results collectively demonstrate intention is an intrinsic mental representation that can forcefully regulate human actions through commitment.