Abstract
For human cooperation, jointly selecting a goal out of multiple comparable goals and maintaining the team’s joint commitment to that goal poses a great challenge. By combining psychophysics and computational modeling, we demonstrate that visual perception can support spontaneous human joint commitment without any communication. We developed a real-time multi-player hunting task where human hunters could team up with human or machine hunters to pursue prey in a 2D environment with Newtonian physics. Joint commitment is modeled through an "Imagined We" (IW) approach, wherein each agent uses Bayesian inference to reason the intention of “We”, an imagined supraindividual agent that controls all agents as its body parts. This model is compared against a Reward Sharing (RS) model, which posits cooperation as sharing reward through multi-agent reinforcement learning (MARL). We found that both humans and IW, but not RS, could maintain high team performance by jointly committing to a single prey and coordinating to catch it, regardless of prey quantity or speed. Human observers also rated all hunters of both human and IW teams as having high contributions to the catch, irrespective of their proximity to the prey, suggesting that their high-quality hunting resulted from sophisticated cooperation rather than individual strategies. IW hunters could not only cooperate with their own kind but also with humans, with human-IW teams mirroring the hunting performance and teaming experience of all-human teams. However, substituting human members with more RS hunters reduced both performance and teaming experience. In conclusion, this study demonstrates that humans achieve cooperation through joint commitment that enforces a single goal on the team, rather than merely motivating team members through reward sharing. By extending the joint commitment theory to visually grounded cooperation, our research sheds light on how to build machines that can cooperate with humans in an intuitive and trustworthy manner.