Abstract
Vision enables humans to communicate richly and flexibly. For example, pointing to the same object can represent go there or avoid that —opposite propositions— depending on the context. We construct a computational cognitive model that emulates the generative process of sending and receiving ambiguous signals.
We argue that visually grounded physical-social commonsense is the key to resolving ambiguity in communication. Physical commonsense constrains the scope of actions due to the costs of interacting in an environment. Social commonsense treats observed behaviors as rational actions that maximize expected utility, given an agent’s mind (e.g. beliefs, desires, and intentions). We augment the Theory-of-Mind (ToM) framework by treating signaling as a rational action. The signal’s meaning is then defined as the contents of the mind rationally generating that signal, and inferred using Bayesian inference.
In the context of multi-agent cooperation, the mind to interpret is a joint “we” mind, imagined separately by each agent. This “Imagined We” is constrained by the cooperative logic of being jointly committed to realizing a goal. Resolving overloaded signaling becomes possible because of constraints from (a) cooperative logic, (b) utility maximization, and (c) rational ToM Bayesian inference.
Our model captures nuanced effects from two scenarios proposed in previous works where humans cooperate under overloaded signaling conditions. (1) When agents ask for help in the presence of multiple objects, even toddlers can disambiguate the referent of the request. (2) When agents communicate exclusively through tokens in an environment, humans flexibly capture changes in token meaning as “go to” and “avoid” from context. Our model accurately reproduces both, observed distributions of signaling and signal interpretation, with a single free parameter – degree of rationality. As a general framework enabling agents to communicate flexibly and act efficiently under ambiguity, Imagined We generates many novel predictions that can be empirically tested.