October 2020
Volume 20, Issue 11
Open Access
Vision Sciences Society Annual Meeting Abstract  |   October 2020
Intuitive Visual Communication Through Physical-Social Commonsense
Author Affiliations & Notes
  • Stephanie Stacy
    Department of Statistics, University of California - Los Angeles
  • Qingyi Zhao
    Department of Computer Science, University of California - Los Angeles
  • Max Kleiman-Weiner
    Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology
    Department of Psychology, Harvard University
  • Tao Gao
    Department of Statistics, University of California - Los Angeles
    Department of Communication, University of California - Los Angeles
  • Footnotes
    Acknowledgements  This material is based upon work supported by the National Science Foundation Graduate Research Fellowship Program under Grant No. DGE-1650604.
Journal of Vision October 2020, Vol.20, 1517. doi:https://doi.org/10.1167/jov.20.11.1517
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Stephanie Stacy, Qingyi Zhao, Max Kleiman-Weiner, Tao Gao; Intuitive Visual Communication Through Physical-Social Commonsense. Journal of Vision 2020;20(11):1517. https://doi.org/10.1167/jov.20.11.1517.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

Vision enables humans to communicate richly and flexibly. For example, pointing to the same object can represent go there or avoid that —opposite propositions— depending on the context. We construct a computational cognitive model that emulates the generative process of sending and receiving ambiguous signals. We argue that visually grounded physical-social commonsense is the key to resolving ambiguity in communication. Physical commonsense constrains the scope of actions due to the costs of interacting in an environment. Social commonsense treats observed behaviors as rational actions that maximize expected utility, given an agent’s mind (e.g. beliefs, desires, and intentions). We augment the Theory-of-Mind (ToM) framework by treating signaling as a rational action. The signal’s meaning is then defined as the contents of the mind rationally generating that signal, and inferred using Bayesian inference. In the context of multi-agent cooperation, the mind to interpret is a joint “we” mind, imagined separately by each agent. This “Imagined We” is constrained by the cooperative logic of being jointly committed to realizing a goal. Resolving overloaded signaling becomes possible because of constraints from (a) cooperative logic, (b) utility maximization, and (c) rational ToM Bayesian inference. Our model captures nuanced effects from two scenarios proposed in previous works where humans cooperate under overloaded signaling conditions. (1) When agents ask for help in the presence of multiple objects, even toddlers can disambiguate the referent of the request. (2) When agents communicate exclusively through tokens in an environment, humans flexibly capture changes in token meaning as “go to” and “avoid” from context. Our model accurately reproduces both, observed distributions of signaling and signal interpretation, with a single free parameter – degree of rationality. As a general framework enabling agents to communicate flexibly and act efficiently under ambiguity, Imagined We generates many novel predictions that can be empirically tested.


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.