August 2010
Volume 10, Issue 7
Free
Vision Sciences Society Annual Meeting Abstract  |   August 2010
Predicting object and scene descriptions with an information-theoretic model of pragmatics
Author Affiliations
  • Michael Frank
    Department of Brain and Cognitive Sciences, MIT
  • Avril Kenney
    Department of Brain and Cognitive Sciences, MIT
  • Noah Goodman
    Department of Brain and Cognitive Sciences, MIT
  • Joshua Tenenbaum
    Department of Brain and Cognitive Sciences, MIT
    Computer Science and Artificial Intelligence Lab, MIT
  • Antonio Torralba
    Computer Science and Artificial Intelligence Lab, MIT
  • Aude Oliva
    Department of Brain and Cognitive Sciences, MIT
Journal of Vision August 2010, Vol.10, 1241. doi:10.1167/10.7.1241
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to Subscribers Only
      Sign In or Create an Account ×
    • Get Citation

      Michael Frank, Avril Kenney, Noah Goodman, Joshua Tenenbaum, Antonio Torralba, Aude Oliva; Predicting object and scene descriptions with an information-theoretic model of pragmatics. Journal of Vision 2010;10(7):1241. doi: 10.1167/10.7.1241.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

A picture may be worth a thousand words, but its description will likely use far fewer. How do speakers choose which aspects of a complex image to describe? Grice's pragmatic maxims (e.g., “be relevant”, “be informative”) have served as an informal guide for understanding how speakers select which pieces of information to include in descriptions. We present a formalization of Grice's maxim of informativeness (“choose descriptions proportional to the number of bits they convey about the referent with respect to context”) and test its ability to capture human performance.

Experiment 1: Participants saw sets of four simple objects that varied on two dimensions (e.g., texture and shape) and were asked to provide the relative probabilities of using two different adjectives (e.g., polka-dot vs. square) to describe a target object relative to the distractor objects. Participants' mean probabilities were highly correlated with the information theoretic model’s predictions for the relative informativeness of the two adjectives (r=.92,p<.0001).

Experiment 2: Participants described street scenes from a database of hand-segmented and labeled images (LabelMe). In the context condition, participants were presented with a set of six scenes, one target and five distractors, and were asked to name five objects in the target scene so that another observer could pick that scene out of the set. In the no-context condition, another group of participants performed the same task without seeing the distractors. The information-theoretic model was strongly correlated with differences in object labeling between context and no-context conditions (r=.67,p<.0001), suggesting that the model captures the effect of context on descriptor choice.

Our results suggest that speakers' image descriptions conform to optimal pragmatic norms and that information theory can define norms for the linguistic compression of visual information. This constitutes a first step towards understanding how the visual world is captured and communicated by language users.

Frank, M. Kenney, A. Goodman, N. Tenenbaum, J. Torralba, A. Oliva, A. (2010). Predicting object and scene descriptions with an information-theoretic model of pragmatics [Abstract]. Journal of Vision, 10(7):1241, 1241a, http://www.journalofvision.org/content/10/7/1241, doi:10.1167/10.7.1241. [CrossRef]
Footnotes
 NSF DDRIG #0746251.
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×