August 2010
Volume 10, Issue 7
Free
Vision Sciences Society Annual Meeting Abstract  |   August 2010
What's behind the box? Measuring scene context effects with Shannon's guessing game on indoor scenes
Author Affiliations
  • Michelle Greene
    Brigham and Women's Hospital
    Harvard Medical School
  • Aude Oliva
    Massachusetts Institute of Technology
  • Jeremy Wolfe
    Brigham and Women's Hospital
    Harvard Medical School
  • Antonio Torralba
    Massachusetts Institute of Technology
Journal of Vision August 2010, Vol.10, 1259. doi:https://doi.org/10.1167/10.7.1259
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Michelle Greene, Aude Oliva, Jeremy Wolfe, Antonio Torralba; What's behind the box? Measuring scene context effects with Shannon's guessing game on indoor scenes. Journal of Vision 2010;10(7):1259. https://doi.org/10.1167/10.7.1259.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Natural scenes are lawful, predictable entities: objects do not float unsupported, spoons are more often found with forks than printers, and it makes little sense to search for toilets in dining rooms. Although visual context has often been manipulated in object and scene recognition studies, it has not yet been formally measured. Information theory specifies how much information is required to encode objects in a scene, assuming no contextual knowledge. We can then measure, in bits per object, the information benefit provided by human observers' contextual knowledge. We used a database of 100 indoor scenes, containing 352 unique objects labeled using the LabelMe tool. If all objects were equally probable in a scene, 8.46 bits per object would be required (log2(352)). Taking object frequency into account (i.e., chairs are more common than basketballs), would only reduce this number to 7.22 bits per object. To measure the information required by humans to represent objects in scenes, we adapted the guessing game proposed by Shannon (1951). Between 5-80% of the objects in each scene were occluded by opaque bounding boxes. Observers guessed the identity of each occluded object until the object was correctly named. More than 60% of objects were correctly guessed on the first try, because context massively constrains the identity of a hidden object (What might you guess was hidden next to a plate on a dinner table?). Fully 93% of objects were correctly guessed within 10 tries. Overall, we found that observers could represent the database with just 1.86 bits per object when 5-10% of objects were masked. Just 2.00 bits per object were needed even when the majority of objects were masked. This technique can be used to measure the redundancy provided by aspects of context such as scene category, object density and object consistency.

Greene, M. Oliva, A. Wolfe, J. Torralba, A. (2010). What's behind the box? Measuring scene context effects with Shannon's guessing game on indoor scenes [Abstract]. Journal of Vision, 10(7):1259, 1259a, http://www.journalofvision.org/content/10/7/1259, doi:10.1167/10.7.1259. [CrossRef]
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×