Abstract
The last few years have brought us both large-scale image databases and the ability to crowd-source human data collection, allowing us to measure contextual statistics in real world scenes (Greene, 2013). How much contextual information is there, and how efficiently do people use it? We created a visual analog to a guessing game suggested by Claude Shannon (1951) to measure the information scenes and objects share. In our game, 555 participants on Amazon's Mechanical Turk (AMT) viewed scenes in which a single object was covered by an opaque bounding box. Participants were instructed to guess about the identity of the hidden object until correct. Participants were paid per trial, and each trial terminated upon correctly guessing the object, so participants were incentivized to guess as efficiently as possible. Using information theoretic measures, we found that scene context can be encoded with less than 2 bits per object, a level of redundancy that is even greater than that of English text. To assess the information from scene category, we ran a second experiment in which the image was replaced by the scene category name. Participants still outperformed the entropy of the database, suggesting that the majority of contextual knowledge is carried by the category schema. Taken together, these results suggest that not only is there a great deal of information about objects coming from scene categories, but that this information is efficiently encoded by the human mind.
Meeting abstract presented at VSS 2017