Purchase this article with an account.
Li Fei-Fei, Christof Koch, Asha Iyer, Pietro Perona; What do we see when we glance at a scene?. Journal of Vision 2004;4(8):863. doi: 10.1167/4.8.863.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
What exactly do we see when we glance at a natural scene? And does what we see change as the glance becomes longer? We asked naïve subjects to report what they saw in briefly presented photographs. Our subjects received no specific information as to the content of each stimulus, and were asked to report what they saw in free-form text. Thus, our paradigm differs from previous studies where subjects were either cued before a stimulus was presented, and/or were probed with multiple-choice questions. Our experiment consisted of two stages. First, a group of 22 native-English speaking subjects were shown 100 novel gray-scale photographs foveally. The photographs contained a broad sample of real-life scenarios both indoor and outdoor. Each presentation time was chosen at random in the set of 7 possible times (from 27ms to 500ms). A perceptual mask followed each photograph immediately. After each presentation subjects reported what they had just seen as completely as possible. In the second stage, another group of sophisticated individuals who were not aware of the goals of the experiment were instructed to score each of the descriptions produced by the subjects in the first stage. Individual scores were assigned to different categories of the content: sensory data (edges, colored patches, etc.), objects, relations between objects, description of the overall scene (gist), completeness and writer's confidence. We find that for most subjects, perception of objects, object relations and overall scenes improves sharply for display times between 50ms and 80ms. For longer display times the quality of their reports improves minimally. We find little evidence to support a temporal order in the report of objects vs. “gist”. The identity of faces of famous people was amongst the last pieces of information to be perceived, with display times typically above 100ms. The presence of animals, people and vehicles was reported early, often with display times below 50ms.
This PDF is available to Subscribers Only