Abstract
It remains unclear whether humans recognize scenes by using “diagnostic objects” or by using global attributes of the scene. Based on our work, we argue that not all scenes are created equal and that varying levels of information are needed for recognition at different category levels. We hypothesize that there is a time course to scene processing. Global scene information may be accessed early and can be used for superordinate-level categorization, but recognition of “diagnostic objects” takes longer and is necessary for basic-level categorization. EXPERIMENT ONE: Subjects were asked to divide grayscale images into multiple sub-groups based on global visual similarity, while ignoring the object and scene content of the image. We found that the resulting sub-groups could be categorized at the superordinate-level as natural or man-made scenes, but categorization beyond that level was not achieved. EXPERIMENT TWO: Subjects were shown grayscale images of scenes, followed by two basic-level category choices. The task was to select the category that best described the image. When subjects were allowed to view the images for 150ms, they demonstrated nearly perfect performance. When the viewing time was reduced to 30ms, subjects made more errors categorizing man-made scenes than natural scenes. The distribution of errors revealed a higher level of confusion among indoor scenes. EXPERIMENT THREE: A simple texture-based classifier was tested and compared to human performance at 30ms. The classifier made similar errors, but was outperformed overall. Based on this result, it is clear that subjects are using more than just global texture to do basic-level scene classification in 30ms. CONCLUSIONS: Global properties of a scene, such as texture and shape, may be instructive in the early, superordinate-level categorization of scenes. Basic-level categorization may require more time and more information, such as the processing of “diagnostic objects”.