Abstract
Research in visual scene understanding has been limited by a lack of large databases of real-world scenes. Databases used to study object recognition frequently contain hundreds of different object classes, but the largest available dataset of scene categories contains only 15 scene types. In this work, we present a semi-exhaustive database of 130,000 images organized into 900 scene categories, produced by cataloguing all of the place type or environment terms found in WordNet. We obtained human typicality ratings for all of the images in each category through an online rating task on Amazon's Mechanical Turk service, and used the ratings to identify prototypical examplars of each scene type. We then used these prototype scenes as the basis for a naming task, from which we established the basic-level categorization of our 900 scene types. We also used the prototypes in a scene sorting task, and created the first semantic taxonomy of real-world scenes from a hierarchical clustering model of the sorting results. This taxonomy combines environments that have similar functions and separates environments that are semantically different. We find that man-made outdoor and indoor scene taxonomies are similar, both based on the social function of the scenes. Natural scenes, on the other hand, are primarily sorted according to surface features (snow vs. grass, water vs. rock). Because recognizing types of scenes or places poses different challenges from object classification -- scenes are continuous with each other, whereas objects are discrete -- large databases of real-world scenes and taxonomies of the semantic organization of scenes are critical for further research in scene understanding.
Funded by NSF CAREER award to A.O. (0546262) and NSF CAREER Award to A.T. (0747120). K.A.E. is supported by a NSF Graduate Research Fellowship.