Purchase this article with an account.
James Hays, Jianxiong Xiao, Krista Ehinger, Aude Oliva, Antonio Torralba; Scene categorization and detection: the power of global features. Journal of Vision 2010;10(7):1222. doi: https://doi.org/10.1167/10.7.1222.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
Scene recognition involves a different set of challenges from those posed by object recognition. Like objects, scenes are composed of parts, but whereas objects have strong constraints on the distribution of their parts, scene elements are governed by much weaker spatial constraints. Recently, a number of approaches have focused on the problem of scene classification using global features instead of encoding the objects within the scene. An important question is “what performance can be achieved using global image features?” In this work we select and design several state-of-art algorithms in computer vision and evaluate them using two datasets. First, we use the 15 scene categories dataset, a standard benchmark in computer vision for scene recognition tasks. Using a mixture of features and 100 training examples per category, we achieve 88.1% classification accuracy (human performance is 95%; the best prior computational performance is 81.2%). For a more challenging and informative test we use a new dataset containing 398 scene categories. This dataset is dramatically larger and more diverse than previous datasets and allows us to establish a new performance benchmark. Using a variety of global image features and 50 training examples per category, we achieve 34.4% classification accuracy (chance is only 0.2%). With such a large number of classes, we can examine the common confusions between scene categories and evaluate how similar they are to the target scenes and reveal which classes are most indistinguishable using global features. In addition, we introduce the concept of scene detection – detecting scenes embedded within larger scenes – in order to evaluate computational performance under a finer-grained local scene representation. Finding new global scene representations that significantly improve performance is important as it validates the usefulness of a parallel and complementary path for scene understanding that can be used to provide context for object recognition.
This PDF is available to Subscribers Only