August 2010
Volume 10, Issue 7
Free
Vision Sciences Society Annual Meeting Abstract  |   August 2010
Scene categorization and detection: the power of global features
Author Affiliations
  • James Hays
    Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology
  • Jianxiong Xiao
    Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology
  • Krista Ehinger
    Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology
  • Aude Oliva
    Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology
  • Antonio Torralba
    Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology
Journal of Vision August 2010, Vol.10, 1222. doi:https://doi.org/10.1167/10.7.1222
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      James Hays, Jianxiong Xiao, Krista Ehinger, Aude Oliva, Antonio Torralba; Scene categorization and detection: the power of global features. Journal of Vision 2010;10(7):1222. https://doi.org/10.1167/10.7.1222.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Scene recognition involves a different set of challenges from those posed by object recognition. Like objects, scenes are composed of parts, but whereas objects have strong constraints on the distribution of their parts, scene elements are governed by much weaker spatial constraints. Recently, a number of approaches have focused on the problem of scene classification using global features instead of encoding the objects within the scene. An important question is “what performance can be achieved using global image features?” In this work we select and design several state-of-art algorithms in computer vision and evaluate them using two datasets. First, we use the 15 scene categories dataset, a standard benchmark in computer vision for scene recognition tasks. Using a mixture of features and 100 training examples per category, we achieve 88.1% classification accuracy (human performance is 95%; the best prior computational performance is 81.2%). For a more challenging and informative test we use a new dataset containing 398 scene categories. This dataset is dramatically larger and more diverse than previous datasets and allows us to establish a new performance benchmark. Using a variety of global image features and 50 training examples per category, we achieve 34.4% classification accuracy (chance is only 0.2%). With such a large number of classes, we can examine the common confusions between scene categories and evaluate how similar they are to the target scenes and reveal which classes are most indistinguishable using global features. In addition, we introduce the concept of scene detection – detecting scenes embedded within larger scenes – in order to evaluate computational performance under a finer-grained local scene representation. Finding new global scene representations that significantly improve performance is important as it validates the usefulness of a parallel and complementary path for scene understanding that can be used to provide context for object recognition.

Hays, J. Xiao, J. Ehinger, K. Oliva, A. Torralba, A. (2010). Pre-exposure interferes with perceptual learning for ambiguous stimuli [Abstract]. Journal of Vision, 10(7):1222, 1222a, http://www.journalofvision.org/content/10/7/1222, doi:10.1167/10.7.1222. [CrossRef]
Footnotes
 Funded by NSF CAREER award 0546262 to A.O. and NSF CAREER Award 0747120 to A.T.
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×