May 2008
Volume 8, Issue 6
Vision Sciences Society Annual Meeting Abstract  |   May 2008
Comparison of gist models in rapid scene categorization tasks
Author Affiliations
  • Christian Siagian
    Computer Science Department, University of Southern California
  • Laurent Itti
    Computer Science Department, University of Southern California
Journal of Vision May 2008, Vol.8, 734. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Christian Siagian, Laurent Itti; Comparison of gist models in rapid scene categorization tasks. Journal of Vision 2008;8(6):734.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

The capacity of humans to perform a number of complex visual tasks such as scene categorization and object detection in as little as 100ms has been attributed their ability to rapidly extract the gist of a scene. Existing models of gist utilize various types of low-level features, color (Ulrich & Nourbakhsh 2001), Fourier component profiles (Oliva & Torralba 2001), textures (Renniger & Malik 2004), steerable wavelets (Torralba, et. al. 2003), and a combination of these (Siagian & Itti 2007). Some of the methods compute feature histograms from the whole image, while others encode rough spatial information by using a predefined grid system. Here, we systematically compare gist models with categorization tasks of increasing difficulty. We investigate how far these low level features can describe complicated real-world scenes. With three outdoor test sites - a building complex (26368 training images, 13965 testing images), a park full of trees (66291/26397 images), and a spacious open-field area (82747/34711 images) - which provide visually distinct challenges, we first ask the question of which scene belongs to which site. As a baseline for comparison fo other models we used the classification rate of our combination model (95% success). Then we divide each site into nine distinct segments, to test finer classification ability (baseline of 85% success). We finally divide the segments into smaller geographical regions, making it an even harder to do scene classification as the regions become more similar visually. This, in turn, forces the competing systems to look for detailed attributes to exploit. The hypothesis is that each particular system (or more importantly, the features they use) will be able to distinguish some segments but not others.

Siagian, C. Itti, L. (2008). Comparison of gist models in rapid scene categorization tasks [Abstract]. Journal of Vision, 8(6):734, 734a,, doi:10.1167/8.6.734. [CrossRef]
 The authors gratefully acknowledge the contribution of NSF, HFSP, NGA, and DARPA.

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.