September 2015
Volume 15, Issue 12
Free
Vision Sciences Society Annual Meeting Abstract  |   September 2015
Functions Provide a Fundamental Categorization Principle for Scenes
Author Affiliations
  • Michelle Greene
    Stanford University, Department of Computer Science
  • Christopher Baldassano
    Stanford University, Department of Computer Science
  • Andre Esteva
    Stanford University, Department of Electrical Engineering
  • Diane Beck
    University of Illinois Urbana-Champaign, Department of Psychology
  • Li Fei-Fei
    Stanford University, Department of Computer Science
Journal of Vision September 2015, Vol.15, 572. doi:https://doi.org/10.1167/15.12.572
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Michelle Greene, Christopher Baldassano, Andre Esteva, Diane Beck, Li Fei-Fei; Functions Provide a Fundamental Categorization Principle for Scenes. Journal of Vision 2015;15(12):572. https://doi.org/10.1167/15.12.572.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

How do we know that a kitchen is a kitchen by looking? Here we start with the intuition that although scenes consist of visual features and objects, scene categories should reflect how we use category information. In our daily lives, rather than asking ourselves whether we are in (for example) a “beach” scene, we tend to use scene category information for actions such as navigation and search. Because we act within scenes, we test the hypothesis that scene categories arise from scene functions. We collected a large-scale scene category similarity matrix (5 million trials) by asking observers to simply decide whether two images were from the same or different categories. To serve as a noise ceiling (maximum possible correlation) for comparison among computational models, we used boostrapping to compute the correlation among observers (r=0.75). Using the actions from the American Time Use Survey, we normed these scenes by what actions could be done in each scene (1.4 million trials). We found a strong relationship between category distance and functional distance (r=0.50, or 66% of the maximum possible correlation from noise ceiling). Furthermore, the function-based model outperformed alternative models of object-based similarity (r=0.33), visual features from a convolutional neural network (r=0.39), lexical distance between category names (r=0.27), and models of simple visual features such as color histograms (r=0.09), gabor wavelets (r=0.07), and gist descriptor (r=0.11). Using hierarchical linear regression, we found that functions captured 85.5% of overall explained variance, with nearly half of the explained variance captured only by functions, implying that the predictive power of alternative models was due to their shared variance with the function-based model. These results challenge the dominant school of thought that visual features and objects are sufficient for categorization, and provide immediately testable hypotheses about the functional organization of the human visual system.

Meeting abstract presented at VSS 2015

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×