August 2014
Volume 14, Issue 10
Vision Sciences Society Annual Meeting Abstract  |   August 2014
Towards a model for mid-level feature representation of scenes
Author Affiliations
  • Mariya Toneva
    Department of Computer Science, Yale University
  • Elissa Aminoff
    Center for the Neural Basis of Cognition, Carnegie Mellon University
  • Abhinav Gupta
    The Robotics Institute, School of Computer Science, Carnegie Mellon University
  • Michael Tarr
    Center for the Neural Basis of Cognition, Carnegie Mellon University
Journal of Vision August 2014, Vol.14, 363. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Mariya Toneva, Elissa Aminoff, Abhinav Gupta, Michael Tarr; Towards a model for mid-level feature representation of scenes . Journal of Vision 2014;14(10):363.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

Never Ending Image Learner (NEIL) is a semi-supervised learning algorithm that continuously pulls images from the web and learns relationships among them. NEIL has classified over 400,000 images into 917 scene categories using 84 dimensions - termed "attributes". These attributes roughly correspond to mid-level visual features whose differential combinations define a large scene space. As such, NEIL's small set of attributes offers a candidate model for the psychological and neural representation of scenes. To investigate this, we tested for significant similarities between the structure of scene space defined by NEIL and the structure of scene space defined by patterns of human BOLD responses as measured by fMRI. The specific scenes in our study were selected by reducing the number of attributes to the 39 that best accounted for variance in NEIL's scene-attribute co-classification scores. Fifty scene categories were then selected such that each category scored highly on a different set of at most 3 of the 39 attributes. We then selected the two most representative images of the corresponding high-scoring attributes from each scene category, resulting in a total of 100 stimuli used. Canonical correlation analyses (CCA) was used to test the relationship between measured BOLD patterns within the functionally-defined parahippocampal region and NEIL's representation of each stimulus as a vector containing stimulus-attribute co-classification scores on the 39 attributes. CCA revealed significant similarity between the local structures of the fMRI data and the NEIL representations for all participants. In contrast, neither the entire set of 84 attributes nor 39 randomly-chosen attributes produced significant results using this CCA method. Overall, our results indicate that subsets of the attributes learned by NEIL are effective in accounting for variation in the neural encoding of scenes – as such they represent a first pass compositional model of mid-level features for scene representation.

Meeting abstract presented at VSS 2014


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.