September 2015
Volume 15, Issue 12
Free
Vision Sciences Society Annual Meeting Abstract  |   September 2015
Attention in Low Resolution: Learning Proto-Object Representations with a Deep Network
Author Affiliations
  • Chengyao Shen
    Graduate School for Integrative Science and Engineering, National University of Singapore Department of Electrical and Computer Engineering, National University of Singapore
  • Xun Huang
    School of Computer Science and Engineering, Beihang University
  • Qi Zhao
    Department of Electrical and Computer Engineering, National University of Singapore
Journal of Vision September 2015, Vol.15, 898. doi:https://doi.org/10.1167/15.12.898
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Chengyao Shen, Xun Huang, Qi Zhao; Attention in Low Resolution: Learning Proto-Object Representations with a Deep Network. Journal of Vision 2015;15(12):898. https://doi.org/10.1167/15.12.898.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

While previous researches in eye fixation prediction typically rely on integrating low-level features (e.g. color, edge) to form a saliency map, recently it has been found that the structural organization of these features into perceptual objects (proto-objects) can play a significant role, and many times more important than low-level features. In this work, we presented a computational framework based on deep network to demonstrate that proto-object representations can be learned naturally from low-resolution image patches from fixation regions. We advocated the use of low-resolution inputs in this work due to a number of reasons: (1) Stimuli triggering eye movements are usually in para-foveal or peripherial regions of the retina, which are in lower resolution compared with fovea. (2) People can perceive or recognize objects well even it is in low resolution. (3) Fixations from lower resolution images can predict fixations on higher resolution images. In the proposed computational model, we extracted multi-scale image patches on fixation regions from eye fixation datasets, resized them to low resolution and fed them into a two-layer neural network. With layer-wise unsupervised feature learning, we found that many proto-objects like features responsive to different shapes of object blobs were learned out in the second layer. Visualizations also show that these features are selective to potential objects in the scene and the responses of these features work well in predicting eye fixations on the images when combined with learned weights.

Meeting abstract presented at VSS 2015

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×