September 2019
Volume 19, Issue 10
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2019
Multiple-object Control Predicts Movements of Attention During Free Viewing
Author Affiliations & Notes
  • Yupei Chen
    Department of Psychology, Stony Brook University
  • Gregory Zelinsky
    Department of Psychology, Stony Brook University
    Department of Computer Science, Stony Brook University
Journal of Vision September 2019, Vol.19, 269d. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Yupei Chen, Gregory Zelinsky; Multiple-object Control Predicts Movements of Attention During Free Viewing. Journal of Vision 2019;19(10):269d. doi:

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

People spend a significant amount of their time freely viewing the world in the absence of a task. The dominant class of models attempting to explain this free-viewing behavior computes saliency, a measure of local feature contrast in an image, to obtain a strictly bottom-up attention priority map. Our contention is that the directionality of attention control may be exactly opposite; that free viewing may be guided by a top-down control process that we refer to as multiple-object search. Unlike standard search in which there is typically only a single target, multiple-object search distributes the target goal over several objects, thereby diluting the contribution of any one and creating a diffuse object-priority signal. To compute this signal we borrowed computer vision methods for localizing a trained object class in an image by backpropagating activity from a high-layer in a deep network to lower layers closer to the pixel space. Several object-localization methods exist, but we chose STNet (Biparva & Tsotsos, 2017) because it is inspired by the brain’s attention mechanism. Using STNet we computed an object localization map for each of 1000 categories (ImageNet), which we averaged to create one top-down objectness map. We evaluated our method by predicting the free-viewing fixations in the MIT-ICCV dataset of 1003 scenes. For each scene, the location of maximum object-map activity was selected for fixation, followed by spatial inhibition and the iterative selection of the next most active location until six-fixation scanpaths were obtained. We also obtained scanpath predictions from several bottom-up saliency models. Using vector similarity for scanpath comparison, we found that predictions from objectness maps were as good as those from saliency maps, with the best predictions obtained by combining the two. This suggests that top-down attention control signals originating from learned object categories may influence even ostensibly task-free viewing behavior.


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.