October 2020
Volume 20, Issue 11
Open Access
Vision Sciences Society Annual Meeting Abstract  |   October 2020
How much time do you have? Introducing a multi-duration saliency model
Author Affiliations
  • Camilo Fosco
    Massachusetts Institute of Technology
  • Anelise Newman
    Massachusetts Institute of Technology
  • Patr Sukhum
    Harvard University
  • Yun Bin Zhang
    Harvard University
  • Aude Oliva
    Massachusetts Institute of Technology
  • Zoya Bylinskii
Journal of Vision October 2020, Vol.20, 1005. doi:https://doi.org/10.1167/jov.20.11.1005
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Camilo Fosco, Anelise Newman, Patr Sukhum, Yun Bin Zhang, Aude Oliva, Zoya Bylinskii; How much time do you have? Introducing a multi-duration saliency model. Journal of Vision 2020;20(11):1005. https://doi.org/10.1167/jov.20.11.1005.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

What jumps out in a single glance of an image is different than what you might notice after closer inspection. Despite this, current computational models of visual saliency predict human gaze patterns at an arbitrary, fixed viewing duration (one image: one saliency map). This offers a limited view of the rich interactions between image content and gaze, and obscures the fact that different image content might be salient at different time points. In this paper we propose to capture gaze as a series of snapshots (one image: multiple saliency maps). Rather than aggregating individual scanpaths, we directly generate population-level saliency heatmaps for multiple viewing durations. Towards this goal, we turn to CodeCharts UI, a cost-effective interface for crowdsourcing gaze data without requiring an eye tracker. This interface provides precise control over timing, which allows us to gather attention patterns at different viewing durations. We collect the CodeCharts1K dataset with attention data for 0.5, 3, and 5 seconds of free-viewing on images from action, memorability, and out-of-context datasets. We find that gaze locations differ significantly across the three viewing durations but are consistent across participants within a duration, leading to multiple distinct heatmaps per image. Using insights from our analysis of human gaze data, we develop a temporally-aware deep learning model of saliency that simultaneously trains on data from multiple viewing durations. Our computational model achieves competitive performance on the LSUN 2017 Saliency Prediction Challenge when tested at the same viewing duration used for collecting the ground-truth human data. Importantly, our model also simultaneously produces predictions at multiple viewing durations. We discuss how knowing what is salient over different viewing windows can be used for image cropping, compression, and captioning applications.


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.