September 2017
Volume 17, Issue 10
Open Access
Vision Sciences Society Annual Meeting Abstract  |   August 2017
Predicting Scanpath Agreement during Scene Viewing using Deep Neural Networks
Author Affiliations
  • Zijun Wei
    Department of Computer Science, Stony Brook University
  • Hossein Adeli
    Department of Psychology, Stony Brook University
  • Minh Hoai
    Department of Computer Science, Stony Brook University
  • Gregory Zelinsky
    Department of Computer Science, Stony Brook University
    Department of Psychology, Stony Brook University
  • Dimitris Samaras
    Department of Computer Science, Stony Brook University
Journal of Vision August 2017, Vol.17, 749. doi:https://doi.org/10.1167/17.10.749
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Zijun Wei, Hossein Adeli, Minh Hoai, Gregory Zelinsky, Dimitris Samaras; Predicting Scanpath Agreement during Scene Viewing using Deep Neural Networks. Journal of Vision 2017;17(10):749. https://doi.org/10.1167/17.10.749.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Eye movements are a widely used measure of overt shifts of attention, but this measure is often limited by poor agreement in peoples' gaze, which can vary significantly in the context of free viewing. In this work we ask whether the level of scanpath agreement among participants during scene viewing, quantified using a modified version of the MultiMatch method (Dewhurst et al., 2012), can be predicted using a Deep Neural Network (DNN). Specifically, using image features extracted from the last convolutional layer of a DNN trained for object recognition, we found a linear weighting such that positive regressor weights indicated the presence of image features resulting in greater gaze agreement among viewers. Image regions corresponding to these features were then found by back-propagating the features to the image space using the probabilistic Selective Tuning Attention model (Zhang et al., 2016, ECCV). Combining these regions from all positively weighted features yielded an activation map reflecting the image features important for predicting scanpath consistency among people freely viewing scenes. The model was trained on a randomly selected 80% of the MIT1003 dataset (Judd et al., 2009) and tested on the remaining 20%, repeated 10 times. We found that this linear regressor model was able to predict for each image the level of agreement in the viewers' scanpaths (r = 0.3, p < .01). Consistent with previous findings, in qualitative analyses we also found that the features of text, faces, and bodies were especially important in predicting gaze agreement. This work introduces a novel method for predicting scanpath agreement, and for identifying the underlying image features important for creating agreement in collective viewing behavior. Future work will extend this approach to identify features of a target goal that are important for producing uniformly strong attentional guidance in the context of visual search tasks.

Meeting abstract presented at VSS 2017

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×