Purchase this article with an account.
Zijun Wei, Hossein Adeli, Minh Hoai, Gregory Zelinsky, Dimitris Samaras; Predicting Scanpath Agreement during Scene Viewing using Deep Neural Networks. Journal of Vision 2017;17(10):749. doi: 10.1167/17.10.749.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
Eye movements are a widely used measure of overt shifts of attention, but this measure is often limited by poor agreement in peoples' gaze, which can vary significantly in the context of free viewing. In this work we ask whether the level of scanpath agreement among participants during scene viewing, quantified using a modified version of the MultiMatch method (Dewhurst et al., 2012), can be predicted using a Deep Neural Network (DNN). Specifically, using image features extracted from the last convolutional layer of a DNN trained for object recognition, we found a linear weighting such that positive regressor weights indicated the presence of image features resulting in greater gaze agreement among viewers. Image regions corresponding to these features were then found by back-propagating the features to the image space using the probabilistic Selective Tuning Attention model (Zhang et al., 2016, ECCV). Combining these regions from all positively weighted features yielded an activation map reflecting the image features important for predicting scanpath consistency among people freely viewing scenes. The model was trained on a randomly selected 80% of the MIT1003 dataset (Judd et al., 2009) and tested on the remaining 20%, repeated 10 times. We found that this linear regressor model was able to predict for each image the level of agreement in the viewers' scanpaths (r = 0.3, p < .01). Consistent with previous findings, in qualitative analyses we also found that the features of text, faces, and bodies were especially important in predicting gaze agreement. This work introduces a novel method for predicting scanpath agreement, and for identifying the underlying image features important for creating agreement in collective viewing behavior. Future work will extend this approach to identify features of a target goal that are important for producing uniformly strong attentional guidance in the context of visual search tasks.
Meeting abstract presented at VSS 2017
This PDF is available to Subscribers Only