December 2022
Volume 22, Issue 14
Open Access
Vision Sciences Society Annual Meeting Abstract  |   December 2022
Artificial neural networks predict human eye movement patterns as an emergent property of training for object classification
Author Affiliations
  • Gustavo Santiago-Reyes
    Massachusetts Institute of Technology
  • Thomas O'Connell
    Massachusetts Institute of Technology
  • Nancy Kanwisher
    Massachusetts Institute of Technology
Journal of Vision December 2022, Vol.22, 4194. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Gustavo Santiago-Reyes, Thomas O'Connell, Nancy Kanwisher; Artificial neural networks predict human eye movement patterns as an emergent property of training for object classification. Journal of Vision 2022;22(14):4194.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

Eye movements spatially sample information from the environment to support visual behavior. Artificial Neural Networks (ANNs) optimized on eye movements capture >90% of the explainable variance in static eye movement patterns (MIT/Tübingen Saliency Benchmark,, and ANN reconstructions from fMRI responses predict eye movements (O’Connell & Chun, 2018). While this suggest that ANN-like representations could support spatial attention in humans, it is unclear how such representations are learned. Here, we test the hypothesis that ANNs optimized for object recognition incidentally learn features predictive of human eye movement behavior as an emergent property without any direct optimization on eye movement data. We completed a large-scale analysis of eye movement predictivity using 1156 layers from 60 ANNs trained on ImageNet to predict fixation patterns from O’Connell & Walther 2011 and CAT2000 (Borji and Itti, 2015). Across ANNs and datasets, emergent eye movement predictivity – computed as Normalized Scanpath Saliency (NSS) – shows an initial peak in early-middle layers (Average NSS = 0.32, SEM = 0.06), with highest predictivity coming from the final convolutional layers (Average NSS = 0.63 SEM = 0.06), suggesting that eye movement predictive representations develop spontaneously along with object recognition. However, the peak eye movement predictivity overall (NSS = 1.52, final layer from resnet18-contrastive-multiview) is below the best model on the Saliency Benchmark (DeepGaze II, NSS = 1.96), indicating that emergent eye movement predictivity captures only a portion of the explainable behavioral variance. Additionally, we find that eye movement predictivity across the model-zoo is not meaningfully correlated to neural predictivity in monkey V4 and IT (r < 0.21), showing that emergent eye movement predictivity captures independent variance from existing neural benchmarks. We propose emergent eye movement predictivity as a novel quantitative benchmark to compare ANNs to human behavior.


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.