September 2024
Volume 24, Issue 10
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2024
Comparing Human eye-tracking heatmaps with DNN saliency maps for faces at different spatial frequencies
Author Affiliations & Notes
  • Michal Fux
    The Department of Brain and Cognitive Sciences, MIT
  • Joydeep Monshi
    AI, Machine Learning and Computer Vision, GE Research, Niskayuna, NY, USA
  • Hojin Jang
    The Department of Brain and Cognitive Sciences, MIT
  • Charlotte H Lahey
    Keene State College
  • Suayb S Arslan
    The Department of Brain and Cognitive Sciences, MIT
  • Walter V Dixon III
    AI, Machine Learning and Computer Vision, GE Research, Niskayuna, NY, USA
  • Matthew Groth
    The Department of Brain and Cognitive Sciences, MIT
  • Pawan Sinha
    The Department of Brain and Cognitive Sciences, MIT
  • Footnotes
    Acknowledgements  This research is supported by ODNI, IARPA. The views are of the authors and shouldn't be interpreted as representing official policies of ODNI, IARPA, or the U.S. Gov., which is authorized to reproduce & distribute reprints for governmental purposes notwithstanding any copyright annotation therein.
Journal of Vision September 2024, Vol.24, 1329. doi:https://doi.org/10.1167/jov.24.10.1329
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Michal Fux, Joydeep Monshi, Hojin Jang, Charlotte H Lahey, Suayb S Arslan, Walter V Dixon III, Matthew Groth, Pawan Sinha; Comparing Human eye-tracking heatmaps with DNN saliency maps for faces at different spatial frequencies. Journal of Vision 2024;24(10):1329. https://doi.org/10.1167/jov.24.10.1329.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Deep neural network (DNN)-based Face recognition (FR) models have improved greatly over the past decades achieving, or even exceeding, human-level accuracies under certain viewing conditions, such as frontal face views. However, as we reported in last year’s meeting (XXX et al., 2023), under challenging viewing conditions (e.g. large distances, non-frontal regard) humans outperform DNNs. To shed light on potential explanations for these differences in FR accuracies of humans and DNNs, we turned to eye-tracking paradigms to discern potentially important zones of information uptake for observers, and compare them with DNN-derived saliency maps. Despite the conceptual similarity between human eye tracking-based heat-maps and DNN saliency maps, the literature is sparse in terms of strategic efforts to quantitatively compare the two and translate human gaze and attention strategies to improve machine performance. We obtained gaze-contingent (GC) human eye-tracking heatmaps and DNN saliency maps, for faces, under three stimulus conditions: filtered for low-spatial frequency, high-spatial frequency, and full-resolution images. Human participants saw two sequentially presented faces and were asked to determine whether the individuals depicted were siblings (images from Vieira et. al., 2014) or two images of the same person (Stirling face database). While human eye-tracking heatmaps were collected during each occurrence of face images (sibling/stirling), DNN saliency maps were realized from differences in similarity score between the machine-interpreted face embeddings of pairs of face images using an efficient correlation-based explainable AI approach. We present the characterization and comparison of humans’ and DNN’s usage of the spatial frequency information in faces, and propose a model-agnostic translation strategy for improved face recognition performance utilizing an efficient training approach to bring DNN saliency maps into closer register with human eye-tracking heatmaps.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×