September 2021
Volume 21, Issue 9
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2021
Comparing Human and AI Attention in Visuomotor Tasks
Author Affiliations & Notes
  • Ruohan Zhang
    University of Texas at Austin
  • Sihang Guo
    University of Texas at Austin
  • Bo Liu
    University of Texas at Austin
  • Yifeng Zhu
    University of Texas at Austin
  • Dana Ballard
    University of Texas at Austin
  • Peter Stone
    University of Texas at Austin
  • Mary Hayhoe
    University of Texas at Austin
  • Footnotes
    Acknowledgements  NIH EY05729
Journal of Vision September 2021, Vol.21, 2056. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Ruohan Zhang, Sihang Guo, Bo Liu, Yifeng Zhu, Dana Ballard, Peter Stone, Mary Hayhoe; Comparing Human and AI Attention in Visuomotor Tasks. Journal of Vision 2021;21(9):2056.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

Deep reinforcement learning (RL) is a powerful machine learning tool to train AIs to solve visuomotor tasks. Recently these algorithms have achieved human-level performance in tasks such as video games. However, the trained models are often difficult to interpret, because they are represented as deep neural networks that map raw pixel inputs directly to decisions. It is hence unclear whether AIs and humans solve these tasks in similar or different ways, and why AIs and humans perform relatively well or poorly in certain tasks. To understand human visuomotor behaviors in Atari video games, Zhang et al. (2020) collected a dataset of human eye-tracking and decision-making data. Meanwhile, Greydanus et al. (2018) proposed a method to interpret deep RL agents by visualizing the “attention” of RL agents in the form of saliency maps. Combining these two works allows us to shed light on the inner workings of RL agents by analyzing the pixels that they attend to during task execution and comparing them with the pixels attended to by humans. We ask: 1) How similar are the visual features learned by RL agents and humans when performing the same tasks? 2) How do similarities and differences in these learned features correlate with RL agents' performance? We show how the attention of RL agents develops and becomes more human-like during the learning process, as well as how varying the parameters of reward function affects the learned attention. Additionally, compared to humans, RL agents still make simple mistakes in perception (e.g., failing to attend to important objects), and generalize poorly to unfamiliar situations. The insights provided have the potential to inform novel algorithms for closing the performance gap between RL agents and human experts. They also indicate the relative advantages and disadvantages of humans, compared to AIs, in performing these visuomotor tasks.


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.