September 2018
Volume 18, Issue 10
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2018
Modelling complex perception-action choices
Author Affiliations
  • Ruohan Zhang
    Department of Computer Science, The University of Texas at Austin
  • Jake Whritner
    Center for Perceptual Systems, The University of Texas at Austin
  • Zhuode Liu
    Department of Computer Science, The University of Texas at Austin
  • Luxin Zhang
    Department of Intelligence Science, Peking University
  • Karl Muller
    Center for Perceptual Systems, The University of Texas at Austin
  • Mary Hayhoe
    Center for Perceptual Systems, The University of Texas at Austin
  • Dana Ballard
    Department of Computer Science, The University of Texas at Austin
Journal of Vision September 2018, Vol.18, 533. doi:https://doi.org/10.1167/18.10.533
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Ruohan Zhang, Jake Whritner, Zhuode Liu, Luxin Zhang, Karl Muller, Mary Hayhoe, Dana Ballard; Modelling complex perception-action choices. Journal of Vision 2018;18(10):533. https://doi.org/10.1167/18.10.533.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

In many contexts, such as reading and sandwich making, the function of gaze is readily interpretable. However, in other contexts (e.g., game playing) the underlying task structure has no ready interpretation. The development of Convolutional Neural Nets (CNNs) has led to breakthroughs in models of human pattern recognition. Here we show their potential for predicting the linkage between gaze and action choices as a way of revealing the underlying task structure in complex behavior. We collected records of actions and gaze using an EyeLink 1000 eye tracker, while participants played eight Atari games in the Arcade Learning Environment. This dataset was used to train two deep CNNs, which we refer to as the Gaze Network and the Policy Network. The former is trained on human gaze data and learns to accurately predict observed gaze positions given the image. The Policy network learns to predict actions (such as firing or evasion) given the image and gaze. The result is that incorporating the output of the Gaze Network into the Policy Network significantly improves the accuracy of human action prediction. By incorporating an attention model that extracts the features that are most important for the task, the learning agent is able to better imitate the human demonstrator's behaviors. The visual attention model also enables the agent to learn better policies, resulting in higher game scores than previous networks. Those networks were trained to predict behaviors demonstrated by human experts using image data, but had no overt attention cues. Our results demonstrate that adding the gaze data greatly improves predictions accuracy (up to 16%). Consequently, the approach explored here has potential for modeling complex visually guided behavior and discovering the underlying task structure.

Meeting abstract presented at VSS 2018

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×