September 2021
Volume 21, Issue 9
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2021
Prediction of visual attention in embodied real-world tasks
Author Affiliations & Notes
  • J. Alex Harston
    Imperial College London
  • Chaiyawan Auepanwiriyakul
    Imperial College London
  • Aldo Faisal
    Imperial College London
  • Footnotes
    Acknowledgements  EPSRC
Journal of Vision September 2021, Vol.21, 2741. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      J. Alex Harston, Chaiyawan Auepanwiriyakul, Aldo Faisal; Prediction of visual attention in embodied real-world tasks. Journal of Vision 2021;21(9):2741.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

Understanding how we dynamically distribute our visual attention in real-world environments remains a hard computational problem, due to the complexity of the relationship between natural visual behaviour, task and motor actions. Shifting focus from classical static gaze behaviour experiments, where subjects perform reductionist, head-fixed screen-viewing tasks in front of monitors, to a more dynamic view of real-world tasks requires taking into account ecologically salient features like motion cues, optic flow, and gaze object semantics. To address this, healthy right-handed subjects (n=9, aged 22-28) were given a common neurorehabilitative rubric, and asked to perform a hierarchical, goal-directed task (cooking a predefined meal), whilst wearing synchronised wireless head-mounted eye-tracking glasses and a 66 degree-of-freedom full-body motion tracking suit. We developed three deep autoregressive models for modelling this gaze/body kinematics relationship in PyTorch, using 1) prior gaze alone to predict future gaze direction, 2) a model combining prior gaze and body kinematics as exogenous input to predict gaze, and 3) a linear regression model from body kinematics to gaze. Model training was validated in both open loop with known current gaze position, and closed loop using progressively more predicted values in a sliding window for multi-step-ahead prediction, with leave-one-section-out and leave-one-subject-out cross-validation for in- and cross-subject comparison. Predictive performance averaged 0.08 RMSE (open-loop), and 0.06 RMSE at ~2s (closed-loop). Through our model comparisons we found that incorporating body dynamics minimised gaze prediction error over longer rollahead timescales of 3s than using just prior gaze data alone. In real-world tasks we can utilise body kinematics to predict gaze on a second-by-second basis, without any visual context. This combination of bottom-up visually-derived saliency and top-down task-centric kinematics-derived saliency may improve further prediction performance and allow us to uncover the mechanisms of interaction.


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.