Abstract
Understanding how we dynamically distribute our visual attention in real-world environments remains a hard computational problem, due to the complexity of the relationship between natural visual behaviour, task and motor actions. Shifting focus from classical static gaze behaviour experiments, where subjects perform reductionist, head-fixed screen-viewing tasks in front of monitors, to a more dynamic view of real-world tasks requires taking into account ecologically salient features like motion cues, optic flow, and gaze object semantics. To address this, healthy right-handed subjects (n=9, aged 22-28) were given a common neurorehabilitative rubric, and asked to perform a hierarchical, goal-directed task (cooking a predefined meal), whilst wearing synchronised wireless head-mounted eye-tracking glasses and a 66 degree-of-freedom full-body motion tracking suit. We developed three deep autoregressive models for modelling this gaze/body kinematics relationship in PyTorch, using 1) prior gaze alone to predict future gaze direction, 2) a model combining prior gaze and body kinematics as exogenous input to predict gaze, and 3) a linear regression model from body kinematics to gaze. Model training was validated in both open loop with known current gaze position, and closed loop using progressively more predicted values in a sliding window for multi-step-ahead prediction, with leave-one-section-out and leave-one-subject-out cross-validation for in- and cross-subject comparison. Predictive performance averaged 0.08 RMSE (open-loop), and 0.06 RMSE at ~2s (closed-loop). Through our model comparisons we found that incorporating body dynamics minimised gaze prediction error over longer rollahead timescales of 3s than using just prior gaze data alone. In real-world tasks we can utilise body kinematics to predict gaze on a second-by-second basis, without any visual context. This combination of bottom-up visually-derived saliency and top-down task-centric kinematics-derived saliency may improve further prediction performance and allow us to uncover the mechanisms of interaction.