Abstract
The visuomotor system is critical to every interaction we make with our environments, and yet much remains unknown, even superficially, about the natural dynamics and mechanisms of freely-moving visuomotor behaviour. Visual and motor behaviours are tightly spatiotemporally coupled (Hayhoe & Ballard, 2005; Land, 2006), yet few studies of freely-moving gaze dynamics account for dynamics of the body, thereby ignoring a fundamental component of the perception-action loop. To address this we use whole body motion capture suits (XSENS, 60Hz, 65 DOF) in tandem with time-synced eye-tracking glasses (120Hz, SMI) on freely-moving subjects during urban mobility in public spaces, and whilst performing high-dimensionality natural tasks such as cooking, thereby capturing a wide range of motion and frequent object interaction. We obtained 30+ hours of egocentric video and eye-tracking data with synced whole body motion across a range of natural tasks to investigate the body-gaze relationship. By building deep learned vector autoregressive models with exogenous input (DeepVARX), using body kinematics as an exogenous control signal, we show that gaze autoregression with exogenous body kinematics input is a better predictor of future gaze than simple gaze autoregression alone (past gaze dynamics predicting future gaze dynamics). Regression from body dynamics to gaze also gives higher prediction rates (0.867AUC on the ROC Curve) than gaze autoregression alone (a more advanced form of central fixation bias). We find up to 60% of the variance in eye movements can be explained using non-linear body dynamics alone in a context-naive manner, demonstrating the high predictive power body kinematics hold, even before taking the critical element of task or environmental context into account - this is because ultimately body kinematics carry task information in an implicit form.