Abstract
Natural gaze behaviour is highly context-driven, controlled both by bottom-up (external visual stimuli) and top-down saliency (task goals and affordances). Whilst quantitative descriptions are easily obtainable in highly constrained tasks, ecological validity is an issue as gaze behaviour is richer in natural environments with free locomotion, head and eye movements. Whilst much work has been done on qualitative description of these natural dynamics, quantitative analysis of the spatiotemporal characteristics of natural gaze behaviour has proven difficult, due to the time-intensive nature of manual object identification and labelling. To address this issue we present ‘Embodied Semantic Fovea’, a proof-of-concept system for real-time detection and semantic labelling of objects from a head-mounted eye-tracker’s field-of-view, complemented by SLAM-based reconstruction of the 3D scene (Li et Faisal, ECCV WS, 2018) and 3D gaze end-point estimation (Orlov et Faisal, ETRA 2018). Our system reconstructs the entire visual world from surface elements (surfels), allowing us to understand not just how gaze moves across specific surfels, but also how surfels aggregate for a given object class. With this it is possible to track eye movements that land on an object, even as the wearer moves freely around the object, allowing us to understand eye movements from perspectives that are usually not possible (e.g. understanding that we are engaging with the same tree but from opposing directions). This allows systematic measurement of real-world scenario inter-object affordances for people in freely-moving environments. Using pixel-level object labelling with depth estimation from an egocentric scene camera, with simultaneous pupil tracking, we can superimpose gaze vectors over 3D-reconstructed object surfaces, providing the first three-dimensional object gaze reconstruction in freely-moving environments. We thereby produce ‘object maps’ of objects’ locations in relation to other objects, and understand how gaze moves between them, allowing true interrogation of three-dimensional gaze behaviour in freely-moving environments.