Abstract
Very little is known about how visual information is used to guide foot placement during natural locomotion. Previous work studying the role of vision in locomotion has largely relied on constrained environments like treadmills or visual stimuli presented to head-fixed subjects. While these types of experimental controls are helpful for understanding such a complex behavior, there is also much to be learned from studying the behavior as it unfolds naturally. Using an apparatus that combines mobile eye tracking (Pupil Labs), IMU based motion capture (Motion Shadow), and photogrammetric software (Meshroom), we collected a novel dataset that features measurements of gaze direction, body position, and 3D environmental structure as subjects navigate across various outdoor terrains. The dataset is spatially and temporally aligned, so that the gaze direction, body position, and environmental structure information is all in the same coordinate frame, represented as a triangle mesh or a point cloud. Use of Meshroom on the Pupil Labs scene camera images allows correction for IMU drift by fixing the scene relative to the head. The median distance between foothold estimates across 12 different walks and subjects is 3cm. This is much more accurate than previous estimates which assumed a flat ground plane. Using this method, we have examined the distribution of terrain smoothness at locations where subjects fixated, and placed their feet. Differences in smoothness statistics between foothold locations and comparable control locations are modest. However, a convolutional neural network (CNN) trained to distinguish an actual from pseudo randomly selected foothold locations can do so at 65% accuracy. This suggests that there are indeed visual features that differentiate suitable from unsuitable footholds, but the differences are small. Since we found high replicability between paths chosen for different walks and different subjects, a stronger determinant of where subjects walk may be path feasibility.