We found that the pure-pursuit and proportional controller models performed similarly, despite relying upon different inputs (only the pure-pursuit controller made use of depth/eye height information). Distinguishing between the performance of these types of steering models may require going beyond the relatively simple steering tasks used in the present experiments. Future work will need to test the models further to see whether they can still steer-by-gaze in more challenging situations such as when speeds vary, tracks have obstacles to avoid, or there are intersections/multiple paths to choose from. Steering models with additional parameters and/or sensory information may well be required to capture human behavior in these circumstances.
We observe that while the gaze-guided models produce significant variation in the trajectories, that variation does not correlate well with the variation in the trajectories of the human drivers. In other words, while the variance in gaze produces variance in the trajectories, the present models do not capture the variance in human behavior. There are several potential (not mutually exclusive) explanations for this mismatch, which we cannot presently tease apart but could guide future modeling efforts:
Capturing such additional sources of variation (individual differences in gaze strategy, full retinal input, memory of the track layout) would be desirable for more sophisticated modeling of steering control mechanisms in humans.
While the performance of the models was largely successful, it should be noted that they did occasionally fail. On detailed inspection, many of these failures appear to be related to periods when the gaze input falls near the horizon, which (due to foreshortening in the optical projection) results in the steering points being projected extremely far into the distance (or even behind the driver if the gaze is above the horizon). Despite every effort being taken to ensure well-calibrated gaze data, some of these observations could be the result of bias in the calibration of the eye tracker. It is also possible that occasionally, the participants genuinely looked into the distance and not where they were going. The models had no allowance for the occasional decoupling of steering from gaze, while in reality, humans do seem to be able to do this.
Some assumptions have been made in the modeling approaches presented here that could be challenged. The present investigation was based on the following theoretical premises: If the oculomotor system maintains information about the current direction of gaze relative to the direction of locomotor heading, and possibly depth information about the current distance to the point of fixation (by whatever means: perceptual, cognitive, motor), and if this information was passed to steering control, then this information could be useful as “efference copy” to perform steering by gaze. Even determining the visual angle of gaze relative to heading (horizontally, parallel to the plane of travel) may not be trivial, especially during natural locomotion (e.g., walking or running) where the eye, head, body, and vehicle may all point in different directions (and there may be no convenient visual reference like the horizon). Whether the model assumptions hold—that is, whether the oculomotor system in fact retains such gaze-vector-to-heading, or distance-to-point-of-fixation (or equivalently gaze-to-horizontal angle plus eye height) information, and whether such information is passed to and used by locomotor control systems—remains beyond the scope of this article and should be independently investigated. Also, we do not of course claim that even if these assumptions were justified, other information would not also be passed to steering control (i.e., that humans would steer-by-gaze only).
The two steering-by-gaze control strategies that we investigated were the simplest effective ones (with fewest parameters) that we found: using egocentric gaze direction for proportional control and using egocentric direction and distance of the point of fixation for pure-pursuit control. The rationale was that the simpler the controller, the more the simulations would be a test of the underlying control signal. It is worth emphasizing that there is no suggestion that these models reflect the actual processing performed by the human brain (and would most likely fail in more complex task settings), but we do hope they are at least somewhat cognitively plausible from the perspective that their control inputs are derived directly from where the human drivers look. In the future, more sophisticated steering control strategies (e.g., model predictive control) could be implemented with an aim to increase fidelity as well as generalizability.
We tested two different controllers to ensure some degree of robustness to the investigation and to determine whether steering by gaze control would be sensitive to the specifics of the control law or its parameterization. We found that both models were able to steer through the S-bends (Experiment 2) using parameters estimated from the slalom task (Experiment 1). Note, however, that we used fairly straightforward steering tasks—and generalization is from the arguably more challenging slalom task to the easier (though longer) S-bend task. More complex environments would likely challenge these simple models and therefore could necessitate more sophisticated control models that take into consideration the fusion of signals from multiple information sources. And even if the models are able steer via gaze, it of course does not mean that human drivers necessarily do so, only that the information or some of the information required for steering in principle is there.
Future models should look into multisensory integration in both gaze control and steering. Note that the possible use of additional sources of information should not necessarily be considered mutually exclusive, and each might play a role to different extents depending on the strength, quality, and variability of other sources, making the “true” steering mechanisms difficult to disentangle when examining steering and gaze behavior. It might be valuable to also account for possible differences in timing between gaze and steering in a more sophisticated manner beyond simple smoothing.
In addition to using the measured gaze data as inputs, we did also use optically specified waypoint markers along the track (which were visible to the participants and visually designated the path they were meant to follow) as control inputs. This was in order to see how well the models performed in principle when given the “best possible” path information (which is the usual type of input they are designed for, i.e., the optical input, steering-by-what-is-there compared to steering-by-gaze).