Abstract
Driving requires us to represent visual information about our environment under time pressure, but how long do we need to examine the road ahead of us to detect, localize and evade road hazards? To answer these questions, participants performed a series of three tasks (n=24, licensed drivers) using dash camera video of real road scenes across two spatial scale conditions (immersive [78º x 44º] vs laptop-scale [26º x 14.7º]). The detection task asked participants to report whether they detected a road hazard (an event requiring an immediate response), and the evasion task asked them to choose whether they would steer left or right to evade it. Responses from each task were used to determine viewing duration thresholds, which were longer for the evasion task compared to the detection task (370 vs 220 ms, p = .006), but not significantly impacted by scale condition (p = .10). In the third task, a localization task, participants clicked where they believed the hazard to be in the video after it was presented (video duration: 33 ms – 600 ms). Measuring localization error (distance from the annotated center of the hazard) across duration revealed above-chance localization performance with very brief video durations (67 ms). However, localization error continued to drop with longer viewing durations, beginning to reach an asymptotic level at durations similar to thresholds from the detection task (233 ms). Localization performance was also unaffected by stimulus scale. Together, these results suggest that drivers have an adequate but imperfect sense of hazard location at viewing durations which are only sufficient for detection, but that planning evasive action requires a more precise spatial representation of the hazard, suggesting that they refine their representation of the dynamic scene to better inform action.