Abstract
Autonomous driving agents deal with a complex skilled task for which humans train for a long period of time, relying heavily on their sensory, cognitive, situational intelligence, and motor skills developed over years of their life. Autonomous driving is focused on end-to-end learning of driving commands, however, perception and understanding of the environment remains the most critical challenge when evaluating situations. This is even more relevant in urban scenes, as they contain various distractions acting as visual noise that hinder the agent from understanding the situation correctly. Humans develop the skill of visual focus and identification of task-relevant objects from an early age. Information extracted from human gaze relevant to environment context can help the agent with this perception problem, injecting a wealth of information about human decision-making behaviour and helping agents focus on task-relevant features and ignore irrelevant information. We combine human gaze and features of task-relevant instances to enhance perception systems for autonomous driving. We use a virtual reality headset with built-in eye-trackers for participants (n=9) to use, providing us with human driving gaze data. Based on this, we build a human driving visual attention predictor. Our integrated object detector identifies relevant instances on the road while human visual attention prediction determines which objects are most relevant to the human driving policy. We present the results of using this architecture with imitation learning and reinforcement learning autonomous driving agents, and compare them with baseline end-to-end methods, showing improved performance, 28% in imitation and 11% in reinforcement learning, accelerated training and explainable behaviour with our approach. Our results highlight the potential of human in-the-loop approaches for autonomous systems which, as opposed to end-to-end approaches, allow us to make use of human skills in creating AI that closes the loop by augmenting humans in an efficient and explainable manner.