Free
Article  |   February 2012
Head and eye gaze dynamics during visual attention shifts in complex environments
Author Affiliations
Journal of Vision February 2012, Vol.12, 9. doi:https://doi.org/10.1167/12.2.9
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Anup Doshi, Mohan M. Trivedi; Head and eye gaze dynamics during visual attention shifts in complex environments. Journal of Vision 2012;12(2):9. https://doi.org/10.1167/12.2.9.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

The dynamics of overt visual attention shifts evoke certain patterns of responses in eye and head movements. In this work, we detail novel findings regarding the interaction of eye gaze and head pose under various attention-switching conditions in complex environments and safety critical tasks such as driving. In particular, we find that sudden, bottom-up visual cues in the periphery evoke a different pattern of eye–head movement latencies as opposed to those during top-down, task-oriented attention shifts. In laboratory vehicle simulator experiments, a unique and significant (p < 0.05) pattern of preparatory head motions, prior to the gaze saccade, emerges in the top-down case. This finding is validated in qualitative analysis of naturalistic real-world driving data. These results demonstrate that measurements of eye–head dynamics are useful data for detecting driver distractions, as well as in classifying human attentive states in time and safety critical tasks.

Introduction
Analysis and understanding of human behavior, particularly of head and eye gaze behavior, has been a subject of interest for many years (Dodge, 1921) in cognitive psychology and neurophysiology; yet a full understanding of the causes, dynamics, and control mechanisms of head and eye movements is still a subject of active research (Freedman, 2008). More recently, researchers in the computer vision and artificial intelligence community have increasingly sought to incorporate information gleaned from patterns of human behaviors into intelligent human–machine interfaces (HMIs; Pentland, 2007; Trivedi, Gandhi, & McCall, 2007). The confluence of these two research paradigms allows for a deeper understanding of fundamental human cognition and behavior, as well as how to interpret and modify that behavior, if necessary. 
Active perception of human operators by “intelligent environments” can allow assistance systems to improve performance and even help avoid dangerous circumstances. Every year, traffic accidents result in over one million fatalities worldwide (Peden et al., 2004) and around 40,000 in the U.S. alone. Of those, an estimated 26,000 are due to some form of driver inattention (Angell et al., 2006; S. E. Lee, Olsen, & Wierwille, 2004). Recent advances in active vision and machine intelligence have resulted in incorporation of camera-based driver analysis into driver assistance systems (Trivedi & Cheng, 2007; Trivedi et al., 2007), to predict future behaviors or inattention, and thereby counteract poor driving behavior. By detecting patterns of body language in critical situations, such as the time prior to a lane change, these systems are able to predict the context of the situation. 
In other interactive environments such as intelligent command-and-control centers or intelligent meeting rooms, systems monitoring the participants or operators could provide assistance based on the subjects' body language. This may help reduce distractions and help improve performance of whatever task is being performed. Landry, Sheridan, and Yufik (2001) discovered that certain patterns, or “gestalts,” of aircraft on a radar screen drew the attention of the air traffic controllers due to their location, though they were not relevant to the task. An air traffic control training manual from the FAA (Cardosi, 1999) states that “even in low-workload conditions, distractions can clobber short-term or working memory.” An assistance system could mitigate such dangerous situations by detecting the context of the attention shift and providing warnings when the attention shift is not task related. 
Those intelligent environments that must assist humans in time and safety critical situations would clearly benefit from knowledge of the human's cognitive state. Such information could allow the system to detect, for example, whether the driver is distracted or actively engaged in a relevant task. Information about salient properties of the environment (Itti & Koch, 2000) could help understand attention; however, such systems are not good at predicting the actual focus of attention (Ballard & Hayhoe, 2009), nor do they give insight into the cognitive state of the driver. 
Studies into the dynamics of head and eye gaze behavior tend to hint that attention is linked to the types of movements of head and eye. A number of studies have shown that when presented with a stimulus in the field of view, the head tends to lag behind the eye during the gaze shift (Bartz, 1966; Goossens & Opstal, 1997). Others have found that early head motions, with respect to the saccade, are associated with target predictability, location, propensity to move the head, and timing (Freedman, 2008; Fuller, 1992; Khan, Blohm, McPeek, & Lefèvre, 2009; Morasso, Sandini, Tagliasco, & Zaccaria, 1977; Ron, Berthoz, & Gur, 1993; Zangemeister & Stark, 1982). These studies occurred in very controlled, arguably unnatural environments; more recent studies into “natural” tasks involving grasping, tapping, or sitting in a lecture (Doshi & Trivedi, 2009a; Herst, Epelboim, & Steinman, 2001; Mennie, Hayhoe, & Sullivan, 2007; Pelz, Hayhoe, & Loeber, 2001) have suggested that task-oriented gaze shifts may be associated with early head motions. These studies suggest the hypothesis that the interactive dynamics of head and eye movements may encode information about the type of attention shift, and thus the cognitive state, of the human subject. 
These preliminary investigations led to two related questions in more complex environments such as a driving scenario: 
1. Is it possible to observe differences in human eye–head dynamics during different styles of attention shifts? 
2. Is it possible to extract and use those cues to help identify or learn information about the subject's cognitive state? 
In the following sections, we detail novel findings regarding the interaction of eye gaze and head pose dynamics under various attention-switching conditions in more complex environments and while engaged in safety critical tasks such as driving. In particular, we find that sudden, bottom-up visual cues in the periphery evoke a different pattern of eye–head yaw movement latencies as opposed to those during top-down, task-oriented attention shifts. In laboratory vehicle simulator experiments, a unique and significant (p < 0.05) pattern of preparatory head motions, prior to the gaze saccade, emerges in the top-down case. In contrast to the findings of Land (1992), we also observe the early head motion patterns during task-oriented behavior in real naturalistic driving scenarios. 
We ultimately aim to understand whether one can detect whether the human subject had prior knowledge of the scene or the task upon which they are focusing. That contextual knowledge may be useful to real-time interactive assistance systems. Specifically, it could provide semantic hints such as whether there is a distraction or unplanned stimulus in a scene, or what kinds of goals the human subjects are pursuing. This finding is validated in qualitative analysis of naturalistic real-world driving data. These results show that measurements of eye–head dynamics are useful data for detecting driver distractions, as well as in classifying human attentive states in time and safety critical environments. 
Related research
Human behavior analysis and prediction
A great amount of recent research has been in the detection of human behaviors. Many of these examples use patterns of human behavior to learn about the scene or predict future behaviors (Oliver, Rosario, & Pentland, 2000; Pentland & Liu, 1999; Ryoo & Aggarwal, 2007). In considering time and safety critical situations such as driving scenarios, we present an overview of research related to behavior analysis and prediction in vehicles. 
Driver behavior and intent prediction
Recent research has incorporated sensors looking inside the vehicle to observe driver behavior and infer intent (Trivedi & Cheng, 2007; Trivedi et al., 2007). Bayesian learning has been used to interpret various cues and predict maneuvers including lane changes, intersection turns, and brake assistance systems (Cheng & Trivedi, 2006; McCall & Trivedi, 2007; McCall, Wipf, Trivedi, & Rao, 2007). More recently, head motion has been shown to be a more useful cue than eye gaze for discerning lane change intentions (Doshi & Trivedi, 2009b). 
The assumption made in all of these systems has been that head motion or eye gaze is a proxy for visual attention. In other words, the system tries to measure head motion given that the driver is likely paying attention to whatever they are looking at, with the assumption that attention is associated with gaze. 
Gaze behavior and visual search
A gaze shift may or may not be associated with a particular goal. The broad question of “why people look where they look” is a subject of research for cognitive psychologists, neuroscientists, and computer vision researchers alike. 
A significant amount of research in psychology has examined whether such visual searches are guided by goals (such as the goal of changing lanes) or by external stimuli (Itti & Koch, 2000; Jovancevic, Sullivan, & Hayhoe, 2006; Navalpakkam & Itti, 2005; Peters & Itti, 2006; Rothkopf, Ballard, & Hayhoe, 2007). These stimuli may include visual distractions, which could pop up in a scene and thereby attract the attention of the observer. For the most part, any visual search is presumed to be guided by some combination of a goal- and stimulus-driven approach, depending on the situation (Yantis, 1998). 
Itti et al. (Itti & Koch, 2000; Navalpakkam & Itti, 2005; Peters & Itti, 2006) and others (Mahadevan & Vasconcelos, 2010; Zhang, Tong, Marks, Shan, & Cottrell, 2008) have made inroads in developing saliency maps that model and predict focus of attention on a visual scene. Initial models were based on “bottom-up” cues, such as edge and color features. However, it was found that “top-down” cues may be more influential; in other words, the context of the scene and the goals of the human subject are crucial to determining where they look. Jovancevic et al. (2006) determined that even in the presence of potentially dangerous distractions, in complex environments gaze is tightly coupled with the task. Several other works have similarly concluded that in natural environments, saliency does not account for gaze, but task and context determine gaze behavior (Henderson, Brockmole, Castelhano, & Mack, 2007; Land, 1992; Rothkopf et al., 2007; Zelinsky, Zhang, Yu, Chen, & Samaras, 2005). 
It is clear that in certain critical environments, distractions play an important role in attracting attention. Carmi and Itti (2006) show that dynamic visual cues play a causal role in attracting visual attention. In fact, perceptual decisions after a visual search are driven not only by visual information at the point of eye fixation but also by attended information in the visual periphery (Eckstein, Caspi, Beutter, & Pham, 2004). In certain cases, these stimuli may affect primary task performance; Landry et al. (2001) found that certain unrelated gestalt motion patterns on radar screens drew the attention of air traffic controllers away from the task at hand. In the driving context, there are many well-known cognitive and visual distractions that can draw the driver's attention (Angell et al., 2006; Doshi, Cheng, & Trivedi, 2009). Recarte and Nunes (2000) measured the number of glances to the mirror during a lane change, noting that visual distractions decrease the glance durations by 70–85%. This result is well aligned with more recent results indicating the limitations of drivers' multitasking abilities (Levy & Pashler, 2008; Levy, Pashler, & Boer, 2006). Moreover, some suggest that visual distractions may even increase likelihood of “change blindness,” a phenomena whereby a subject may look in a certain area and not see or comprehend the objects in front of them (Durlach, 2004; Y. C. Lee, Lee, & Boyle, 2007). In these cases, it would be useful to know whether a gaze shift is attributable more to the primary goal or context (e.g., of maintaining the vehicle's heading in the lane) or to a secondary and potentially irrelevant visual stimuli (e.g., a flashing billboard). 
Several studies have used eye gaze or head pose to detect the attention of the subject (Batista, 2005) or estimate user state or gestures (Asteriadis, Tzouveli, Karpouzis, & Kollias, 2009; Matsumoto, Ogasawara, & Zelinsky, 2000). In this study, we instead use the temporal relationship of eye gaze and head pose to determine the attentional state of the subject and proceed to use that information as contextual input to event detection and criticality assessment systems. In the following section, we describe the framework of eye gaze interactivity analysis, and in the later sections, we describe some supporting experiments. 
Attention shifts
Attention shifts can be of two kinds or some combination of the two (Yantis, 1998). Top-down attention shifts occur when the observer has a particular task in mind. This task may necessitate a visual search of a potentially predetermined location or a search of likely relevant locations. An example may be as simple as shifting one's attention from a television to a newspaper, after having turned the television off. There may also be learned tasks, such as the search for oncoming cars when crossing a road, as a driver in a vehicle or as a pedestrian at a crosswalk. 
Bottom-up attention shifts are caused by interesting stimuli in the environment. These stimuli may include distractions or salient regions of a scene. For example, flashing police lights on the highway may draw a driver's attention, or an audience member may be drawn to an instant chat message popping up on the screen during a technical presentation. 
Generally speaking, Pashler, Johnston, and Ruthruff (2001) observe that “whereas exogenous attention control is characterized as stimulus driven, endogenous control is typically characterized as cognitively driven.” Others have encoded such notions into computational models of attention (Mozer & Sitton, 1998). 
In many cases, an object in the scene may easily be a distraction in one instance and part of the primary task at hand at another time. For example, in a classroom, the person standing in front of the blackboard may be the teacher during a lesson, to whom the student should be paying attention. On the other hand, the person in front of the blackboard may just be someone walking by, who is distracting the student from the task at hand. By classifying the student's interactive behaviors, we may be able to set the context for the attention shift and understand the state of the subject, whether cognitively driven (top-down) or stimulus driven (bottom-up). 
Head and eye gaze during attention shifts
Zangemeister and Stark (1982) performed a controlled study of eye–head interactions and posited various conditions of different styles of eye–head movements. In their paper, they found several styles of movements, depicted in Figure 1. Among the most pertinent of movements are those labeled “Type III,” which include early or anticipatory head movement with respect to the gaze shift. They theorized that this behavior is associated with a repetitive, predetermined, or premeditated attentional shift, as is the case for any goal-directed attentional shift. 
Figure 1
 
Examples of various interactions of head and eye movements, with type labels from Zangemeister and Stark (1982). Note that in certain cases eye gaze tends to move first, where in others the head tends to move first.
Figure 1
 
Examples of various interactions of head and eye movements, with type labels from Zangemeister and Stark (1982). Note that in certain cases eye gaze tends to move first, where in others the head tends to move first.
Morasso et al. (1977) examined control strategies in the eye–head system and observed that “The head response [to a visual target], which for random stimuli lags slightly behind the eyes, anticipates instead for periodical movements [of the target].” The implication is once again that for trained or predetermined gaze shifts, the head movement anticipates the gaze. 
Indeed, eye shifts can occur much faster than head movements, and thus, near-instantaneous gaze changes can be made with eye shifts alone. Shifts of larger amplitudes would necessitate a head movement to accommodate the limited field of view of the eyes. Most likely, when a large visual attentional shift is about to occur, and the observer has prior knowledge of the impending shift, these studies imply that there may be some amount of preparatory head motion (Pelz et al., 2001). 
Freedman (2008) and Fuller (1992) each included thorough reviews of eye–head coordination and verified the same results seen above. Several variables including initial eye, head, and target positions, along with predictability, seem to affect the latencies of head movements (Ron et al., 1993). A few studies have touched the possibility that attention and task can influence the dynamics of eye–head movements (Corneil & Munoz, 1999; Herst et al., 2001; Khan et al., 2009; Mennie, Hayhoe, Sullivan, & Walthew, 2003). These studies are well controlled in laboratory environments but limited in their generalizability to natural environments and more complicated tasks. 
In this study, we venture to examine directly the effects of goal- vs. stimulus-driven attentional shifts on eye–head coordination. Further, by classifying the type of shift, we are able to propose a novel model to determine the cognitive state of the subject, which may prove useful to assistive human–machine interfaces such as driver assistance systems. 
In the following sections, we show that by extracting the yaw dynamics of eye gaze and head pose, it may be possible to identify those gaze shifts that are associated with premeditated or task-oriented attentional shifts, driven by “endogenous” cues. In each case, we find that a majority of endogenous, task-related shifts occur with an anticipatory Type III gaze shift. Based on these results and the studies listed above, we might further hypothesize that a Type III gaze shift could imply a task-related shift, and Type I or II is more likely to occur in conjunction with stimulus-related gaze shifts, associated with “exogenous” cues. 
Methods
Participants
Ten volunteers consented to participate in the experiment, 9 males and 1 female. The participants were mostly in their 20s, with two in their 30s, one over 40, and the female subject near the median age of 25. Every subject had a valid driver's license, with varying driving experience level from novice to decades of driving experience. 
Apparatus
The experiment was conducted in a driving simulator, with several additional features to facilitate the various conditions of the experiment. The simulator was configured as shown in Figure 2. A 52″ screen was placed 3 ft in front of the subject, with a Logitech steering wheel mounted at a comfortable position near the driver. An additional secondary screen was placed to the left of the main monitor, in the peripheral vision of the subject. 
Figure 2
 
Experimental setup of LISA-S test bed. The test bed includes a PC-based driving simulator, with graphics shown on a 52-inch monitor, an audio, and a steering wheel controller. The PC also controls a secondary monitor, located at an angle of 55° with respect to the driver, in a similar location to where the side-view mirror would be. The test bed includes a head and eye gaze tracking system, as well as a vision-based upper body tracking system (not used in this experiment). All the data from the gaze and body trackers are recorded synchronously with steering data and other parameters from the driving simulator.
Figure 2
 
Experimental setup of LISA-S test bed. The test bed includes a PC-based driving simulator, with graphics shown on a 52-inch monitor, an audio, and a steering wheel controller. The PC also controls a secondary monitor, located at an angle of 55° with respect to the driver, in a similar location to where the side-view mirror would be. The test bed includes a head and eye gaze tracking system, as well as a vision-based upper body tracking system (not used in this experiment). All the data from the gaze and body trackers are recorded synchronously with steering data and other parameters from the driving simulator.
The main monitor was configured to show a PC-based interactive open source “racing” simulator, TORCS (TORCS, The Open Racing Car Simulator, 2010). The software was modified in several ways to make it a more appropriate driving simulator. The test track was chosen to be a two-lane highway/main road running through a city. Several turns on the driving track necessitated a significant slowdown in speed to maneuver through, keeping the driver actively engaged in the driving task. Additionally, the maximum speed of the “vehicle” was limited to appropriate highway speeds, in order to discourage excessive speeding. 
In this experiment, the track contained no other vehicles, to limit the complexity of interactions. The ego-vehicle and road parameters such as friction, vehicle weight, and others were fixed to approach real-world conditions. However, they were also constrained in order to facilitate an easy learning process, as some subjects had never used driving simulators before. The tire friction was increased to prevent sliding, and the limiting of maximum speed also allowed drivers to quickly adjust to the simulator. 
A stereo-camera-based non-intrusive commercial eye gaze tracker, faceLAB by Seeing Machines (Seeing Machines faceLAB Head and Eye Tracker, 2004), was set up just in front of and below the main monitor. It was appropriately placed not to obstruct the field of view of the driver. The system required calibration for each subject, which was performed prior to any data collection. Once calibrated, it output a real-time estimate (updated at 60 Hz) of gaze location (pitch and yaw) and head rotation (pitch, yaw, and roll). These quantities were calculated and transmitted concurrently in “Minimum Latency” mode to the PC running the driving simulator, along with a “validity” signal indicating confidence in the tracking (a lack of which would be caused by blinks and occlusions). The gaze and head data, along with all the driving parameters such as distance traveled and lateral position, were automatically timestamped and logged to disk every 10 ms, without any filtering or smoothing operations. 
The secondary monitor showed various text messages as described below, depending on the condition. This display was controlled synchronously by the same PC that ran the simulator. 
Design
Each subject was run in two conditions, endogenous and exogenous. In each condition, the subject had a primary task of maintaining their lane heading and, as part of a secondary task, was cued in a different way to check the secondary monitor to find out whether to change lanes. These cues are demonstrated in an illustrative example in Figure 3
Figure 3
 
Illustrative (staged) example of the experimental paradigm. In each cuing condition, we measure the differences in eye–head interactions during attention shifts to a secondary monitor, which the driver is required to check for instructions. In the “endogenous” condition, the driver is presented with a cue in the primary monitor and allowed to make a goal-oriented or preplanned attention shift. In the stimulus-oriented “exogenous” cuing condition, the secondary monitor displays a sudden change in color, drawing the driver's attention to the target.
Figure 3
 
Illustrative (staged) example of the experimental paradigm. In each cuing condition, we measure the differences in eye–head interactions during attention shifts to a secondary monitor, which the driver is required to check for instructions. In the “endogenous” condition, the driver is presented with a cue in the primary monitor and allowed to make a goal-oriented or preplanned attention shift. In the stimulus-oriented “exogenous” cuing condition, the secondary monitor displays a sudden change in color, drawing the driver's attention to the target.
The “endogenous” condition (Condition 1) of the experiment was intended to stimulate “goal-oriented” attention switching, such as the planned visual search of a driver checking mirrors prior to a lane change. Large overhead signs appeared at several constant, predefined locations around the track. The subject was instructed that upon noticing these signs coming up in the distance, the subject should glance over at the secondary monitor. This monitor would be displaying a message, “Left Lane” or “Right Lane,” indicating in which lane the driver should be. The subject was told to glance over to the message and move to the corresponding lane by the time the overhead sign passed by. The duration of the cue appearance varied from 5 to 20 s. In this manner, the subject was allowed time enough to plan and initiate the attention switch by herself; we label this as the “endogenous” cue condition (Condition 1). 
The second condition was designed to evoke an unplanned “stimulus-oriented” attention-switching response, as if the driver was suddenly distracted. In this condition, the driver was also told to maintain the lane as best as possible. The cue to change lanes would come from the secondary monitor, whose entire screen would change color suddenly, at a set of predefined times unknown to the subject. This color change occurred in concert with a potential change of the text message, once again to either “Left Lane” or “Right Lane.” Upon noticing the change, the driver was tasked with maneuvering to the appropriate lane as soon as was safe to do so. The subject was told that the colors were random, not correlated with the text, and would occur at random times. Thus, the subject's attention switch was hypothetically initiated by some external cue in the peripheral field of view; we label this as the “exogenous” cue condition (Condition 2). 
The “exogenous” cue would ideally have been a stimulus not associated with the task of driving. However, it would have been difficult to collect enough data where the subject freely decided to attend to a potentially irrelevant stimulus. By associating the stimulus-style cue with the secondary task of lane selection, it became possible to gather consistent data about how the subject would respond to the stimulus-oriented cues such as sudden flashes, motion, or other unplanned distractions. This could then be directly compared to driver behaviors under the “endogenous” condition described above. 
Each condition consisted of 10 min of driving, corresponding to 12 to 15 lane changes per condition. The order of the conditions was presented randomly, and to ensure comfort, participants were offered breaks between conditions. 
Procedure
Participants were told that the experiment pertains to naturalistic lane-keeping behavior. After calibrating each subject on the gaze tracker, the subject was given 5–10 min to acclimatize herself to the simulator. She was queried once she felt comfortable with the simulator, and her comfort level was verified by subsequently asking her to keep the vehicle in a single lane for at least 60 s. 
For the remainder of the experiment, the subject was tasked primarily with maintaining her current lane to the best of her ability. This allowed the subject to be actively engaged in the driving process throughout the experiment. As part of a secondary task, the subject was instructed to respond to the cues in each condition, and if necessary, change lanes when safe to do so. This required a glance to the secondary screen, or “side mirror,” in order to decide which lane to move into. Though this was not entirely naturalistic, it was reminiscent of glancing at the mirror prior to lane changes to scan for obstacles, and participants had little difficulty following the instructions. 
Analysis
The automatically logged data sets from each experiment were then processed to analyze the dynamics of head and eye movements leading up to the attention shift. For each condition, the appearance time of the cue was determined and the subsequent gaze shift was determined to be an example of a shift of interest. The secondary monitor was fixed at an angle of approximately 55° from the subject, so only gaze shifts that resulted in glances of that magnitude were considered. 
In 34% of the cases in the entire experiment, cues that were not followed by a tracked gaze shift of sufficient magnitude appeared. This was caused either by a lack of response from the subject or a lack of tracking confidence from the head and eye tracker. Occasionally, the tracker would lose an accurate representation of the subject's head or eye movements, and this would be represented by a “validity” signal, output by the tracker. Whenever there was no valid gaze shift of sufficient magnitude following the appearance of a cue, the example was discarded from further analysis. 
Fewer than 5% of the examples in the “exogenous” condition were discarded due to the unforeseen effects of high cognitive load. Occasionally, a color change would appear while the driver was actively engaged in a sharp turn, either causing the driver to lose control of the vehicle or to shift their glance slightly without paying much attention to the color change. Where these effects were observed, the examples were discarded, as in these cases drivers behaved inconsistently, most likely due to a task overload. Such effects could be the topic of further investigations, but examples were too few to discuss in detail in this study. 
Each subject j had approximately 10 examples of each condition after the pruning step described above. For the remaining examples in both conditions, the time of the first gaze saccade to the secondary monitor was found in an iterative manner. This was done to avoid any false positives in an automatic saccade detection procedure, given the somewhat noisy nature of the gaze data. 
In the first step, the example was manually annotated as to the approximate time of the initiation of the gaze saccade, during the first gaze shift of approximately 55° in yaw (i.e., to the target monitor). Subsequently, the point of maximum yaw acceleration in a local 50-ms window W around the annotated point T L was found. This was calculated using a 5-tap derivative-of-Gaussian filter to temporally smooth and find the second derivative of the gaze rotation signal. The maximum point of the second derivative was fixed as the location of the gaze saccade: 
T i j S = a r g m a x t W ( e Y [ t ] ) ,
(1)
where e Y[t] represents the yaw position of the eye at time t. An example of this detection procedure can be seen in Figure 4; we found that the local point of maximum eye yaw acceleration was a consistent labeling method for the following analysis. 
Figure 4
 
Sample data showing the procedure for detection of the start of the eye saccade. A local area around a manually marked point is searched for the maximum eye yaw acceleration. This point is labeled as the start of the saccade, T ij S . The dynamics of the head yaw, i.e., position and motion, at this point T ij S are extracted for further analysis.
Figure 4
 
Sample data showing the procedure for detection of the start of the eye saccade. A local area around a manually marked point is searched for the maximum eye yaw acceleration. This point is labeled as the start of the saccade, T ij S . The dynamics of the head yaw, i.e., position and motion, at this point T ij S are extracted for further analysis.
Given the time of the gaze saccade, T ij S , two features based on head dynamics were calculated for each example i of subject j. The first was simply the value of the head yaw at the point of the saccade: 
P i j = h Y [ T i j S ] ,
(2)
where h Y[t] represents the yaw position of the head at time t
The second feature was the average yaw velocity in the 100 ms prior to the saccade: 
M i j = 1 10 t = T i j S 100 m s T i j S h Y [ t ] .
(3)
P ij and M ij thus represent the yaw position and motion features of each example i of subject j. These features were chosen to capture whether the head was moving in a preparatory motion prior to the saccade and by how much. 
Based on the features above, several additional statistics were calculated: 
P j 1 ˜ = m e d i a n i ( P i j | C o n d i t i o n 1 ) ,
(4)
 
P j 2 ˜ = m e d i a n i ( P i j | C o n d i t i o n 2 ) ,
(5)
 
M j 1 ˜ = m e d i a n i ( M i j | C o n d i t i o n 1 ) ,
(6)
 
M j 2 ˜ = m e d i a n i ( M i j | C o n d i t i o n 2 ) .
(7)
 
Results and discussion
Each trial was characterized by one main independent variable: the condition, either endogenous or exogenous. Two related variables were measured, namely, the head yaw P ij and head motion M ij at the point of the gaze saccade. 
The order of presentation was also randomized, but there is no interaction between order and condition (p > 0.5), so we collapse across the order groups. 
A main effect of the condition is found both in yaw (mean (P ij 1) = 6.68°, mean (P ij 2) = 1.54°, F(1,9) = 9.44, p < 0.05) and in motion (mean (M ij 1) = 41.36°/s, mean (M ij 2) = 22.12°/s, F(1,9) = 11.23, p < 0.05). Figure 5 shows the distribution of all the data at the point of the saccade, clearly demonstrating that the yaw and motion both tend to be greater at the point of saccade in the endogenous condition. 
Figure 5
 
Overall distribution of head yaw position and head yaw motion at the time of the eye gaze saccade for each condition, including all examples of all subjects.
Figure 5
 
Overall distribution of head yaw position and head yaw motion at the time of the eye gaze saccade for each condition, including all examples of all subjects.
Figure 6 shows the average head yaw over all the examples time aligned to the saccade location. The head yaw in Condition 1, shown in red, demonstrates a clear increase in head motion as much as 500 ms prior to the saccade, with the greatest motion occurring around 200 ms prior. Head yaw in Condition 2 begins clearly increasing in earnest only 50–100 ms prior to the saccade. 
Figure 6
 
Average head yaw prior to eye gaze saccade under each condition of the experiment, aligned to the position of the saccade. Dotted lines show the variance of the overall data. In Condition 1, a clear pattern of early head movement, beginning 0.5 s prior to the actual gaze shift, emerges. This early head movement is much less evident in Condition 2.
Figure 6
 
Average head yaw prior to eye gaze saccade under each condition of the experiment, aligned to the position of the saccade. Dotted lines show the variance of the overall data. In Condition 1, a clear pattern of early head movement, beginning 0.5 s prior to the actual gaze shift, emerges. This early head movement is much less evident in Condition 2.
Prior studies (Freedman, 2008; Pelz et al., 2001; Zangemeister & Stark, 1982) have noticed early head motions up to 200 ms in advance of the saccade. The results of these experiments are thus in mild agreement, although it is apparent that in preparation for task-oriented gaze shifts in the driving situation, the head movement may begin as much as 500 ms in advance of the saccade. 
To discount the effect of each subject's outliers, the subject-wise medians of these data,
P j [ 1 , 2 ] ˜
and
M j [ 1 , 2 ] ˜
, were also calculated to analyze in further detail. Given the limited data set size, the two pairs of statistics were then analyzed using the Wilcoxon Signed-Rank test, a non-parametric extension of the t test. This analysis showed significant differences in both cases;
P j 1 ˜
= 6.2° >
P j 2 ˜
= 1.1° (Z = 2.80, p = 0.0051 < 0.01) and
M j 1 ˜
= 45.3°/s >
M j 2 ˜
= 13.9°/s (Z = 2.50, p = 0.0125 < 0.02). Figure 7 shows the differences in the median statistics computed above, for
P j ˜
and
M j ˜
Figure 7
 
Distribution of subject-wise median head yaw position and head yaw motions at the time of the eye saccade. Error bars represent standard error of the median.
Figure 7
 
Distribution of subject-wise median head yaw position and head yaw motions at the time of the eye saccade. Error bars represent standard error of the median.
The endogenous cuing condition, corresponding to a “task-oriented” attention shift, demonstrates a clear pattern of greater and earlier head motion just prior to the saccade. 
Figure 8 demonstrates the relative timings of the gaze saccade after the appearance of the cue. The exogenous cue tends to attract attention very quickly, and most gaze shifts are made within 500 ms of the cue. This is in clear contrast to the endogenous cue, which, as expected, results in the subject shifting gaze in a preplanned manner. This separation shows that the experimental conditions elicited the two different styles of gaze shifts appropriately. 
Figure 8
 
Distribution of saccade timings, after the onset of the first cue. The delay in Condition 1 is as expected as subjects take time to detect the cue and plan the saccade. Condition 2 follows the pattern of unplanned saccades, mostly occurring around 500 ms after cue onset.
Figure 8
 
Distribution of saccade timings, after the onset of the first cue. The delay in Condition 1 is as expected as subjects take time to detect the cue and plan the saccade. Condition 2 follows the pattern of unplanned saccades, mostly occurring around 500 ms after cue onset.
In order to characterize the time course of the head movements with respect to the gaze shift, we can measure the timing of the first “significant” head motion. In Figure 9, the histograms of the first head motion (where head motion goes above a fixed manually selected threshold of 17°/s) is shown for each condition. This is found by searching in both directions from the point of the gaze saccade, to determine where the head motion first exceeds the threshold. The endogenous cuing condition can be observed eliciting a greater portion of early head motions. 
Figure 9
 
Distribution of first significant head motions (over a fixed threshold) relative to the gaze saccade. The histograms represent the actual measurements, and the solid lines represent a fitted Gaussian. The endogenous, “goal-oriented” condition shows a marked difference, with a majority of head motions occurring prior to the saccade.
Figure 9
 
Distribution of first significant head motions (over a fixed threshold) relative to the gaze saccade. The histograms represent the actual measurements, and the solid lines represent a fitted Gaussian. The endogenous, “goal-oriented” condition shows a marked difference, with a majority of head motions occurring prior to the saccade.
Kinematics of eye and head movements
Here, we measure a number of other variables that could have influenced the onset of early head motion due to the kinematics of the eye–head motion. 
Prior studies in controlled conditions have determined that the amplitude of the gaze shift had a significant impact on the amount of head movement prior to the saccade (Freedman, 2008), as larger movements tended to correlate with early head motions. Another critical variable was found to be the predictability of the target location, which also positively influenced the early movement of the head. However, in this study, we have fixed the target location, and the amplitude as determined by initial position is generally constant as well. To verify this, we compared the starting yaw positions in Figure 10 and found no reliable differences in starting position (EyeInitialYaw: F(1,9) = 2.84, p = 0.09; HeadInitialYaw: F(1,9) = 0.95, p = 0.33). The subject is always aware of the location of the target, in both the endogenous and exogenous cuing cases. In spite of the constant target and shift amplitude, we still find variations in the amount of early head movement, correlating with the type of cuing condition, whether driven endogenously by top-down motor commands or exogenously by bottom-up stimulus responses. 
Figure 10
 
Starting yaw position for eye and head motion under each condition. Error bars represent the standard error of the mean.
Figure 10
 
Starting yaw position for eye and head motion under each condition. Error bars represent the standard error of the mean.
Bizzi, Kalil, and Morasso (1972) also report in controlled environments that predictive movement conditions include lower peak velocities and movement durations than triggered conditions. To analyze these effects, we measured the peak velocities and saccade durations for both eye and head movements in Figures 11 and 12. In contrast to the earlier studies, the peak eye velocities in the endogenous condition actually trend toward being significantly greater than in the exogenous condition (F(1,9) = 5.16, p = 0.02). In all other cases, there were no significant differences (PeakHeadVelocity: F(1,9) = 0.14, p = 0.71; EyeMovementDuration: F(1,9) = 1.24, p = 0.26; HeadMovementDuration: F(1,9) = 1.10, p = 0.30). We are thus not able to observe slower peak velocities or movement durations in the endogenous cuing case. 
Figure 11
 
Maximum yaw rates for eye and head motion under each condition. Error bars represent the standard error of the mean.
Figure 11
 
Maximum yaw rates for eye and head motion under each condition. Error bars represent the standard error of the mean.
Figure 12
 
Duration of motion from initial movement until target. Note that, in case of eye motion, this is a superset of the saccade duration. Error bars represent the standard error of the mean.
Figure 12
 
Duration of motion from initial movement until target. Note that, in case of eye motion, this is a superset of the saccade duration. Error bars represent the standard error of the mean.
Finally, during predictive movements, it has been reported that the head contribution may be larger than during triggered movements (Freedman, 2008). If it were the case that the head contribution were larger in predictive movements, we would observe a greater maximum in the head yaw in the endogenous case. As demonstrated in Figure 13, we find no reliable differences between the maximum yaw positions of either the head or the eye, between the two conditions (EyeMaxYaw: F(1,9) = 2.12, p = 0.15; HeadMaxYaw: F(1,9) = 0.15, p = 0.70). 
Figure 13
 
Maximum yaw position for eye and head motion under each condition. Error bars represent the standard error of the mean.
Figure 13
 
Maximum yaw position for eye and head motion under each condition. Error bars represent the standard error of the mean.
Some of these results have been repeated in other controlled environments (Fuller, 1992; Khan et al., 2009; Zangemeister & Stark, 1982). In this more naturalistic study, we found almost no variations between the conditions. This implies that by approximating more natural conditions, along with a more complex task of driving, variables such as initial gaze position and gaze shift duration seem uncorrelated with either style of gaze shift. This could further imply that under these conditions, such variables are irrelevant to onset of early head motion, in contrast to the conclusions of earlier studies, such as those reviewed in Freedman (2008). However, more experiments should be done in these cases to verify these claims. 
Analysis of misclassifications
To analyze the misclassifications associated with each of the test statistics (either P ij or M ij ), we show confusion matrices in Tables 1 and 2. Results approach 66% detection rates using the head yaw position, with fewer misclassifications associated with predictions of exogenous, stimulus-oriented behavior. 
Table 1
 
Confusion matrix for detecting endogenous, goal-oriented gaze shifts (G) versus exogenous, stimulus-oriented shifts (S) in the simulator experiment, using head yaw position criteria. Correct classification rate is 69%.
Table 1
 
Confusion matrix for detecting endogenous, goal-oriented gaze shifts (G) versus exogenous, stimulus-oriented shifts (S) in the simulator experiment, using head yaw position criteria. Correct classification rate is 69%.
Number of examples
Actual G Actual S
Predicted G 63 41
Predicted S 25 87
Table 2
 
Confusion matrix for detecting endogenous, goal-oriented gaze shifts (G) versus exogenous, stimulus-oriented shifts (S) in the simulator experiment, using head yaw motion criteria. Correct classification rate is 66%.
Table 2
 
Confusion matrix for detecting endogenous, goal-oriented gaze shifts (G) versus exogenous, stimulus-oriented shifts (S) in the simulator experiment, using head yaw motion criteria. Correct classification rate is 66%.
Number of examples
Actual G Actual S
Predicted G 65 39
Predicted S 35 77
Given this information, it may be possible to generate a classifier to determine in real time whether the individual example is more similar to Condition 1 or Condition 2. The statistical basis for such a classifier is provided by the ANOVA results in the previous section, giving credence to the possibility of automatically classifying the examples. Such a classifier could then directly be used to improve advanced human–machine interfaces in task-oriented environments; we leave it to future work for analysis of such a classifier. 
The results in these prior sections serve to validate the hypothesis that humans exhibit different behaviors when attention switching in various contexts. Of importance is the attention switch in time and safety critical situations such as driving, where distractions pose significant dangers. Knowledge of a planned attention shift could give hints to whether the driver is actively engaged with a task. In the next section, we discuss an application of this eye–head dynamics knowledge in real-world task-oriented situations, in order to design classifiers to detect driver's intentions. 
Investigations into naturalistic driving
McCall et al. (2007) demonstrated the ability to detect a driver's intent to change lanes up to 3 s ahead of time, by analyzing driver head motion patterns. Doshi and Trivedi (2009b) extended this study and found that head motion was in fact an earlier predictor of lane change intentions than eye gaze. However, the reasons for this interesting finding were not clear. 
We propose that the visual search that occurs prior to lane changes, and potentially in other similar common driving maneuvers, is initiated by a top-down process in the driver's mind. The driver has a goal in mind and, thus, is trained to initiate a search in particular locations such as the mirrors and over the shoulders, for obstacles. Here, we present a deeper analysis into real driving data to support this hypothesis, that the visual search prior to lane changes is a Type III search. The ability to detect this type of behavior is crucial in being able to identify the context of the situation and then to assess its criticality or determine if objects around the vehicle are of interest. 
Naturalistic driving data collection
For this research, data were collected in a driving experiment with an intelligent vehicle test bed outfitted with a number of sensors detecting the environment, vehicle dynamics, and driver behavior. These data are drawn from the same data as were used in the lane change intent work by McCall et al. (2007). A camera-based lane position detector and CAN-Bus interface provided most of the data related to the vehicle and surrounding environment. 
The main driver-focused sensor was a rectilinear color camera mounted above the center console facing toward the driver, providing 640 × 480 resolution video at 30 frames per second. To calculate head motion, optical flow vectors were compiled and averaged in several windows over the driver's face (detected with the Viola and Jones, 2001, face detector). This method was found to be stable and robust across different harsh driving conditions and various drivers. Other methods could be used for these purposes (Murphy-Chutorian, Doshi, & Trivedi, 2007; Murphy-Chutorian & Trivedi, 2009). Various automatic eye gaze detectors exist (e.g., Wu & Trivedi, 2007); however to ensure accuracy and reliability, eye gaze was labeled into one of 9 categories using a manual reduction technique similar to several recent NHTSA studies on workload and lane changes (Angell et al., 2006; S. E. Lee et al., 2004; details on this procedure can be found in Doshi & Trivedi, 2009b). While the technique did not capture the subtle variations in eye movements, it was useful to capture the timing of the eye saccades to the mirrors and side windows. 
The data set was collected from a naturalistic ethnographic driving experiment in which the subjects were not told that the objective was related to lane change situations. Eight drivers of varying age, sex, and experience drove for several hours each on a predetermined route. A total of 151 lane changes were found on highway situations with minimal traffic; 753 negative samples were collected, corresponding to highway “lane-keeping” situations. 
Analysis
These examples were used with the features described above to train a classifier to predict 3 s in advance of a lane change, whether the driver would intend to change lanes. Another classifier was trained for 2 s ahead of the lane change. In similar studies, an extension of SVM, namely, Relevance Vector Machines, was found to be a good classifier (see McCall et al., 2007; Tipping & Faul, 2003 for more details) and was thus used here as well. In a comparative study, Doshi and Trivedi (2009b) found that such a classifier based on head motion has significantly more predictive power than one based on eye gaze 3 s ahead of the lane change but not 2 s ahead of time. 
By looking at the outputs of each classifier, we can get a sense of the performance of eye gaze and head pose over time. Specifically, the RVM classifier outputs a class membership likelihood, ranging from −1 (for negative examples) to 1 (for positive examples); thus the more positive the value, the more confident it is in its predictions of a true intention. Figure 14 shows the processing framework for this scenario. By looking at the average over all positive examples of these “Intent Prediction Confidences,” in Table 3, we can tell that the eye-gaze-based classifier is hardly better than chance 3 s before the lane change but improves significantly in the 2-s case. On the other hand, the head-motion-based classifier works very well even in the 3-s case. 
Figure 14
 
Flowchart of proposed approach for evaluating naturalistic driving data during lane change-associated visual search.
Figure 14
 
Flowchart of proposed approach for evaluating naturalistic driving data during lane change-associated visual search.
Table 3
 
Average intent prediction confidences ( I P C ― ) for each type of classifier, where a value of 0 represents chance.
Table 3
 
Average intent prediction confidences ( I P C ― ) for each type of classifier, where a value of 0 represents chance.
Seconds before lane change
3 s 2 s
Eye gaze classifier ( I P C e y e ― ) 0.0027 0.4691
Head pose classifier ( I P C h e a d ― ) 0.4411 0.6639
ANOVA: I P C h e a d ― > I P C e y e ― p < 0.01 p > 0.05
The results indicate that drivers engage in an earlier preparatory head motion, before shifting their gaze to check mirrors or blind spot. 
ANOVA significance tests comparing the population of Intent Prediction Confidences (
I P C
) demonstrate quantitatively that the preparatory head motion prior to the eye gaze shift is a significant trend 3 s prior to the lane change: The head pose classifier is significantly more predictive than the eye gaze classifier (F(1,7) = 24.4, p < 0.01). We can conclude that the early head motions begin to indicate a driver's intentions prior to the actual gaze saccade. This implies that the detection of a goal-oriented gaze shift may be quite useful in future advanced driver assistance systems, by helping to determine the context of the drive and whether the driver is indeed paying attention to task-relevant objects. 
Concluding remarks
We have demonstrated how the dynamics of overt visual attention shifts evoke certain patterns of responses in eye and head movements, in particular the interaction of eye gaze and head pose dynamics under various attention-switching conditions. Sudden, bottom-up visual cues in the periphery evoke a different pattern of eye–head yaw dynamics as opposed to those during top-down, task-oriented attention shifts. In laboratory vehicle simulator experiments, a unique and significant (p < 0.05) pattern of preparatory head motions, prior to the gaze saccade, emerges in the top-down case. 
This finding is validated in qualitative analysis of naturalistic real-world driving data. In examining the time course of visual searches prior to lane changes, the same significant pattern of early head motions appeared, indicating a “task-oriented” attention shift. Though it would be dangerous to collect data regarding stimulus-oriented attention shifts in real driving, the simulator experiments presented here seem to demonstrate that sudden stimuli attract eye motions as early as, or earlier than, head motions. 
One important question arises regarding the nature of the stimulus presented in this experiment, specifically whether the “exogenous” cuing condition actually corresponds to a distraction-driven attention shift. Pashler et al. (2001) notes that “abrupt onset” cues may not be inherently attractive in a bottom-up sense but are only attractive if those cues also happen to be useful for whatever top-down tasks the user may be engaged in. This “cognitive penetration” of bottom-up cuing could imply that the “exogenous” cuing case presented in these experiments may not correlate to a real bottom-up, reflexive “distraction.” However, it could be argued that in real driving scenarios, users are primed to detect abrupt onset stimuli because they may be relevant to the safety of the driving situation. In this context, any abrupt onset stimulus that is then unrelated to the primary task at hand (safe driving) can be considered a “distraction,” and the results of this study shows that the visible dynamics of attention shifts differ in this case. More work could be done to characterize the nature of distractions in complex environments, as they relate to top-down goals of the subject, and then to include different forms of distractions in these experiments. 
The results presented here indicate that measurement of eye–head dynamics may prove to be useful data for classifying human attentive states in time and safety critical environments. By detecting a “predictive” or early head movement prior to a gaze shift, an intelligent assistance system could improve its estimate of whether the subject is actively engaged in the task at hand. In potentially dangerous situations such as driving and flying, active perception of humans thus has a large role to play in improving comfort and saving lives. 
Acknowledgments
This research was supported by a National Science Foundation Grant and a Doctoral Dissertation Grant from the University of California Transportation Center (U.S. Department of Transportation Center of Excellence). The authors would like to thank Professor Hal Pashler and Professor Michael Mozer for their valuable comments and insights. The authors also wish to thank their colleagues from the CVRR Laboratory for their assistance, especially during experimental data collection. 
Commercial relationships: none. 
Corresponding author: Anup Doshi. 
Address: University of California, San Diego, 9500 Gilman Dr, MC 0434, La Jolla, CA 92093-0434, USA. 
References
Angell L. Auflick J. Austria P. A. Kochhar D. Tijerina L. Biever W. et al. (2006). Driver workload metrics task 2 final report (Report DOT HS 810635). Washington, DC: NHTSA, US Department of Transportation.
Asteriadis S. Tzouveli P. Karpouzis K. Kollias S. (2009). Estimation of behavioral user state based on eye gaze and head pose—Application in an e-learning environment. Multimedia Tools and Applications, 41, 469–493. [CrossRef]
Ballard D. H. Hayhoe M. M. (2009). Modelling the role of task in the control of gaze. Visual Cognition, 17, 1185–1204. [CrossRef] [PubMed]
Bartz A. E. (1966). Eye and head movement in peripheral vision: Nature of compensatory eye movements. Science, 152, 1644–1645. [CrossRef] [PubMed]
Batista J. P. (2005). A real-time driver visual attention monitoring system. In Marques J. Pérez de la Blanca N. Pina P. (Eds.), Iberian Conference on Pattern Recognition and Image Analysis (vol. 3522, pp. 200–208). Heidelberg: Springer Berlin.
Bizzi E. Kalil R. E. Morasso P. (1972). Two modes of active eye–head coordination in monkeys. Brain Research, 40, 45–48. [CrossRef] [PubMed]
Cardosi K. M. (1999). Human factors for air traffic control specialists: A user's manual for your brain (Report DOT/FAA/AR-99/39). Washington, DC: Federal Aviation Administration, U.S. Department of Transportation.
Carmi R. Itti L. (2006). Visual causes versus correlates of attentional selection in dynamic scenes. Vision Research, 46, 4333–4345. Available from http://www.sciencedirect.com/science/article/B6T0W-4M4TNJ1-2/2/2a4f8e772f046d1d0eb9ea90e57eeaf1. [CrossRef] [PubMed]
Cheng S. Y. Trivedi M. M. (2006). Turn intent analysis using body-pose for intelligent driver assistance. IEEE Pervasive Computing, Special Issue on Intelligent Transportation Systems, 5, 28–37.
Corneil B. D. Munoz D. P. (1999). Human eye–head gaze shifts in a distractor task: II. Reduced threshold for initiation of early head movements. Journal of Neurophysiology, 86, 1406–1421.
Dodge R. (1921). The latent time of compensatory eye movements. Journal of Experimental Psychology, 4, 247–269. [CrossRef]
Doshi A. Cheng S. Y. Trivedi M. M. (2009). A novel, active heads-up display for driver assistance. IEEE Transactions on Systems, Man, and Cybernetics, Part B, 39, 85–93. [CrossRef]
Doshi A. Trivedi M. M. (2009a). Head and gaze dynamics in visual attention and context learning. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (pp. 77–84). Miami, FL, USA.
Doshi A. Trivedi M. M. (2009b). On the roles of eye gaze and head dynamics in predicting driver's intent to change lanes. IEEE Transactions on Intelligent Transportation Science, 10, 453–462. [CrossRef]
Durlach P. J. (2004). Change blindness and its implications for complex monitoring and control systems design and operator training. Human–Computer Interaction, 19, 423–451. [CrossRef]
Eckstein M. P. Caspi A. Beutter B. R. Pham B. T. (2004). The decoupling of attention and eye movements during multiple fixation search [Abstract]. Journal of Vision, 4(8):165, 165a, http://www.journalofvision.org/content/4/8/165, doi:10.1167/4.8.165. [CrossRef]
Freedman E. G. (2008). Coordination of the eyes and head during visual orienting. Experimental Brain Research, 190, 369–387. [CrossRef] [PubMed]
Fuller J. H. (1992). Comparison of head movement strategies among mammals. In Berthoz A. Graf W. Vidal P. P. (Eds.), Head–neck sensory motor system (pp. 101–114). New York: Oxford Press.
Goossens H. H. L. M. Opstal A. J. V. (1997). Human eye–head coordination in two dimensions under different sensorimotor conditions. Experimental Brain Research, 114, 542–560. [CrossRef] [PubMed]
Henderson J. M. Brockmole J. R. Castelhano M. S. Mack M. (2007). Visual saliency does not account for eye movements during visual search in real-world scenes. In Gompel, R. P. V. Fischer, M. H. Murray, W. S. Hill R. L. (Eds.), Eye movements (pp. 537–562). Oxford, UK: Elsevier. Available from http://www.sciencedirect.com/science/article/B87JD-4PFGDST-14/2/71b923d388d1e0a0ef1f6a6aeb98d6a8
Herst A. N. Epelboim J. Steinman R. M. (2001). Temporal coordination of the human head and eye during a natural sequential tapping task. Vision Research, 41, 3307–3319. [CrossRef] [PubMed]
Itti L. Koch C. (2000). A saliency-based search mechanism for overt and covert shifts of visual attention. Vision Research, 40, 1489–1506. [CrossRef] [PubMed]
Jovancevic J. Sullivan B. Hayhoe M. (2006). Control of attention and gaze in complex environments. Journal of Vision, 6(12):9, 1431–1450, http://www.journalofvision.org/content/6/12/9, doi:10.1167/6.12.9. [PubMed] [Article] [CrossRef]
Khan A. Z. Blohm G. McPeek R. M. Lefvre P. (2009). Differential influence of attention on gaze and head movements. Journal of Neurophysiology, 101, 198–206. [CrossRef] [PubMed]
Land M. F. (1992). Predictable eye–head coordination during driving. Nature, 359, 318–320. [CrossRef] [PubMed]
Landry S. J. Sheridan T. B. Yufik Y. M. (2001). A methodology for studying cognitive groupings in a target-tracking task. IEEE Transactions on Intelligent Transportation Systems, 2, 92–100. [CrossRef]
Lee S. E. Olsen C. B. Wierwille W. W. (2004). A comprehensive examination of naturalistic lane changes (Report DOT HS 809702). Washington, DC: NHTSA, US Department of Transportation.
Lee Y. C. Lee J. D. Boyle L. N. (2007). Visual attention in driving: The effects of cognitive load and visual disruption. Human Factors, 49, 721–733. [CrossRef] [PubMed]
Levy J. Pashler H. (2008). Task prioritization in multitasking during driving: Opportunity to abort a concurrent task does not insulate braking responses from dual-task slowing. Applied Cognitive Psychology, 22, 507–525. [CrossRef]
Levy J. Pashler H. Boer E. (2006). Central interference in driving: Is there any stopping the psychological refractory period? Psychological Science, 17, 228–235. [CrossRef] [PubMed]
Mahadevan V. Vasconcelos N. (2010). Spatiotemporal saliency in dynamic scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 171–177. [CrossRef] [PubMed]
Matsumoto Y. Ogasawara T. Zelinsky A. (2000). Behavior recognition based on head pose and gaze direction measurement. Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2000, 3, 2127–2132.
McCall J. C. Trivedi M. M. (2007). Driver behavior and situation aware brake assistance for intelligent vehicles. Proceedings of the IEEE, Special Issue on Advanced Automobile Technology, 95, 374–387.
McCall J. C. Wipf D. Trivedi M. M. Rao B. (2007). Lane change intent analysis using robust operators and sparse Bayesian learning. IEEE Transactions on Intelligent Transportation Systems, 8, 431–440. [CrossRef]
Mennie N. R. Hayhoe M. M. Sullivan B. T. (2007). Look-ahead fixations: Anticipatory eye movements in natural tasks. Experimental Brain Research, 179, 427–442. [CrossRef] [PubMed]
Mennie N. R. Hayhoe M. M. Sullivan B. T. Walthew C. (2003). Look ahead fixations and visuo-motor planning [Abstract]. Journal of Vision, 3(9):123, 123a, http://www.journalofvision.org/content/3/9/123, doi:10.1167/3.9.123. [CrossRef]
Morasso P. Sandini G. Tagliasco V. Zaccaria R. (1977). Control strategies in the eye–head coordination system. IEEE Transactions on Systems, Man and Cybernetics, 7, 639–651. [CrossRef]
Mozer M. C. Sitton M. (1998). Computational modeling of spatial attention. In Pashler H. (Ed.), Attention (chap. 3, pp. 341–394). East Sussex, UK: Psychology Press.
Murphy-Chutorian E. Doshi A. Trivedi M. M. (2007). Head pose estimation for driver assistance systems: A robust algorithm and experimental evaluation. IEEE International Transportation Systems Conference, 709–714.
Murphy-Chutorian E. Trivedi M. M. (2009). Head pose estimation in computer vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31, 607–626. [CrossRef] [PubMed]
Navalpakkam V. Itti L. (2005). Modeling the influence of task on attention. Vision Research, 45, 205–231. [CrossRef] [PubMed]
Oliver N. Rosario B. Pentland A. (2000). A Bayesian computer vision system for modeling human interactions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 831–843. [CrossRef]
Pashler H. Johnston J. C. Ruthruff E. (2001). Attention and performance. Annual Review of Psychology, 52, 629–651. [CrossRef] [PubMed]
A. A. (Eds.) (2004). World report on road traffic injury prevention. Geneva, Switzerland: World Health Organization.
Pelz J. Hayhoe M. Loeber R. (2001). The coordination of eye, head, and hand movements in a natural task. Experimental Brain Research, 139, 266–277. [CrossRef] [PubMed]
Pentland A. (2007). Social signal processing [Exploratory DSP]. IEEE Signal Processing, 24, 108–111. [CrossRef]
Pentland A. Liu A. (1999). Modeling and prediction of human behavior. Neural Computation, 11, 229–242. [CrossRef] [PubMed]
Peters R. J. Itti L. (2006). Beyond bottom-up: Incorporating task-dependent influences into a computational model of spatial attention. IEEE Conference on Computer Vision and Pattern Recognition, 1–8.
Recarte M. A. Nunes L. M. (2000). Effects of verbal and spatial-imagery tasks on eye fixations while driving. Journal of Experimental Psychology: Applied, 6, 31–43. [CrossRef] [PubMed]
Ron S. Berthoz A. Gur S. (1993). Saccade vestibuloocular reflex cooperation and eye head uncoupling during orientation to flashed target. The Journal of Physiology, 464, 595–611. [CrossRef] [PubMed]
Rothkopf C. A. Ballard D. Hayhoe M. (2007). Task and context determine where you look. Journal of Vision, 7(14):16, 1–20, http://www.journalofvision.org/content/7/14/16, doi:10.1167/7.14.16. [PubMed] [Article] [CrossRef] [PubMed]
Ryoo M. S. Aggarwal J. K. (2007). Hierarchical recognition of human activities interacting with objects. IEEE Conference on Computer Vision and Pattern Recognition, 0, 1–8.
Seeing Machines faceLAB Head and Eye Tracker (2004). This is a product available from the Seeing Machines company, more info is available online here: http://www.seeingmachines.com/product/facelab/.
Tipping M. E. Faul A. C. (2003). Fast marginal likelihood maximisation for sparse Bayesian models. In Bishop C. M. Frey B. J. (Eds.), Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics (pp. 1–13). Key West, FL.
TORCS, The Open Racing Car Simulator (2010). This is open source software which can be downloaded here: http://torcs.sourceforge.net/.
Trivedi M. M. Cheng S. Y. (2007). Holistic sensing and active displays for intelligent driver support systems. IEEE Computer, Special Issue on Human-Centered Computing, 40, 60–68.
Trivedi M. M. Gandhi T. McCall J. C. (2007). Looking in and looking out of a vehicle: Computer vision-based enhanced vehicle safety. IEEE Transactions on Intelligent Transportation Systems, 8, 108–120. [CrossRef]
Viola P. Jones M. J. (2001). Rapid object detection using a boosted cascade of simple features. IEEE Conference on Computer Vision and Pattern Recognition, 1, 511–518.
Wu J. Trivedi M. M. (2007). Simultaneous eye tracking and blink detection with interactive particle filters. EURASIP Journal on Advances in Signal Processing, 2008, 114:
Yantis S. (1998). Control of visual attention. In Pashler H. (Ed.), Attention (pp. 13–74). Hove, UK: Psychology Press.
Zangemeister W. H. Stark L. (1982). Types of gaze movement: Variable interactions of eye and head movements. Experimental Neurology, 77, 563–577. [CrossRef] [PubMed]
Zelinsky G. J. Zhang W. Yu B. Chen X. Samaras D. (2005). The role of top-down and bottom-up processes in guiding eye movements during visual search. In Neural Information Processing Systems Conference (vol. 18, pp. 1569–1576). Cambridge, MA: MIT Press.
Zhang L. Tong M. H. Marks T. K. Shan H. Cottrell G. W. (2008). Sun: A Bayesian framework for saliency using natural statistics. Journal of Vision, 8(7):32, 1–20, http://www.journalofvision.org/content/8/7/32, doi:10.1167/8.7.32. [PubMed] [Article] [CrossRef] [PubMed]
Figure 1
 
Examples of various interactions of head and eye movements, with type labels from Zangemeister and Stark (1982). Note that in certain cases eye gaze tends to move first, where in others the head tends to move first.
Figure 1
 
Examples of various interactions of head and eye movements, with type labels from Zangemeister and Stark (1982). Note that in certain cases eye gaze tends to move first, where in others the head tends to move first.
Figure 2
 
Experimental setup of LISA-S test bed. The test bed includes a PC-based driving simulator, with graphics shown on a 52-inch monitor, an audio, and a steering wheel controller. The PC also controls a secondary monitor, located at an angle of 55° with respect to the driver, in a similar location to where the side-view mirror would be. The test bed includes a head and eye gaze tracking system, as well as a vision-based upper body tracking system (not used in this experiment). All the data from the gaze and body trackers are recorded synchronously with steering data and other parameters from the driving simulator.
Figure 2
 
Experimental setup of LISA-S test bed. The test bed includes a PC-based driving simulator, with graphics shown on a 52-inch monitor, an audio, and a steering wheel controller. The PC also controls a secondary monitor, located at an angle of 55° with respect to the driver, in a similar location to where the side-view mirror would be. The test bed includes a head and eye gaze tracking system, as well as a vision-based upper body tracking system (not used in this experiment). All the data from the gaze and body trackers are recorded synchronously with steering data and other parameters from the driving simulator.
Figure 3
 
Illustrative (staged) example of the experimental paradigm. In each cuing condition, we measure the differences in eye–head interactions during attention shifts to a secondary monitor, which the driver is required to check for instructions. In the “endogenous” condition, the driver is presented with a cue in the primary monitor and allowed to make a goal-oriented or preplanned attention shift. In the stimulus-oriented “exogenous” cuing condition, the secondary monitor displays a sudden change in color, drawing the driver's attention to the target.
Figure 3
 
Illustrative (staged) example of the experimental paradigm. In each cuing condition, we measure the differences in eye–head interactions during attention shifts to a secondary monitor, which the driver is required to check for instructions. In the “endogenous” condition, the driver is presented with a cue in the primary monitor and allowed to make a goal-oriented or preplanned attention shift. In the stimulus-oriented “exogenous” cuing condition, the secondary monitor displays a sudden change in color, drawing the driver's attention to the target.
Figure 4
 
Sample data showing the procedure for detection of the start of the eye saccade. A local area around a manually marked point is searched for the maximum eye yaw acceleration. This point is labeled as the start of the saccade, T ij S . The dynamics of the head yaw, i.e., position and motion, at this point T ij S are extracted for further analysis.
Figure 4
 
Sample data showing the procedure for detection of the start of the eye saccade. A local area around a manually marked point is searched for the maximum eye yaw acceleration. This point is labeled as the start of the saccade, T ij S . The dynamics of the head yaw, i.e., position and motion, at this point T ij S are extracted for further analysis.
Figure 5
 
Overall distribution of head yaw position and head yaw motion at the time of the eye gaze saccade for each condition, including all examples of all subjects.
Figure 5
 
Overall distribution of head yaw position and head yaw motion at the time of the eye gaze saccade for each condition, including all examples of all subjects.
Figure 6
 
Average head yaw prior to eye gaze saccade under each condition of the experiment, aligned to the position of the saccade. Dotted lines show the variance of the overall data. In Condition 1, a clear pattern of early head movement, beginning 0.5 s prior to the actual gaze shift, emerges. This early head movement is much less evident in Condition 2.
Figure 6
 
Average head yaw prior to eye gaze saccade under each condition of the experiment, aligned to the position of the saccade. Dotted lines show the variance of the overall data. In Condition 1, a clear pattern of early head movement, beginning 0.5 s prior to the actual gaze shift, emerges. This early head movement is much less evident in Condition 2.
Figure 7
 
Distribution of subject-wise median head yaw position and head yaw motions at the time of the eye saccade. Error bars represent standard error of the median.
Figure 7
 
Distribution of subject-wise median head yaw position and head yaw motions at the time of the eye saccade. Error bars represent standard error of the median.
Figure 8
 
Distribution of saccade timings, after the onset of the first cue. The delay in Condition 1 is as expected as subjects take time to detect the cue and plan the saccade. Condition 2 follows the pattern of unplanned saccades, mostly occurring around 500 ms after cue onset.
Figure 8
 
Distribution of saccade timings, after the onset of the first cue. The delay in Condition 1 is as expected as subjects take time to detect the cue and plan the saccade. Condition 2 follows the pattern of unplanned saccades, mostly occurring around 500 ms after cue onset.
Figure 9
 
Distribution of first significant head motions (over a fixed threshold) relative to the gaze saccade. The histograms represent the actual measurements, and the solid lines represent a fitted Gaussian. The endogenous, “goal-oriented” condition shows a marked difference, with a majority of head motions occurring prior to the saccade.
Figure 9
 
Distribution of first significant head motions (over a fixed threshold) relative to the gaze saccade. The histograms represent the actual measurements, and the solid lines represent a fitted Gaussian. The endogenous, “goal-oriented” condition shows a marked difference, with a majority of head motions occurring prior to the saccade.
Figure 10
 
Starting yaw position for eye and head motion under each condition. Error bars represent the standard error of the mean.
Figure 10
 
Starting yaw position for eye and head motion under each condition. Error bars represent the standard error of the mean.
Figure 11
 
Maximum yaw rates for eye and head motion under each condition. Error bars represent the standard error of the mean.
Figure 11
 
Maximum yaw rates for eye and head motion under each condition. Error bars represent the standard error of the mean.
Figure 12
 
Duration of motion from initial movement until target. Note that, in case of eye motion, this is a superset of the saccade duration. Error bars represent the standard error of the mean.
Figure 12
 
Duration of motion from initial movement until target. Note that, in case of eye motion, this is a superset of the saccade duration. Error bars represent the standard error of the mean.
Figure 13
 
Maximum yaw position for eye and head motion under each condition. Error bars represent the standard error of the mean.
Figure 13
 
Maximum yaw position for eye and head motion under each condition. Error bars represent the standard error of the mean.
Figure 14
 
Flowchart of proposed approach for evaluating naturalistic driving data during lane change-associated visual search.
Figure 14
 
Flowchart of proposed approach for evaluating naturalistic driving data during lane change-associated visual search.
Table 1
 
Confusion matrix for detecting endogenous, goal-oriented gaze shifts (G) versus exogenous, stimulus-oriented shifts (S) in the simulator experiment, using head yaw position criteria. Correct classification rate is 69%.
Table 1
 
Confusion matrix for detecting endogenous, goal-oriented gaze shifts (G) versus exogenous, stimulus-oriented shifts (S) in the simulator experiment, using head yaw position criteria. Correct classification rate is 69%.
Number of examples
Actual G Actual S
Predicted G 63 41
Predicted S 25 87
Table 2
 
Confusion matrix for detecting endogenous, goal-oriented gaze shifts (G) versus exogenous, stimulus-oriented shifts (S) in the simulator experiment, using head yaw motion criteria. Correct classification rate is 66%.
Table 2
 
Confusion matrix for detecting endogenous, goal-oriented gaze shifts (G) versus exogenous, stimulus-oriented shifts (S) in the simulator experiment, using head yaw motion criteria. Correct classification rate is 66%.
Number of examples
Actual G Actual S
Predicted G 65 39
Predicted S 35 77
Table 3
 
Average intent prediction confidences ( I P C ― ) for each type of classifier, where a value of 0 represents chance.
Table 3
 
Average intent prediction confidences ( I P C ― ) for each type of classifier, where a value of 0 represents chance.
Seconds before lane change
3 s 2 s
Eye gaze classifier ( I P C e y e ― ) 0.0027 0.4691
Head pose classifier ( I P C h e a d ― ) 0.4411 0.6639
ANOVA: I P C h e a d ― > I P C e y e ― p < 0.01 p > 0.05
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×