Open Access
Article  |   January 2016
It's in the eyes: Planning precise manual actions before execution
Author Affiliations & Notes
  • Address: Department of Computer Science, University of Tübingen, Tübingen, Germany. 
Journal of Vision January 2016, Vol.16, 18. doi:https://doi.org/10.1167/16.1.18
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Anna Belardinelli, Madeleine Y. Stepper, Martin V. Butz; It's in the eyes: Planning precise manual actions before execution. Journal of Vision 2016;16(1):18. https://doi.org/10.1167/16.1.18.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

It is well-known that our eyes typically fixate those objects in a scene, with which interactions are about to unfold. During manual interactions, our eyes usually anticipate the next subgoal and thus serve top-down, goal-driven information extraction requirements, probably driven by a schema-based task representation. On the other hand, motor control research concerning object manipulations has extensively demonstrated how grasping choices are often influenced by deeper considerations about the final goal of manual interactions. Here we show that also these deeper considerations are reflected in early eye fixation behavior, significantly before the hand makes contact with the object. In this study, subjects were asked to either pretend to drink out of the presented object or to hand it over to the experimenter. The objects were presented upright or upside down, thus affording a thumb-up (prone) or a thumb-down (supine) grasp. Eye fixation data show a clear anticipatory preference for the region where the index finger is going to be placed. Indeed, fixations highly correlate with the final index finger position, thus subserving the planning of the actual manual action. Moreover, eye fixations reveal several orders of manual planning: Fixation distributions do not only depend on the object orientation but also on the interaction task. These results suggest a fully embodied, bidirectional sensorimotor coupling of eye-hand coordination: The eyes help in planning and determining the actual manual object interaction, considering where to grasp the presented object in the light of the orientation and type of the presented object and the actual manual task to be accomplished with the object.

Introduction
Visual perception and motor control have been mostly studied separately, neglecting to a large degree the cognitive value of action production (Engel, Maye, Kurthen, & König, 2013; Rosenbaum, Chapman, Weigelt, Weiss, & van der Wel, 2012). Long considered a by-product of higher level cognition, action planning and execution, or behavior as a whole, have been relegated to be the computational output of perceptual and cognitive processes, somewhat detached from sensory perceptions. Although various researchers have emphasized the need to study perception and action jointly when attempting to understand vision (Ballard, 1991; Clark, 1999; Gibson, 1986), only over recent years interactions of action and perception moved into focus. 
Such interactions may be best appreciated in object manipulation tasks, which typically require highly refined sensorimotor control. Particularly manipulation sequences are critical to investigate how task control is instantiated and scheduled in the light of the current cognitive agenda (M. Hayhoe & Ballard, 2014). Indeed, each sensorimotor subphase in a task is typically composed of eye fixations and/or hand movement units, which can be delimited by contact events (Flanagan, Bowman, & Johansson, 2006). 
Whereas the role of the current task in overt attention has been recognized for a long time (Yarbus, 1967), only relatively recently the oculomotor behavior has been investigated while manipulating real objects in real environments. In a block copying task, Ballard et al. (1992) proved that the gaze strategy precedes hand movements and fixates the block to copy “just-in-time” to minimize memory load. In a simple block grasping and target-touching task, Johansson, Westling, Bäckström, and Flanagan (2001) showed that fixations within objects strictly landed on locations that were about to be grasped, to be touched, or to be avoided. Also while executing composite action sequences such as sandwich making (Hayhoe, Shrivastava, Mruczek, & Pelz, 2003), speed-stacking (Foerster, Carbone, Koesling, & Schneider, 2011), and other tasks (cf. Land & Tatler, 2009), higher level, cognitive processes including planning, visual memory, and learning appear to be at play. In these studies, the focus was not on grasping per se, however, or on the visual exploration of a particular object, but rather on the shifting of attention when moving from a subtask to the next while performing object interactions. To explain this anticipatory, task-driven, top-down controlled oculomotor behavior, Land (2009) proposed that a schema system in the prefrontal cortex may set the task and plan the overall sequence of actions, thus prioritizing eye gaze and manual control in a goal-oriented manner. In consequence, the eyes provide information to the manual control system to reach the current (sub) goal effectively and reliably. 
Recent studies have considered even more intricate object interactions. Baldauf and Deubel (2010) have shown that selective, action-oriented attentional processes typically produce attentional landscapes with multiple, behavior-relevant peaks on the object of interest. Confirmed by differences in eye fixations shortly after object presentation during pure object observation versus during object manipulation planning and execution (Belardinelli, Herbort, & Butz, 2015; Brouwer, Franz, & Gegenfurtner, 2009), these attentional landscapes appear to reflect the active attentional effort to instantiate proper hand-motor programs and thus to enable effective motor control. Seeing that these planning considerations are reflected in the gaze before the actual manual interaction unfolds, the eyes appear to work towards affordance- and contact-point information extraction to enable the subsequent execution of goal-directed manipulation actions. However, which task- and object-properties as well as which resulting planning considerations can influence eye saccade behavior before the actual manual object interaction unfolds and if also the manual control system itself may influence eye saccade behavior—besides the schema system proposed by Land and Tatler (2009)—still needs to be clarified. 
Manual behavior during object interaction tasks has been intensively investigated also in motor science, however without considering eye fixations. Particularly object manipulations and tool usage represent highly refined behavioral capabilities in humans and primates and may be considered one of the most goal-directed examples of action (as in Engel et al., 2013) strongly grounded in and relying on the control of sensorimotor interactions. When we want to execute a task with a particular object, it appears that we first find the target object and then extract the manipulation-relevant features for the selection of an appropriate grasp. However, this grasp selection does not appear to be purely habitual (Herbort & Butz, 2011), but it jointly depends on the current state of the object, the final goal, and the proprioceptive and haptic anticipation of its weight and 3D shape (Herbort & Butz, 2012; Herbort, Butz, & Kunde, 2014). 
A crucial finding regarding motor planning for object interaction was characterized as the end-state comfort effect (ESC). It was first observed by Rosenbaum et al. (1990) and describes the fact that when executing a grasp, we are willing to assume an awkward and uncomfortable hand posture (e.g., rotating the hand and grasping the object with the thumb down) if this brings us—after movement execution—in a more comfortable posture, which is possibly also more convenient for the subsequent task. The ESC is not limited to rotating objects, but also the grasp height can be influenced, depending on the final position where objects are to be placed: The lower the target position, the higher the grasp, and vice-versa (grasp height effect, Cohen & Rosenbaum, 2004). The fact that the goal posture or position plays a relevant role led to the hypothesis that the need for higher control rather than for higher comfort in the postgrasping phase might be the reason for the effect. This was demonstrated, for example, in a rotation task where narrow objects (more difficult to control), instead of larger objects, were preferably grasped with a thumb-down grasp (Short & Cauraugh, 1999). Although seemingly involving deep planning mechanisms, cognitive modeling as well as studies with everyday objects show that our brain seems to integrate multiple biases, including ESC and habitual biases, when controlling a grasp (Herbort & Butz, 2011, 2012), rather than planning the whole interaction each time anew. Nonetheless, it was also shown that final manipulation goals can affect the grasping kinematics, comparing, for example, pouring with moving objects (Sartori, Straulino, & Castiello, 2011), or lifting, throwing, and placing objects (Armbrüster & Spijkers, 2006). 
Most ESC studies focused on the motor control aspect, often assessing what grasp (thumb-up vs. thumb-down) or grasp orientation (e.g., wrist orientation) was preferably selected under which condition. In the present study, we reasoned that since planning, decision making, and motor control of manual object interactions appear to be subserved by visual information, eye fixations should anticipate these manual object interaction considerations. Essentially, we expected to see the ESC in the eyes before the actual manual object manipulation is executed. Moreover, we expected that gaze control should not only be affected by the current task and goal, but also by the specific kinematic planning required to execute it. Because in (precision) grasping the eyes show a preference for the index finger location, which is usually the first to make contact with an object (Brouwer et al., 2009; Cavina-Pratesi & Hesse, 2013), we hypothesized that eye fixations should anticipate index finger placement, which is determined by considering the complete task. 
To investigate this hypothesis, we presented three everyday objects—a bottle, a cup, and a can—to the participants, either upright of upside down, with the task to either drink out of the object or hand it over to the experimenter. Whereas we expected that the eyes will always anticipate index finger placement, we expected different modulations of this anticipation dependent on the task and the type and orientation of the presented object. In particular, depending on the object's orientation, we expected to see the ESC reflected in eye fixations as well as, later on, in the actual manual grasp. Moreover, we expected further eye fixations and later manual grasp adjustments depending on the task, since one typically would grasp an object slightly lower when handing it over to another person than when intending to drink out of it. 
Methods
Participants
Twenty participants (10 female, 10 male) carried out the experiment. Their ages ranged between 20 and 29 years (M = 23.37, SD = 2.50 years). All participants were right-handed according to the Edinburgh Inventory questionnaire and with normal or corrected-to-normal vision. Persons with glasses or left-handed did not participate in the experiment. Prior to the experiment the participants were informed about the experimental procedure and gave informed consent according to the Declaration of Helsinki. All of them, except for one, were students of the Eberhard Karl University of Tübingen. All participants were compensated with study credits or money and were naïve as to the purpose of the experiment. 
Stimuli and design
For the purpose of this experiment, participants were asked to grasp and carry out two different tasks with three, ordinary, real-world objects: a bottle, a cup, and a can, depicted in Figure 1A. Salient writings and logos, which could have captured attention, were removed. The objects were chosen such that all of them had an identifiable top with an opening and a bottom. During the experiment all objects were presented in two different orientations: in their typical upright orientation or reversed with the opening on the bottom. 
Figure 1
 
Stimulus material and experimental setup.
Figure 1
 
Stimulus material and experimental setup.
A within-subject 3 × 2 × 2 design was used. Three independent variables were manipulated: object (bottle, cup, or can), orientation (upright or upside down) and task (drink or hand). One block included one trial for each of the twelve experimental conditions, in randomized order within the block. Each participant performed three blocks, resulting in 36 trials. 
Apparatus
The measurements of interest were acquired by means of an eye tracker and a motion tracker. A mobile, head-mounted, binocular eye tracking system (ETG 2.0 by SMI) with proprietary recording software (iViewETG, Version 2.0) was used for the measurement of the eye movements, at a sampling frequency of 30 Hz and with a stated accuracy of 0.5° over all distances. A HD camera placed on the glasses frame between the eyes recorded the visual field of the participants with a resolution of 1280 × 960 px (60° × 46° field of view). The recording of the eye tracking data was controlled by a separate laptop in front of the experimenter, on the right of the participant (see Figure 1B). A three-point calibration was used for calibrating the eye tracker. Between trials the participants fixated a fixation cross (2 × 2 cm, 2.29°). If the experimenter noted that the calibration had degraded (fixation not on the cross, i.e., in the eye-tracker live view, the gaze was off more than 1.15°), calibration was repeated before continuing with the experiment. 
The experimental sequence and the motion tracker were controlled by a self-written program running in MATLAB (Version R2013a, 8.1, MathWorks Inc.) with Psychotoolbox 3 (Brainard, 1997). For recording motion data of the right hand, an electromagnetic motion tracker (3D Guidance trakSTAR, Ascension Technology Corp.) at 220 Hz working frequency was used (accuracy RMS 1.4 mm and 0.5°, effective tracking volume 46 × 56 × 60 cm with submillimeter resolution). Hand and finger movements were recorded via three sensors attached to the nails of the index and thumb finger and on the wrist of the participant. A fourth sensor was attached on the stand where the objects were presented to the participants, as reference for the object position. The positions of the sensors were defined relatively to the transmitter, fixed at the lower left corner on the table in front of the participant (see Figure 1B). The transmitter captured the movement of the sensors in its reference system: x axis parallel to the stand, oriented rightwards of the participant, y axis toward the participant, and z axis perpendicular to the table plane, oriented downwards. 
Procedure
Participants were seated on a chair in front of a table, where the objects were presented on a stand about 30 cm high (see Figure 1B). This served the purpose of having the objects always approximately at eye level. The distance to the table was determined in a way that the participant was able to grasp the objects without effort or the need to lean forward. The distance from the eyes to the objects was about 50 cm. Objects were always presented on a marked position at the center of the stand. Between trials a cardboard sheet, 30 × 45 cm, was placed in front of the object to mask stimulus substitution. The fixation cross was positioned at the center of the cardboard sheet. Participants were instructed to keep their gaze on the cross before starting each task execution. On the table in front of the participant, a hand resting position was marked, so to keep the starting position coherent across trials and participants. The experimenter stood to the right of the table for the whole experiment. 
Participants were asked to grasp the objects naturally with their right hand and to simulate drinking out of the object or to hand the object to the experimenter in a way that she could easily drink out of it. To accomplish both these tasks, it was necessary to first rotate the objects presented in reversed orientation. 
The experiment started with one training block. This block included, like each of the three experimental blocks, one trial for each of the 12 conditions. The training block was not recorded and was only used to make the participants familiar with the experiment. 
Participants started each trial by fixating the fixation cross with the right hand at the resting position. Depending on the condition sequence, one of the objects was placed upright or upside down on the stand. Afterwards, recording of the tracking devices was started, and the participant was verbally instructed with the task to be executed next. This was followed (after 4 s) by a tone signal, which was marked in the eye tracker and motion tracker log, and which prompted the experimenter to quickly remove the cardboard, and the participant to perform the task. Execution of the drinking task was concluded when the object was placed back on the shelf. Execution of the hand task was concluded with the handover of the object to the experimenter. After completion of the task, the participant was instructed to put her hand back to the resting position on the table. Each trial lasted about 15 s. 
Data processing and analysis
In almost all trials participants performed the expected grips. They grasped upright objects with the thumb-up grasp and upside down objects with the thumb-down grasp. Some examples of the performed trajectories for the index finger are presented in Figure 3A. In twelve trials four participants performed grips other than the expected one (they grasped the cup from above). In one trial, one participant first performed the wrong task and then corrected her execution. One trial could not be included in the analysis because of technical problems (corrupted video). In total these are 14 out of 720 trials. These “problematic” trials were less than 2% of all trials, and the results were not skewed by them, so we decided not to exclude them from the analysis to avoid the exclusion of all trials of one condition from these participants. 
Data processing for the recorded eye movements was done by means of the evaluation software of the eyetracking system (BeGaze, Version 3.4, SensoMotoric Instruments). The program extracted and projected fixations on the corresponding video sequences. Fixations were detected via the dispersion algorithm (Salvucci & Goldberg, 2000) with a temporal threshold of 80 ms and a spatial dispersion threshold of 100 px (∼4.7°). 
Since the camera perspective varied across trials and participants, reference pictures were used for further processing of the fixation data. Reference pictures were taken for each object in both orientations, and these pictures were taken twice for the two tasks for a total of twelve reference pictures. For each trial, relevant fixations showing in the corresponding video frame were mapped by a naive observer on the reference picture of the respective condition. We considered as relevant those fixations that occurred before the hand made contact with the object. The first fixation mapped for every trial was the first one on the object after the cardboard was not occluding the object any longer (not even partially). It started on average 670 ms after tone onset. The last transferred fixation was the one in which the hand at the beginning of the fixation was still moving, even if almost on the object. The observer was told to map fixations as precisely as possible on the corresponding point of the object where they landed in the trial videos. In the very rare case that the hand occluded the fixation point before actually being on the object, the corresponding fixation was not mapped. In this case, the fixation before that was used for the open bins in the timing analyses presented below. 
For qualitative visualization, heatmaps of each reference picture were created by means of the BeGaze evaluation software. The heatmaps are built by representing each fixation as a Gaussian with height scaled by its duration and kernel width set to 100 px (∼4.7°). The superposition of the Gaussians yields the accumulated time spent on each pixel averaged across participants, hence highlighting the longest and most fixated areas. The intensity is shown by a color spectrum from minimum to maximum (blue to red). 
For the evaluation of fixation time, Areas of Interest (AOIs) were used. The total duration of all fixations within each AOI (for every subject in every condition) was considered. The AOIs were drawn on the reference pictures, manually, after the fixation mapping was completed. To test our hypotheses, we defined two pairs of AOIs in order to be able to separately analyze the distribution of fixations on the horizontal (left and right AOIs) and vertical dimension (top and bottom AOIs). The rectangular AOIs were defined on the reference pictures of each object, the same for both orientations and both tasks, dividing each object in two halves (see Figure 6). 
Repeated-measures ANOVAs with factors AOI, task, object, and orientation were performed on these measures, with posthoc Bonferroni corrected analyses. 
As to time measures, we considered the hand movement times. Movement times are considered as the time between movement initiation (velocity of the wrist marker exceeded a threshold of 2.5 cm/s) and grasp onset, where grasp onset is defined as the point when the velocity of the index finger marker fell under a threshold of 5 cm/s. 
To relate eye and hand data, we considered first the grasp height and horizontal position of the index finger by the time of grasp onset. 
Furthermore, to reveal the temporal evolution of gaze locations relative to hand locations, we also analyzed the evolution of the fixation coordinates on the reference pictures and of the hand locations in the motion tracker reference frame. To do so, visual scanpaths were normalized on a 0 to 1 temporal axis starting with the first mapped fixation and ending at the time of grasp onset. The fixations during this time period of each subject in each trial were then binned into ten bins. A fixation was considered part of a bin if it overlapped with the time interval of the bin. Fixation coordinates in each bin were then averaged for each subject in each trial by weighting them proportionally to the amount of overlap of the fixation duration with the bin's time interval. When no data for a bin was available, the value of the preceding bin was taken (this was the case just for four combinations of subject-condition). Similarly, if the last mapped fixation in a trial ended before grasp onset, the coordinates of the last mapped fixation were used to fill the missing bins. Hand movement data were similarly normalized (starting again with the first mapped fixation and ending at the point of grasp onset) and binned into ten bins. The alignment of the eye and hand events and the interval of normalization and binning are depicted in Figure 2. The mean lag between the first fixation start and hand motion onset was 253 ms (SD = 213 ms). The mean lag between the end of the last fixation and grasp onset was 123 ms (SD = 266 ms). 
Figure 2
 
Time lines for eye fixations and hand trajectories in a trial. The two data streams are aligned with respect to the tone onset event. On the eye tracking time line, start and end times are available for each mapped fixation. On the hand tracking time line, hand motion onset and grasp onset times are available as well as the locations of the index finger over the complete trial. To compare both trajectories across trials, eye and hand data were normalized and binned considering the time interval starting with the onset of the first mapped fixation and ending with grasp onset. Note that the hand motion onset can occur before or after the start of the first fixation (253 ms after the first fixation on average) and that the grasp onset can also occur before or after the end of the last mapped fixation (123 ms after the end of the last fixation on average).
Figure 2
 
Time lines for eye fixations and hand trajectories in a trial. The two data streams are aligned with respect to the tone onset event. On the eye tracking time line, start and end times are available for each mapped fixation. On the hand tracking time line, hand motion onset and grasp onset times are available as well as the locations of the index finger over the complete trial. To compare both trajectories across trials, eye and hand data were normalized and binned considering the time interval starting with the onset of the first mapped fixation and ending with grasp onset. Note that the hand motion onset can occur before or after the start of the first fixation (253 ms after the first fixation on average) and that the grasp onset can also occur before or after the end of the last mapped fixation (123 ms after the end of the last fixation on average).
Figure 3
 
(A) Example of the index finger trajectories of one subject grasping the bottle (one trial for each task/ orientation condition). These trajectories are limited to the interval between movement initiation and grasp onset. (B) Boxplot of movement times in each condition.
Figure 3
 
(A) Example of the index finger trajectories of one subject grasping the bottle (one trial for each task/ orientation condition). These trajectories are limited to the interval between movement initiation and grasp onset. (B) Boxplot of movement times in each condition.
Note that in this way we can temporally relate the visual eye fixation scanpaths to the manual movement evolution in each trial—even if these have different durations—essentially considering the subperiod of the trial for which visual and hand trajectory data is available until the hand reaches the object. By filling visual bins for which no fixation was available with the data from past bins, the data for any visual bin comes from the past or present, but not from future fixations. In this way we assure that the data below indeed shows that the eyes anticipate future hand behavior (eye data comes always from past or present; hand data is always from the present). Since the trial normalization time periods differ (mean duration 1329 ± 347 ms), we can only report bin averages. Seeing that a bin has an average duration of 1329/10 ms, an anticipation of one bin roughly corresponds to 133 ms. 
Results
Motor behavior
Since the subjects were not instructed to execute the object manipulations as fast as possible, reaction times are not analyzed in further detail (and only revealed small differences across the different object manipulation tasks). On average, movement onset was registered 923 ± 162 ms after tone onset. To assess whether the task or the orientation could determine differences in producing the movement, however, we looked into movement times in further detail. We expected the upside down condition to elicit longer times because of the more uncomfortable and difficult to control grasp. 
We conducted a repeated measures, three-way ANOVA with task, orientation, and object as factors. This analysis yielded main effects of task, F(1, 19) = 12.02, p = 0.003; object, F(2, 38) = 31.59, p < 0.001; and orientation, F(1, 19) = 96.99, p < 0.001. In the drink task (M = 1094 ms), it took longer for the hand to reach the object than in the hand task (M = 1062 ms). The cup (M = 1134 ms) in this case required significantly longer than the bottle (M = 1046, p < 0.001) and the can (M = 1052, p < 0.001), as was the case for the fixation times. Moreover, upside down objects required a longer time (M = 1184) to be reached than upright objects (M = 970). Light interaction effects for task and orientation, F(1, 19) = 5.76, p = 0.027, as well as a three-fold interaction, reached the significance level, F(2, 38) = 4.29, p = 0.021. 
We followed up this interaction with a separate, repeated measures, two-way ANOVA for each object. The bottle case took longer movements in the inverted condition than in the upright (Mup = 950 ms; Mdown = 1142 ms), F(1, 19) = 4.29, p < 0.001. The cup case took longer movements depending on the task (Mdrink = 1156 ms; Mhand = 1113 ms), F(1, 19) = 9.41, p = 0.006; and on the orientation (Mup = 1015 ms; Mdown = 1254 ms), F(1, 19) = 66.55.29, p < 0.001; and these factors also interacted, F(1, 19) = 9.37, p = 0.006. For the can, we found similarly a main effect of task, (Mdrink = 1073 ms; Mhand = 1030 ms), F(1, 19) = 8.81, p = 0.008, and of orientation, F(1, 19) = 60.36, p < 0.001 (Mup = 946 ms; Mdown = 1157 ms), but no interaction effect. 
AOI analysis
Total fixation time was evaluated with respect to the AOI pairs defined above. A repeated measures, four-way ANOVA with all factors (AOI, task, object, and orientation) was computed separately for the two AOI pairs. 
The first AOI pair (left/right) is meant to investigate whether the different grasps for upright and inverse objects are also visible in the early gaze behavior. The two halves indeed correspond to the side of the object where the index finger would be placed in the different grasp types. In the second AOI tiling, each object is segmented into a top and a bottom part to investigate if there is also an interaction effect of AOI with task and/or object on the height of fixations. 
The ANOVA with left and right AOIs revealed that in the drink condition objects were overall fixated longer1 (M = 465 ms) than in the hand condition (M = 433 ms), F(1, 19) = 7.72, p = 0.012. For the factor object, F(2, 38) = 22.97, p < 0.001, Bonferroni posthoc tests revealed significant differences between bottle and cup, t(19) = – 6.99, p < 0.001; bottle and can, t(19) = −3.18, p = 0.015; and between cup and can, t(19) = 3.51, p = 0.007. The cup (M = 487 ms) was fixated longer than the bottle (M = 411 ms) and the can (M = 449 ms). Regarding orientation, the upside-down-presented objects were fixated longer (M = 468 ms) than the upright ones (M = 430 ms), F(1, 19) = 7.14, p = 0.015. Although significant, these effects are rather small, especially when considering the eye-tracker sampling rate. In the light of our hypotheses, more relevant effects were expected from interactions between the factors, and indeed the ANOVA analysis revealed significant interactions between the factors AOI and orientation, F(1, 19) = 56.36, p < 0.001; object and orientation, F(2, 38) = 3.84, p = 0.030; and AOI, object, and orientation, F(2, 38) = 3.53, p = 0.041. All other interactions were not significant (p > 0.05). These interactions are depicted in Figure 4 and show that for every object the left AOI was fixated significantly longer in the upside down condition than in the upright (where, at least for the bottle and the can, fixations lingered mostly centrally). 
Figure 4
 
AOI, task, and orientation interaction in total fixation time for every object for the AOIs left and right. Error bars denote the standard error of the mean. Asterisks indicate significant comparisons (p < 0.05) between upright and upside down presentations.
Figure 4
 
AOI, task, and orientation interaction in total fixation time for every object for the AOIs left and right. Error bars denote the standard error of the mean. Asterisks indicate significant comparisons (p < 0.05) between upright and upside down presentations.
To further disentangle these effects, we conducted a three-way ANOVA for each object. For the bottle (see Figure 4, top panel), this analysis showed a main effect of orientation, F(1, 19) = 19.36, p < 0.001—the bottle was fixated longer when presented upside down (M = 452 ms) than upright (M = 372 ms)—and an interaction of AOI and orientation, F(1, 19) = 22.08, p < 0.001. For the cup and the can (see Figure 4, middle and bottom panels), the interaction of AOI and orientation was significant, Fcup(1, 19) = 19.00, pcup < 0.001; Fcan(1, 19) = 14.77, pcan = 0.001. The differences in fixation distribution can also be very well seen in the heatmaps shown in Figure 6: Whereas in the upright orientation, fixations concentrate on the upper-right part of the objects, in the upside down condition, there was a second peak, denoting the shift to the bottom-left side of the objects. 
The ANOVA for the AOIs top and bottom additionally revealed that the objects were fixated significantly longer in the AOI top (M = 664 ms) than in the AOI bottom (M = 235 ms), F(1, 19) = 100.45, p < 0.001. The factors task, object, and orientation produced the same main effects as for the first AOI condition presented above. 
Additionally, the ANOVA revealed significant interactions between the factors AOI and object, F(2, 38) = 19.85, p < 0.001; AOI and orientation (objects fixated longer in the top AOI in the upright orientation), F(1, 19) = 24.72, p < 0.001; AOI, task, and orientation F(1, 19) = 21.24, p < 0.001, object and orientation, F(2, 38) = 3.84, p = 0.030; AOI, object, and orientation, F(2, 38) = 19.35, p < 0.001, and AOI, task, object, and orientation, F(2, 38) = 6.18, p = 0.005. These interactions are illustrated in Figure 5
Figure 5
 
AOI, task, and orientation interaction in total fixation time for every object for the AOIs top and bottom. Error bars denote the standard error of the mean. Asterisks indicate significant comparisons (p < 0.05) between upright and upside down presentations. Triangles indicate significant comparisons between tasks.
Figure 5
 
AOI, task, and orientation interaction in total fixation time for every object for the AOIs top and bottom. Error bars denote the standard error of the mean. Asterisks indicate significant comparisons (p < 0.05) between upright and upside down presentations. Triangles indicate significant comparisons between tasks.
Figure 6
 
Object × orientation × task-respective heat maps encoding absolute gaze durations (in ms) averaged across subjects. Upper rows show the upright; lower rows, the upside down condition. The color code goes from blue (1 ms duration) to red (maximum gaze duration on each map).
Figure 6
 
Object × orientation × task-respective heat maps encoding absolute gaze durations (in ms) averaged across subjects. Upper rows show the upright; lower rows, the upside down condition. The color code goes from blue (1 ms duration) to red (maximum gaze duration on each map).
Again, to disentangle these many interactions, we looked into a three-way ANOVA separately for every object. For the bottle (see Figure 5, top panel) we found a main effect of AOI, F(1, 19) = 8.08, p = 0.010, (Mtop = 501 ms; Mbottom = 322 ms) and of orientation (as for the other AOI pair above). Additionally there was a significant interaction of AOI and task, F(1, 19) = 7.76, p = 0.012, and a three-fold interaction of AOI, task, and orientation (reversing the pattern from drink to hand), F(1, 19) = 16.98, p = 0.001. For the can and the cup (see Figure 5, middle and bottom panels), we found similarly a main effect of AOI, Fcup(1, 19) = 92.27, pcup < 0.001 (Mtop = 819 ms; Mbottom = 157 ms); Fcan(1, 19) = 74.44, pcan < 0.001 (Mtop = 672 ms; Mbottom = 226 ms), and again for both we found the AOI and orientation interaction, Fcup(1, 19) = 19.01, pcup < 0.001; Fcan(1, 19) = 58.37, pcan < 0.001, and the AOI, task, and orientation interaction, Fcup(1, 19) = 6.58, pcup = 0.019; Fcan(1, 19) = 6.17, pcan = 0.023. 
Depending on the task, the bottle presents a somewhat different pattern, compared to the other two objects. In the hand task the trend is inverted compared to the cup and the can, with the bottom part gathering more attention in the upright than in the inverted condition. Posthoc tests (correcting the p level by four comparisons) show that in the upright condition the top part is fixated significantly longer in the drink task than in the hand task, t(19) = 5.72, p = 0.004, whereas the bottom part is looked at longer in the hand condition than in the drink condition, t(19) = −4.02, p = 0.004. In the upside down condition, the bottom part was looked at longer in the drink than in the hand condition, t(19) = 2.76, p = 0.048. 
Main effects for the AOI analysis are finally summed up in Table 1, along with the principal interaction of interest (AOI × orientation). 
Table 1
 
Main effects found in the four-way ANOVA with factors AOI (LR = left/right, TB = top/bottom), task, object, and orientation. Further, the interaction AOI × orientation is displayed.
Table 1
 
Main effects found in the four-way ANOVA with factors AOI (LR = left/right, TB = top/bottom), task, object, and orientation. Further, the interaction AOI × orientation is displayed.
Grasp position
To confirm the effects found in the AOI analysis, we further looked into the position of the index finger at the end of the movement time. Considering the horizontal location, a three-way ANOVA with factors task, object, and orientation revealed a main effect of object, F(1.17, 22.23) = 5.19, p = 0.028, and of orientation, F(1, 19) = 861.40, p < 0.001. In particular, every object was grasped with the index finger more to the right in the upright condition (M = 15.31 in) than in the upside down condition (M = 12.37 in). There was also an interaction of object and orientation, F(2, 38) = 675.99, p < 0.001. 
For the height, the three-way ANOVA with factors task, object, and orientation produced an object, F(2, 38) = 675.99, p < 0.001 and orientation, F(2, 38) = 17.23, p = 0.001, main effect, as for the AOI analysis. The bottle, as to be expected, was in general grasped higher than the can, t(19) = −29.20, p < 0.001, and the cup, t(19) = −34, 24, p < 0.001. The can was grasped higher than the cup too, t(19) = −8.72, p < 0.001. Interaction effects were found for task and orientation, F(1, 19) = 14.85, p = 0.001, object and orientation, F(1, 19) = 62.55, p < 0.001, and for task, object, and orientation, F(2, 38) = 10.48, p < 0.001. This latter interaction is depicted in Figure 7
Figure 7
 
Task and orientation interaction with respect to the index finger grasping height (in inches) for every object. Error bars denote standard errors. Y axis is shown reversed because the motion tracker Z axis was oriented downwards. Asterisks indicate significant comparisons (p < 0.05) between upright and upside down presentations. Triangles indicate significant comparisons between tasks.
Figure 7
 
Task and orientation interaction with respect to the index finger grasping height (in inches) for every object. Error bars denote standard errors. Y axis is shown reversed because the motion tracker Z axis was oriented downwards. Asterisks indicate significant comparisons (p < 0.05) between upright and upside down presentations. Triangles indicate significant comparisons between tasks.
We followed up the three-way interaction in the grasp height ANOVA with two-way ANOVAs for every object. The bottle presented no main effect of task or orientation but an interaction of the two factors, F(1, 19) = 18.21, p < 0.001. T tests show that the bottle was grasped significantly higher in the drink upright condition than in the hand upright, t(19) = −5.75, p < 0.001, and lower in the drink upside down condition than in the hand upside condition, t(19) = 2.96, p = 0.016. This can be also appreciated in the trials depicted in Figure 3A. The cup, perhaps due to its small size, presented only an orientation main effect, F(1, 19) = 66.92, p < 0.001. Finally, the can presented a main effect of task, F(1, 19) = 8.53, p = 0.008; a main effect of orientation, F(1, 19) = 33.19, p < 0.001; and an interaction effect, F(1, 19) = 12.75, p = 0.002: It was grasped higher in the drink than in the hand condition, and in the upright than in the inverted condition. T tests show also that the can was grasped higher in the drink upright condition when compared with the hand upright condition, t(19) = −4.51, p < 0.001. 
Correlation of eye and hand data
To analyze how the observed object orientation and task effects unfold over time, we looked into the binned evolution of x and y coordinates of eye fixations and hand positions. Eye fixation and index finger binned trajectories, averaged across subjects, are plotted in Figures 8 and 9
Figure 8
 
Gaze coordinates binned along the time axis starting from the beginning of the first fixation (t = 0) to the grasp onset (t = 1); left panels: drink task; right panels: hand task. Shaded areas represent the standard error. Dashed lines represent the objects' out-most edges.
Figure 8
 
Gaze coordinates binned along the time axis starting from the beginning of the first fixation (t = 0) to the grasp onset (t = 1); left panels: drink task; right panels: hand task. Shaded areas represent the standard error. Dashed lines represent the objects' out-most edges.
Figure 9
 
Index finger coordinates binned along the time axis starting from the beginning of the first fixation (t = 0) to grasp onset (t = 1); left panels: drink task; right panels: hand task. Shaded areas represent the standard error. Dashed lines represent the objects' out-most edges.
Figure 9
 
Index finger coordinates binned along the time axis starting from the beginning of the first fixation (t = 0) to grasp onset (t = 1); left panels: drink task; right panels: hand task. Shaded areas represent the standard error. Dashed lines represent the objects' out-most edges.
Considering fixation locations along the horizontal axis, the eyes move towards the left side of the object when it is presented upside down, and this left-side bias grows over the time course of a trial. No obvious, systematic differences between the drink and the hand condition are observable, regardless if the object was presented upright or upside down. On the other hand, when considering the height of the eye fixations along the vertical axis and particularly for the cup and the can, the eye looks lower in the upside down condition when compared with the upright condition and also this difference tends to increase. For the bottle, however, this tendency is reversed, indicating that generally a grasp central on the bottle is preferred. More importantly, though, the contrast between the drink and the hand condition is clearly reflected: When handing over the bottle to the experimenter, the eyes move progressively lower when the bottle is presented upright and progressively higher, when the bottle is presented upside down, compared to when the task is to pretend to drink out of the bottle. 
To ascertain to which extent the eyes anticipate the hand and especially the index finger, we finally correlated the binned eye and hand data along the x and the y axis. When computing a Pearson product-moment correlation coefficient to assess the relationship between the last fixation (in pixel) and the index finger height (in inches) when the object is reached by the hand on a trial-by-trial basis, a strong positive correlation is found (r = 0.54, n = 716, p < 0.001; regression slope β = 0.005). For the horizontal dimension a weaker but still significant positive correlation was found (r = 0.36, n = 716, p < 0.001; regression slope β = 0.014). These correlations are shown in Figure 11 in the Appendix. Two trials (one without video and one without fixations prior to the grasp) were discarded. 
To analyze how the correlation changed over time, we furthermore correlated each of the 10 eye bins with each of the 10 hand bins, both for the horizontal and vertical dimension. The correlation matrices, along with their significance levels, are shown in Figure 10. Higher correlation values are concentrated in the lower part of the matrices, indicating that the eyes indeed anticipate the hand, seeing that the eye fixations correlate with later hand positions from very early on. 
Figure 10
 
Correlation values and corresponding p values between eye movement bins (columns, i.e., left-to-right) and hand movement bins (rows, i.e., top-to-bottom) for the horizontal (panels on the left) and vertical (panels on the right) coordinates. The p values were corrected by 100 comparisons; all p values above 0.1 are encoded in blue. Small blue boxes indicate selected bins, which show significant anticipations of hand locations by the eyes (see text for corresponding values).
Figure 10
 
Correlation values and corresponding p values between eye movement bins (columns, i.e., left-to-right) and hand movement bins (rows, i.e., top-to-bottom) for the horizontal (panels on the left) and vertical (panels on the right) coordinates. The p values were corrected by 100 comparisons; all p values above 0.1 are encoded in blue. Small blue boxes indicate selected bins, which show significant anticipations of hand locations by the eyes (see text for corresponding values).
For the horizontal dimension, the first highly significant correlation is reached in the third eye bin, anticipating the sixth and seventh hand bin (both with values r = 0.28, n = 238, p < 0.001 and regression slope β = 0.21 and β = 0.27, respectively). As a bin covers a time interval of 133 ms on average, the anticipation of three to four bins corresponds to a time interval of at least 400 ms. All later eye bins correlated even higher with the sixth and later hand bins, and all yielded p < 0.001 significance values. Note that it may be expected that the eyes actually move on to the subsequent task, possibly fixating the experimenter to hand over the object or elsewhere to prepare drinking. However, these fixations were not analyzed, and, as stated above, eye bins were filled with eye fixation data from earlier bins when no further eye fixations were mapped on the object. In this way the correlations of the last eye bins with respect to the last hand bins do not degrade. 
Due to the larger spectrum of hand positions along the vertical axis (the hand is initially positioned 30 cm lower than the object base), eye fixations significantly correlate with the later positions of the index finger from the very first fixations on. The highest correlation was found between the seventh eye bin and the 10th hand bin (r = 0.66, n = 238, p < 0.001; regression slope β = 0.008), also indicating an anticipation of three bins in this case. To further assess the relationship between eye and hand with respect to anticipating the upcoming task, we conducted the same repeated-measures ANOVA analysis as for the grasp height also with respect to fixation height in the seventh eye height bin. The three-way ANOVA showed a main effect of object: Fixation were higher (lower values in pixel) for the bottle, M = 535, than for the can, M = 580, and the cup, M = 618, F(1.52, 28.85) = 26.14, p < 0.001. As to orientation, the upright objects were fixated higher, M = 535, than the upside down object, M = 620, F(1, 19) = 14.44, p = 0.001, as for the grasp case. Interactions were found for task and orientation, F(1, 19) = 20.85, p < 0.001, object and orientation, F(2, 38) = 23.62, p < 0.001, and all three factors, F(2, 38) = 5.89, p = 0.006. Following up with two-way ANOVAs for every object, the bottle presented no main effect of task or orientation but an interaction of the two factors, F(1, 19) = 26.90, p < 0.001. T tests show that the bottle was gazed higher in the drink upright condition than in the hand upright, t(19) = −4.43, p < 0.001, and lower in the drink upside down condition than in the hand upside condition, t(19) = 2.66, p = 0.032. The cup, perhaps due to its small size, presented only an orientation main effect, F(1, 19) = 18.10, p < 0.001, being fixated higher in the upright than in the inverted condition. Finally the can presented a main effect of orientation, F(1, 19) = 20.94, p < 0.001, and an interaction effect, F(1, 19) = 20.22, p < 0.001. The can was fixated higher in the upright than in the inverted condition. T tests show also that the can was fixated higher in the drink upright than in the hand upright condition, t(19) = −4.03, p = 0.002. All these effects are coherent with the effects obtained for the grasp height data. 
Discussion
The results confirm our hypotheses and offer other critical insights. Deeper task considerations, including the End State Comfort (ESC) effect, are clearly visible in the eyes, as evident in the heat maps, the AOI analyses, and the correlation analyses between eye and hand data. That the eyes would target the contact points during a grasp is per se just a confirmation of other studies in the literature (Belardinelli et al., 2015; Brouwer et al., 2009; Cavina-Pratesi & Hesse, 2013; Johansson et al., 2001). Yet in these studies, different grasps were probed, either with one contact point hidden behind the object (Johansson et al., 2001), with a frontal precision grasp (Brouwer et al., 2009; Cavina-Pratesi & Hesse, 2013), or a pantomimed grasp (Belardinelli et al., 2015). Importantly, in all these studies the object orientation was not systematically manipulated, and different grasps were not allowed. Only in Brouwer et al. (2009), in the second experiment, one of the geometric stimuli used (the triangle) was presented in different orientations (it could point either left or right), but subjects were instructed to use always the same grasp. Interestingly, in this case subjects looked more at the pointing vertex, independently of the finger used, confirming a higher need for control at that location. In our study we manipulated specifically the object orientation to induce a different grasp, that is, the one more convenient for accomplishing the actual object interaction task. 
The preference for the index finger, and its position anticipating the ESC effect, was qualitatively demonstrated by the fixation distribution in the different conditions and by the evolution of the fixation coordinates until the actual grasp: Fixations lingered longer in the upper right part of the objects when they were presented in the upright orientation, whereas they moved toward the bottom left part when the same objects were presented upside down. Quantitatively, this was demonstrated by the strongly significant interaction of AOI and orientation. For all objects the left part of the object was fixated longer than the right part in the upside down condition. 
Considering the vertical dimension, again AOI and orientation strongly interacted in all objects, where the top part was fixated longer in the upright condition. Moreover, in the upright condition the bottom part of the bottle was fixated longer in the hand task than in the drink task and vice versa when the bottle was presented upside down. A tendency towards this bias was also observable for the two other objects, reaching significance in the object-specific, three-way interactions as well as for the can in the eye height analysis of the seventh bin. The same effects were also observed when analyzing the grasp position data. Thus, a social consideration played a role in the height of the grasp and of the eye fixations. The hand task implied that the experimenter would take the object from the hand of the subject to drink from it; hence, subjects likely fixated and grasped the bottle lower than in the drink task in order to leave enough space for the experimenter's hand. This is probably true to some extent also for the third object but it is particularly evident in the case of the bottle since it is the tallest object in the set. The relation between eye and hand data in this case once more confirms how precisely the eyes predict motor intentions and that the two motor plans are generated by a common cognitive agenda. 
This social effect has been already demonstrated in different settings, albeit without recording eye data. Gonzalez, Studenka, Glazebrook, and Lyons (2011) have shown that the end state comfort can be extended to joint action scenarios and adapted to the social context. When requested to hand an object (presented in a comfortable or uncomfortable position) to a confederate, subjects considered both the own end-state comfort and the beginning state comfort of the confederate, e.g., by handing a hammer with the handle toward the other person. This was not just an automatic behavior, determined by social conventions, since this trade-off between personal ESC and other's beginning state comfort was produced just when the other's following task was to use the object (as opposed to place it down). Also Ray and Welsh (2011) showed that subjects tend to co-represent the comfort state of a partner in a joint action task, in this case manipulating the side of the jug handle in addition to the object orientation. In our study, we thus replicate these social considerations while executing social interaction tasks but also extend these insights to the fact that the eyes anticipate these social considerations significantly before the actual hand movement or grasp position reflect these considerations. 
Regarding time measures, the cup produced longer movement times. This might be due to the fact that it was the smallest object to grasp and the only one made of ceramic, so that the online sensorimotor control was more involved and the grasp was more carefully executed. 
The correlation analysis shows that on average the eye anticipate the hand grasping height about 3 bins earlier (399 ms). That the eyes lead the hand in object manipulation has been shown already in (Hayhoe et al., 2003) and also reported in (Land & Tatler, 2009), attesting an anticipation of about 500 ms, although a large variance can be observed depending on subjects and tasks. In our case we considered not single fixations or object manipulations but the whole visual exploration of a single object and the hand movement before the intended object manipulation. 
All these results demonstrate that different orders of planning are at play when interacting with objects, alone or in a social context. Rosenbaum et al. (2012) introduced this concept of orders of planning to denote any behavioral change in a task as a function of the next task or also of several upcoming tasks. Thus, when we adjust our hand aperture to the size of an object and rotate the hand to bring the object in the habitual orientation for the following task, first and second order planning are at play, respectively. Most studies so far have focused on first and second order planning, with few exceptions considering different subsequent locations to be reached with an object in different configurations (Haggard, 1998; Rosenbaum et al., 1990). Our study, to the best of our knowledge, is the first to show that three orders of planning—including the plan to grasp, the plan to reach the ESC, and the plan to hand over or drink from the object—are reflected in eye fixation behavior and that all these three orders are visible significantly before the final manual manipulation takes place. 
Conclusions
With the presented design we could show that the ESC—according to which the hand's initial grasp posture is influenced by considerations about the final action goal and hand state—is reflected in eye fixation distributions prior to the execution of the actual manual interaction: the index finger location side and height was anticipated by the eyes. Moreover, our results show that eye fixations and the subsequent grasp are not only influenced by the orientation of the object but also by the final goal of the manipulation, that is, either drinking from the object or handing it over to another person. This shows that even joint action considerations are accounted for at the oculomotor level before the actual grasp is executed. 
The strongly correlated behaviors of eye gaze with subsequent index finger positions imply very fast interactions between oculomotor and manual planning areas in the brain. Land (2009) has suggested that a task-oriented “schema control'' system instructs visual, gaze, and motor control systems, what to look for, where to look, and what to do, respectively. Whereas the visual system was assumed to interact bidirectionally with the gaze system, the motor system was believed to mainly receive information from the gaze and visual system, but to send information only to the visual system and only during action monitoring. In contrast to this model, our results suggest that even more intricate, bidirectional interactions may take place between the gaze- and the motor system, given the current schema control system state. ESC-oriented gaze anticipations about 400 ms before the hand suggest that the motor system may have actually sent the ESC-oriented request to the gaze system. The schema control system simply does not have the knowledge to issue such a request. 
Thus, we propose that the gaze and motor system interact bidirectionally, where the gaze system informs the motor system about particular motor-relevant locations, while the motor system sends additional location-specific information requests, such as, “determine the exact edge location on the left-side of the object to ensure grasp success.” In our experiment, it appears that ESC considerations and even further motor considerations anticipating the subsequent task (drink or hand over) have led the motor system to issue such requests to the gaze system in order to gain task- and situation-specific location information. In general, the data suggests that such requests may be codetermined by the current task, by the object's position and orientation, and by motor considerations to ensure a successful execution of the complete object manipulation action. 
The presented results furthermore imply that it may be possible to infer the intended kind of object manipulation—and not only which object is intended to be manipulated—solely by tracking a person's eyes. Future work should focus on verifying or falsifying the put-forward hypotheses and should consider implementing anticipatory, supportive systems that may be controlled by natural eye gaze to a certain extent. 
Acknowledgments
During this study A. B. was supported by the Institutional Strategy of the University of Tübingen (Deutsche Forschungsgemeinschaft, ZUK 63). We also acknowledge additional support by the Deutsche Forschungsgemeinschaft via the Open Access Publishing Fund of the University of Tübingen. 
Commercial relationships: none. 
Corresponding author: Martin V. Butz. 
References
Armbrüster, C., Spijkers W. (2006). Movement planning in prehension: Do intended actions influence the initial reach and grasp movement? Motor Control, 10, 311–329.
Baldauf D., Deubel H. (2010). Attentional landscapes in reaching and grasping. Vision Research, 50, 999–1013.
Ballard D. H. (1991). Animate vision. Artificial Intelligence, 48, 57–86.
Ballard D. H., Hayhoe M. M., Li F., Whitehead S. D., Frisby J. P., Taylor J. G., Fisher R. B. (1992). Hand-eye coordination during sequential tasks. Philosophical Transactions of the Royal Society of London B: Biological Sciences, 337, 331–339.
Belardinelli A., Herbort O., Butz M. V. (2015). Goal-oriented gaze strategies afforded by object interaction. Vision Research, 106, 47–57.
Brainard D. H. (1997). The psychophysics toolbox. Spatial Vision, 10, 433–436.
Brouwer A.-M., Franz V. H., Gegenfurtner K. R. (2009). Differences in fixations between grasping and viewing objects. Journal of Vision, 9 (1): 18, 1–24, doi:10.1167/9.1.18.[PubMed] [Article]
Cavina-Pratesi C., Hesse C. (2013). Why do the eyes prefer the index finger? Simultaneous recording of eye and hand movements during precision grasping. Journal of Vision, 13 (5): 15, 1–15, doi:10.1167/13.5.15.[PubMed] [Article]
Clark. (1999). An embodied cognitive science? Trends in Cognitive Sciences, 3, 345–351.
Cohen R. G., Rosenbaum D. A. (2004). Where grasps are made reveals how grasps are planned: Generation and recall of motor plans. Experimental Brain Research, 157, 486–495.
Engel A. K., Maye A., Kurthen M., König P. (2013). Where's the action? the pragmatic turn in cognitive science. Trends in Cognitive Sciences, 17, 202–209.
Flanagan J. R., Bowman M. C., Johansson R. S. (2006). Control strategies in object manipulation tasks. Current Opinion in Neurobiology, 16, 650–659.
Foerster R. M., Carbone E., Koesling H., Schneider W. X. (2011). Saccadic eye movements in a high-speed bimanual stacking task: Changes of attentional control during learning and automatization. Journal of Vision, 11 (7): 9, 1–16, doi:10.1167/11.7.9.[PubMed] [Article]
Gibson J. J. (1986). The ecological approach to visual perception. Mahwah, NJ: Lawrence Erlbaum Associates.
Gonzalez D. A., Studenka B. E., Glazebrook C. M., Lyons J. L. (2011). Extending end-state comfort effect: Do we consider the beginning state comfort of another? Acta Psychologica, 136, 347–353.
Haggard P. (1998). Planning of action sequences. Acta Psychologica, 99, 201–215.
Hayhoe M., Ballard D. (2014). Modeling task control of eye movements. Current Biology, 24 (13), R622–R628.
Hayhoe M. M., Shrivastava A., Mruczek R., Pelz J. B. (2003). Visual memory and motor planning in a natural task. Journal of Vision, 3 (1): 6, 49–63, doi:10.1167/3.1.6.[PubMed] [Article]
Herbort O., Butz M. V. (2011). Habitual and goal-directed factors in (everyday) object handling. Experimental Brain Research, 213, 371–382.
Herbort O., Butz M. V. (2012). The continuous end-state comfort effect: Weighted integration of multiple biases. Psychological Research, 76, 345–363.
Herbort O., Butz M. V., Kunde W. (2014). The contribution of cognitive, kinematic, and dynamic factors to anticipatory grasp selection. Experimental Brain Research, 232, 1677–1688.
Johansson R. S., Westling G., Bäckström A., Flanagan J. R. (2001). Eye-hand coordination in object manipulation. The Journal of Neuroscience, 21, 6917–6932.
Land M. F. (2009). Vision, eye movements, and natural behavior. Visual Neuroscience, 26, 51–62.
Land M. F., Tatler B. W. (2009). Looking and acting: Vision and eye movements in natural behaviour. Oxford, UK: Oxford University Press.
Ray M., Welsh T. N. (2011). Response selection during a joint action task. Journal of Motor Behavior, 43, 329–332.
Rosenbaum D. A., Chapman K. M., Weigelt M., Weiss D. J., van der Wel R. (2012). Cognition, action, and object manipulation. Psychological Bulletin, 138, 924–946.
Rosenbaum D. A., Marchak F., Barnes H. J., Vaughan J., Slotta J. D., Jorgensen M. J. (1990). Constraints for action selection: Overhand versus underhand grips. In Jeannerod M. (Ed.) Attention and performance xiii: Motor representation and control (pp. 321–342). Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.
Salvucci, D. D., Goldberg J. H. (2000). Identifying fixations and saccades in eye-tracking protocols. In Duchowski A. T. (Ed.) Proceedings of the 2000 symposium on eye tracking research & applications ( pp. 71–78). New York: ACM.
Sartori, L., Straulino E., Castiello U. (2011). How objects are grasped: The interplay between affordances and end-goals. PloS one, 6 (9), e25203.
Short M. W., Cauraugh J. H. (1999). Precision hypothesis and the end-state comfort effect. Acta Psychologica, 100, 243–252.
Yarbus A. L. (1967). Eye and vision (1st ed.). New York: Plenum Press.
Footnotes
1  Please note that this is the mean value across AOIs, object, and orientation, but each object consists of two AOIs and in every condition both are looked at to some extent.
Appendix
Figure 11
 
Correlation plots for the eye and hand x (left) and y (right) coordinates, on a trial basis. For the x coordinate the index finger position is constrained by the object edges (hence the two clusters).
Figure 11
 
Correlation plots for the eye and hand x (left) and y (right) coordinates, on a trial basis. For the x coordinate the index finger position is constrained by the object edges (hence the two clusters).
Figure 1
 
Stimulus material and experimental setup.
Figure 1
 
Stimulus material and experimental setup.
Figure 2
 
Time lines for eye fixations and hand trajectories in a trial. The two data streams are aligned with respect to the tone onset event. On the eye tracking time line, start and end times are available for each mapped fixation. On the hand tracking time line, hand motion onset and grasp onset times are available as well as the locations of the index finger over the complete trial. To compare both trajectories across trials, eye and hand data were normalized and binned considering the time interval starting with the onset of the first mapped fixation and ending with grasp onset. Note that the hand motion onset can occur before or after the start of the first fixation (253 ms after the first fixation on average) and that the grasp onset can also occur before or after the end of the last mapped fixation (123 ms after the end of the last fixation on average).
Figure 2
 
Time lines for eye fixations and hand trajectories in a trial. The two data streams are aligned with respect to the tone onset event. On the eye tracking time line, start and end times are available for each mapped fixation. On the hand tracking time line, hand motion onset and grasp onset times are available as well as the locations of the index finger over the complete trial. To compare both trajectories across trials, eye and hand data were normalized and binned considering the time interval starting with the onset of the first mapped fixation and ending with grasp onset. Note that the hand motion onset can occur before or after the start of the first fixation (253 ms after the first fixation on average) and that the grasp onset can also occur before or after the end of the last mapped fixation (123 ms after the end of the last fixation on average).
Figure 3
 
(A) Example of the index finger trajectories of one subject grasping the bottle (one trial for each task/ orientation condition). These trajectories are limited to the interval between movement initiation and grasp onset. (B) Boxplot of movement times in each condition.
Figure 3
 
(A) Example of the index finger trajectories of one subject grasping the bottle (one trial for each task/ orientation condition). These trajectories are limited to the interval between movement initiation and grasp onset. (B) Boxplot of movement times in each condition.
Figure 4
 
AOI, task, and orientation interaction in total fixation time for every object for the AOIs left and right. Error bars denote the standard error of the mean. Asterisks indicate significant comparisons (p < 0.05) between upright and upside down presentations.
Figure 4
 
AOI, task, and orientation interaction in total fixation time for every object for the AOIs left and right. Error bars denote the standard error of the mean. Asterisks indicate significant comparisons (p < 0.05) between upright and upside down presentations.
Figure 5
 
AOI, task, and orientation interaction in total fixation time for every object for the AOIs top and bottom. Error bars denote the standard error of the mean. Asterisks indicate significant comparisons (p < 0.05) between upright and upside down presentations. Triangles indicate significant comparisons between tasks.
Figure 5
 
AOI, task, and orientation interaction in total fixation time for every object for the AOIs top and bottom. Error bars denote the standard error of the mean. Asterisks indicate significant comparisons (p < 0.05) between upright and upside down presentations. Triangles indicate significant comparisons between tasks.
Figure 6
 
Object × orientation × task-respective heat maps encoding absolute gaze durations (in ms) averaged across subjects. Upper rows show the upright; lower rows, the upside down condition. The color code goes from blue (1 ms duration) to red (maximum gaze duration on each map).
Figure 6
 
Object × orientation × task-respective heat maps encoding absolute gaze durations (in ms) averaged across subjects. Upper rows show the upright; lower rows, the upside down condition. The color code goes from blue (1 ms duration) to red (maximum gaze duration on each map).
Figure 7
 
Task and orientation interaction with respect to the index finger grasping height (in inches) for every object. Error bars denote standard errors. Y axis is shown reversed because the motion tracker Z axis was oriented downwards. Asterisks indicate significant comparisons (p < 0.05) between upright and upside down presentations. Triangles indicate significant comparisons between tasks.
Figure 7
 
Task and orientation interaction with respect to the index finger grasping height (in inches) for every object. Error bars denote standard errors. Y axis is shown reversed because the motion tracker Z axis was oriented downwards. Asterisks indicate significant comparisons (p < 0.05) between upright and upside down presentations. Triangles indicate significant comparisons between tasks.
Figure 8
 
Gaze coordinates binned along the time axis starting from the beginning of the first fixation (t = 0) to the grasp onset (t = 1); left panels: drink task; right panels: hand task. Shaded areas represent the standard error. Dashed lines represent the objects' out-most edges.
Figure 8
 
Gaze coordinates binned along the time axis starting from the beginning of the first fixation (t = 0) to the grasp onset (t = 1); left panels: drink task; right panels: hand task. Shaded areas represent the standard error. Dashed lines represent the objects' out-most edges.
Figure 9
 
Index finger coordinates binned along the time axis starting from the beginning of the first fixation (t = 0) to grasp onset (t = 1); left panels: drink task; right panels: hand task. Shaded areas represent the standard error. Dashed lines represent the objects' out-most edges.
Figure 9
 
Index finger coordinates binned along the time axis starting from the beginning of the first fixation (t = 0) to grasp onset (t = 1); left panels: drink task; right panels: hand task. Shaded areas represent the standard error. Dashed lines represent the objects' out-most edges.
Figure 10
 
Correlation values and corresponding p values between eye movement bins (columns, i.e., left-to-right) and hand movement bins (rows, i.e., top-to-bottom) for the horizontal (panels on the left) and vertical (panels on the right) coordinates. The p values were corrected by 100 comparisons; all p values above 0.1 are encoded in blue. Small blue boxes indicate selected bins, which show significant anticipations of hand locations by the eyes (see text for corresponding values).
Figure 10
 
Correlation values and corresponding p values between eye movement bins (columns, i.e., left-to-right) and hand movement bins (rows, i.e., top-to-bottom) for the horizontal (panels on the left) and vertical (panels on the right) coordinates. The p values were corrected by 100 comparisons; all p values above 0.1 are encoded in blue. Small blue boxes indicate selected bins, which show significant anticipations of hand locations by the eyes (see text for corresponding values).
Figure 11
 
Correlation plots for the eye and hand x (left) and y (right) coordinates, on a trial basis. For the x coordinate the index finger position is constrained by the object edges (hence the two clusters).
Figure 11
 
Correlation plots for the eye and hand x (left) and y (right) coordinates, on a trial basis. For the x coordinate the index finger position is constrained by the object edges (hence the two clusters).
Table 1
 
Main effects found in the four-way ANOVA with factors AOI (LR = left/right, TB = top/bottom), task, object, and orientation. Further, the interaction AOI × orientation is displayed.
Table 1
 
Main effects found in the four-way ANOVA with factors AOI (LR = left/right, TB = top/bottom), task, object, and orientation. Further, the interaction AOI × orientation is displayed.
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×