In a real-world social interaction, participants utilized gaze cues from another individual, but only when the gaze cues provided information that was absent from the spoken instructions. When the information contained in the spoken language was sufficient to uniquely identify a target block, participants rarely sought out or followed gaze cues from the experimenter. However, when language cues were ambiguous, the presence or absence of gaze cues from the instructor did impact behavior in a number of ways: Participants frequently sought out gaze cues when available and they made use of these cues to orient to the target block (gaze following) and to pick up the correct block. Thus our findings add support to the growing body of evidence that gaze-cue utilization is context dependent (Itier et al.,
2007; Nummenmaa et al.,
2009), and extends this evidence into the domain of a real-world interaction.
The mean percentage of looks to the instructor was significantly higher when the instructions were ambiguous and gaze cues were provided than in any other condition, indicating that these participants were seeking gaze cues more often than all other participants. The difference between this condition and the two unambiguous instruction conditions can be easily accounted for by the irrelevance of gaze cues to the task: The unambiguous language clearly described the location of the target block, so there was no need for any extra nonverbal information. The significant difference across gaze conditions for those given ambiguous instructions is particularly interesting. Those in the condition in which instructions were ambiguous but no gaze cues were provided very rarely looked to the instructor, despite not receiving sufficient verbal information to correctly perform the task. This finding is all the more surprising, considering that the looks to the instructor in the condition in which ambiguous instructions were supported by gaze cues occurred before the gaze cue was given, a point in time at which the gaze and no gaze conditions did not differ. These findings can be explained by the participants very quickly learning the nonverbal informativeness of the instructor: When gaze cues are present participants learned to pre-emptively look to the instructor, however, when they were not present, participants learnt that they cannot receive useful nonverbal information from the instructor and therefore do not look towards him. While these differences must reflect sensitivity on the part of the participant to whether or not gaze cues are provided, we did not see any change in overt gaze-seeking behavior over the course of the experiment, suggesting that participants quickly learned whether gaze cues were provided or were able to detect the presence or absence of gaze cues without overtly orienting to the instructor.
The percentage of trials in which the target was the first block to be fixated after the onset of the first descriptor word was our indicator of gaze following. The identity of the target block was ambiguous at the onset of the first descriptor word to all participants, except those utilizing gaze cues, because the gaze cues were given at this point. We would therefore expect that participants following gaze cues would be more likely to look at the target block before any other block than participants not following gaze cues. Our gaze-seeking indicator can only consider overt gaze seeking, however this indicator for gaze following takes the effects of both overt and peripheral gaze-following into account. This is crucial, as there is evidence that gaze cues can be utilized without being directly fixated (Knoeferle & Kreysa,
2012). Within each gaze condition, there was no difference in the percentage of trials in which the target block was the first block to be fixated after the onset of the first descriptor word for ambiguous and unambiguous instructions. This is likely to be because at the onset of the first descriptor word both types of instruction are equally ambiguous. However, we found a significantly higher percentage of trials in which the target was the first block fixated after the onset of the first descriptor word when ambiguous instructions were supported by gaze cues than when ambiguous instructions were not supported by gaze cues, suggesting that gaze cues were being followed when language was ambiguous. There was no significant difference between the gaze conditions for participants who were provided with unambiguous instructions. We argue that this is due to the gaze cues not being utilized even though they were present, when the necessary information to complete the task was provided by language. Thus, as was the case for our indicator of overt gaze-seeking behavior, gaze following appears to only occur when the language of the instructions was imprecise and did not uniquely identify a target block.
In contrast to our gaze-seeking measure, we found that gaze following appeared to change over the course of the experiment in the condition when gaze cues were present and language was ambiguous. There were more trials in which the target was the first block to be fixated after the onset of the first descriptor word in the final 20 trials than the first 20 trials. This suggests that participants are following gaze cues more when they have encountered more evidence of their communicative value. It is interesting to note that the influence of gaze cues on fixating the target block seemed to be manifest in terms of the accuracy of selecting the target block after the onset of the first descriptor word, but not in terms of fixating this block sooner. There was no difference in how quickly the target block was fixated after the onset of the first descriptor word in the presence or absence of gaze cues. As such, gaze utilization in this experiment appears restricted to selection accuracy rather than selection speed. However, it should be noted that given the small cohort of participants in this experiment, a significant difference in potentially very small time differences would be hard to detect. Taken together, our measures of gaze seeking and gaze following suggest that participants seek and utilize gaze cues when they provide information not present in language, but quickly learn not to seek gaze cues when they are not provided, and do not utilize gaze cues when the spoken instructions contain all of the task-relevant information.
The percentage of trials with correct pick-ups was unsurprisingly very high for both conditions with unambiguous instructions. This was due to the ease of the task when the instructions uniquely identified the target block. Since performance reached ceiling in both conditions, we cannot infer anything from the nonsignificant difference across the gaze conditions. However, we can infer a positive effect of the presence of gaze cues on task performance for participants given ambiguous instructions, as there was a significantly higher percentage of correct pick-ups when ambiguous instructions were supported by gaze cues than when they were not. Although the number of correct pick-ups was higher in the presence of gaze cues, the speed of performance was not significantly affected by gaze cues, suggesting (as for our measure of gaze following) that gaze utilization effects were manifest in terms of accuracy rather than speed of task performance. Our task performance results provide strong evidence that the participants were using the gaze cues to help them in the task.
In all our conditions gaze-seeking behavior was surprisingly rare, with participants only looking at the instructor on around 11% of trials when the spoken instructions unambiguously identified the target block. Even when language was unspecific and gaze cues were therefore highly informative for the participant, they only sought gaze cues on around 59% of trials. This may seem surprisingly low given previous reports of the tendency of people to look at the eyes and faces of others (Birmingham et al.,
2009). However, previous research using both real-world and video stimuli has shown that the extent to which we respond to the gaze allocation of others varies with the ecological validity of the paradigm. Specifically, Gullberg (
2002) showed increased gaze following when in a real world interaction compared to when viewing a video-taped speaker and Freeth et al. (
2013) found that in a real world setting, a speaker's eye contact increased the fixation duration on the speaker's face, but this was not found when participants viewed video stimuli. An explanation for the results of the present study may be provided by work on the effect of the potential for social interaction on the way people look at others (Laidlaw et al.,
2011) and the extent to which we follow the gaze of others (Gallup et al.,
2012). The present study used a real-world situation in which there was potential for social interaction. This could have led to participants avoiding looking at the instructor, in a similar way to that found by Laidlaw et al. (
2011) and Gallup et al. (
2012). There are no social consequences of looking at a photograph of a person, so participants in static image experiments (Birmingham et al.,
2009) would not show this aversion behavior. Whether or not this accounts for these findings, it is clear that the mere presence of another person is not sufficient to stimulate gaze-seeking behavior, even when they are using gaze to indicate the location of behaviorally important objects.
Hanna and Brennan's (
2007) highly naturalistic study found that participants used the gaze cues of instructors to help them understand instructions. In the present study, by controlling the presence of gaze cues and the specificity of instructions, we have found that rather than being a ubiquitous response to a social interaction, the tendency to engage in gaze seeking and gaze following appears to depend upon the informativeness of gaze cues relative to other information. When language provides the necessary information to locate a block, gaze cues are not sought or followed. This suggests that there is an interaction between language and gaze cueing in communication, with gaze becoming more important when spoken language is less effective at communicating a message or idea.
The above conclusion may initially seem to contradict the conclusions of Knoeferle and Kreysa (
2012), however there is a clear theoretical distinction between these two conclusions. Knoeferle and Kreysa (
2012) found that using a less common (harder) syntactic structure led to a decrease in gaze utilization and concluded that this was due to the extra cognitive resources required to process the less common structure. The present study found that using more ambiguous (harder) instructions led to an increase in gaze utilization. The key difference between these two studies is that the utilization of gaze cues was never necessary to complete Knoeferle and Kreysa's task, whereas it was essential in the present study when ambiguous language was used. When the sentence was harder in Knoeferle and Kreysa's task, the participant would avoid unnecessary supportive information (gaze cues) and focus on the sentence; however, in the present study, the harder instructions require the use of gaze cues in order for the participant to successfully complete the task.
One interpretation of these data is that in the present study, when the language of the instructions was precise, gaze cues do not provide task-crucial information and thus are only task relevant when the instructions were ambiguous. It is clear from previous studies of fixation behavior in natural settings (e.g., Hayhoe, Shrivastava, Mruczek, & Pelz,
2003; Land, Mennie, & Rusted,
1999; see Tatler, Hayhoe, Land, & Ballard,
2011, for a discussion) that fixations are intimately linked to task relevant sources of information in our surroundings. Very few fixations in natural tasks are directed to task-irrelevant objects in the environment. Thus the present data demonstrate that restriction of visual selection to task relevant information extends to social signals conveyed in the eyes of another individual.
The notion that gaze cues are sought and utilized to aid task performance only when they are highly informative is consistent with studies that have considered the extent to which visual cues are used in other domains. When moving virtual objects on a surface, the extent to which hand movements are planned and executed on the basis of visual and remembered information depends upon the relative availability of visual information (Brouwer & Knill,
2007). When walking toward a goal, the relative reliance upon optic flow and egocentric direction depends upon the relative availability of optic flow information (Warren, Kay, Zosh, Duchon, & Sahuc,
2001). It would appear that the extent to which we utilize gaze cues in social interactions may be similarly flexible and dependent upon their informativeness relative to other available cues.
In a similar way to gaze utilization, natural language production can be affected by the relative availability of other information. Brown-Schmidt and Tanenhaus (
2008) found that when conversing during a collaborative task, two isolated participants would refer to task-relevant objects with ambiguous descriptive phrases, yet the listener would often have no difficulty understanding what object was being referred to. The authors suggest that the apparently ambiguous statements are produced because the conversational context restricts and aligns the referential domains of the speaker and listener such that for the subset of objects within the shared referential domain, the language is not ambiguous. One interpretation of our findings is that the speaker's gaze can provide similar cues for restricting and aligning referential domains, thus allowing the listener to select the correct referent in spite of the ambiguous language of the instruction. We can therefore speculate that in natural conversation, not only does language affect the utilization of gaze cues, but the informativeness of a gaze cue may reciprocally affect the specificity of language provided.
Our results provide new insights into how gaze cues are utilized in a social setting. In a reasonably ecologically valid social interaction in which a participant follows the instructions of another individual, both language and gaze cues are utilized by the participant to complete the task successfully. However, when language alone can provide all of the information required for successful task performance, gaze cues are not sought out or utilized by the participant. It is, therefore, clearly not the case that gaze seeking and (when possible) following are ubiquitous behaviors in the context of our paradigm. At least in this form of interaction, we appear only to utilize the gaze cues of another individual when they provide information not otherwise available to us.