The present study investigated whether direct gaze was detected faster than averted gaze in displays containing four full characters. We used eye tracking to extend the results of previous studies on the stare-in-the-crowd effect by examining eye movements in addition to response times. In a target localization task, participants' eye movements were tracked as they searched for a given target whose gaze was different than that of the other three distractors. Our results indicated an asymmetrical bias to respond to direct gaze faster than to averted gaze, as reported previously (e.g., Conty et al.,
2006; Doi & Ueda,
2007; Doi et al.,
2009; Senju et al.,
2005; von Grünau & Anston,
1995). However, this effect was strongly modulated by target position such that direct gaze was responded to faster than averted gaze only in the far peripheral visual fields. Eye movement analyses revealed that the faster responses to DG stemmed from a faster visual detection of DG compared to AG targets at left but not right peripheral positions. Furthermore, the timing of this visual detection, reflected in latencies of first fixation on target, mimicked the behavioral responses and strongly correlated with RTs. Finally, detection of direct gaze was influenced by distractors' gaze direction. We discuss these effects and their implications in more details below, starting with general effects of hemifield and position.
Targets in the LVF were responded to marginally faster than targets in the RVF, replicating the results of Conty et al. (
2006) and Doi et al. (
2009). The fact that 41% of the very first fixation landed on Position 2 situated on the left side indicated a trend for leftward bias in initial spontaneous explorations of visual scenes (Ebersbach et al.,
1996; Hättig,
1992). This initial bias may have helped the visual search as targets in the LVF were visually detected faster and with fewer fixations than in the RVF. The LVF was also associated with higher error rates than the RVF, as discussed in more detail below. However, for correct response trials, which were those kept for eye movement analyses, less visual exploration was seen in the LVF as indicated by less fixations and shorter viewing times than in the RVF. These results demonstrate a strong visual field asymmetry in this gaze direction search. An LVF advantage for gaze perception has been reported previously (Ricciardelli, Ro, & Driver,
2002) and seems related to the larger involvement of the right hemisphere at the neural level (Calder et al.,
2007). This right hemisphere dominance is also seen for various face perceptual processes (Burt & Perrett,
1997) and seems to be a general characteristic of social information processing (Brancucci, Lucci, Mazzatenta, & Tommasi,
2009).
The response pattern was also strongly modulated by target position, and both eye movements and response times were much faster for central (Positions 2 and 3) than peripheral (Positions 1 and 4) target positions. The characters were presented side by side, like one might see in real-life situations, for example, coming out of an elevator. This specific positioning seems to have influenced participants' general scanning and response pattern to targets. As reflected in the distribution of very first fixations, participants most often looked upward from the centered fixation cross, bringing their gaze on the central positions (Face 2 or 3 or in between) before looking elsewhere. In addition to being seen first, the least exploration time was needed to detect targets in these central positions as seen in the number of fixations and viewing time, explaining the faster responses for targets at these locations. Thus, the pattern of visual search was serial, demonstrating a lack of pop-out effect as previously suggested (e.g., Conty et al.,
2006; von Grünau & Anston,
1995). These results, along with the strong correlation patterns between response times and latency of first fixation on target, are in line with recent evidence indicating that gaze direction processing requires focused attention (Burton, Bindemann, Langton, Schweinberger, & Jenkins,
2009). That is, target gaze was only discriminated when participants actually attended to and fixated on the target face.
The most important finding of the present study concerns the interaction between target gaze and position such that direct gaze yielded faster responses than averted gaze for peripheral targets only. This contrasts majorly with previous stare-in-the-crowd research that did not report target position effects or interactions of target position and gaze direction. Stimuli in these studies consisted of arrays of pictures scattered randomly across the screen, preventing the analysis of specific target locations. In contrast, the characters' positioning in the present study allowed us to investigate more precisely where in the visual field direct gaze might be more efficiently detected. Contrary to our hypothesis and to previous research, the results suggest that the stare-in-the-crowd effect is not a general phenomenon but is heavily dependent on target position and likely on initial gaze position (which was not controlled in previous studies).
Eye movement analyses confirmed a gaze difference at peripheral positions and a lack thereof at central positions. However, the gaze differences at peripheral positions differed between visual fields. In the LVF (Position 1), the response asymmetry stemmed from a truly faster visual detection of DG than AG targets. This was seen in the time taken to first view the target, which was shorter for DG than AG targets, as well as in the number of fixations made before seeing the target for the first time, which tended to be smaller for the DG than the AG condition. Overall, fewer fixations were made before the actual response to DG than AG targets at this left peripheral position. In contrast, in the RVF (Position 4), participants made surprisingly more fixations before responding in the DG than in the AG condition, which resulted in longer viewing times to detect DG than AG targets. The number of fixations made to reach the target for the first time was also larger in the DG than in the AG condition although the time before first fixation on target did not differ between gaze conditions. Yet, participants were faster to respond to DG targets at Position 4. Thus, we must assume that another mechanism, likely cognitive, must have overcome the longer viewing time for DG in order to respond to DG targets faster than to AG targets.
It could be argued that at central positions, the lack of stare-in-the-crowd effect was related to a ceiling effect such that the task was so easy it would wash out the experimental manipulation. In contrast, at peripheral sites, gaze targets would need further display exploration and thus more attentional resources to be detected. The increase in task difficulty at these positions would allow the emergence of the effect. The faster RTs and less exploration for central than peripheral positions support this interpretation. However, the generally long RTs recorded suggest that the task was not easy, even at central positions (1231 ms on average for RTs at Position 2, the fastest of all). The pattern of error rates does not support that interpretation either given that error rates were smaller at the right peripheral field compared to central positions for the DG condition (and unchanged across positions for the AG condition). Most importantly, target faces in central positions were not within the foveal region of the starting point for each trial. As mentioned above, participants had to look up from the centered fixation first, bringing their gaze to the top of the monitor on Face 2 or 3 (or in between the two) before starting their search. Thus, gaze information was not directly in front of participants who still needed to move their eyes to get to central positions. For all these reasons, the lack of the stare-in-the-crowd effect at central positions is unlikely related to the task being easier at these positions compared to peripheral positions.
One possible explanation for the faster detection of direct gaze at peripheral positions may be found in the framework of the putative subcortical face detection system that is thought to be fast and based on low spatial frequencies (LSFs; Johnson,
2005; Senju & Johnson,
2009). According to this theory, the LSF information is carried to the superior colliculus, pulvinar, and amygdala subcortical regions, offering a “quick and dirty” route of visual processing that is best suited to detecting stimuli in the periphery. For example, when faces are viewed in the periphery, LSF information of the face under naturalistic top-lighting conditions yields dark shadowed areas for the eye sockets, surrounded by the illuminated areas of the cheeks, nose, and forehead. This face detection subcortical route has also been proposed to be involved in eye contact detection (Senju & Johnson,
2009), possibly using contrast information between the circular dark iris and the white sclera. This “fast-track modulator” could be the Eye Direction Detector proposed by Baron-Cohen (
1994) or the mutual attention detector proposed by Perrett and Emery (
1994). Senju and Johnson (
2009) proposed that the subcortical route projects onto cortical areas of the social network so as to modulate cortical activation as a function of task demands. Thus, direct gaze targets in the left peripheral visual field (Position 1) might have been detected faster than averted gaze targets as a result of this subcortical face-processing route that would then modulate oculomotor orienting, explaining the earlier latency of first fixation on target in the DG condition. Faces in central positions were also not within the foveal region, but eye movements were made upward from the central fixation cross (i.e., vertically). In contrast, eye movements to the periphery were made mostly laterally from those central positions. This would suggest that the subcortical route works for lateral periphery but not vertical periphery, and direct gaze can be discriminated at about 7.4° of eccentricity (Faces 1 and 4 were at 7.4° from Faces 2 and 3, respectively). The results also suggest a hemifield asymmetry for this subcortical route that does not seem to play the same role for targets in the RVF. The possible involvement of this indirect visual route remains speculative and does not completely fit with the idea that processing gaze requires focused attention (Burton et al.,
2009). Thus, although the present data demonstrate a lack of pop-out effect and a serial search strategy to detect gaze, the mechanism behind the faster eye movements for direct than averted gaze in the LVF remains unclear and will have to be addressed by future studies.
Following previous stare-in-the-crowd studies, we also predicted that participants would be more accurate in the detection of direct than averted gaze. This was the case for the peripheral RVF (Position 4) but interestingly not for any other positions. In fact, participants made more errors overall for the DG condition relative to the AG condition, especially in the peripheral LVF (Position 1). This finding goes against previous research that reported better responses for direct gaze, although again, without exploring possible modulations by target location. One explanation for these results may be related to participants' handedness. The great majority of participants (22 out of 24) were right-handed, and all participants used their right index and middle fingers to respond, respectively, to Positions 3 and 4, whereas they used their left middle and index fingers to respond, respectively, to Positions 1 and 2. Thus, participants' dominant hand may have facilitated responses for targets situated on the right while impairing responses for targets on the left. However, this possibility is ruled out by the fact that no position effects were found for responses to AG targets; the position effects of better accuracy at Position 4 with a concurrent lower accuracy at Position 1 were found only for DG targets and thus cannot be due to handedness. Another possibility for these results may be a speed–accuracy trade-off, especially in the peripheral LVF. Participants looked at, and responded to, targets marginally faster in the LVF relative to the RVF but also made more errors in the LVF, especially at Position 1. The correlation of RTs with error rates demonstrated a lack of speed–accuracy trade-off. In contrast, significant positive correlations for RTs and for latencies before first fixation on target with error rates were found at Position 1, while no significant correlations were found at the other positions. In other words, the longer the response times and latencies of first fixation on target, the larger the error rates, at Position 1 only. This finding suggests that, rather than a speed–accuracy trade-off, errors could reflect hesitation at this position. This is all the more possible as 85% of the errors at Position 1 were congruency errors, i.e., elicited by the gaze direction of distractors. When only the DG Congruent condition was used, a significant positive correlation was found for RTs at Position 1 but not at the other positions, supporting this interpretation. Participants may thus have been confused by the direction of gaze of distractors. In contrast, in the peripheral RVF (Position 4), DG targets yielded the lowest error rates overall, indicating a truly more accurate response to direct gaze. RTs at this position were also the longest, due to participants scanning other target positions first, as also supported by the linear decrease of error rates from Positions 1 to 4. The longer yet more accurate responses at Position 4 most likely resulted from the serial search mentioned previously: if the target was not detected in the first two, and often three, locations, then it had to be in the fourth one (participants knew each trial contained a target). This search process seemed to have facilitated better accuracy at Position 4.
The analysis of congruency effects of averted gaze distractors on direct gaze targets revealed faster RTs for congruent than incongruent targets. That is, targets were responded to faster when their position was congruent with the direction of gaze of the distractors, suggesting an orientation of attention by distractors' gaze. This interpretation is strongly supported by the literature on gaze orienting, which suggests that we orient our attention in the direction signaled by gaze in an automatic manner (Driver et al.,
1999; Friesen & Kingstone,
1998; Frischen et al.,
2007; Langton & Bruce,
1999). However, this congruency effect was
not the driving force behind the overall gaze difference found between direct and averted gaze targets at peripheral positions as even incongruent DG targets yielded faster RTs than AG targets at these positions. Earlier latencies of first fixation on target were also found for congruent than incongruent direct gaze targets, a result in agreement with recent studies showing that eye movements follow the direction signaled by gaze (Castelhano et al.,
2007; Itier et al.,
2007; Zwickel & Võ,
2010). However, this result was mostly driven by the peripheral LVF (Face 1) and central RVF (Face 3). Interestingly, congruent targets also yielded more errors than incongruent targets, specifically in Position 1. As noted above, this finding could be due to hesitation errors, whereby participants may have been confused, rather than helped, by the direction of gaze of distractors.
It is important to note some methodological differences between the current and previous studies. Unlike previous research (e.g., Senju et al.,
2005; von Grünau & Anston,
1995), no feedback was given after each trial. Telling participants whether they were right or wrong may have, in those studies, influenced both their response speed and accuracy by influencing their search strategy. Moreover, only four agents were used. It would be interesting to see whether the present results hold for displays containing more characters. For instance, Conty et al. (
2006) found larger differences between gaze conditions for RTs and error rates when using display sizes of 8 and 12 stimuli rather than 4. That is, direct gaze was detected more efficiently than averted gaze when many distractors were present, which could be attributed to the greater influence of distractor gaze congruency with distractor number. We also used a different type of search task than previous studies examining the stare-in-the-crowd effect. In previous work, participants detected whether the target was present or absent, not its location as done here. Future studies should examine the effects of feedback, task demands, and distractor number on the stare-in-the-crowd effect and their eye movement correlates.
In summary, the present study showed that the faster response to direct than averted gaze was found in a localization task, in both visual fields, despite the use of full characters and their bodies, and was not due to (although influenced by) distractors' averted gaze. However, our RTs, error rates, and eye movement results demonstrate that this stare-in-the-crowd effect is dependent on target position and not systematically found. Other studies have also reported instances in which this effect is absent. For example, Conty et al. (
2006) found a stare-in-the-crowd effect under deviated head orientations but not in frontal head orientations. Thus, the faster detection of direct over averted gaze is not a systematic phenomenon and is modulated by at least two factors: spatial position and social gaze context. Future studies will have to determine whether other factors can modulate the state-in-the-crowd effect and the perception of gaze in more realistic social contexts.