Social interactions are part of our everyday life and thereby essential for communication, empathy, and clarifying intent or needs for mutual understanding. Initial studies have confirmed a tendency to focus on other human beings in a social context, which has sparked extensive research in the field of social attention. Early studies frequently used isolated stimuli to compare allocation of attention toward social as opposed to nonsocial features (e.g., Driver et al.,
1999; Friesen & Kingstone,
1998; Langton & Bruce,
1999; Pelphrey et al.,
2002; Ricciardelli, Bricolo, Aglioti, & Chelazzi,
2002; Ro, Russell, & Lavie,
2001). One way to investigate this bias is by measuring the free allocation of eye movements to deduce attention distribution to these areas, also referred to as overt attention. However, some laboratory findings using simplified (e.g., iconic images or geometric shapes) or isolated stimuli (e.g., showing a face without the remaining body or contextual information) have yielded different results than studies using more naturalistic setups (Kingstone,
2009). When a face is presented in isolation, there is a clear preference to fixate the eye region of these images, but when a face is presented with its associated body parts, the tendency to scan the eyes disappears (Kingstone, Smilek, Ristic, Friesen, & Eastwood,
2002). This altered behavior may be due to a lack of contextual information and situational complexity in the former as compared to the latter condition. Naturalistic scenes are rich in detail and provide meaningful contexts for objects influencing the selection process of eye movements, e.g., through scene gist (e.g., Oliva & Torralba,
2006) or a multitude of competing objects (e.g., Einhäuser, Spain, & Perona,
2008; but see Borji, Sihite, & Itti,
2013). This aspect is often diminished in impoverished visual displays (see also Anderson, Ort, Kruijne, Meeter, & Donk,
2015). As a consequence, researchers have started to utilize naturalistic scenes to extend previous results to a more realistically complex yet controlled setting. Recent studies have additionally introduced and investigated the presence of a human being within these scenes and its effects on gaze behavior (e.g., Birmingham, Bischof, & Kingstone,
2008a,
2008b; Cerf, Frady, & Koch,
2008; End & Gamer,
2017; Flechsenhar & Gamer,
2017; Fletcher-Watson, Findlay, Leekam, & Benson,
2008; Rösler, End, & Gamer,
2017; Rubo & Gamer,
2018; Xu, Jiang, Wang, Kankanhalli, & Zhao,
2014). These eye-tracking studies have shown that social features, such as faces or bodies of other human beings, are preferentially attended in free-viewing conditions. However, the use of naturalistic stimuli has mainly been restricted to experiments addressing overt attention by means of eye gaze but not for those that restrict eye movements in order to investigate covert shifts of attention. In addition to the distinction between covert and overt shifts of attention, the literature typically dissociates between reflexive, externally driven shifts of attention and voluntary shifts of attention. Traditional covert paradigms, such as variations of the Posner cueing paradigm (Posner,
1980) involve the presentation of cues that address either endogenous, top down–driven control or exogenous, bottom up–driven attentional orienting while restricting eye movements. In the original experiment, two boxes are presented on the screen, and one of these boxes will reveal a checkerboard. Before the onset of this probe, one of the boxes may be cued exogenously using a flash at the corresponding location, reflexively drawing attention to it. Alternatively, a central arrow cue may point to one of the boxes, triggering an endogenous shift of attention. The cues can either correctly indicate where a stimulus is going to appear (valid cue) or point to an incorrect location (invalid cue). Even when the cue does not reliably predict where the probe is going to appear, participants typically respond faster on trials in which the probe is congruent with the cued and, therefore, attended location compared to a cue that was incongruent. This phenomenon is referred to as a congruency effect. Although the majority of these paradigms use directional cues, such as arrows (e.g., Tipples,
2002), some studies have introduced social endogenous cues, referring to centrally presented faces or eyes indicating a certain spatial location by changing the gaze direction (Deaner & Platt,
2003; Driver et al.,
1999; Friesen & Kingstone,
1998; Kuhn & Kingstone,
2009; Ricciardelli et al.,
2002). Such gaze cues are able to induce automatic shifts of attention in the cued (i.e., gazed-at) direction and seem to resist voluntary control.