A quick look at a face allows us to glean a lot of socially relevant information, such as identity, gender, mood, attractiveness, and even trustworthiness. In the past half-century, there has been a lot of interest in understanding the mechanisms by which humans are able to achieve this feat (
Bruce & Young, 2013). This interest in the mechanism of face processing has led to the discovery of many peculiar perceptual effects that occur in face judgments on stimuli that have been manipulated or doctored in ways that are uncommon or non-existent in nature. Notable examples of such effects are the face inversion effect (
Yin, 1969), the Thatcher illusion (
Thompson, 1980), the composite face effect (
Young, Hellawell, & Hay, 1987), the parts versus wholes effect (
Tanaka & Farah, 1993), and the face adaptation aftereffect (
Webster, Kaping, Mizokami, & Duhamel, 2004). Although these effects are not ecologically valid in the sense that they do not have a functional role in real-life face recognition settings, their study has been central in our theoretical understanding of how humans process faces. Therefore, these visual tasks are an important testbed for studying face recognition.
Recently, it was shown that across various common face tasks such as person, gender, and emotion recognition, humans consistently land their first saccade at a point just below the eyes (
Peterson & Eckstein, 2012). This location, referred to here as the preferred fixation location (PFL), varies moderately across individuals (
Peterson & Eckstein, 2013), and these variations generalize to the real world (
Peterson, Lin, Zaun, & Kanwisher, 2016). This point of fixation plays a functional role in the aforementioned tasks, such that when observers fixate away from the PFL there is a reduction in task performance. More recently, it was shown that the internal representations of faces in humans are tuned to the PFL in these tasks (
Tsank, 2019). Specifically,
Tsank (2019) showed that individual differences in the variation of face identification performance with fixation position are best captured by an ideal observer model with internal templates as faces foveated at an individual's PFL. Further, the performance of convolutional neural network models trained on faces foveated at an individual's PFL on the face is also tuned to the PFL. These observations are consistent with research showing that human face processing is optimized for the face diet that we encounter during our lives (
Oruc, Shafai, Murthy, Lages, & Ton, 2019;
Yang, Shafai, & Oruc, 2014). This is because consistently moving one's eyes to a preferred location on the face can bias internal representations to match stimulus features accessible when fixating at the PFL, given peripheral information loss. This body of research has largely focused on the effect of the gaze position relative to the PFL on performance in common face tasks (identity, gender, and emotion identification), but we do not understand whether gaze position modulates the effect size in face tasks that are doctored to produce peculiar perceptual effects. As noted earlier, such research has the potential to elucidate constraints and mechanisms by which the long-term oculomotor strategy of consistently moving one's eyes to a PFL on the face can shape face perception. Here, we consider how the gaze position relative to the PFL modulates the composite face effect (CFE).
The discovery of the CFE by
Young et al. (1987) is an important landmark in the evolution of an influential hypothesis that upright faces are processed as units (
Piepers & Robbins, 2012). In the CFE, the top (or bottom) halves of two faces, although identical, are perceived as being different when the bottom (or top) halves of the faces are different (
Young et al., 1987). Over the years, the CFE has been studied extensively (
Murphy, Gray, & Cook, 2017;
Rossion, 2013).
Figure 1 shows the general stimulus design for demonstrating the CFE. Note that the reader may not experience the CFE equally strongly for both halves.
At its core, the CFE represents a surprising limitation of the visual system, which, despite its expertise in face processing, struggles to ignore irrelevant facial features, whereas it more effectively disregards irrelevant features of non-face objects (
Cassia, Picozzi, Kuefner, Bricolo, & Turati, 2009;
Robbins & McKone, 2007). Typically, this effect is attributed to holistic face-processing (i.e., a mode of computation that considers all the parts at once) (
Richler, Palmeri, & Gauthier, 2012). There are several other examples of tasks where observers also show such inflexibility in adjusting for unfamiliar manipulations to features or viewing conditions. These include the inability to recognize inverted faces (
Yin, 1969), diminished recognition performance with other-race faces (
Cross, Cross, & Daly, 1971), atypical illumination directions(
Braje, Kersten, Tarr, & Troje, 1998), diminished recognition for too small or too large faces (
Yang et al., 2014), and diminished recognition when fixating away from the preferred first fixation location (
Peterson & Eckstein, 2012). Thus, the CFE can be interpreted as an inability to flexibly use learned internal representations of full faces on a task that demands judgments based on partial use of incoming face information. In this general framework, we examined the role of long-term consistent eye movements to a consistent PFL on the face in shaping the internal representations of faces as measured by the CFE.
For this, we implemented gaze-contingent versions of the classic CFE task, where we quasi-experimentally manipulated the PFL on the face and experimentally manipulated the half of the face being judged (top vs. bottom) and judgment fixation location (JFL) on the face. For this, we first screened observers whose PFL was either high up on the face close to the eyes (upper-lookers) or lower on the face close to the nose tip (lower-lookers). We then measured the strength of the CFE for the top and bottom halves of faces for the upper-lookers and lower-lookers when their JFL was either at their own group's mean PFL or at that of the other group's mean PFL.
Our experimental design allowed us to test several hypotheses on how the CFE is modulated by variations in the PFL and JFL across the face. They are listed below:
Hypothesis H1
—Upper-lookers and lower-lookers would show a stronger CFE for the top half compared with the bottom half. This hypothesis is based on recent reports demonstrating the primacy of the eye region for face perception (
Issa & DiCarlo, 2012;
Rossion, 2013;
Royer et al., 2018).
Hypothesis H2—The amount by which the top-half CFE is stronger than the bottom-half CFE depends on the PFL, such that upper-lookers would show a greater top-half CFE than lower-lookers. This hypothesis is based on the idea that upper- and lower-lookers have different preferred internal representations, with the upper-lookers relying more on the eye region compared with the lower-lookers.
Hypothesis H3
—The CFE should be affected by the proximity of the JFL to the PFL, such that the CFE is stronger when the JFL is at the PFL and is diminished when the JFL is far away from the PFL. This hypothesis is based on the findings from
Peterson and Eckstein 2012;
Peterson and Eckstein 2013, showing that both upper- and lower-lookers perform best at face identification tasks when they fixate their PFL.
The deidentified data, stimuli, and analysis scripts to reproduce the findings reported in this paper have been made publicly available via the Open Science Framework (OSF) and can be accessed at
https://osf.io/vsdcj.