Individual differences in face salience and rapid face saccades

Humans saccade to faces in their periphery faster than to other types of objects. Previous research has highlighted the potential importance of the upper face region in this phenomenon, but it remains unclear whether this is driven by the eye region. Similarly, it remains unclear whether such rapid saccades are exclusive to faces or generalize to other semantically salient stimuli. Furthermore, it is unknown whether individuals differ in their face-specific saccadic reaction times and, if so, whether such differences could be linked to differences in face fixations during free viewing. To explore these open questions, we invited 77 participants to perform a saccadic choice task in which we contrasted faces as well as other salient objects, particularly isolated face features and text, with cars. Additionally, participants freely viewed 700 images of complex natural scenes in a separate session, which allowed us to determine the individual proportion of first fixations falling on faces. For the saccadic choice task, we found advantages for all categories of interest over cars. However, this effect was most pronounced for images of full faces. Full faces also elicited faster saccades compared with eyes, showing that isolated eye regions are not sufficient to elicit face-like responses. Additionally, we found consistent individual differences in saccadic reaction times toward faces that weakly correlated with face salience during free viewing. Our results suggest a link between semantic salience and rapid detection, but underscore the unique status of faces. Further research is needed to resolve the mechanisms underlying rapid face saccades.


Introduction
Humans rapidly detect whether a face is present in a scene or not (Crouzet, Kirchner, & Thorpe, 2010).This finding is based on the saccadic choice task (SCT), in which participants have to detect a predefined target among two stimuli from different categories as fast and accurately as possible.More precisely, participants fixate centrally until two images from two different categories (a predefined target and a distractor) appear to their left or right, respectively.Their task is to saccade as fast as possible to the target category as soon as the stimuli appear.Previous studies have shown that faces-when being the target category-elicit faster saccadic reaction times (SRTs) than other stimulus categories such as animals or inanimate objects.Humans can reliably detect the presence of a human face within a scene 100 ms after onset (Crouzet et al., 2010).This effect is robust against various stimulus modifications, such as varying eccentricities (Boucart et al., 2016), contrast/orientation inversion (Little, Jenkins, & Susilo, 2021), low-pass filtering (Guyader, Chauvin, Boucart, & Peyrin, 2017), or scrambling face stimuli (Kauffmann, Khazaz, Peyrin, & Guyader, 2021).Because scrambled faces differ in their structure from their intact versions, the programming of rapid face saccades does not seem to rely on holistic face processing and may be driven by particular facial features.The eye region in particular has been suggested to play a role in rapid face detection.Most saccades Citation: Broda, M. D., Borovska, P., & de Haas, B. (2024).Individual differences in face salience and rapid face saccades.Journal of Vision, 24(6):16, 1-11, https://doi.org/10.1167/jov.24.6.16.
land near the eyes (Kauffmann et al., 2021) and upper face regions elicit faster SRTs than lower and even whole faces (Broda, Haddad, & de Haas, 2023).This finding is in line with recordings from face-selective cells in macaques showing that faces only containing an eye elicit rapid neural responses (Issa & DiCarlo, 2012).However, it remains unclear whether isolated eye regions are sufficient to elicit rapid saccades in the saccadic choice paradigm.
Trying to predict human gaze behavior has been a focus of vision scientists for decades.Often, research has focused on low-level image properties such as color, orientation, or luminance contrast (Itti, Koch, & Niebur, 1998).However, for complex scenes, low-level models barely perform better than chance (Schütz, Braun, & Gegenfurtner, 2011).Faces, in contrast, reliably attract saccades when present in static or dynamic natural scenes even when observers are not instructed to specifically look at them (Broda & de Haas, 2022a;Broda & de Haas, 2022b;de Haas, Iakovidis, Schwarzkopf, & Gegenfurtner, 2019;Linka & de Haas, 2020;Rider, Coutrot, Pellicano, Dakin, & Mareschal, 2018;Rubo & Gamer, 2018).Generally, humans fixate on faces more often and longer compared with other object categories (de Haas et al., 2019;Linka & de Haas, 2020).Thus, including faces in salience models drastically improves their performance (Cerf, Harel, Einhäuser, & Koch, 2008), as is true for other salient semantic features such as text (Cerf, Paxon Frady, & Koch, 2009;Xu et al., 2014).Faces are salient in the sense that they attract fixations during free viewing.Faces also attract fixation in the SCT.This finding suggests that SRT and performance advantages in the SCT may depend on the same type of salience attracting fixations during free viewing.We previously used the SCT to explore SRTs toward inanimate objects associated with faces, such as face masks or glasses, but found no advantages when compared with cars (Broda et al., 2023).This finding could be the result of the insufficient salience of such objects, which are not known to attract fixations strongly during free viewing.Text, however, is a highly salient semantic category that captures humans' gaze during free viewing (de Haas et al., 2019;Linka, Sensoy, Karimpur, Schwarzer, & de Haas, 2023;Xu et al., 2014).Nevertheless, it is unknown whether text elements possess a salience advantage in the SCT or if this effect is exclusive to faces.
Recent research from our lab found that saccades targeting faces during free viewing are preceded by shorter fixation durations compared with saccades directed toward inanimate objects (Borovska & de Haas, 2023).This finding suggests the latency advantage for face-directed saccades generalizes between the SRT and natural free viewing, and that at least some mechanisms of face salience may be shared between the SRT and free viewing.The aforementioned individual differences in face salience during free viewing offer an opportunity to test this hypothesis.Individual differences in the proportion of first fixations landing on faces (as discussed elsewhere in this article) seems to be a particularly promising candidate for shared mechanisms with rapid face saccades in the SCT.
Our previous research addressed the question of whether the face advantage in the SCT generalizes to face halves and artificial face-related stimuli (Broda et al., 2023).Here, we investigate whether isolated face features, especially eyes, are sufficient to elicit rapid saccades, similar to whole faces; whether rapid saccades are exclusive to faces or generalize to text; and whether individual differences in early face fixations during free viewing are linked to potential differences for the face advantage in the SCT.

Participants
Seventy-seven healthy adult participants with normal or corrected-to-normal vision took part in the experiment, mean age, 25.23 ± 5.82 years; 53 females.All participants provided written informed consent and the study was approved by the institutional review board and in accord with the declaration of Helsinki.
All participants completed both experiments, the SCT and the free viewing task, in separate sessions.All data from the SCT paradigm are new and original.Free viewing data collected during a separate session were used for more studies than the present one (for at least some of our participants).Participants could choose between course credits or 10€/h for reimbursement.

Apparatus
Participants placed their head in a chin and forehead rest approximately 56 cm from the screen.The experiment was controlled using Psychtoolbox 3.0.16(Kleiner et al., 2007) in MATLAB R2019a (MathWorks, Natick, MA) on a Windows 10 PC.Gaze data were acquired using an EyeLink 1000 Plus eye tracker (SR Research, Ottawa, Canada) at a frequency of 1 kHz.

Stimuli
We used 150 grayscale silhouette-cropped car images and 30 silhouette-cropped grayscale images from five categories of interest each: faces, text, noses, mouths, and eyes (Supplementary Figure S1).All images originated from the fLoc functional localizer package (Stigliani, Weiner, & Grill-Spector, 2015).Nose, mouth, and eye stimuli were cropped from the original faces and have been previously used in another experiment (Broda & de Haas, 2023).All stimuli were scaled to the same width of 6.9 degrees visual angle (dva), and placed on a mid-gray background at an eccentricity of 8 dva to the left or right of fixation along the horizontal meridian.We created unique image pairs, each containing an image of a car and an image of a category of interest (i.e., face, text, nose, mouth, or eyes).Images were paired such that the difference in aspect ratios (vertical to horizontal stimulus size) between the members of each pair was minimized.

Procedure
In each trial, an image from a category of interest was shown to the left or right of a central fixation point.The corresponding car image appeared on the opposite side.Participants were instructed to saccade as quickly and accurately as possible to the predefined target, which could either be a car or one category of interest, in separate blocks of 60 trials each.The target side was counterbalanced across trials.Each trial started with a central fixation (diameter fixation point: 0.1 dva).After a random interval of 0.8 to 1.6 seconds, the fixation point disappeared and was followed by a gap of 0.2s.Then, a stimulus pair was shown for 0.4 seconds.The next trial started after an interval of 1second.Participants completed sets of 10 trials that were separated by a self-paced drift check which allowed for breaks without leaving the setup.Participants initiated the next set by fixating centrally and pressing the space bar.All categories of interest (i.e., all but cars) served once as target and once as distractor in separate blocks, always paired with cars.In total, participants completed 10 blocks of 6 sets (60 trials) in randomized order.We calibrated participants' gaze position before each block using a standard nine-point grid.This process allowed for breaks between blocks during which participants could leave the setup.The experiment lasted between 60 and 90 minutes in total.

Stimuli and pixel masks
We used the Object and Semantic Images and Eye-tracking set, which consists of 700 complex natural scenes (Xu et al., 2014).Additionally, we used the head pixel masks for this stimulus set published by Broda and de Haas (2022a) to determine participants' face salience.Of the 700 images, 454 contained at least 1 face.All scenes were presented at 34.3 × 25.7 dva.

Procedure
Participants freely viewed all 700 images for 3 seconds each.Images were presented in the same order for all participants.Participants completed 7 blocks, each containing 100 images.We calibrated participants' gaze position before each block using a standard nine-point grid.This process allowed for breaks between blocks during which participants could leave the setup.The start of each trial was self-paced and initiated by a central fixation and a button press.The complete experiment lasted between 60 and 90 minutes.

SCT
We included only saccades after the onset of the stimulus pair in each trial and analyzed the first saccade with an amplitude of greater than 1 dva.Trials were excluded if this saccade possessed an initial eccentricity of more than 2 dva, a latency of less than 50 ms, or a duration of more than 100 ms.Only correct responses were considered for SRTs.We excluded two participants with more than 50% invalid trials or incorrect responses.For the remaining participants, an average of 85% of trials were valid and, of those, an average of 73% was correct.

Performance
We computed the proportion of correct responses, separately for each participant and block.Performance advantages were always relative to the car category and calculated as the difference in performance when a given category of interest served as target compared with when it served as distractor.

SRT
We computed the mean SRT separately for the correct responses of each participant and block.SRT differences were calculated as the difference in SRT of correct responses when a given category of interest served as target compared with when it served as a distractor.
We defined outliers as those instances where individual performance or SRTs deviated by more than 3.5 standard deviations from the group's mean.These outliers were excluded from further analyses.We computed separate repeated-measures analyses of variance (ANOVAs) and subsequent post hoc t tests in MATLAB.The family-wise error rate was set to an α of 0.05.We applied the Holm-Bonferroni method to correct for multiple testing (p values are reported uncorrected but marked as significant only if they survived correction).

Minimum SRTs
We aggregated the SRTs across participants, categorizing them based on response accuracy, separately for each target category.We binned them into 30 bins, each spanning 10 ms, ranging from 60 to 350 ms after stimulus onset.We conducted a binomial test in each bin to determine if the proportion of correct responses significantly exceeded chance level, p < 0.05.Minimum SRTs were identified as the earliest significant bin that was followed by at least four consecutive significant bins (cf.Broda et al., 2023;Kauffmann et al., 2021).

Face SRT consistency
To test whether the SRT advantage for faces systematically varies across observers, we first applied a stricter exclusion criterion to only include participants with at least 50% valid and correct trials in the block in which whole faces served as the target as well as in the block in which faces served as the distractor category.This led to the exclusion of 17 participants and ensured we only included participants with a sufficient number of trials to allow robust estimates of the individual SRT advantage for whole faces, a crucial prerequisite for analyses pertaining to individual differences (de Haas, 2018).To test whether individuals consistently differed in their SRT differences toward faces, we computed split-half correlations across 1,000 random splits (test distribution).For the null distribution, we used the same splits, but shuffled the data across individuals.We compared the resulting median correlation from the test distribution against the null distribution.The p value was determined as the proportion of null distribution events above the median correlation.For explorative reasons, we additionally computed consistencies the same way, including a category specific exclusion criterion (<50% valid/correct trials in the target or distractor block) for text, noses, mouths, and eyes.

Free viewing experiment
We only considered participants that were included in the face SRT consistency analysis (as discussed elsewhere in this article).Face fixations were labeled as such if they landed on a head feature in an image or were within a distance of 0.5 dva from the respective feature mask.All fixations earlier than 100 ms after image onset and shorter than 100 ms in duration were excluded from further analyses (cf.Broda & de Haas, 2022a;de Haas et al., 2019).

Proportion of first face fixations
We defined the proportion of first face fixations by adding the number of cases in which the first fixation after image onset landed on a face, divided by the number of all first fixations, separately for each individual.We computed split-half correlations across 1,000 random splits (test distribution).For the null distribution, we used the same splits, but shuffled the data across individuals.We compared the resulting median correlation from the test distribution against the null distribution.The p value was determined as the proportion of null distribution events above the median correlation.Finally, we correlated face SRT differences with individuals' proportion of first face fixations.All correlations used Pearson's r.

Minimum SRTs
We additionally explored minimum SRTs, determining the lowest SRTs from which the proportion of correct trials became significantly higher than expected by chance.The minimum reaction times for faces, text, noses, mouths, and eyes (vs.cars when the respective category served as distractor) were 120 ms (vs.140 ms), 140 ms (vs.130 ms), 150 ms (vs.170 ms), 140 ms (vs.190 ms) and 140 ms (vs.170 ms) respectively (Figure 3).

Consistencies in SRT differences
To test whether participants consistently differed in their face SRT differences, we calculated the median consistency across 1,000 random split-half correlations.Individuals significantly differed in their face SRT differences, r = 0.49, p < 0.001 (Figure 4A).Exploratory analyses found similar consistencies for text, noses, mouths, and eyes (Supplementary Figure S4).
Next, we tested whether these differences in face SRT differences were correlated with the tendency to fixate faces first when free viewing of complex scenes.Individual differences in the proportion of first fixations landing on faces was highly consistent, r = 0.95, p < 0.001, replicating previous findings (Broda & de Haas, 2022a;Broda & de Haas, 2022b;de Haas et al., 2019;Guy et al., 2019;Linka et al., 2022;Linka & de Haas, 2020).Crucially, there was a weak but significant correlation between the individual SRT difference for faces and the proportion of first fixations landing on faces during free viewing, r = −0.29,p = 0.026 (Figure 4B).Participants who showed a higher SRT advantage for faces compared with cars tended to direct more first fixation toward faces during the free viewing task.

Discussion
Faces elicit faster and more accurate saccades than other types of objects (Crouzet et al., 2010), which may hinge on the eye region (Broda et al., 2023;Kauffmann et al., 2021).However, it remains unclear whether rapid saccades are face specific and whether individual differences in the tendency to immediately fixate faces during free viewing are related to reaction time advantages for faces in the SCT.To tackle these open questions, we contrasted stimuli of whole faces, isolated face regions (eyes, noses, and mouths), and text with images of cars using the SCT.Furthermore, we tested whether face SRT differences varied consistently between individuals and, if so, whether these differences were correlated with general differences in face detection during free viewing.
We found high absolute performances for faces and text as targets (both >83% correct), which was significantly higher than absolute performances for isolated face regions (all <74% correct).However, this effect diminished for performance advantages (i.e., the difference in performance when a category of interest served as a target vs. as a distractor, respectively), because all categories of interest apart from noses showed a significantly stronger performance compared with when they served as distractors and cars served as the target category.These performance advantages over cars may be explained by differences in salience.Faces (e.g., de Haas et al., 2019), face features (e.g., Broda & de Haas 2022a), and text elements (e.g., Xu et al., 2014) have all been shown to attract human gaze when present in a scene.Note, however, that our design followed previous studies in using a singular comparison category (cf.Crouzet et al., 2010;Kauffmann et al., 2019;2021;Little et al., 2021), leading to far more trials with cars than other types of stimuli.Theoretically, this may cause the observed intercept advantage for non-car stimuli.Crucially, we do not consider this is a concern for our main analyses, namely, testing differences across non-car stimulus categories and across individual participants.Future studies may use more diverse control stimuli and balanced designs (as discussed elsewhere in this article).
Faces and text elicited saccades with similarly low latency.However, although we found a SRT advantage for text over cars, this advantage was significantly lower compared with the advantage for faces over cars, which suggests that faces as distractors are more potent in slowing car-directed saccades than text.This finding is in line with minimum SRTs that showed that participants reliably performed correct saccades earlier for faces than for text.The reaction time advantage for text over cars may point to a general role of semantic salience (i.e., beyond faces) in the rapid detection of objects.Nevertheless, this effect is significantly smaller than that for faces, in line with their unique status for gaze behavior and potential domain-specific mechanisms.The high salience of text elements in scenes (Cerf et al., 2009;de Haas et al., 2019;Xu et al., 2014) rests on the ability to read (Linka et al., 2023), which is in contrast with face salience, which is already present at birth (Fantz, 1963) and possibly even earlier (Reid et al., 2017).This finding suggests that the advantages we found for text over cars may be the result of learned salience and tied to literacy.Alternatively, the text advantage may be mediated by low-level properties (such as contrast levels or spatial frequency spectra) and independent of literacy and acquired semantic salience.Future experiments may juxtapose these hypotheses with the use of well-matched control stimuli and (if possible) by testing pre-literate children in the saccadic choice paradigm.
Noses elicited significantly longer reaction times than all other categories of interest.Similarly, noses were the only category to show no performance advantage and possessed the longest minimum SRTs.However, we found faster reaction times when compared with cars.When freely viewing faces, many observers target the nose region (Broda & de Haas, 2024;Peterson & Eckstein, 2013).The nose has been shown to convey information used to judge a human's age, personality traits, and emotional states (Broda & de Haas, 2023;Diego-Mas, Fuentes-Hurtado, Naranjo, & Alcañiz, 2020).However, information from eyes and mouths is more useful when judging these expressions (Broda & de Haas, 2023).Our findings are in line with these studies, showing that noses are a somewhat salient facial feature, but less so than eyes and mouths.
Whole faces were detected with greater accuracy than all face parts, a greater performance advantage over cars than noses, with a higher reaction time advantage over cars than mouths and text and with the lowest minimum SRT of all stimuli.The previously reported advantages for rapidly detecting the upper face region over other face parts (and even whole faces) (Broda et al., 2023;Kauffmann et al., 2021) did not replicate for the isolated eye region.The previously used upper face stimuli contained the forehead and nose, but we speculated that their advantage was mediated by the eye region alone (Broda et al., 2023).Our current results are not in line with this hypothesis.Instead, the advantage for our previous stimuli may have been mediated by (part of) the typical low-spatial frequency contrast pattern in faces (Dakin & Watt, 2009).The eye region is typically darker compared with the forehead and nose.Previous research has shown that greater contrast results in decreased SRTs (Ludwig, Gilchrist, & McSorley, 2004).Rapid saccades might rely on this contrast between the eye region and its context, which was missing in the current stimuli.Future research could investigate this phenomenon by varying the contrast between isolated eye regions and the background or between the eye region and their surrounding facial regions.
Consistent with previous studies, we placed all stimuli on the horizontal meridian.Previous results show that rapid face-directed saccades toward faces are unaffected by their position in the visual field (Kauffmann et al., 2021;Little et al., 2021).Nonetheless, when freely viewing faces, the eyes are typically located in the upper and the mouth in the lower visual field.Recognition performance decreases when isolated face features are presented at atypical compared with typical visual field locations (de Haas et al., 2016;de Haas et al., 2021;de Haas & Schwarzkopf, 2018).The upper face stimuli used by Broda et al. (2023) included face parts below the eyes, such as the nose or cheeks, so eyes still appeared in participants' upper visual fields before they executed a saccade, which is in contrast with the current study.Here, the eyes were aligned horizontally with participants' initial gaze position.It is possible that manipulating the visual field position of the isolated eye region would affect performance, and SRTs and upper visual field positions could result in a stronger advantage for the isolated eye region.
We recently found that fixations preceding saccades toward faces in complex natural scenes have a shorter duration than fixations preceding saccades toward inanimate objects (Borovska & de Haas, 2023).This finding suggests that the mechanisms mediating lower SRTs in the saccadic choice paradigm may be at play during free viewing and potentially contribute to face salience in complex scenes.Here, we aimed to test a potential link between face salience in complex scenes and the reaction time advantage for faces in the saccadic choice paradigm using individual differences.We find interindividual differences for the SRT advantage for faces over cars with moderate internal consistency.This result resonates with previous findings regarding individual differences in different aspects of gaze, such as saccade dynamics (Andrews & Coppola, 1999;Bargary et al., 2017;Hancock, Gareze, Findlay, & Andrews, 2012;Rigas, Komogortsev, & Shadmehr, 2016) or semantic salience (Broda & de Haas, 2022a;Broda & de Haas, 2022b;Broda & de Haas, 2024;Guy et al., 2019;Linka et al., 2022;Linka & de Haas, 2020;Peterson et al., 2016;Peterson & Eckstein, 2013;Rubo & Gamer, 2018).We also replicated highly consistent differences in face salience for first fixations during free viewing (cf.Broda & de Haas, 2022a;de Haas et al., 2019).Crucially, we found a weak but significant correlation between interindividual differences in face salience and SRT differences, which could point toward shared mechanisms.This finding is remarkable, given the only moderate consistency of individual estimates of saccadic reaction advantages for faces over cars.Nevertheless, this finding should be considered with caution.We had to exclude more than 20% of our participants owing to a lack of valid data in blocks where faces served as targets or distractors (<30 valid trials).Future studies need to confirm this finding using fewer categories and more trials with whole faces as targets or distractors, which should lead to more precise and robust estimates of individual SRT advantages for faces.Still, this work provides the first evidence that at least some of the variance in SRT differences is shared with face salience during natural free viewing.
In general, the biological mechanisms behind rapid saccades toward faces are still unknown.We replicate reliable detection latencies as low as 120 ms.Macaque studies found reliable face-preferring neural responses in the posterior lateral face patch (Issa & DiCarlo, 2012) and the superior colliculus (Yu, Katz, Quaia, & Messinger, 2023), both with a latency of less than 100 ms.Whether cortical or subcortical structures are involved in triggering rapid saccades toward peripheral faces in humans (Crouzet et al., 2010;Nguyen et al., 2014) remains unclear.
The SCT often uses images of vehicles or buildings as contrast categories.However, we found that both faces (and face parts, except noses) and text elicited superior performance and faster reaction times compared with cars.Previous research has suggested the existence of potential push-pull mechanisms between certain semantic categories.Specifically, face and text fixations (de Haas et al., 2019) and face and body fixations (Broda & de Haas, 2022a;Broda & de Haas, 2022b) are anti-correlated.People who are more prone to fixating faces in natural scenes look less often at text elements or other body parts compared with people who tend to fixate faces less.Here, we have shown consistent individual differences in SRTs toward faces.It would be interesting to see whether similar effects exist in the SCT when contrasting categories with each other, which are anti-correlated in terms of individual semantic preferences during free viewing.Generally, it seems advisable to test a broader range of control stimuli in future experiments to avoid potential specific effects for cars or buildings as control categories.
To conclude, we replicated rapid face saccades using the SCT.The advantage of faces over cars generalized to some extent to isolated face regions and text.However, these advantages were lower when compared with whole faces.We further provide first evidence for consistent individual differences in the SRT advantage for faces, which were weakly but significantly correlated with differences in individual face salience during free viewing.Future research is necessary to test the robustness of this association and potential shared mechanisms between face salience during free viewing and rapid face detection in the saccadic choice paradigm.

Figure 1 .
Figure 1.Performance (A) and performance advantage (B) for each condition.Each dot shows one observer's mean performance (or performance advantage).Black horizontal lines indicate group mean values.Performance corresponds with the proportion of first saccades going to the target in the block in which the respective category served as the target.Performance advantage refers to the difference in the proportion of correct saccadic choices in the block in which the respective category served as target versus blocks in which it served as distractor (and cars served as targets).

FFigure 2 .
Figure 2. Saccadic reaction time (SRT) (A) and SRT advantage (B) for each condition.Each dot shows one observer's mean SRT (or SRT advantage).Black horizontal lines indicate group mean values.SRTs correspond with the latency of the first saccade going to the target in correct trials of the block in which the respective category served as the target.SRT differences refer to the difference of SRTs of correct trials in the block in which the respective category served as target versus blocks in which it served as distractor (and cars served as targets).

Figure 3 .Figure 4 .
Figure3.Distribution of saccadic reaction times for correct (blue) and incorrect (dashed red lines) saccades for each condition.The 10-ms bin from which correct responses significantly exceeded incorrect ones is highlighted with a gray bar for each condition.The top row shows conditions in which the respective category of interest was target, and the bottom row the corresponding conditions in which it served as distractor (and cars were targets).