Regions of interest (ROI) were primarily defined with respect to social content. Based on recent findings that fixation densities on head and body regions tend to be anticorrelated (e.g.
Broda & de Haas, 2022), we decided to differentiate between these two social elements. Therefore, using the software GIMP 2.8.22 (
https://www.gimp.org), human heads and bodies were manually drawn for each picture. Bodies were defined as excluding heads and including torso, arms, hands, legs, and feet. Furthermore, in order to allow for a comparison between social ROIs and areas of high physical saliency (cf.
End & Gamer, 2017), we calculated saliency maps using the Graph-Based Visual Saliency (GBVS) model of
Harel, Koch, and Perona (2007), that generally performs well in predicting fixation locations (
Judd, Durand, & Torralba, 2012). Saliency values of pixels outside social ROIs were classified as reflecting high physical saliency if the respective value exceeded the eighth decile of the saliency distribution. This criterion was, as
End and Gamer (2017) already noted, arbitrary, but the cut-off fulfilled the purpose of effectively distinguishing high- from low-saliency regions (see
Figure 2 for example images, corresponding physical saliency maps, and resulting ROIs; further characteristics on these ROIs are given in
Table S1 of the supplement). Finally, these ROIs were used to determine the trial-wise average fixation or relevance density, respectively, for head and body ROIs as well as regions of high physical saliency. Therefore, the sum of fixation or relevance density values per ROI was divided by the sum of density values for the whole scene and this proportion was then normalized by the area of the respective ROI to control for the issue that larger areas typically receive more attention than smaller ones. Moreover, in order to compare the current results to the analyses of
End and Gamer (2017), we also calculated the relative proportion of fixations/relevance taps on these ROIs for the first five individual fixations/relevance taps and normalized these scores by dividing them by the mean area of the respective ROI across all scenes whose data were represented in the respective relative frequency score.