Abstract
Visual perception is influenced by the statistical properties of our environment. Notably, cortical regions with visual selectivity for faces and body parts exhibit a topographic relationship (Weiner & Grill-Spector, 2011) that evokes the relative locations of heads to bodies. Here, we explore the hypothesis that detection sensitivity for faces and hands is biased according to their frequent spatial locations in the visual field. We used the Google MediaPipe face and hand detectors to analyze over 200 hours of the Visual Experience Dataset, comprising more than 23.1 million frames of egocentric video. Our findings revealed a distinct spatial distribution: the median center of faces was located 13 degrees above central fixation, whereas hands typically appeared 23 degrees below (Wilcoxon rank sum, p<0.0001). We reasoned that these position statistics would translate into enhanced face detection in the upper visual field and better hand detection in the lower field. To test this, we embedded faces and hands within 1/f noise at 50% transparency (with chroma values selected from a range of skin tones). Targets were presented in either the upper or lower visual field. In a challenging detection task, 22 participants identified the presence of a body part (face or hand) versus 1/f noise within an 80 ms exposure followed by a dynamic pattern mask. Our face detection results supported our hypothesis, with higher accuracy in the upper visual field (81%) compared to the lower (75%, p<0.05). However, hand detection did not exhibit a significant difference between the two fields (70% vs 72%). These outcomes will be discussed in relation to the spatial selectivity of face (FFA and OFA) and body (EBA) areas and the potential for complex relationships between visual field biases, eye movement patterns, and peripheral visual recognition.