Abstract
Neural correlates of true, real-world vision remain almost entirely unknown. The lack of understanding of real-world vision is particularly problematic in the context of face perception where passive viewing of static, unfamiliar, and isolated faces, briefly presented in between blank screens bears little resemblance to the richness of real-world interpersonal interactions. The real world features context, familiar faces present in relatively stable positions and active sampling of information via eye movements. To address these gaps in knowledge, we simultaneously recorded intracranial electroencephalography (ECoG), eye-tracking and videos of scenes being viewed by human subjects over hours of natural conversations with friends, family, and experimenters. These videos were annotated on a frame-by-frame basis using computer vision models, to assess when subjects were fixating faces. The fixated faces were manually labeled for identity and emotional expression classification and the face at each fixation was extracted to allow for face image reconstruction based on neural activity and, inversely, neural reconstruction based on facial information. Neural selectivity for real-world facial identity and expression classification was seen distributed across occipital, temporal, parietal and cingulate cortices and had both overlap and critical spatiotemporal differences compared to when subjects viewed faces in traditional paradigms. Accurate reconstruction of the faces of different individuals, as well as the face of an individual with different emotional expressions, was seen using fixation locked neural activity. Conversely, reconstruction of fixation locked neural activity from face stimuli was accurate for specific frequency bands and temporal periods of the neural response, suggesting a relationship between facial features and oscillatory mechanisms during real-world interactions. These findings demonstrate the role of neurodynamics in capturing fine-grained details of facial information during real-world visual perception and demonstrate that combining invasive neural recordings with real-world behavior can be used to achieve a neurocomputational understanding of natural facial perception.