Abstract
Human pose, defined as the spatial relationships between body parts, carries critical visual information about the underlying motion and action of a person. A substantial body of previous work has identified cortical areas responsive to images of different body parts and their spatial relationships. However, these studies were done on a very limited range of poses and with fairly simple stimuli. Our paper investigates high-resolution fMRI responses to a broad range of poses present in over 4,000 complex natural images of people from the Natural Scene Dataset. To help analyze these data, we exploit detailed ground truth annotations created by the computer vision community. Using these annotations, we built models that contrasted view-dependent vs. view-independent and 2D vs. 3D parameterizations of body pose. We compared the similarity of patterns of cortical activity with similarities computed from model activations, thereby identifying how cortical areas represent viewpoint and 2D/3D body pose. We found distributed patterns of cortical activity that captured the similarity structure of the natural pose space in lateral occipital-temporal cortex (LOTC), fusiform gyrus and posterior parietal cortex, including previously studied areas (EBA and FBA). In particular, we found near the right superior temporal sulcus (STS) neural representations that exclusively encode intrinsic, view-independent 3D pose dissimilarity structures. Together, these results reveal a distributed cortical network, encoding both view-dependent and view-independent representations of pose.