Abstract
Humans effortlessly know how and where to move in the immediate environment using a wide range of navigational actions, from walking and driving to climbing. Yet little is known about where and how action affordances are computed in the brain. Some work implicates scene-selective cortex in navigational affordance representation, reflecting visual features computed in mid-level DNN layers (Bonner et al., 2017, 2018), while others report a lack of affordance representation therein (Groen et al., 2018). Here, we curated a novel set of real-world scenes that afford distinct navigational actions in both indoor and outdoor environments, for which we collected rich behavioral annotations (N=152) for seven commonly used visual properties. The behavioral annotations indicate that navigational actions form a distinct space separate from representations of objects or materials; even in combination, visual properties explain only around 20% of the variance in navigational action annotations. We collected human fMRI measurements (N=20) to a subset of 90 images while subjects performed three distinct tasks (action affordance recognition, object recognition, and fixation). Using representational similarity analysis, we confirm that scene-selective brain regions, especially the Parahippocampal Place Area and Occipital Place Area, represent navigational action affordances. Furthermore, elevated behavioral correlations in scene-selective regions during action affordance and object recognition tasks relative to fixation suggests these representations are task-dependent. Unlike prior findings, however, we find that DNNs trained for scene and object classification poorly represent these action affordances. Interestingly, language-supervised models like Contrastive Language-Image Pre-training (CLIP) show enhanced predictions for behavior and brain activity, suggesting they capture affordance representation better. These findings strengthen evidence for action affordances in the scene-selective cortex and reveal their task dependency. However, the underlying computations remain elusive, but our work suggests that integrating semantic information in computational models of affordance perception is a promising direction.