Abstract
In daily life, we move with ease through our immediate environment: we choose different paths using a variety of navigational actions, such as walking, swimming or climbing. How do we represent navigational affordances and which visual features drive these representations? Prior work suggests that cardinal orientations in the lower visual field are diagnostic of navigational paths for walking through indoor scenes, and that Convolutional Neural Networks (CNNs) trained for scene prediction also encode these features, thus showing representations for navigational affordances (Bonner & Epstein, 2017, 2018). Here, we investigated which visual features predict a larger range of navigational actions in both indoor and outdoor environments. In an online experiment, human participants (N=152) annotated navigable paths and associated actions (e.g., walking, biking, driving) in images of real-world scenes, along with several readily identifiable scene properties (e.g., objects, materials, spatial layout, scene category). Converging results suggest that navigational affordances are derived from a combination of different scene properties which are only partially captured by current computational models. First, principal component analysis reveals three major dimensions in the navigational action space. While there are obvious diagnostic scene properties for some actions, like water for swimming/boating, there is no simple mapping from a single scene property to each dimension. Second, representational similarity analysis shows that measured scene properties only account for a small part of navigational affordance representations (R2<.21), with CNNs trained for scene recognition capturing a similar amount of variance (R2=.21). Finally, explainable AI feature visualizations show that CNN representations rely on objects and sharp contrasts rather than navigational features. Together, these results show that there is an inherent structure in navigational affordances that has not been exploited so far, and suggest that computational models specifically trained for navigational tasks may be necessary to uncover the underlying representations.