Abstract
View-based and Cartesian representations provide rival accounts of visual navigation in humans. Here, we compare the ability of models based on each representation to describe human performance for a homing task on the scale of a room (i.e. 3–4 m square) in an immersive virtual reality environment. In interval one, participants were shown three very long coloured vertical poles from one viewing location with some head movement permitted. The poles were easily distinguishable, and designed to have constant angular width irrespective of viewing distance. Participants were then transported (virtually) to another location in the scene and, in interval two, they tried to navigate to the initial viewing point relative to the poles. The distributions of end-point errors on the ground plane differed significantly in shape and extent depending on pole configuration and goal location. We compared the ability of two types of model to predict these variations in the distribution of errors: (i) view-based models, based on simple features such as angles between poles from the cyclopean point, ratios of these angles, or various disparity measures and (ii) Cartesian models based on a probabilistic 3D reconstruction of the scene geometry in an egocentric coordinate frame for each interval, coupled with a comparison of these reconstructions for the “goal” and “end” points. We estimated parameters for each type of model using cross-validation, and compared the models based on the likelihood of the validation dataset. For our data, we find that view-based models capture important characteristics of the end-point distributions very well. By contrast, the most plausible 3D-based models have a lower likelihood than many possible view-based alternatives. Our evidence provides no support for the hypothesis that human navigation on this scale is based on a Cartesian model.