Abstract
Are the visual representations that guide our online interactions with the world sparse and impoverished, or richly detailed, including the 3D shape of objects and their spatial and physical relationship to each other? We address this question using a naturalistic virtual reality environment in which participants (N=10) are asked to find an occluded target object (orange) on a tabletop environment as quickly as possible, by pressing a button to indicate which occluder should be moved first, or by reaching directly for the occluder. The occluders differ in width, orientation, 3D shape, and the presence of holes (which enable the participant to see through parts of the occluder). As instructed, people launch their actions quickly, within 500 ms after stimulus onset. We find that the decisions about which of two occluder objects to move first are guided by fairly accurate estimates of the area behind occluders that take into account 1) the 3D structure of the scene (not just the 2D pixel area of the occluders) and 2) the relative size of the hidden object. We also find that the decisions are similarly fast and accurate whether participants explicitly report or move the objects. Overall, these results suggest that a fast and accurate 3D representation of both visible and occluded parts of a scene are rapidly available to guide rational search in naturalistic environments. Future work using this framework will investigate whether the information that is rapidly available during naturalistic viewing includes not only geometric but also physical properties of the scene.