Abstract
Visual search is a ubiquitous human behavior and canonical example of selectively sampling sensory information to attain a goal. Previous research has studied optimality in visual search with artificial laboratory tasks (Najemnik and Geisler, 2005; Yang et al. 2016). To understand how people search in naturalistic environments, we conducted a study of visual search in virtual reality. Participants (N=21) viewed scenes generated with the Unity game engine through a head-mounted display equipped with an eye-tracker. On each of 300 trials, participants were shown a target object and teleported into a virtual cluttered room where they searched for the item from a fixed viewpoint. They had 8 seconds to identify the target object among 60-100 distractors. Participants had a 76% success rate of finding the target with a median response time on successful trials of 2.89s (IQR: 1.99-4.44s). To understand what features drive people’s search, we annotated gaze samples with semantic scene information such as the identity, shape, color, and texture of the object at the center of gaze. Concretely, we used the object asset (3D mesh and texture) to compute low-dimensional shape and color representations of each object. We found that people’s gaze is primarily directed to task-relevant objects (i.e. targets or distractors), and that the distractors that people look at are close in representational space to the target. Furthermore, this distance decreased over time, suggesting that representational similarity guides eye movements. We discuss these results in the context of a meta-level Markov Decision Process model (Callaway et al. 2018), which frames visual search as optimal information sampling under computational constraints.