Abstract
In everyday life, humans rely heavily on vision to identify objects and to guide the hands towards them to support goal-directed actions. Recent research has begun to question the ecological validity of using impoverished stimuli, such as two-dimensional (2-D) computerized images, as proxies for real-world solid objects to study perception and action. Real objects differ from 2-D pictures in many aspects, including the availability of binocular depth cues and the fact that they are tangible, actionable solids. Here, we measured gaze patterns towards everyday kitchen and garage tools by human adults during an object categorization task and an object grasping task. The stimuli were presented to observers as real-world solids, 2-D computerized images, or three-dimensional (3-D) stereoscopic images. The 2-D and 3-D images were matched closely to their real-world counterparts for retinal size, viewpoint and illumination, and event timing was computer-controlled on all trials. We used linear discriminant analysis (LDA) to determine whether eye movements to stimuli in each display format could be reliably discriminated based on the evolution of gaze position throughout the trials. Gaze patterns towards stimuli in the three display formats were highly discriminable from each other, both during the visuomotor grasping task and the visual categorization task, particularly in the first several hundred milliseconds of each trial. Gaze patterns towards 2-D and 3-D images of objects were more similar to each other than to those towards solid objects. Specifically, participants’ early gaze tended to linger towards the handles of real tools more so than on those of their 2-D and 3-D picture counterparts. These results illustrate that even very early behavioral responses towards objects depend critically on the format in which the stimuli are presented. Our findings underscore the importance of using ecologically valid stimuli to understand real-world vision and action.