Abstract
Most past and present research in computer vision involves passively observed data. Humans, however, are active observers outside the lab; they explore, search, select what and how to look.
Here, we are investigating active, visual observation in a 3D world. To focus, we ask subjects to decide if two 3D objects are the same or different, with no constraints on how they view those objects. Such 3D unconstrained, active observation seems under-studied. While many studies explore human performance, usually, they use line drawings portrayed in 2D, and no active observer is involved. The ability to compare two objects seems a core visual capability, one we use many times a day. It would also be essential for any robotic vision system whose role it is to be a real assistant at home, manufacturing or medical setting.
To explore the 3D 'same-different task', we designed a novel experimental environment and created a set of twelve 3D printed objects with known complexity. The subject is allowed to move around freely in a 4m x 3m controlled environment, outfitted with eye gaze tracker and observed by head trackers. In this environment, two objects are presented at a time at a fixed 3D locations but with a varying 3D pose. We track precise 6D head motion, gaze and record a video of all actions, synchronized at microsecond resolution. Additionally, the subject is interviewed about how the task was approached.
Our results show that at least six strategies for solving this task are employed, not always independently. We found that the strategy used is dependent on three variables: object complexity, object orientation, and initial viewpoint. Furthermore, we show that performance improves over time as subjects refine their strategies throughout the study. Since no external feedback is given, an internal feedback mechanism must exist that refines strategies.