Abstract
In natural settings, we encounter complex 3-D objects as dynamic sequences resulting either from ego motion or object motion. How does the visual system represent object appearance following these dynamic experiences? Specifically, does it extract privileged or canonical views of an object from a dynamic sequence? This is a fundamental question that has implications not only for theories of object representation, but also for the broader issue of how our continuous experience of the visual world may be encoded in memory. We presented naïve observers with short image sequences depicting novel objects either rigidly rotating through space, or scrambled sequences that did not depict coherent motion. Following this brief exposure period, we measured response time and accuracy for old/new judgments of frames from the training sequence and novel distracter images. We report two main results. First, performance following the observation of coherent object motion is far superior to performance following random presentation. This is evident even when the same set of images is used following both coherent and scrambled sequences, indicating that target/distracter dissimilarity is not driving recognition judgments. Second, we find evidence that even after very little experience, certain object views from coherent sequences are correctly identified faster than others. These views might serve as “keyframes” for efficiently encoding the full spatiotemporal input. We compare our RT data to explicit ratings obtained for each view, and discuss various models by which keyframe position might be predicted from spatial and temporal factors. We conclude that locally canonical views are indeed determined from coherent dynamic experience and that explicit judgments of canonicity do not necessarily predict the results obtained from implicit measures.
National Defense Science and Engineering Graduate Fellowship