Abstract
The limited bandwidth of attention places tight constraints on visual processing of natural, complex scenes. Although a substantial body of research on change blindness suggests that visual representations of scenes are very sparse, Henderson & Hollingworth (2004) argue that elaborate representations of scenes are built up in long-term memory. If so, such representations might serve as a basis for attracting attention to changed regions of scenes. We examined this hypothesis in an immersive virtual environment where 8 subjects were given time to become familiar with the environment while walking the same path 6 times. Distribution of gaze was examined during familiarization trials, and also following changes in the environment, including appearance, disappearance, movement, or replacement of an object. Results showed that the average duration of fixations on control objects is 534 ms while that on changed objects is 1083 ms for subjects who had 3 familiarization trials in the environment before the changes occurred. Without familiarization, however, fixations on changed objects were no longer than on control objects (645 ms and 729 ms on changed and control objects respectively). Fixations were longest on new or moved objects, and least on replaced or removed objects. These results are consistent with Brockmole et al (2004) with 2D images, and support the hypothesis that we learn the structure of natural scenes over time, and that attention is attracted by deviations from the normal state. Thus, bottom up attention is defined with respect to learnt representations of scenes.