Abstract
Visual representa-tion of complex scenes requires the retention and integration of information from separate eye fixations on local objects. In the present study, a “follow the dot” method was used to investigate the memory systems supporting accumulation of visual information from objects in scenes. Participants fixated a series of objects in 3-D rendered environments, following an abruptly appearing dot-cue from object to object. During this sequence, a change detection test was introduced: A single target object was masked briefly, and the mask was removed to reveal either the original object or a different object from the same basic-level category. A verbal working memory load minimized verbal encoding. The delay between the examination of the target object and the test was manipulated: The target object was examined either 0, 1, 2, 4, or 10 objects prior to the initiation of the test. In a final condition, the test was delayed until all scenes had been viewed (a delay of 376 objects, on average). Detection sensitivity was highest in the 0-back condition (testing the currently attended object), with A' = .91. Detection sensitivity declined to A' = .87 for the 1-back and A' = .84 for the 2-back condition, but was unreduced from this level in the 4-back (A' = .81) and 10-back conditions (A' = .83). Thus, the recency effect was limited to the 2 most recent objects, with no evidence of further forgetting when the retention of 4 or 10 objects was required. Performance remained well above chance (A' = .75) on the test delayed until the end of the session. These data allow the following conclusions: 1) Online scene representation is supported by VSTM retention of approximately 2 objects; 2) Online scene representation is also supported by visual LTM, and visual LTM capacity is not exhausted by the retention of many hundreds of objects; 3) Memory for an object attended immediately before a change is only slightly more reliable than memory for previously attended objects.