Visual processing of large-field scenes is performed in an interplay of gaze shifts (gaze = eye-in-head + head-in-space) and memorization of local visual stimuli in a visual short-term memory (VSTM). VSTM is considered a subdivision within the visual part of the working memory (Baddeley,
1978,
1992,
2003) and characterized by a limited storage capacity (Phillips,
1974). The upper limit of this storage capacity was identified as three to four objects, independently of the number of feature dimensions (e.g., color, shape, or orientation) probed for each object (e.g., Irwin & Andrews,
1996; Luck & Vogel,
1997; Vogel, Woodman, & Luck,
2001). In addition, the total information capacity of VSTM is also limited (Alvarez & Cavanagh,
2004; Xu,
2002). Alvarez and Cavanagh (
2004) suggested that there is an upper limit on storage that is set in terms of the total amount of information and that there is a functional dependence between the complexity of the objects and the total number of objects that can be stored. Limits in the storage capacity of the VSTM could be impressively shown by research on change detection, revealing the phenomenon of change blindness (Simons,
2000; Simons & Levin,
1997). In these experiments, two almost identical images are presented in alternation with short blanks separating them in time. Subjects are strikingly insensitive even to large changes between both images. For the case of eye movements, Ballard, Hayhoe, and Pelz (
1995) and Hayhoe, Bensinger, and Ballard (
1998) investigated VSTM usage in a block-copying task. During copying, a presented pattern of colored blocks from a model area to the workspace, participants made only minimal use of VSTM. The authors phrased this as “just-in-time” processing strategy, where observers acquire the specific information they need just at the point at which it is required in the task. This makes sense because the information is easily accessible by the sensors—the eyes. This “just-in-time” processing strategy could also be observed in everyday activities (e.g., Hayhoe, Ballard, Triesch, & Shinoda,
2002; Land, Mennie, & Rusted,
1999). Ballard et al. and Hayhoe et al. showed that this strategy cannot be attributed to hard capacity limitations. When the distance between the model and the workspace area was increased (70°), forcing subjects to perform larger eye and head movements, the frequency of the “memoryless” pattern decreased gradually and eye movements occurred about half as frequently (Ballard et al.,
1995). The trade-off between memory load and saccadic behavior thus seems to be controlled by the minimization of combined “costs.” On the memory side, such costs could be associated with the amount of stored information, whereas on the eye and head movement side, costs may arise from duration, energy consumptions, or an increased need for correction saccades at larger saccade amplitudes. A general discussion of costs in terms of time requirements has been given by Gray, Sims, Fu, and Schoelles (
2006). To investigate the trade-off between the usage of VSTM and the execution of eye movements, Inamdar and Pomplun (
2003) used a comparative visual search paradigm developed by Pomplun et al. (
2001). In their study, two identical columns of simple geometrical objects (each column containing 20 objects) had to be compared to detect the single difference (target). To vary the costs of eye movements, three distances between the columns were used (15°, 30°, and 45°). For the larger hemifield separations, fewer inter-hemifield saccades was found. Furthermore, the processing time, that is, the time spent within a hemifield before jumping to the other side, was prolonged for larger hemifield separation. To further increase the time needed for inter-hemifield saccades, the attended hemifield was masked when the saccade started and was unmasked only after a delay of 0, 500, and 1000 ms. Delayed unmasking enhanced the original effects, that is, it decreased the number of gaze shifts (i.e., inter-hemifield saccades) between the two hemifields and it increased the time for processing intervals. Inamdar and Pomplun argued that VSTM usage can be flexibly adapted to optimize task performance as long as the creation of internal representations does not take longer than roughly one second. These findings together with the experiments by Ballard et al. provide strong evidence for the VSTM versus eye movement trade-off in visual tasks. In natural, large-field environments, head movements interact with eye movements to carry out gaze shifts. Within the functional range of eye movements (up to ±22°; Stahl,
1999), head movements are unlikely, but their frequency increases as the gaze saccade amplitude grows (Freedman & Sparks,
2000; Hanes & McCollum,
2006; Land,
2004). Because head movements are more costly than movements of the eye, both in terms of time and energy consumption, the VSTM versus gaze shift trade-off should be shifted toward increased memory use for tasks involving large gaze saccades. Indeed, Ballard et al. presented initial evidence showing that subjects use more memorization in a block-copying task requiring large head movements when making gaze changes.