Prior studies using static scene stimuli have shown that memory for the four object properties listed above survive the saccade (Henderson & Hollingworth,
2003; Hollingworth & Henderson,
2002; Irwin & Zelinsky,
2002; Melcher,
2006; Melcher & Kowler,
2001; Tatler et al.,
2003,
2005). In natural behavior detailed object information is also represented during action, although only task-specific information required at the present moment is extracted (Ballard et al.,
1992; Ballard, Hayhoe, & Pelz,
1995; Droll, Hayhoe, Triesch, & Sullivan,
2005; Triesch, Ballard, Hayhoe, & Sullivan,
2003). Thus the nature of action goals seems to be an important factor in determining what information is represented during natural behavior. How is object information encoded and integrated across viewpoint changes in moving images, and are all types of object property represented equally well? Movie sequences appear to be processed without effort, yet the fact that people are rather poor at detecting changes that happen during an editorial cut (e.g. Levin & Simons,
1997) suggests that extraction and integration of object information across viewpoint changes may be difficult to achieve in moving images. In particular, representing position information in movies may be more difficult than processing other types of information such as color or identity. In static scene viewing object position is coded in relation to the locations of other objects in a larger spatial representation, which provides a contextual frame of reference (Hollingworth,
2007). In dynamic scenes, spatial information may be encoded in much the same way, coding object positions with respect to a larger external frame of reference (such as the screen in which the movie is shown) or with respect to other objects in the scene. However, in such dynamic scenes, coding object position in a representation of the scene requires constant updating of information with respect to changing external frame of reference. This need for constant updating is in contrast to the situation for coding color or identity because these object properties are independent of background or external factors. Moreover, a number of studies have suggested that position information may be represented in a qualitatively different manner from other object information. Position information may be extracted before other sources of information in order to construct an overall spatial layout of the scene (Aginsky & Tarr,
2000; Rensink,
2000; Tatler et al.,
2003). Similarly, how position information is accumulated over fixations has been found to be distinct from other object properties. In static scenes memory for position appears to accumulate over successive fixations (Melcher,
2006; Tatler et al.,
2005), whereas the accumulation of color and identity information appears less consistent, with some studies observing no accumulation (Tatler et al.,
2005) and others showing an opposite pattern (Hollingworth & Henderson,
2002; Melcher,
2006). In moving images we may therefore expect to find differences in the sub-structure of visual representations for different object properties, in particular between those properties that can be coded independent of external factors (e.g. color, identity) and property such as position that requires some form of external frame of reference.