An early hypothesis proposed that highly detailed point-by-point representations, produced during individual fixations, could be superimposed or fused to form an image-like representation containing information from multiple fixations (e.g., Breitmeyer, Kropfl, & Julesz,
1982; Jonides, Irwin, & Yantis,
1982; Wolf, Hauske, & Lupp,
1978,
1980). Despite the intuitive appeal of “spatiotopic fusion” (Irwin,
1992a) or the “integrative visual buffer” account of transsaccadic integration (McConkie & Rayner,
1976), substantial empirical evidence has shown it to be incorrect (e.g., Bridgeman & Mayer,
1983; Irwin, Brown, & Sun,
1988; Irwin, Yantis, Jonides,
1983; Irwin, Zacks, & Brown,
1990; O'Regan & Lévy-Schoen,
1983; Rayner & Pollatsek,
1983). According to an alternative account, visual detail is lost across a saccade; instead, visual information integration is carried out on a relatively abstract level with visual form being represented in terms of structural or relational aspects of the stimulus and its components (e.g., Carlson-Radvansky & Irwin,
1995; McConkie & Zola,
1979; Pollatsek, Rayner, & Collins,
1984; Rayner, McConkie, & Zola,
1980). Consistent with this view, maintenance of spatial position is relatively poor (e.g., Bridgeman, Hendry, & Stark,
1975; Bridgeman & Stark,
1979; Li & Matin,
1990a,
1990b; Mack,
1970; Pollatsek, Rayner, & Henderson,
1990; Stark, Kong, Schwartz, Hendry, & Bridgeman,
1976; Verfaillie,
1997; Verfaillie & De Graef,
2000; Verfaillie, De Troy, & Van Rensbergen,
1994; Wallach & Lewis,
1966), while relational information is integrated quite accurately (Carlson-Radvansky,
1999; Germeys, De Graef, Panis, Van Eccelpoel, & Verfaillie,
2004; Verfaillie,
1997).