The temporal dynamics of visual processing have led to the idea that visual perception may be described as a “discrete” process (
Freeman, 2006;
Herzog, Drissi-Daoudi, & Doerig, 2020;
VanRullen, 2016;
VanRullen & Koch, 2003). Evidence for this idea comes, for example, from the wagon wheel illusion – a wheel in movies or TV seemingly rotating backwards – which is supposed to occur when the “sampling rate” of visual processing and the temporal frequency of the stimulus are misaligned (
VanRullen, Reddy, & Koch, 2005). However, studies have shown that visual temporal integration windows vary in size, ranging from a few tens of milliseconds, which is much briefer than a fixation, to several seconds for event perception, encompassing multiple unique fixations (for a review, see
Pöppel 1997;
Pöppel 2009). Notably, the saccade rate is slower than the proposed 7 to 10 Hz rates of discrete perception that has most often been suggested. Moreover, studies of trans-saccadic transformational apparent motion, for example, show that event perception boundaries can start before, and continue after, a saccade (
Fracasso, Caramazza, & Melcher, 2010). Specifically, when an object changes shape and position, with the first frame shown prior to saccade onset and the second frame (change in shape and position) is shown after the saccade, a smooth and continuous motion event is perceived in external space, not a series of discrete snapshots (
Fracasso et al., 2010). If temporal integration windows can be longer or shorter than a fixation duration, this raises the question of why saccades are not made more often. Making saccades 10 times per second, for example, would seem to better maximize the potential intake of visual information. One possibility could be that time is required to not only process the information at the fovea but also to form a prediction about the future saccade target, as found in the extrafoveal preview effect. The saccade rate, then, would be limited by the temporal dynamics of object perception (and word perception in reading), with a time frame of around 3 to 5 cycles per second (
Drewes, Zhu, Wutz, & Melcher, 2015;
Wutz, Muschter, van Koningsbruggen, Weisz, & Melcher, 2016). Visual perception would then reflect a hierarchy of different temporal scales, ranging from tens of milliseconds to seconds, with the fixation duration reflecting a complete “sample” of visual processing (
Barczak et al., 2019) in retinotopic coordinates, in order to then provide information about the foveated target and the future saccade target, supporting stable perception across saccades beyond retinotopic coordinates (
Melcher & Morrone, 2015). Previous studies of the time frame of object categorization (
Drewes et al., 2015;
Lamme & Roelfsema, 2000;
Thorpe, Fize, & Marlot, 1996), as well as studies on the time point at which visual input transforms from retinotopically-defined to non-retinotopic and consciously available percepts (
Fabius et al., 2020) tend to converge on a time scale of around 150 to 200 ms for the duration of the initial, retinotopic integration window. Interestingly, as the preview paradigm shows, this temporal scale resembles the time period in which pre- and post-saccadic visual input is integrated (
Huber-Huber et al., 2019) and the period during which several types of visual predictions based on preceding events take effect (
Johnston, Robinson, Kokkinakis, Ridgeway, Simpson, Johnson, Kaufman, & Young, 2017). Thus, further investigations of the timing of the extrafoveal preview effect may provide further insight into how temporal dynamics of vision and of the oculomotor system may interact in the timing of gaze movements in natural tasks.