The spatially uniform mislocalization of stimuli flashed around the onset of fast eye-movements (perisaccadic shift) has previously been explained by an inaccurate internal representation of current eye position. However, this hypothesis does not account for the observation that continuously presented stimuli are correctly localized during saccades. Here we show that the two findings are not mutually exclusive. The novelty of our approach lies in our interpretation of the extraretinal signal which, in contrast to other models, is **not** considered an (erroneous) estimate of current eye-position. Based on the reafference principle, our model assumes that the extraretinal signal is optimal in that it accurately predicts the neural representation of the retinal position of a continuously present stimulus. Perisaccadic shift arises as a consequence of maintaining stable perisaccadic position estimates for continuously present stimuli under the physiologically plausible assumption of temporal low-pass filtering in the afferent visual pathway. Consequently, our model reconciles the reafference principle with the finding of perisaccadic shift.

*exR*). The

*exR*is assumed to account for changes in afferent flow which are caused by the movement of the eyes (reafference principle, e.g., von Holst & Mittelstaedt, 1950).

*E*

_{ f}(

*t*) corresponds to the observed localization error of a stimulus flashed at time

*t*after saccade onset. Further,

*h*(

*t*) corresponds to the direction of gaze at the time of the flash and

*exR*(

*t*) denotes the extraretinal signal at the time of the flash. In this context it is assumed that in order to guarantee perceptual stability

*exR*needs to be equal to eye-position:

*exR*(

*t*) =

*h*(

*t*). Hence, it is assumed that

*exR*(

*t*) is the visual system's estimated direction of gaze. The observed pattern of mislocalization can be predicted under the assumption that

*exR*(

*t*) is a damped version of the actual eye-position (see Honda, 1989, Figure 4, or Pola, 2004, Figure 1). Hence, this model is often referred to as the damped eye position model.

*exR*in a way that solves Equation 1. One tentative explanation is that a damped version of eye-position is the visual system's best guess of actual eye-position under the assumption that it is not able to produce signals that change in time as rapidly as eye position. However, it seems unlikely that the same system which controls eye-position by sending out highly time-variant neuronal signals to a sluggish eye-plant should not be able to produce signals that describe these very same changes in eye position.

*τ*. The duration and weight of this epoch is described by

*ξ*

_{0}which represents the persistence of sensory preprocessing in afferent neurons.

*ξ*

_{0}is normalized such that ∫

*ξ*

_{0}(

*τ*)

*dτ*= 1.

*R*

_{ f,t}(

*τ*) corresponds to the time-resolved retinal signal of a stimulus flashed at time

*t*after saccade onset (see Equation 5 for details). Pola has argued convincingly that

*R*

_{ f,t}(

*τ*) is constant and corresponds to the inverse of direction of gaze at the time of the flash:

*R*

_{ f,t}(

*τ*) = −

*h*(

*t*). For example, a flash at 0 deg in craniocentric coordinates presented while gaze is directed 5 degrees towards the right, will drive neurons with receptive fields 5 deg to the left of the fovea. We can rewrite

*R*

_{ f,t}(

*τ*) as

*R*

_{ f}(

*t*) and remove it from the integral. For more details on the use of functions, their arguments and subscripts, refer to 1. Thus, we can reformulate Equation 2:

*ξ*

_{0}equal to a Dirac delta function (see 1 for details). For a wide range of choices of

*ξ*

_{0}we can find an entire family of functions

*exR*which solve the equation. In other words, Equation 2′ describes an infinite number of models of perisaccadic shift, including the damped eye-position model. In order to come up with a unique solution, Equation 2′ needs to be restricted. One way to do so is to measure the persistence of the afferent neurons in question and hence determine

*ξ*

_{0}explicitly. However, it is not obvious which neurons at what level of the visual hierarchy should be considered. Alternatively,

*ξ*

_{0}may be estimated from psychophysical data as has been done by Pola (2004). Following this approach, it is possible to identify a family of models of perisaccadic shift all of which use physiologically plausible temporal dynamics.

*exR*from a setting that is independent from the one used to measure perisaccadic shift. In particular, we start with the following assumption: visual processing is based on and optimized for continuously present stimuli. Thus, assuming that the reafference principle is valid during saccades, we choose

*exR*such that continuously present stimuli are

**not**mislocalized. This restriction is described by Equation 3:

*E*

_{ c}corresponds to the localization error of a continuously present stimulus at time

*τ*after saccade onset. Whenever Equation 3 holds, continuously present stimuli are not mislocalized perisaccadically and the visual world appears stable during saccades. It is essential to point out the difference to previous models which choose

*exR*in order to predict the previously measured perisaccadic shift. Such an approach will usually lead to a violation of the reafference principle, i.e., these models will not be able to explain perisaccadic stability for continuously present stimuli. Here we invert the problem in order to determine the properties of the sensory preprocessing, i.e.,

*ξ*

_{0}, which are necessary to produce perisaccadic shift. In contrast to other models, our approach does

**not**necessarily guarantee a solution.

*ξ*

_{0}which reconcile the reafference principle with perisaccadic shift, i.e., simultaneously solve Equations 2′ and 3. Hence, perceptual stability for continuously present stimuli, i.e., Equation 3, on the one hand and perisaccadic shift, i.e., Equation 2′, on the other hand are not mutually exclusive. Further, we show that the preprocessing parameters

*ξ*

_{0}which simultaneously solve Equations 2′ and 3 have physiologically plausible low-pass filter properties. In contrast, we show that for a wide range of un-physiological choices of

*ξ*

_{0}, our model does

**not**predict perisaccadic shift of realistic amplitude. In particular, we can rule out the case that

*ξ*

_{0}equals a Dirac delta function which underlies the damped eye position model. In summary, we argue that the visual system chooses

*exR*in order to guarantee perisaccadic stability, and in doing so causes perisaccadic shift under the assumption of physiologically plausible temporal low-pass filtering in the afferent visual pathway. In addition, our analysis reduces the dimensionality of the family of potential models of perisaccadic shift described by Equation 2′.

*R*).

*R*is converted to a craniocentric position estimate

*W*by subtracting the extraretinal signal

*exR*.

*N*(

*x, τ*) that corresponded to the linear convolution of the retinal stimulus

*S*with a spatio-temporal receptive field

*ξ*. The activity of the neurons is described by Equation 4 (see Figure 1D):

*x*corresponds to the receptive field position in retinal coordinates and

*τ*indicates time relative to saccade onset. The input

*S*corresponds to the retinal projection of the visual scene coded as zeros and ones, depending on the presence of a stimulus. Note that the temporal kernel

*ξ*

_{0}in Equation 2′ is identical to the temporal aspect of the spatio-temporal kernel

*ξ*in Equation 4, i.e.,

*ξ*

_{0}(

*τ*) =

*ξ*(0,

*τ*). The spatial receptive field was modeled as a Gaussian with a standard deviation of 0.15°, normalized to a maximum amplitude of 1 (see Figure 1B, upper panel). The temporal impulse response was described as a Gamma distribution with various scale and shape parameters (see Figure 1B, lower panel). A gamma distribution can be described as the convolution of

*n*exponential distributions. The shape parameter of the gamma distribution corresponds to the number

*n*; the scale parameter corresponds to the time constant

*λ*of the underlying exponential distributions. Hence, the entire temporal receptive field is determined by the shape parameter

*n*and the scale parameter

*λ*. Unless mentioned otherwise, the

*n*was set to 5. The resulting kernel is a low-pass filter of order

*n*and cutoff wavelength 2

*πλ*. The extent of the kernel in time can be quantified by its standard deviation which is given as

*R*(

*τ*) was derived as the center of gravity of the neuronal activity at time

*τ*in question (see Figure 1D):

*R*(

*τ*) is only defined when an object is present in the scene. Hence, the same restrictions apply to

*W*(

*τ*). Finally, the time-resolved craniocentric position estimate was converted into a global position estimate

*g*(

*τ*) were calculated as the maximum of the neuronal activity at time

*τ*. In addition, the weights were normalized to a sum of one:

*g*(

*τ*) = max[

*N*(

*x, τ*)]/∫max[

*N*(

*x, τ*)]

*dτ*. For stimuli flashed at time

*t*after saccade onset, the weights

*g*are defined by the temporal impulse response function:

*g*(

*τ*) =

*ξ*

_{0}(

*τ*−

*t*).

*exR*(

*τ*) −

*R*

_{ f,t}(

*τ*) for

*W*(

*τ*) and

*g*(

*τ*) for

*ξ*

_{0}(

*τ*−

*t*) we see that for flashed stimuli, Equation 7 corresponds to Equation 2.

*R*

_{ c}is determined and for all subsequent runs

*exR*is set to

*R*

_{ c}, regardless of whether presenting flashed or continuously present stimuli. In a real system that performs saccades of variable direction and amplitude, such a simple mapping is not feasible. Instead,

*exR*actually has to be trained with a number of different saccade vectors.

*ξ*with increasingly slower temporal dynamics, i.e., longer time constants

*λ*(see Methods). Prior to saccade onset the eye is fixating 3° to the right of the stimulus which consequently drives neurons with receptive fields 3° to the left of the fovea. At time 0 the eye starts to move to the left and finally reaches a position 3° to the left of the stimulus which consequently drives neurons with receptive fields 3° to the right of the fovea. Between 25 and 50 ms after saccade onset, the neuronal representation of the stimulus begins to reflect this change in eye-position as the center of gravity of the neuronal activity, i.e.,

*R*

_{ c}, moves from −3° to +3° (green lines in Figure 2, see also Supplementary movie). Note that for slower temporal dynamics, described by increases in the scale parameter

*λ*(panels A through D),

*R*

_{ c}begins to move later and at lower speeds. Accordingly, each of the four models will use different extraretinal signals which, according to Equation 8, are set to

*R*

_{ c}.

*W*

_{ c}(

*τ*) (blue line in Figure 2) accurately remains at 0° during the entire perisaccadic time period. Hence, continuously present stimuli are not mislocalized, regardless of the receptive field properties defined by

*ξ*. In the following we will explore the predictions of the models for flashed stimuli.

*ξ*and

*exR*we can derive the mislocalization of flashed stimuli by solving Equation 2′. Alternatively, we can run the model multiple times and simulate flashes at various times relative to saccade onset. Figures 3A– 3D show the responses of the model from Figure 2B to stimuli flashed at four different times relative to saccade onset. Stimuli flashed before saccade onset or in the first half of the saccade are mislocalized in the direction of the saccade (Figures 3A and 3B). Stimuli flashed in the second half of the saccade and briefly after saccade completion are mislocalized in direction opposite to the saccade (Figure 3D). Figure 3E depicts a summary of mislocalization as a function of flash onset relative to saccade onset. It verifies that the model indeed predicts the biphasic pattern of mislocalization that is typically referred to as perisaccadic shift. It is important to note that the mislocalization starts well before saccade onset and that the maximal amplitude is observed for stimuli flashed around saccade onset. At the same time it is important to note again that the model does not predict mislocalization for continuously present stimuli. Figure 4 shows the predicted perisaccadic shift for the four choices of

*λ*which were used in Figures 2A through 2D. Note that it is only in the conditions with slower temporal dynamics, i.e., conditions with longer time constants, that the predicted mislocalization error reaches an amplitude that is comparable to experimental findings (see below for a quantitative analysis).

*N*(

*x, τ*) in a number of different ways. In the simulations above, we assumed that

*R*corresponds to the center of gravity (CG) of the neuronal activity (see Methods). Alternatively,

*R*may, for example, be defined as the retinal location with the strongest activity (MAX) or be derived by a maximum likelihood method (ML). Several recent studies have provided neuronal mechanisms for ML calculations (Deneve, Latham, & Pouget, 2001) as well as evidence in favor of the brain using ML-like methods (e.g., Knill & Pouget, 2004). Hence, it is important to test how our model is affected by using the MAX or the ML method instead of the CG method to estimate the retinal signal.

*n,*as well as the scale parameter

*λ*(see Figures 6A and 6D for results of the CG and MAX method, respectively). Closer investigation revealed that the mislocalization amplitude could be described reasonably well as a function of the standard deviation of the kernel which is given by

*n*of 5, we can find a good model for the CG method by setting

*λ*equal to 10.6 ms, as the standard deviation of the temporal receptive field, i.e., a gamma-distribution with shape parameter 5 and scale parameter 10.6 ms, will be close to 24 ms:

*λ*to 15.7 ms we will find a good model for the MAX method. Despite having different temporal receptive fields, and consequently different extraretinal signals, these models predict almost identical mislocalization profiles (three examples each are plotted in Figures 6C and 6F).

_{ f,c}(

*t*) are identical and equal to

*E*

_{ f}(

*t*). Hence, the relative error between the flashed and the continuously present stimulus is identical to the absolute error of the flashed stimulus. Further, it is clear from Equation D3 that the relative mislocalization does not depend on the execution of an eye-movement: as the extraretinal signal does not figure into Equation D3, the same pattern of relative mislocalization will be observed if the continuously present stimulus is actively moved across the retina by a saccade or is moved across the retina due to stimulus motion. Consequently, our model makes identical predictions for relative position judgments during real and simulated saccades.

*u*and

*v,*respectively. In analogy to Equations D1, D2 and D3, we define the Equations D1′, D2′ and D3′. Within the framework of our model, these formulas provide a mathematical description of the relative localization for two stimuli flashed at times

*u*and

*v,*respectively:

_{ f,f}

^{ R}(

*u, v*) =

*R*

_{ f}(

*u*) −

*R*

_{ f}(

*v*). In particular, if both stimuli are flashed before saccade onset, the two stimuli will not be mislocalized relative to each other. This is not in keeping with experimental findings. If we use the fact that

*W*

_{ f}(

*τ*) =

*R*

_{ f}(

*τ*) −

*exR*(

*τ*) we see that Equation D2′ yields the same results as Equation D3′. In contrast, Equation D1′ describes the relative mislocalization as the difference in the global craniocentric position estimates. Hence, in line with experimental findings, two stimuli flashed before saccade onset will be mislocalized relative to each other.

**against**the reafference principle. We argue that this interpretation was based on disregarding slow temporal dynamics in afferent visual neurons, which in turn, led to the faulty assumption that the extraretinal signal should represent eye position. Recent modeling work has acknowledged temporal low-pass filtering (Pola, 2004) and considerably changed our interpretation of perisaccadic shift. However, the implications of this low-pass filtering for the retinal signal of continuously present stimuli had not been acknowledged so far. Our model closes this link and hence provides a stringent implementation of the reafference principle. As required by the reafference principle, our model views the extraretinal signal not as an erroneous estimate of eye-position, but rather as an accurate estimation of the retinal signal of continuously present stimulus. By inverting this model we can deduce the preprocessing parameters that predict perisaccadic shift in the framework of the reafference principle. Our results show that for physiologically plausible preprocessing parameters, the reafference model does indeed predict perisaccadic shift. Hence, the reafference principle provides a very simple and elegant account of both, perisaccadic shift

**and**perisaccadic stability.

**why**perisaccadic shift occurs. While previous models could accurately model perisaccadic shift, they did not convincingly answer the question why the extraretinal signal would happen to be chosen in a way that is necessary to predict perisaccadic shift. In our framework, perisaccadic shift follows from two simple principles: perisaccadic visual stability and physiologically plausible temporal dynamics in the afferent visual pathway. The former principle can easily be verified by introspection: the world does appear stable during saccades; the latter has been documented extensively in electrophysiological and psychophysical studies.

**not**mutually exclusive. Hence, to our best knowledge, it provides the first account for erroneous perisaccadic

**relative**position judgments between flashed and continuously present stimuli (Cai et al., 1997; Teichert, Klingenhoefer, Wachtler, & Bremmer, 2008).

**only**be observed with slow temporal dynamics. Our simulations suggest that the standard deviation of the temporal kernels needs to be on the order of 24 and 35 ms for the CG and MAX method, respectively. Thus, we can definitely rule out the damped eye-position model which relies on the assumption that the temporal kernel is a Dirac delta function.

**only**such slow temporal kernels may produce perisaccadic shift in the context of our model. This converging evidence should considerably strengthen the case in favor of the slow temporal kernels. Second, the restrictions imposed by Pola on the set of solutions of Equation 2′, are orthogonal to the restrictions we can impose. While he identifies a single temporal receptive field with the associated family of extraretinal signals, we identify a family of temporal receptive fields each with its unique extraretinal signal derived via Equation 8. These restrictions may be combined to yield a unique model of perisaccadic shift, i.e., we can use the temporal receptive field identified by Pola in our model. Using the CG and the MAX method, respectively, we identify two unique models of perisaccadic shift. For the CG method the predicted mislocalization amplitude exceeds the one observed by Honda (1989), for the MAX method it matches it pretty closely (see Figures 6C and 6E, black dotted line). As there is considerable variability in reported mislocalization amplitude, we do not consider this a definite argument against the CG method. Further, the estimation of the temporal receptive field properties from flicker fusion data is certainly subject to variability. Hence, we regard both of these models are feasible candidates. Note that the two models have very different extraretinal signals. The CG model features a slow exR, while the MAX/ML model features a very fast one.

*exR*that starts moving before saccade onset. This provides further evidence against the common assumption that the mislocalization of stimuli flashed before saccade onset can only be explained by an anticipatory extraretinal signal. This argument certainly does not deny the existence of anticipatory signals that may help to prepare visual areas for upcoming saccades. It merely argues against the involvement of such signals in perisaccadic shift.

*W*(

*t*), is independent of response amplitude. In addition, the weights

*g*(

*t*) are independent of response amplitude because of the normalization step that sets the integral of

*g*(

*t*) equal to one.

_{ c}corresponds to the craniocentric position of the continuous stimulus and Δ is given by our model as calculated by Equations D1′/ 2′/ 3′. In the context of our model,

_{ c}does not correspond to the true location of the continuously present stimulus. Consequently, Equation 9 does not predict the empirically observed mislocalization. However, if we assume that the visual system has an independent way to correctly estimate

_{ c}, Equation 9 would predict the observed mislocalization during simulated saccades. We will not speculate in detail about the mechanisms that may give rise to an accurate estimate of

_{ c}. However, we assume that it would involve neurons with inherently faster temporal dynamics, i.e., the magnocellular pathway, in combination with cross-validation by other sensory-motor systems.

_{c}. Further, it needs to be mentioned that one recent study (Ostendorf et al., 2006) reported compression of space around the onset of simulated saccades. Our model does not predict such behavior (see next paragraph).

*only if*visual stimuli are present in the scene. In the dark, the remapping of receptive fields which is thought to link the pre- and postsaccadic neuronal representation in retinocentric visual areas seems pointless. Under this assumption we would predict no perisaccadic compression during real saccades in the absence of visual references.

*c*/

*f*—qualifies a function as pertaining either to conditions with a

**f**lashed or a

**c**ontinuously present stimulus.

*τ*—denotes time within a trial relative to saccade onset. For example:

*R*

_{ c}(

*τ*) denotes the retinal signal of a continuously present stimulus as a function of time

*τ*after saccade onset.

*t*—specifies a condition in which the flash was present at time

*t*relative to saccade onset. For example:

*E*

_{ f}(

*t*) denotes the craniocentric localization error of a stimulus flashed at time

*t*relative to saccade onset. Note the subtle but important difference between the two arguments

*t*and

*τ*:

*exR*(

*τ*) describes the extraretinal signal as a function of time from saccade onset. In contrast,

*exR*(

*t*) denotes the value of the extraretinal signal at the time the flash was presented.

*x*—indexes a particular neuron. As the neurons are arranged retinotopically,

*x*indicates a particular retinocentric position.

*n*—

**shape parameter**of the Gamma distribution.

*λ*—

**scale parameter**of the Gamma distribution. Corresponds to the time constant of the underlying exponential distribution.

*h*(

*τ*)—

**direction of gaze**as a function of time from saccade onset.

*S*(

*x, τ*)—

**retinal projection**of the visual scene as a function of one-dimensional space

*x*and time

*τ*after saccade onset.

*ξ*(

*x, τ*)—

**(spatio-) temporal receptive field**. A kernel which describes the neuronal processing of the retinal stimulus.

*ξ*(0,

*τ*) describes the one-dimensional temporal receptive field (spatial position

*x*is being held constant at zero). To simplify the notation we use

*ξ*

_{0}(

*τ*) or simply

*ξ*

_{0}to refer to the same expression, i.e., the temporal receptive field.

*N*(

*x, τ*)—

**neuronal activity**in the retinocentric visual area as a function of position

*x*and time

*τ*after saccade onset. In Equation 4

*N*(

*x, τ*) is described as the convolution of

*S*with

*ξ*.

*R*

_{ c/f,t}(

*τ*)

**—retinal signal**defined in Equation 5 as the retinocentric position of a stimulus as estimated from the neuronal activity. For example,

*R*

_{ f,t}(

*τ*) denotes the retinal signal of a stimulus flashed at time

*t*after saccade onset as a function of time

*τ*after saccade onset. As

*R*

_{ f,t}(

*τ*) does not vary as a function of

*τ,*we will at times rewrite the same expression as

*R*

_{ f}(

*t*). Note that

*R*

_{ f}(

*t*) is the inverse of direction of gaze at the time of the flash

*R*

_{ f}(

*t*) = −

*h*(

*t*).

*exR*(

*τ*)

**—extraretinal signal**. In the context of the damped eye-position model,

*exR*is interpreted as the visual system's (erroneous) estimate of eye-position. In the current manuscript

*exR*is defined as the retinal signal of a continuously present stimulus (see Equation 8). Hence

*exR*is not an (erroneous) estimate of eye-position, but an accurate estimate of

*R*

_{ c}.

*exR*can be thought of as the output of the forward model (e.g., Kalveram, 1993) which explicitly predicts the reafference as a function of the efference and the neuronal preprocessing (reafference Principle: von Holst & Mittelstaedt, 1950). If

*ξ*

_{0}is a Dirac impulse our definition and the definition used in the damped eye-position model are identical (except for the sign).

*W*

_{ c/f,t}(

*τ*)—

**instantaneous craniocentric position**. Is derived by subtracting

*exR*from

*R*(see Equation 6). Additional subscripts may specify the instantaneous craniocentric position of the continuous or the flashed stimulus. If the flashed stimulus is specified, a second subscript

*t*may indicate the time of the flash relative to saccade onset.

_{ f/c}(

*t*)—

**global craniocentric position**. For clarity, the second subscript indicating the time

*t*of the flash after saccade onset is now the explicit argument.

_{ f}(

*t*) represents the craniocentric position estimate of a stimulus flashed at time

*t*after saccade onset. Notice the difference to the instantaneous craniocentric position which is expressed as a function of

*τ,*i.e. time after saccade onset.

_{ f}(

*t*)—

**global craniocentric position error**. Because the actual stimulus position was always 0° in craniocentric coordinates, it is identical to the global craniocentric position:

_{ f}(

*t*) =

_{ f}(

*t*).