The question as to how the visual motion generated during eye movements can be ‘canceled’ to prevent an apparent displacement of the external world has a long history. The most popular theories (R. W. Sperry, 1950; E. von Holst & H. Mittelstaedt, 1950) lack specifics concerning the neural mechanisms involved and their loci. Here we demonstrate that a form of vector subtraction can be implemented in a biologically plausible way using cosine distributions of activity from visual motion sensors and from an extraretinal source such as a pursuit signal. We show that the net result of applying an ‘efference copy/corollary discharge signal’ in the form of a cosine distribution is a motion signal that is equivalent to that produced by vector subtraction. This vector operation provides a means of ‘canceling’ the effect of eye movements. It enables the extraretinal generated image motion to be correctly removed from the combined retinal–extraretinal motion, even in cases where the two motions do not share the same direction. In contrast to the established theories (efference copy and corollary discharge), our new model makes specific testable predictions concerning the location (the MT–MST/VIP areas) and nature of the eye-rotation cancellation stage (neural-based vector subtraction).

**T**vector (blue) is the retinal image motion that would be present at a particular image location if the observer is translating through the world while not making an eye movement. The

**R**vector (red) represents the image motion produced by a pursuit eye movement to the right and slightly upward while no translation occurs. The net retinal motion that occurs when translation and pursuit occur at the same time is given by the vector sum (

**T**+

**R**) shown as a black arrow. The visual system experiences the

**T**+

**R**retinal image motion but must recover

**T**in order to correctly estimate the body's self-motion parameters and to correctly recover the relative depth of points in the world. It has access to the value of

**R**(through an extraretinal signal), but how is

**T**obtained from

**T**+

**R**? Standard vector algebra tells us that one simply subtracts

**R**from

**T**+

**R**to find

**T**but how does this vector subtraction occur in the brain?

**B**= −

**R**, where

**R**is the image motion generated by the eye rotation ( Figure 3a). In order to subtract off the eye movement vector (

**R**) from another vector (

**A**), we need to add −

**R**(=

**B**) to it. The standard technique for adding two vectors

**A**and

**B**using vector algebra is shown in Figure 3b. If vector

**A**has speed and direction given by (

*V*

_{A},

*θ*

_{A}) and

**B**= (

*V*

_{B},

*θ*

_{B}), then their sum (

**C**) is found by projecting

**A**and

**B**onto the

*X*and

*Y*axes and using the following equations to find the magnitude and angle of

**C**:

**A**and

**B**vectors onto just two orthogonal axes (

*X*and

*Y*) as was done in Figure 3b, each vector is projected onto a series of axes (left-hand side of Figure 4). The axes cover the full 360° range of directions and we will sample this range in 30° steps. Let the angle of each axis be given by

*ϕ*. The projection of

**A**onto this set of axes produces a distribution given by

*i*= 0° to 360° in 30° steps. For the

**B**vector, the distribution is given by

**A**and

**B**. The proof is given below.

*ϕ*

_{ i}), i.e.,

*θ*

_{C}=

*ϕ*

_{ i}, we have

**A**+

**B**), and that this peak amplitude occurs at

*ϕ*

_{ i}=

*θ*

_{C}(the vector sum direction).

**A**and

**B**onto a set of axes

*ϕ*

_{ i}, and summing the

**A**and

**B**projection values, a new cosine distribution is formed with an amplitude and phase (

*V*

_{C},

*θ*

_{C}) corresponding to the speed and direction of the vector sum of

**A**and

**B**. This means that if we had a cosine distribution corresponding to the sum of the translation vector and an eye rotation vector (

**T**+

**R)**, we could add a cosine distribution based on −

**R**and we would end up with a distribution for

**T**. This would solve the problem shown in Figure 3a. The above proof shows that the sum of the two distributions,

*D*(

*T*+

*R*) +

*D*(−

*R*), will have an amplitude

*V*

_{T}and direction

*θ*

_{T}.

**T**+

**R**retinal image motion? The first requirement is that the image motion at a particular retinal location (

*x, y*) is represented in the form of a cosine distribution of activity similar to that shown in Figure 5a.

*x, y*), tuned to a range of directions (0° to 360° in 30° steps). We will further assume that this array outputs a cosine distribution of activity with amplitude

*V*

_{T+R}and phase

*θ*

_{T+R}. Instead of the

*Y*axis on the cosine distribution plots representing the cosine projection value as in Figure 4, the

*Y*axes in Figure 5a depict the activity from a particular velocity sensor tuned to direction

*ϕ*

_{i}. For actual neurons, the neural activity cannot be negative as illustrated in the cosine curves but the positive and negative values could be coded for using two ‘opponent’ sets of velocity sensors in a similar manner to the ‘on’ and ‘off’ systems proposed for neurons in the earlier stages of the visual system (Hubel & Wiesel, 1962; Kuffler, 1952). If at each location (

*x, y*) there exists a velocity sensor

*v*

_{1}tuned to direction

*ϕ*and another (

*v*

_{2}) tuned to

*ϕ*+ 180° then one ‘channel’ could code for

*v*

_{1}−

*v*

_{2}and another for

*v*

_{2}−

*v*

_{1}. Prior to the addition of the activity from the two channels, the two outputs could be half-wave rectified and the inverted polarity of the

*v*

_{2}−

*v*

_{1}channel output could be corrected by use of an inhibitory interneuron. This ‘on’ and ‘off’ system would enable the negative parts of the cosine distributions to be represented as neural activity.

**R**) ( Figure 5b). The addition of these two distributions results in a new distribution corresponding to what would have been produced by the

**T**vector alone ( Figure 5c). We have effectively removed the effect of the eye rotation from the combined

**T**+

**R**motion. It is equivalent to performing the vector operation

**T**+

**R**+ −

**R**, but we believe this is a more physiologically plausible operation to the trigonometric approach shown in Figure 3b. Our method simply requires that certain fixed levels of activity (both excitatory and inhibitory) are added into the outputs of an array of motion analyzers (velocity encoders) at each retinal location (

*x*

_{ i},

*y*

_{ i}) in response to (or in anticipation of) a particular eye movement. We are suggesting that the ‘efference copy’ or ‘corollary discharge’ signal used in the CD/EC theories (Sperry, 1950; von Holst & Mittelstaedt, 1950) consists of a cosine distribution of activity and that this distribution is added to the activity being generated by a directional set of velocity sensors.

**T**+

**R**). As the eyes rotate, a cosine distribution of activity proportional to the eye velocity (−

**R**) is added to the

**T**+

**R**distribution and the new distribution acquires an orientation with its main axes tilted to the left. The new distribution's orientation and amplitude correspond to the image motion that would have occurred if the eye movement had not taken place (

**T**). Notice that the activity in each direction is simply being increased or decreased (the lines are growing and shrinking), yet the overall distribution changes direction. These ‘local’ changes in activity enable the equivalent of vector subtraction to take place at this image location. The mechanism works for a range of image motion velocities and has more power than the basic ‘cancellation’ mechanism described in the original CD/EC theories. It was never clear how these systems dealt with velocity flow fields containing multiple speeds and directions.

*activity*from MT-like motion sensors in the radial direction out from the FOE was summed. Since the MT sensors used in these earlier models are speed tuned and do not code velocity directly, the activity at a particular image location is not in proportion to the image velocity. Here we are assuming that a velocity signal proportional to

**T**+

**R**is available at (

*x, y*) and are suggesting that the component of this velocity signal along the radial direction is summed by the heading detector unit. The component signal is readily available from the cosine distribution of activity generated by our velocity detector array (see Figure 5a) and it is present in the unit coding for the radial direction.

*x*

_{1},

*y*

_{1}), (

*x*

_{2},

*y*

_{2}), and (

*x*

_{3},

*y*

_{3}) and the radial directions out from the FOE location (see gray lines) as

*ϕ*

_{1},

*ϕ*

_{2}, and

*ϕ*

_{3}. The blue vectors (

**T**) represent the image velocity generated by an observer moving in the direction of the FOE (assumed to be 0° azimuth and 0° elevation in the figure). The red vectors (

**R**) are the image velocity vectors produced by a pursuit eye movement to the right and

**T**+

**R**represents the combined retinal image motion that occurs when the forward translation and pursuit occur at the same time. Let the angles of the

**T**+

**R**vectors at each position be

*θ*

_{1},

*θ*

_{2}, and

*θ*

_{3}and their lengths be

*v*

_{1},

*v*

_{2}, and

*v*

_{3}. If no eye movement compensation is in place, the heading detector in the figure sums the component of the

**T**+

**R**vectors in the radial directions. The component for position 1 is indicated by the distance from (

*x*

_{1},

*y*

_{1}) to the point at which the dashed black line meets the

**T**vector direction.

*α*

_{ i},

*β*

_{ i}) will have different values for

*ϕ*

_{1},

*ϕ*

_{2}, and

*ϕ*

_{3}and so the radial components will be different as well. The blue curve in Figure 7b shows the total activity summed across the three blue vectors for a number of such heading detectors, tuned to a range of different azimuth directions (−40° to +40°). This is for the case when no pursuit occurs and the peak of the curve occurs in the heading detector tuned to 0° azimuth. The correct heading direction is indicated by the heading detector in the array with the largest output. The black curve is for the case in which a pursuit eye movement occurs during the translation of the observer and represents the total activity from the black vectors. Notice that the peak in the array of detectors now incorrectly signals that the heading is 15° to the right. Summation of the three

**T**+

**R**components results in an incorrect heading estimate because the components of

**T**+

**R**along the

*ϕ*

_{1},

*ϕ*

_{2}, and

*ϕ*

_{3}directions are not the same as those for

**T**. The pursuit rotation has added an additional component to each of the

**T**vectors.

**R**as

**R**vector onto the radial direction axes (see dashed lines in Figure 8a). This is equivalent to what was carried out in the vector addition demonstration above ( Figure 4) but only three projection axes are being considered here (

*ϕ*

_{1},

*ϕ*

_{2}, and

*ϕ*

_{3}). If vector

**R**has speed and direction (

*v*

_{R},

*θ*

_{R}), the corrected (C) total heading activity for the detector shown in Figure 8a is now given by

**R**component along the radial direction and ensures that the correct velocity component (equal to

**T**) is now summed by the heading detector. This particular heading detector now responds maximally and signals the correct heading (see Figure 8b).

**R**components) at the end of Equation 11 are all known once the pursuit velocity is known. This is because for a given heading direction (i.e., a particular heading detector location) and for a particular image location (

*x*

_{ i},

*y*

_{ i}), the value of

*ϕ*

_{ i}can be found by calculating the direction of the image location relative to the detector location. Since

*V*

_{R}and

*θ*

_{R}are known, it is straightforward to calculate the size of the projected −

**R**component (cosine component) for a particular pursuit velocity. Therefore a fixed amount of total

**E**(efference/corollary signal) activity can be added to the uncorrected total activity within the heading detector and Equation 11 shows that the same total output is produced. This operation is depicted as the addition of a neural signal

*E*(

*α, β, V*

_{R}) to the output of the heading detector tuned to heading direction (

*α, β*) in Figure 8c. The size of

*E*is a function of the heading tuning of the detector (

*α, β*) and the pursuit velocity (

*V*

_{R}) only. It does not depend on the size of

**T**.

*E*activity applied to the whole range of heading detectors. For the heading detector tuned to (0°, 0°), only a small amount of positive efference/corollary signal needs to be added to the total activity from the three

**T**+

**R**vectors (see vertical dashed line). For other heading detectors, the amount of inhibitory or excitatory

*E*activity is higher. When this distribution of activity is added to the ‘uncorrected’

**T**+

**R**distribution (black curve), the result is a distribution corresponding to that produced by the

**T**vectors alone (blue curve) and the correct heading is once again signaled by the heading detector with the largest output (0°, 0°). In this case, however, the correction has been applied after the integration stage and no ‘local’ vector subtraction has occurred (cf. Figures 8a and 8b). Notice that the distributions depicted in Figure 8d have a negative component. This requires that the heading detectors have an ‘opponent’ stage included in their design in order to signal this negative neural activity (see discussion of “on” and “off” systems above).

**T**+

**R**that is required by the new mechanism. Such a velocity signal would be available at a stage after the signals from a number of MT neurons are combined, such as area MST. This is also consistent with the predominance of pursuit signals in MST compared to MT (Erikson & Thier, 1991; Komatsu & Wurtz, 1988; Newsome et al., 1988).

*D*(

*R*)] is exactly equal to the speed of image motion generated by the eye movement (i.e., the gain = 1). This would produce perfect subtraction of the

**R**vector. However it is well established that, under a wide range of conditions, the cancellation of eye rotation induced motion is imperfect (e.g., Freeman & Banks, 1998; Freeman et al., 2000; Mack & Herman, 1973; Turano & Heidenreich, 1996; Wertheim, 1987) and that the gain of the extraretinal signal can be variable (Haarmeier et al., 2001). Effects such as the Filehne illusion (Filehne, 1922; Mack & Herman, 1973) and the Aubert–Fleischl phenomenon (Dichgans, Wist, Diener, & Brandt, 1975) would arise in our model from a less than optimum value being used for the amplitude parameter of the

*D*(

*R*) distribution. This raises the question of how would the visual system ‘calibrate’ a mechanism such as the one we are proposing. The motor signal used to drive the eye needs to be correctly combined with the visual motion signals for the system to work. The CD/EC cosine distribution must be applied with the correct amplitude, but it is not obvious how to initially scale the motor signals so that they match the visual signals. The two signal generators do not share a common coordinate system. For the post-integration scheme depicted in Figures 8c and 8d, the amplitude of the efference/corollary distribution (red curve) is dependent upon the number of vectors in the flow field. This is unknown to the pursuit system and so the amplitude needs to be modulated somehow by the strength of the visual motion signals present. One possibility that has been suggested is to use feedback from visually generated image motion (Haarmeier et al., 2001). However it is not a trivial matter deciding how the visual signals should be combined with the extraretinal signals. An insight into the complexity of the problem can be found in data demonstrating non-linear interactions between extraretinal signals and retinal flow (Crowell & Andersen, 2001; van den Berg et al., 2001). One option we are exploring is to detect and measure the visual rotation components using full-field ‘rotation detectors’ (e.g., Perrone, 1992) and to use this visual signal to control the amplitude of the

*D*(

*R*) cosine in our model. We are currently testing the impact that such a feedback loop would have on our vector subtraction system.