**Motion along curved paths (curvilinear self-motion) introduces a rotation component to the radial expanding patterns of visual motion generated in the eyes of moving animals with forward-facing eyes. The resultant image motion (vector flow field) is no longer purely radial, and it is difficult to infer the heading direction from such combined translation-plus-rotation flow fields. The eye need not rotate relative to the head or body during curvilinear self-motion, and so there is an absence of efference signals directing and indicating the rotation. Yet the eye's rotation relative to the world needs to be measured accurately and its effect removed from the combined translation–rotation image motion in order for successful navigation to occur. I demonstrate that to be able to account for human heading-estimation performance, the precision of the eye-in-world rotation velocity signal needs to be at least 0.2°/s. I show that an accurate estimate of the eye's curvilinear motion path through the world can be achieved by combining relatively imprecise vestibular estimates of the rotation rate and direction with visual image-motion velocities distributed across the retina. Combined visual–vestibular signals produce greater accuracy than each on its own. The model can account for a wide range of existing human heading- and curvilinear-estimation psychophysical data.**

*focus of expansion*, FOE) corresponding to the heading direction (Gibson, 1950). These global patterns of image motion (called the

*flow field*) can be represented using velocity vectors, as depicted in Figure 1a.

*V*

_{i}|,

*θ*

_{i}) located at positions (

*x*

_{i},

*y*

_{i}). If the heading tuning of MSTd unit #1 coincides with the position of the FOE (

*x*

_{H},

*y*

_{H})—which indicates actual heading direction (Az

_{H}, El

_{H})—then the velocity vectors will be closely aligned with the radial directions

*α*

_{i}joining (

*x*

_{i},

*y*

_{i}) and (

*x*

_{H},

*y*

_{H}). For an MSTd unit tuned to another heading direction that does not coincide with the actual heading (MSTd unit #2), the radial directions do not line up as well with the direction of the velocity vectors and there will often be a difference

*β*

_{i}between

*θ*

_{i}and

*α*

_{i}, and so the projection of the velocity vector onto the incorrect radial direction (the dot product) will be less than for the case in which

*β*

_{i}= 0°. The sum of the projections (Σ

*V*cos

*β*

_{i}) will be maximum when the actual heading lines up with the heading tuning of the detector (Perrone, 1992). This is why these models are referred to as template models: They are looking for the radial pattern in the model set of MSTd units that matches the one generated while the eye is translating in the direction (Az

_{H}, El

_{H}).

_{H}= −10°, elevation El

_{H}= 0°). The most active unit is the one tuned to (−10°, 0°), and so the model is able to extract a heading direction from 2-D patterns of image motion generated during pure translation of the observer. These template models have been shown to be robust to the effect of input noise (perturbation of velocity-vector directions) and to account for a wide range of MSTd neuron properties (Perrone & Stone, 1994; Perrone & Stone, 1998).

*V*

_{T}in the flow field is perturbed by the image-motion vector

*V*

_{R}created with just the rotation, such that the resultant motion is given by the vector sum of the two (

*V*

_{T+R}). An array of MSTd-like heading templates (Figure 2b) incorrectly signals that the heading direction is at (20°, 0°). The addition of flow vectors caused by rotation is what makes the heading-estimation problem so hard (Regan & Beverley, 1982).

*V*

_{T+R}vector at each image location depending on the depth separation of the points. The remaining flow information used to determine the heading (and depth) represents only a fraction of the original flow field.

*V*

_{T+R}vector to determine the rotation. The Perrone (1992) scheme used a projection of the vectors onto candidate template rotation directions, and except for very distant points, this was only a small part of the full vector. The Beintema and van den Berg (1998) templates only measure the rotation for directions orthogonal to the radial-translation vectors. The dynamic-perspective-cue models (H. R. Kim et al., 2015; Sunkara et al., 2016) rely on small deviations from planar motion that become significant only at the edges of the flow field and when the field of view is relatively large. The visual motion in the central region of the flow field lacks the parallax information used for determining the rotation.

_{T}flow field and heading

*V*

_{T+R}−

*V*

_{R}) to recover the pure translation vector

*V*

_{T}by adding a known efference signal to each of the MSTd heading templates (Perrone & Krauzlis, 2008).

*V*

_{R}) depends only on the rotation rate and direction. Therefore, for a particular location in the image, the projection of the

*V*

_{R}vector onto the radial direction specified by a particular heading template can be calculated. The rotation vector could also be subtracted directly from the

*V*

_{T+R}vector at that location, but psychophysical evidence suggests that such local vector subtraction does not occur in the human visual system (Beintema & van den Berg, 2001). In order to allow for this constraint, we showed that local vector subtraction is equivalent to a particular amount of activity being subtracted from the individual heading templates (Perrone & Krauzlis, 2008). Figure 3 shows this process being applied to the template activity generated by the flow field shown in Figure 2a.

*V*

_{T}flow field for depth estimation (Perrone & Krauzlis, 2008).

*R*

_{w}) and so the visual-image motion on the retina is perturbed in the same way as when just the eye or head rotates (Figure 2a), but the brain has not sent any signals indicating that the eye should move relative to the head or the head relative to the body. Curvilinear self-motion generates image rotation, but there is no efference signal available to cancel it. The fact that we can solve the rotation problem while driving indicates that proprioceptive signals from our lower limbs are not essential for this process.

*V*

_{T+R}−

*V*

_{R})? This is an empirical question that can be answered using models of heading estimation such as already described (Perrone & Krauzlis, 2008), and I address it later.

*V*

_{T+R})

_{i}, how precise must the estimate of

*V*

_{R}be in order that (

*V*

_{T+R}−

*V*

_{R})

_{i}results in sufficiently precise

*V*

_{Ti}values across the image to provide heading estimates (Az

_{H}, El

_{H}) with a precision that matches human psychophysical data? The threshold

*σ*

_{H}for heading estimation with and without added rotation has been established as being in the region of 1.0°–1.5°, and this band is also often cited as the precision required for safe navigation (Cutting, 1986). Studies that have examined the threshold for heading during curvilinear self-motion are less common, but the indication is that it is also in the 1.0°–1.5° range for moderate rotation rates (Banks et al., 1996; Stone & Perrone, 1997; W. H. Warren, 2003; W. H. Warren et al., 1991). I will use the study by W. H. Warren et al. (1991) as a representative example of these studies, because it used a range of stimulus conditions and explicitly reported on the heading thresholds. That study found heading thresholds in the region of 1.0°, but that depended on the type of simulated 3-D environment and the rate of rotation. Higher rates of rotation generated higher thresholds (∼1.5°), as did ground planes.

*σ*

_{r}). It has been reported that the rotation threshold is a function of the frequency of the test stimulus, but the 1°/s value is close to the estimate for an intermediate test frequency (0.2 Hz) that was obtained from a model fit to a range of test frequency data (Grabherr et al., 2008).

*σ*

_{r}) to the rotation-compensation stage (

*V*

_{T+R}−

*V*

_{R}), we can estimate the variability (standard deviation) in the heading estimates and compare this to human performance. This can be done separately for the

*V*

_{R}vector magnitude (rate) and for the direction. This estimation assumes that a mechanism similar to that shown in Figures 1 and 2 underlies human heading perception, with some type of vector-subtraction process involved (Perrone & Krauzlis, 2008). Given that the technique outlined in Figure 2 uses all the available motion information and noise-free vectors, it could be considered an ideal-observer model with derived

*σ*

_{H}values representing optimum heading performance. Note that estimates of

*V*

_{R}can also lack accuracy and can be biased to higher or lower rates. This would predict biases in heading estimates; these will be discussed separately later when the full model is tested.

*V*

_{R}|,

*ϕ*

_{R}). From each vector (|

*V*

_{T+R}|,

*θ*

_{T+R}) in the test flow field is subtracted the equivalent of a compensation vector (|

*V*

_{R}| +

*N*

_{r},

*ϕ*

_{R}+

*N*

_{d}), using the rules of vector subtraction.

*N*

_{r}and

*N*

_{d}are noise components for rate and direction, respectively, drawn from a normal distribution with means (

*μ*

_{r},

*μ*

_{d}) = (0, 0)°/s and standard deviations (

*σ*

_{r},

*σ*

_{d}). The vector subtraction occurs at the level of the heading detectors (see Perrone & Krauzlis, 2008), and the noise is assumed to arise from a single rotation-detecting unit. The same level of noise is therefore assigned to each heading detector. It is assumed that this is the noise that is applied during the epoch of the flow field (1/15 s in the model simulations). The noise occurring at the local vector level and over many frames was simulated by running multiple trials (30) and generating a different flow field for each heading tested. Values of (0, 0.1, 0.18, 0.5, 1.0)°/s were used for

*σ*

_{r}. The

*σ*

_{d}values tested were (0°, 6°, 15°, 20°, 25°). When the

*σ*

_{d}values were tested,

*σ*

_{r}was set to 0.18°/s.

*c*= −3° to 3° in 0.1° steps), the proportion of times that the model's heading estimate was greater than

_{i}*c*(hits) was plotted against the number of times the model's estimates for the 0° test input was greater than

_{i}*c*(false alarms). This procedure produces a receiver-operating-characteristic curve (Gegenfurtner et al., 2003; Green & Swets, 1974) for each of the test heading directions. The area under each curve provides a proportion-correct value equivalent to that found from a two-alternative forced-choice psychophysical procedure (Green & Swets, 1974). An example of the derived proportion-correct values from the simulation is shown in Figure 5a, fitted using a cumulative Gaussian function (solid curves). The standard deviation of this function was divided by

_{i}*σ*

_{r}and

*σ*

_{d}values listed previously are plotted in Figure 5b and 5c, with the error bars representing the standard deviation derived from 12 runs of the threshold-estimation procedure. For rate sensitivity (Figure 5b), it is apparent that the rotation precision (1.0°/s) that can be derived from a purely vestibular source (MacNeilage, Turner, & Angelaki, 2010) is insufficient to account for the human heading-threshold performance found by W. H. Warren et al. (1991; 1.0°, dashed line). It would result in heading discrimination thresholds of around 7° at the test rotation rate used in their experiment (1.36°/s). The rotation-rate discrimination performance needs to be around 0.2°/s to explain the human data (see arrow in Figure 5b).

*ϕ*of

*V*

_{R}(Figure 5c), the precision of the estimates is a little less demanding; for the low rotation rate used in the test, at least, a modest

*σ*

_{r}value of 8° to 10° (arrow in Figure 5c) still produces heading-discrimination values less than or equal to the 1.0°/s human-data line. Currently there are no human data indicating what the vestibular direction-sensitivity

*σ*

_{d}equivalent value is for the

*σ*

_{r}= 1.0°/s found by MacNeilage, Turner, & Angelaki (2010).

*σ*

_{r}≤ 0.2°/s). The direction component of the rotation is less demanding but still requires a precision of around 10° if heading performance is not to be overly compromised (Figure 5c).

*x*,

*y*) during a combined forward translation of the eye (creating image motion

*V*

_{T}) and a body rotation about the vertical and horizontal world axes. Let the yaw and pitch rates equal (

**ω**_{Y},

**ω**_{P}), such that the body rotation generates image motion

*V*

_{R}. Roll rotation around the axis coinciding with the heading direction (

**ω**_{Roll}) is not considered in the model because it creates a rotation vector orthogonal to the pure translation vector

*V*

_{T}, and hence does not change the length of this vector. It can therefore be ignored in the equations to follow.

*V*

_{R}vectors all have the same magnitude and direction. My technique for deriving the rotation is designed to be tested with video sequences derived from standard video cameras and to be used for computer-vision systems with regular cameras. I therefore assume a planar projection from the world onto the image plane. For wide-angle lenses this creates a pincushion distortion when a rotation vector field is projected onto it and the vectors are not all identical. The value of

*V*

_{R}varies slightly depending on which part of the image the motion is occurring in. However, the amount of distortion is known, given knowledge of the optics (e.g., the field of view of the camera) as well as the vector location, and can be compensated for prior to the extraction of

*V*

_{R}. For the field-of-view values tested in this article the distortion is very small and has minimal impact on the estimates of rotation. The vectors in the flow field are assumed to be derived from video image sequences using some sort of flow-extraction algorithm (e.g., Perrone, 2012). If the model is to be used for wide field-of-view tests (>90°), it is recommended that a pincushion compensation algorithm be applied to the input image sequence (e.g., Bouguet, 2015). For extremely wide-angle inputs (>120°), some sort of hemispheric sensor system could be considered. In order to demonstrate the optimum performance of the model in the experiments reported later, I have applied a distortion-compensation algorithm to the flow field, and all of the

*V*

_{R}vectors are close to being identical.

**ω**_{Y},

**ω**_{P}) creates image motion at (

*x*,

*y*) given by the vector

*V*

_{T+R}. The magnitude of the vector

*V*

_{T}is unknown because it depends on the (unknown) distance of the point in the world generating the image motion at (

*x*,

*y*). We do know that the direction of

*V*

_{T}is pointed outward from the FOE location (Gibson, 1950) and is aligned with the radial direction out from a retinal position (

*x*

_{H},

*y*

_{H}) corresponding to the unknown heading direction (Az

_{H}, El

_{H}). Figure 6b depicts the problem that must be solved: Given

*V*

_{T+R}, is it possible to find

*V*

_{R}(and hence

*V*

_{T})? For the compensation mechanism, we require just

*V*

_{R}, and not the rotation relative to the world; the removal of the rotation is all done in a retinal coordinate system (Perrone & Krauzlis, 2008). If we can find

*V*

_{R}, it is possible to recover (

**ω**_{Y},

**ω**_{P}), but it requires a transformation from retinal coordinates to world coordinates (Koenderink & van Doorn, 1975; Longuet-Higgins & Prazdny, 1980).

*V*

_{T}is dependent on unknown variables such as the depth of the points in the world, but we can derive it from

*V*

_{T+R}once we have estimated

*V*

_{R}, because they are linked by the rules of vector addition. In the treatment to follow, all angles are measured relative to arbitrary horizontal and vertical axes on the image plane, with 0° corresponding to the rightward horizontal direction.

*V*

_{T+R}is equal to

*θ*and its magnitude is |

*V*

_{T+R}|. Let

*α*be the angle between the unknown FOE position (

*x*

_{H},

*y*

_{H}) and the location of the

*V*

_{T+R}vector (

*x*

_{i},

*y*

_{i}). Let the angle of the (unknown)

*V*

_{R}vector be

*ϕ*. Note that this is defined in retinal coordinates. Using trigonometry, it can be shown that the magnitude of

*V*

_{R}is related to

*V*

_{T+R},

*α*, and

*ϕ*via the following equation:

*V*

_{R}vector is primarily determined by the cosine component of the

*V*

_{T+R}vector projected onto the axis defined by the direction of the

*V*

_{R}vector (

*AB*in Figure 6b). The projected vector length is given by |

*V*

_{T+R}|cos(

*θ*−

*ϕ*), and

*V*

_{R}can be found from this by subtracting a correction factor that is a function of |

*V*

_{T+R}|sin(

*θ*−

*ϕ*) and a tangent term dependent on the heading and rotation directions (Equation 1). For very distant points, the magnitude of the

*V*

_{T}vector is very small and

*V*

_{T+R}≈

*V*

_{R}. Therefore, the rotation could be determined approximately by summing just the |

*V*

_{T+R}|cos(

*θ*−

*ϕ*) values across many vectors and a (constrained) set of potential

*ϕ*

_{i}values (e.g., 0°–330° in 30° steps).

*V*

_{R}could then be determined by finding the

*ϕ*

_{i}value with the maximum activity. This is the scheme I originally suggested for purely visual estimation of rotation (Perrone, 1992). For some scenes with a dominance of distant points, it provides reasonably accurate results (see figure 9 in Perrone, 1992). However, for many situations the rotation rate is overestimated, and large errors for

*V*

_{R}and

*ϕ*can occur. For this reason, the full and exact version of the rotation equation with the

*α*and

*ϕ*values has been adopted.

*ϕ*and

*α*. The first,

*ϕ*, is constrained to lie between 0° and 360° and, as will be shown later, can be subsampled relatively sparsely (30° steps) without a great loss of precision in the rotation-direction estimates. The second unknown,

*α*, is directly related to the vector location (

*x*

_{i},

*y*

_{i}) and the (unknown) heading direction (Az

_{H}, El

_{H}):

*x*

_{i},

*y*

_{i}) is the image location of the vector and FOE

*and FOE*

_{x}*are the*

_{y}*x*- and

*y*-coordinates of the unknown FOE that results from movement along the unknown heading direction (Az

_{H}, El

_{H}).

*V*

_{R}is undefined for particular values of

*α*and

*ϕ*such that

*α*−

*ϕ*= 0. We can examine the way

*V*

_{R}changes across different values of

*ϕ*in order to gain an intuition as to how the new curvilinear detection algorithm works by considering the simple situation depicted in Figure 7a. I will begin with some very strong simplifying assumptions. Only four points are assumed to be visible in the visual field (origins of vectors in Figure 7a), and the forward motion is set so that it is directed toward the center of the field (Az

_{H}, El

_{H}) = (0°, 0°). This latter assumption is in place solely to illustrate one particular property of the new mechanism, and it will be removed later. At the same time that it is moving forward, the body is assumed to be rotating to the right at 4°/s about a vertical axis through the center of rotation of the head/body (

**ω**_{Y}= 4°/s,

**ω**_{P}= 0°/s); this generates the image motion indicated by the red vectors.

*V*

_{R}| derived from Equation 2 for each of the four different vectors in Figure 7a and for a range of candidate

*ϕ*values (

*x*-axis). The value of

*α*for each vector is assumed to be known exactly in this plot, and

*ϕ*has been sampled finely in 1° steps so that the curves are smooth. Also, combinations of

*α*and

*ϕ*that result in undefined values of |

*V*

_{R}| (see earlier) have been avoided by not considering |

*V*

_{R}| values greater than 12°/s. All of the curves cross (intersect) at the actual curvilinear rotation rate (4°/s) and the angle 180° from the actual plane of rotation (0°). This 180° flip is because the algorithm is detecting the image-motion direction, which is opposite to the actual direction of rotation.

*V*

_{R}| and

*ϕ*into a set of candidate rates (e.g., 0°/s to 16°/s in 1°/s steps) and directions (e.g., 30° steps), it is possible to see that the most commonly occurring value of |

*V*

_{R}| from Equation 2 occurs at the correct rate and direction (Figure 7c). The technique relies on the fact that the correct |

*V*

_{R}| value and direction occur the most often. Because image motion that results from rotation is independent of the depth of the points in the world (Koenderink & van Doorn, 1975; Longuet-Higgins & Prazdny, 1980), it is constant across the image (assuming the wide-angle edge-distortion effects have been corrected), as can be seen by the red vectors in Figure 7a. Each image location has the same 2-D-motion rotation component.

*V*

_{R}| and

*ϕ*value. The tuning would come about via a particular (sine) weighting (based on Equation 2) being applied to the connection from the motion sensors at particular (

*x*,

*y*) image locations and the curvilinear detector unit. Neurons tuned to |

*V*

_{R}| and

*ϕ*would generate the most activity when the actual curvilinear rotation matched |

*V*

_{R}| and

*ϕ*.

*α*in Equation 2 were known). This is obviously a big assumption and an unrealistic one, because we are attempting to find the rotation so that we can derive the heading (see Figure 3) and so that the algorithm becomes circular. This assumption can be removed by testing a number of candidate heading directions in the same way that heading estimation has been carried out (see Figures 1 and 2). We have previously shown (Perrone, 1992; Perrone & Stone, 1994; Perrone & Stone, 1998) that heading can be sampled with a sparse array of candidate directions. For the simulations presented in the remainder of the article I will use a set of azimuth and elevation values that range from −80° to 80° in 10° steps (a 17 × 17 regular array), but other configurations are possible (see Perrone & Stone, 1994).

_{H}= −60° to 60° in 30° steps, El

_{H}= 0°) to prevent the plot becoming too dense with lines. If we base the values of

*α*in Equation 2 on this smaller set of (Az

_{H}, El

_{H}) values (using Equation 3), we obtain the

*ϕ*-versus-|

*V*

_{R}| plot shown in Figure 8.

*V*

_{R}| values is much more complex. This is for the case of a subset of the full 17 × 17 array of possible (Az

_{H}, El

_{H}) values, but even with this small number, the

*ϕ*-versus-|

*V*

_{R}| plot has many more overlapping curves. Applying the same binning technique that was used in Figure 7 reveals that the peak activity occurs at the incorrect (|

*V*

_{R}|,

*ϕ*) location of (3°/s, 150°). This is an extreme case, and it will be shown later that a larger number of vectors will usually result in the correct peak being identified, but I am using it to illustrate how vision alone (at least for a small number of vectors) is not adequate for curvilinear detection based on the algorithm specified in Equation 2.

*V*

_{R}|,

*ϕ*) derived from the curvilinear detection mechanism just outlined (Equation 2).

*ϕ*. The semicircular canals, possibly in conjunction with the otoliths (Angelaki & Cullen, 2008), could signal that rotation is occurring in a plane somewhere in the region

*ϕ*±

*γ*(Figure 9a). If, for example, the actual plane of rotation = 0° (i.e.,

*ϕ*= 180°) and

*γ*= 30°, then the solution space could be constrained to the region 180° ± 30° (see vertical dashed lines in Figure 9a). This constraint greatly reduces the possible curvilinear solutions and eliminates many of the incorrect regions of overlap in the (|

*V*

_{R}|,

*ϕ*) space (compare Figures 8a and 9a).

*V*

_{R}|,

*ϕ*) solution space.

*α*values used. There is psychophysical evidence that a vestibular signal indicating forward translation can influence the perception of curvilinear paths (Bertin & Berthoz, 2004). However psychophysical measurements of heading-discrimination ability from purely vestibular sources indicate a range of estimation thresholds from medium values in the region of 6°–9° (MacNeilage, Banks, DeAngelis, & Angelaki, 2010; Telford et al., 1995) through to very high thresholds of 30° (Nooij et al., 2016). If vestibular heading were to be included as a constraint, it could not be a very narrow one based on human data. Conflicting evidence from monkey studies indicates quite precise heading-discrimination thresholds (vision: 1.2°; vestibular: 1.1°; both: 0.4°), with a strong influence of a vestibular heading signal (Fetsch et al., 2009; Fetsch, DeAngelis, & Angelaki, 2010). I have found the addition of an

*α*constraint to have minimal impact on the performance of the curvilinear model over the relatively narrow range of heading directions currently tested. It may become more useful when heading is allowed to span a wider range. For now, I have omitted this possible source of vestibular information, but I will revisit the option once more human psychophysical data are available.

*V*

_{R}|,

*ϕ*) units.

*V*

_{R}| is 5°/s to the left around a vertical axis. The image-motion direction

*ϕ*we are trying to determine is therefore 180°. Given the observer's line-of-sight direction, the (unknown) heading (Az

_{H}, El

_{H}) = (10°, −10°), shown as the red square in Figure 10a.

*ϕ*values used 1° steps to provide continuity to the curves in the plots, but values between 0° and 330° in 30° steps will now be used. The plane of rotation is actually constrained to the range 0° to 150° if both positive and negative values of

*V*

_{R}are considered, but for ease of plotting,

*V*

_{R}is restricted to positive values and a wider

*ϕ*range has been adopted. Much of this range for the plane of rotation is probably out of the bounds encountered during normal human locomotion (except perhaps for the directions close to the cardinals). The rate

*R*is sampled in 1°/s steps from 0°/s to 16°/s, but this is arbitrary and could be sampled more or less finely and over a greater range if future data become available as to the limits of human curvilinear path detection.

*α*in Equation 2 is sampled using an array of candidate heading directions spanning −80° to 80° in 10° steps for both azimuth and elevation. There is no specific justification for these parameter selections other than the fact that the model performs well with these settings and they simplify the coding of the model simulations. Heading space could be sampled using nonlinear schemes that extend over a greater range (see Perrone & Stone, 1994). The model performance turns out to be robust to the choice of

*α*sampling values.

*V*

_{R}|,

*ϕ*) location by the peak of the distribution, so that the values range from 0 to 1.0. The input vector field produces a distribution with a peak close to the true value, but it is quite broad (Figure 10b). In order to refine the

*V*

_{R}estimate, a vestibular signal is added to the distribution in the form of a 2-D Gaussian with amplitude = 1.0, mean (

*μ*

_{R},

*μ*) = (|

_{ϕ}*V*

_{R}|,

*ϕ*), and standard deviations

*σ*

_{r}and

*σ*

_{d}for the

*R*and

*ϕ*directions, respectively. For the Figure 10 simulation,

*σ*

_{r}was set to 1.0°/s based on the data from MacNeilage, Turner, & Angelaki (2010), and

*σ*

_{d}was set to 30°. Note that the

*ϕ*direction for the visual signals is in a retinal coordinate system. This is what is required for the vector operation required for the rotation-compensation mechanism to work. However, the signal from the vestibular system is assumed to be relative to the world (defined by gravity). If the head is tipped sideways, for example, while moving around a curved path, then the visual and vestibular

*ϕ*values need not line up. I am assuming that the appropriate transformations have occurred (Angelaki & Cullen, 2008) and that the added vestibular rotation-direction signal is aligned with the correct

*ϕ*direction.

*ϕ*axis is assumed to be very broad (

*σ*

_{d}= 180°), because with zero rotation, direction is not defined. The values of

*σ*

_{d}for

*R*= 1°/s–3°/s were similarly set to 100°, 90°, and 45°, respectively, in the model to reflect a lack of direction precision at low rotation rates. These values were not found to be critical in the simulations reported in this article.

*ϕ*angles of 150°, 180°, and 210°. The distributions of

*R*

_{est}values are not symmetrical, and so it is possible that an estimate for

*R*based on the centroid could be biased upward if too much of the long tail is included. A more accurate estimate would result if some of the activity were excluded from the centroid estimate via some sort of thresholding that prevents activity below a certain level being passed to the centroid-estimation stage. On the other hand, the elimination of too much activity (by using a very high threshold value) removes information that could help determine the correct

*R*or

*ϕ*value. I have opted for a threshold value of 0.55 of the peak across the whole distribution (see dashed line in Figure 11a) because it produces the most accurate rotation estimates for the stimulus conditions tested. This value has been fixed and used in all of the simulations described in this article—except for one where human psychophysical data was matched (see Figure 16 later). Therefore, for each candidate rotation direction

*ϕ*, a weighted vector average is calculated to provide an estimate of the rotation rate

_{j}*R*

_{est}:

*nr*is the number of

*R*values with nonzero VVA values after the threshold has been applied and

*R*= 0°/s to 16°/s in 1°/s steps. For the test shown in Figure 10, the

*R*

_{est}value comes out as 5.0°/s for the 180° direction, which is a perfect match to the true 5°/s value. A small number of other

*ϕ*directions also generate nonzero

*R*

_{est}outputs, and the final output is assumed to be the maximum across the 12 different

*ϕ*angles.

_{j}*ϕ*

_{est}, the nonzero

*R*

_{est}values for all directions

*ϕ*are represented as vectors (blue vectors in Figure 11b). For each angle

_{j}*ϕ*(

_{j}*j*= 0° to 330° in 30° steps), the vector sum of the individual vectors represented by magnitude

*R*

_{est}(

*ϕ*) and direction

_{j}*ϕ*are summed using vector summation, and the angle of the resultant vector is used as an estimate of the rotation direction

_{j}*ϕ*

_{est}. This scheme makes use of interpolation between the 30° sampling used for

*ϕ*and tends to be more accurate than a mechanism that uses just the maximum

_{j}*R*

_{est}direction. For the test shown in Figure 10a, the rotation direction was found to be 179.0° (red vector in Figure 11b), which is very close to the true input direction (180°).

*R*

_{est},

*ϕ*

_{est}) unless one has knowledge of the observer's forward speed

*T*, since the radius =

*T*/

*R*. Similarly, obtaining the rotation relative to the world (

**ω**_{Y},

**ω**_{P}) requires a transformation from the retinal-coordinate-frame-based

*ϕ*

_{est}value to a world coordinate frame, and this most likely requires a vestibular signal from the otoliths (Angelaki & Cullen, 2008). The retina-based

*ϕ*

_{est}output is sufficient for the purposes of rotation compensation, however, because the operation occurs in a retinal coordinate system (Perrone & Krauzlis, 2008).

*SD*s) for each of the different rotation rates were 0.13 (0.01), 0.08 (0.01), 0.08 (0.01), and 0.16 (0.01). For the direction estimates (Figure 12b), threshold values from the model were all below the 9° limit required for accurate heading estimation (Figure 5b) except for the very lowest rotation rate tested (0.75°/s). The model is therefore able to generate a sufficiently precise signal to account for the 1° heading threshold figure required for safe navigation.

_{H}, El

_{H}) = (0°, 0°), and the rate and direction of the curvilinear rotation were limited to a few values to mimic the conditions used in the human psychophysical experiments. The field of view of the input flow fields was also quite narrow. In order to test the model more extensively, a wider range of input parameters was chosen.

*R*[0°/s, 10°/s]; test rotation angle

*ϕ*[0°, 360°]; test azimuth heading Az

_{H}[−20°, 20°]; test elevation heading El

_{H}[−20°, 20°]. The field of view of the input flow fields was 60° × 60°, and the 3-D virtual world was made up of random dots occupying a range from 2 to 30 m so that around 120 dots were present in the field. The simulated observer's forward speed was 1.5 m/s. Tests were carried out with only the visual input (in which case the activity of the vestibular distribution in the model was set to 0) and for the situation where both vision and vestibular signals were available.

*R*

_{test}−

*R*

_{est}). The root-mean-square error (RMS) was 0.91°/s in this case, and this will be the metric used to summarize the rotation-speed estimation performance of the model in the simulations that follow. For assessing direction error, a metric was developed based on that used in circular statistics to avoid wraparound problems: error

_{d}= 1 − cos(

*ϕ*

_{test}−

*ϕ*

_{est}). This ranges from 0 (no error) to 2.0 (180° error). Figure 13c shows the direction error using this metric and plotted in the form of a scatterplot, with the mean shown in red. Figure 13d is a histogram showing the distribution of the errors across the 100 test trials. For the vision-only condition the mean error was 0.12, which corresponds to an angle error of 28.4°.

*SD*s) for the vision and vision-plus-vestibular conditions were 0.94 (0.06) and 0.31 (0.02), respectively. For direction, the values were 0.13 (0.03) and 0.06 (0.02).

*n*= 4, 8, 16, 32, 64, and 128) as well as the configuration of the simulated 3-D world (cloud, single vertical plane, or ground plane). In addition, the free parameters in the model were modified to demonstrate how sensitive it is to a particular choice of parameter values and to the effect of pincushion distortion introduced via planar-projection surfaces.

*SD*s) for the three cases (cloud, single plane, ground plane) were 0.27 (0.02), 0.38 (0.03), 0.26 (0.02) for rate and 0.046 (0.024), 0.059 (0.025), 0.05 (0.024) for direction.

*F*(2, 33) = 157.6,

*p*< 0.001, but not for direction,

*F*(2, 33) = 0.81,

*p*> 0.05. Post hoc tests showed that the single plane was worse than both the cloud and the ground plane for rate, but the cloud and ground plane did not differ significantly from each other. The poorer performance with a single vertical plane mirrors what has been found in a number of human psychophysical heading studies (W. H. Warren, 2003). Even in the worst-case wall condition, the performance is still very good, though, and such low levels of error would support accurate heading estimation. The simulation shows that the model is relatively robust to the type of environment in which the curvilinear rotation is occurring and does not break down for cases such as the single vertical plane, which can cause problems for rotation-extraction techniques based on vector differencing (Rieger & Lawton, 1985; Royden, 1997).

*σ*

_{R}; the spread in the angle direction,

*σ*

_{d}; and the threshold used for estimating the centroid,

*R*

_{thresh}. For all of the tests, these were set to 1.0°/s, 30°, and 55% of maximum, respectively. The value of 1.0°/s for

*σ*

_{R}is based on human psychophysical data from MacNeilage et al. (2010). The other values were set to optimize performance of the model. The free parameters were decreased and increased by a certain amount to measure their impact on the model output. The same test used in Experiment 4 was used and the percentage change in the RMS value (as a result of the parameter size variation) was measured.

_{rd}. For

*σ*

_{R}and

*σ*

_{d}the value was scaled by 0.5 and 1.5 (50% and 150% change) and the percentage change in performance was assessed using the mean of the 50% RMS

_{rd}change and the 150% RMS

_{rd}change. For

*σ*

_{R}the change was only 8.8%, despite the large change in this free parameter's value. For

*σ*

_{d}it was 11.8%, also modest given the range of

*σ*

_{d}. Given that the threshold parameter

*R*

_{thresh}cannot exceed 1.0, the extended range tested was 75% and 125% (relative to 0.55), and this resulted in a mean change of just 8.5%. The model free parameters are therefore very tolerant to shifts in their range, and the model performance values reported in this article are not contingent on the exact values chosen.

*R*

_{est}and

*ϕ*

_{est}values with no pincushion compensation were compared to those with compensation and a repeated-measures

*t*test carried out to see if the performance without compensation was significantly worse than with compensation. Performance was assessed via an error measure (|

*R*−

*R*

_{est}| and |

*ϕ*−

*ϕ*

_{est}|) for both

*R*and

*ϕ*separately, because the pincushion distortion introduces different types of perturbation to the vectors. A range of field-of-view values were tested: 60° × 60°, 90° × 90°, 120° × 120°, and 130° × 130°.

*R*at the center of the field. However, the high threshold

*R*

_{thresh}used in the model centroid mechanism means that these larger, but minority, values are excluded from the

*R*

_{est}centroid calculation and have minimal impact on the model outputs. Similarly, the perturbation of the vector angles caused by the pincushion distortion tends to be symmetrical, and the vector-summation mechanism used to derive

*ϕ*

_{est}(Figure 11b) is barely affected.

*R*

_{est}and

*ϕ*

_{est}values from the activity distribution (Figure 11), then care should be taken to allow for the pincushion distortion at wide field-of-view angles. For example, when the value of

*R*

_{thresh}was dropped from 0.55 to 0.25, statistically significant drops in performance were noted when the field of view reached 90° × 90°. For

*R*

_{est},

*t*(99) = −5.39,

*p*< 0.001. The mean

*R*

_{est}error was still very small (0.27°/s), but it does indicate that if maximum model performance is the goal, then the pincushion distortion introduced through the use of a planar projection surface needs to be taken into consideration if alternative model parameters are adopted.

*R*and direction of rotation

*ϕ*that the simulated observer is experiencing. Therefore, the model can be used to simulate the results from a number of early experiments on heading estimation where rotation was added to the translation of the observer.

*ϕ*= 180°. Banks et al. also used different proportions of simulated and actual rotation rates, but here I report on the data for cases in which the simulated rotation proportion was 1.0 (i.e., there were no actual eye movements made during the trial).

*R*

_{est},

*ϕ*

_{est}). The flow field was also passed through the heading template model (Figure 1) with templates tuned to −80° to 80° in 5° steps for both azimuth and elevation. The activity from the estimated rotation was subtracted from the template activity distribution (Perrone & Krauzlis, 2008).

*R*

_{est}values for the

*ϕ*= 180° direction from a typical trial in which the actual rotation rate was 7.5°/s. The histogram values on the left of the plot represent the model's vestibular-signal distribution (a Gaussian with

*σ*

_{r}= 1.0°/s). If the human observers in the Banks et al. (1996), study had not taken into account any vestibular information, their visual estimate would be around 7.1°/s for this trial (yellow arrow in Figure 16a). This is a slight underestimate, and if this degree of rotation compensation were applied to the heading model stage, the heading estimate would be close to 0° azimuth. The model simulation data (means and standard deviations as error bars) for the vision-only condition are shown as the yellow curve in Figure 16b.

*SD*s) for the 5°, 10°, 15°, and 20° headings were 2.9° (0.52°), 8.1° (0.64°), 13.5° (1.5°), and 19.7° (2.7°), respectively. The errors were therefore 2.1° (0.52°), 1.9° (0.64°), 1.5° (1.5°), and 0.3° (2.6°), with a mean of 1.5°—which is close to the “few degrees” reported for the size of the center-screen bias (W. H. Warren, 2003).

*R*

_{est}were 0.44°/s, 0.5°/s, 0.77°/s, and 1.1°/s. This rotation signal mainly originates from the vestibular distribution in the model. If the system used a maximum or winner-take-all rule for establishing the rotation rate, then the estimate for

*R*

_{est}would be 0°/s in all heading cases and no bias would have occurred. If, as I suggest, the rotation estimate is derived from a centroid mechanism, then the spread of the vestibular distribution (

*σ*

_{r}= 1.0°/s) means that the centroid is shifted to nonzero values. The size of the shift depends on the spread of the vestibular distribution and the threshold level used to truncate the values used in the centroid-estimation mechanism (0.55 × peak used in model). A more systematic testing of the heading center-screen bias could be used to estimate what values for

*σ*

_{r}and the threshold are appropriate for human observers. The model also predicts that the bias should increase when displays with a larger field of view are used to test heading, because the inclusion of faster moving vectors in the periphery of the display leads to higher values of

*R*

_{est}.

*ϕ*value set to 0° or 180°. We do not currently know if heading performance will be the same if the plane of rotation is not 0° or 180°, or what the vestibular threshold is for detecting a trajectory that is slightly different from 0°. I used a value for the spread (

*σ*

_{d}) of the vestibular distribution (Figure 10c) equal to 30° because this produced the best performance in the model. In the model I assumed that the visual and vestibular distributions align along the rotation-direction axis

*ϕ*(Figure 10b and 10c), but how far can the vestibular estimate of

*ϕ*be from the correct value and the visual estimate before the heading errors get too large?

*ϕ*angle in Equation 2, and I arbitrarily sampled it using a range of angles from 0° to 330° in 30° steps. Can humans judge their trajectory along a curved path when the plane of the trajectory is inclined at, say, 30° to the horizontal? For terrestrial animals,

*ϕ*is most commonly experienced at 0° and 180°. Humans experience

*ϕ*values around 90° and 270° when moving from flat ground to an upward or downward slope. Values of

*ϕ*away from the cardinal directions are experienced when driving around an upward- or downward-sloping bend. Presumably we can recover the rotation correctly, and heading estimation is not compromised in these cases, but the experimental data addressing this are currently lacking. Tree-dwelling primates that swing from branches experience a much greater range of curvilinear paths (and a greater

*ϕ*range), as do pilots; is their precision for detecting changes in

*ϕ*greater than that of humans who experience curvilinear paths only in the vertical or horizontal planes?

*ϕ*

_{est}value that is outputted from the curvilinear model is in a retinal coordinate frame, and it is not clear how this could ultimately be converted to the angle of the curved path in the world—that is, the yaw and pitch rates of the eye–head–body relative to the world (

**ω**_{Y},

**ω**_{P}). The transformation of vestibular signals from head-centric to world coordinates requires the incorporation of signals from the otoliths to disambiguate alternative solutions as to the rotation direction relative to gravity (Angelaki & Cullen, 2008). Therefore, one would expect differences in the ability to determine (

**ω**_{Y},

**ω**_{P}) from

*ϕ*

_{est}to depend on whether or not the observer is moving forward during the rotation. This test, along with the determination of the true value for

*σ*

_{d}and measuring how accurate the vestibular

*ϕ*estimate needs to be, is a relatively simple experiment for anybody with a moving-base simulator.

*V*

_{T+R}and

*θ*in Equation 2 need to be free of any other rotation components for the model to work. Other than that, there are many options for how the curvilinear detectors (Figure 10) could connect to the motion sensors extracting image velocity from the retinal flow.

*Annual Review of Neuroscience*, 31, 125–150, https://doi.org/10.1146/annurev.neuro.31.060407.125555.

*Vision Research*, 36 (3), 431–443.

*Vision Research*, 38 (14), 2155–2179.

*Vision Research*, 41, 2375–2391.

*Experimental Brain Research*, 154 (1), 11–21, https://doi.org/10.1007/s00221-003-1524-3.

*Perception*, 34 (4), 453–475, https://doi.org/10.1068/p5292.

*Vision Research*, 40 (21), 2951–2971.

*Camera Calibration Toolbox for Matlab*. Retrieved from http://www.vision.caltech.edu/bouguetj/calib_doc/

*Science*, 273 (5281), 1544–1547.

*Annual Review of Neuroscience*, 31, 389–410, https://doi.org/10.1146/annurev.neuro.29.051605.112953.

*The Journal of Neuroscience*, 12 (12), 4745–4765.

*Nature Neuroscience*, 1 (1), 59–63.

*Journal of Vision*, 10 (11): 23, 1–13, https://doi.org/10.1167/10.11.23. [PubMed] [Article]

*The Journal of Neuroscience*, 31 (32), 11617–11627, https://doi.org/10.1523/JNEUROSCI.1266-11.2011.

*Journal of Vision*, 12 (3): 12, 1–21, https://doi.org/10.1167/12.3.12. [PubMed] [Article]

*Cell Reports*, 15 (5), 1013–1023, https://doi.org/10.1016/j.celrep.2016.03.089.

*Journal of the Association for Research in Otolaryngology*, 15 (1), 87–102, https://doi.org/10.1007/s10162-013-0423-y.

*Nature Neuroscience*, 1, 732–737.

*Perception with an eye for motion*. Cambridge, MA: MIT Press.

*PLoS One*, 8 (2), e56862, https://doi.org/10.1371/journal.pone.0056862.

*Journal of Neurophysiology*, 65 (6), 1329–1345.

*Nature*, 415 (6870), 429–433, https://doi.org/10.1038/415429a.

*European Journal of Neuroscience*, 31 (10), 1721–1729, https://doi.org/10.1111/j.1460-9568.2010.07207.x.

*The Journal of Neuroscience*, 29 (49), 15601–15612, https://doi.org/10.1523/JNEUROSCI.2574-09.2009.

*Current Biology*, 20 (8), 757–762, https://doi.org/10.1016/j.cub.2010.02.059.

*Journal of Vision*, 3 (11): 19, 865–876, https://doi.org/10.1167/3.11.19. [PubMed] [Article]

*Science*, 233 (4771), 1416–1419.

*The perception of the visual world*. Boston, MA: Houghton Mifflin.

*The vestibular system: A sixth sense*(Vol. 1). New York, NY: Oxford University Press.

*Experimental Brain Research*, 186 (4), 677–681, https://doi.org/10.1007/s00221-008-1350-8.

*Signal detection theory and psychophysics*. Huntington, NY: R. E. Krieger.

*The Journal of Neuroscience*, 26 (1), 73–85, https://doi.org/10.1523/JNEUROSCI.2356-05.2006.

*Journal of Applied Physiology*, 16 (2), 215–220.

*Human visual orientation*. Chichester, UK: John Wiley & Sons.

*Experimental Brain Research*, 136 (1), 1–18.

*Experimental Brain Research*, 117 (3), 419–427.

*American Journal of Psychology*, 86 (2), 311–324.

*Nature Neuroscience*, 18 (1), 129–137, https://doi.org/10.1038/nn.3889.

*Ecological Psychology*, 11 (3), 233–248, https://doi.org/10.1207/s15326969eco1103_3.

*Optica Acta*, 22 (9), 773–791.

*Vision Research*, 27 (6), 993–1015.

*Vision Research*, 35 (3), 389–412.

*Neural Networks*, 11, 397–414.

*Trends in Cognitive Sciences*, 3 (9), 329–336.

*Perception: Essays in honor of J.J. Gibson*(pp. 250–267). Ithaca, NY: Cornell University Press.

*Scandinavian Journal of Psychology*, 18 (3), 224–230.

*Journal of Vision*, 11 (1): 22, 1–15, https://doi.org/10.1167/11.1.22. [PubMed] [Article]

*Journal of Experimental Psychology*, 91 (2), 245–261.

*Proceedings of the Royal Society of London B*, B208, 385–387.

*The Journal of Neuroscience*, 30 (27), 9084–9094, https://doi.org/10.1523/JNEUROSCI.1304-10.2010.

*Journal of Neurophysiology*, 104 (2), 765–773, https://doi.org/10.1152/jn.01067.2009.

*Experimental Brain Research*, 204 (1), 11–20, https://doi.org/10.1007/s00221-010-2288-1.

*Perception*, 3, 63–80.

*Journal of Vision*, 10 (11): 20, 1–15, https://doi.org/10.1167/10.11.20. [PubMed] [Article]

*Experimental Brain Research*, 234 (8), 2323–2337, https://doi.org/10.1007/s00221-016-4638-0.

*Perception*, 34 (10), 1263–1273, https://doi.org/10.1068/p5232.

*Journal of Cognitive Neuroscience*, 13 (1), 102–120.

*Optical Society of America Technical Digest Series*, 22, 47.

*Journal of the Optical Society of America*, 9, 177–194.

*Journal of Vision*, 12 (8): 1, 1–31, https://doi.org/10.1167/12.8.1. [PubMed] [Article]

*Journal of Vision*, 8 (14): 24, 1–14, https://doi/org/10.1167/8.14.24. [PubMed] [Article]

*Vision Research*, 34, 2917–2938.

*The Journal of Neuroscience*, 18, 5958–5975.

*Nature Reviews Neuroscience*, 3 (9), 741–747, https://doi.org/10.1038/nrn914.

*Science*, 215 (8), 194–196.

*Journal of the Optical Society of America*, 73 (3), 339–344.

*Journal of the Optical Society of America A: Optics and Image Science*, 2 (2), 354–360.

*Multisensory Research*, 29 (4–5), 279–317, https://doi.org/10.1163/22134808-00002510.

*Journal of the Optical Society of America A*, 14 (9), 2128–2143, https://doi.org/10.1364/JOSAA.14.002128.

*The Journal of Neuroscience*, 6 (1), 145–157.

*Perception*, 37 (3), 408–418, https://doi.org/10.1068/p5873.

*Journal of Comparative Physiology and Psychology*, 43, 482–489.

*Vision Research*, 37 (5), 573–590.

*Proceedings of the National Academy of Sciences, USA*, 113 (18), 5077–5082, https://doi.org/10.1073/pnas.1604818113.

*The Journal of Neuroscience*, 27 (36), 9742–9756, https://doi.org/10.1523/JNEUROSCI.0817-07.2007.

*Journal of Neurophysiology*, 62 (3), 642–656.

*The Journal of Neuroscience*, 6 (1), 134–144.

*Experimental Brain Research*, 104 (3), 502–510.

*Naturwissenschaften*, 37, 464–476.

*Nature Neuroscience*, 3 (7), 647–648, https://doi.org/10.1038/76602.

*Journal of Experimental Psychology: Human Perception and Performance*, 2 (3), 448–456.

*The visual neurosciences*(Vol. 2, pp. 1247–1259). Cambridge, MA: Bradford.

*Journal of Experimental Psychology: Human Perception and Performance*, 17 (1), 28–43.