Humans accurately judge their direction of heading when translating in a rigid environment, unless independently moving objects (IMOs) cross the observer's focus of expansion (FoE). Studies show that an IMO on a laterally moving path that maintains a fixed distance with respect to the observer (non-approaching; C. S. Royden & E. C. Hildreth, 1996) biases human heading estimates differently from an IMO on a lateral path that gets closer to the observer (approaching; W. H. Warren & J. A. Saunders, 1995). C. S. Royden (2002) argued that differential motion operators in primate brain area MT explained both data sets, concluding that differential motion was critical to human heading estimation. However, neurophysiological studies show that motion pooling cells, but not differential motion cells, in MT project to heading-sensitive cells in MST (V. K. Berezovskii & R. T. Born, 2000). It is difficult to reconcile differential motion heading models with these neurophysiological data. We generate motion sequences that mimic those viewed by human subjects. Model MT pools over V1; units in model MST perform distance-weighted template matching and compete in a recurrent heading representation layer. Our model produces heading biases of the same direction and magnitude as humans through a peak shift in model MSTd without using differential motion operators, maintaining consistency with known primate neurophysiology.

*motion pooling models*and

*differential motion models*. The focus of these heading models is typically to explain how the FoE can be used as an indicator of heading in a general case when rotations may be present in the visual field—as is often the case with animals that can move their eyes. Motion pooling models estimate the observer's heading by integrating motion signals over large portions of the visual field. The assumption of such models is that either the effects of rotation in the flow field on heading estimates is unimportant or that rotation has been previously removed from the optic flow field, for example, by using vestibular oculomotor signals (Pack, Grossberg, & Mingolla, 2001). Differential motion models exploit the separability of the translation and rotation components of optic flow as demonstrated by Longuet-Higgins and Prazdny's (1980) retinal flow equations to remove the effects of rotation from the motion field:

*X*,

*Y*,

*Z*) projected onto a 2D “retinal plane” with coordinates (

*x*,

*y*), assuming a planar camera model (Longuet-Higgins & Prazdny, 1980). Each point in world coordinates has a depth

*Z*,

*f*denotes the camera focal length, and

*Z*, but the translational term does, subtracting the two vectors yields the difference in translational components, which is proportional to the size of the depth discontinuity. Supposing one samples and performs vector subtraction at several locations, one can analyze the difference vectors to recover the FoE due to observer translation. Differential motion models perform vector subtraction at depth discontinuities resulting in difference vectors that have the same angle as the original vectors, provided that the scene is rigid and the only motion is due to the observer translation (Perrone & Krauzlis, 2008). All that remains is to triangulate the “difference vectors” to obtain the estimated heading. While the angle of the difference vectors remains the same, the sign may be switched. Hence, one must have some knowledge about the direction of observer translation to differentiate the FoE from the FoC. The differential motion approach critically relies on the separability of translational and rotational components in Equation 1 and on the existence of significant depth discontinuities (not gradients) in the environment. For a comprehensive review of methods used to analyze Equation 1 and their performance, see Raudies and Neumann (accepted for publication).

^{−}and additive motion cells in MT

^{+}(Berezovskii & Born, 2000). These cells primarily differ based on the functional characteristics of their receptive fields. As depicted in Figure 2, differential motion cells in MT

^{−}possess surround or side lobes of suppression (Born & Bradley, 2005; Xiao, Marcar, Raiguel, & Orban, 1997), while additive cells have no such antagonism. MT cells are selective to a specific range of speeds, stimuli sizes, and directions of motion. The antagonistic zones of differential motion cells are so named because they possess the same velocity sensitivity as the other portion of the receptive field and suppress the response of the neuron. Hence, they act like spatial differentiators in the motion domain.

^{−}with on-center/off-surround direction-of-motion antagonism. The model of Royden demonstrates human-like heading biases due to approaching and non-approaching objects, using differential motion operators inspired by cells in MT

^{−}. Numerous authors now consider differential motion as the best explanation for human heading perception (Duijnhouwer, Beintema, van den Berg, & Wezel, 2006; Royden, 2004; Royden & Conti, 2003; Warren, 1998).

^{+}and template matching in a competitive network in MSTd to replicate the human heading biases. ViSTARS is a model of primate visual processing describing the retina–V1–MT–MST motion processing pathway. It demonstrates how MT/MST interactions can process video input for the purposes of obstacle detection, goal approach, and the estimation of heading (Browning, Grossberg, & Mingolla, 2009a, 2009b). ViSTARS is a dynamical model that explains a range of data, including the human bias demonstrated under simulating eye rotation conditions (Royden et al., 1994), and exhibits robustness to noise, but its complexity obscures the necessary conditions to explain the neural mechanisms underlying the human heading bias data (Royden & Hildreth, 1996; Warren & Kurtz, 1992; Warren & Saunders, 1995). ViSTARS unifies a number of prior models that were developed in a variety of contexts for the purposes of human navigation. For example, it integrates the FORMOTION models, which describe how V1, MT, and MST can perform motion integration and segmentation to solve the aperture problem and explain a number of visual displays with planar motion, such as the barber pole and chopsticks illusions (Baloch & Grossberg, 1997; Grossberg, Mingolla, & Viswanathan, 2001). ViSTARS also integrates the models of Chey, Grossberg, and Mingolla (1997, 1998), which investigate how speed perception and discrimination are affected by contrast, duration, dot density, and spatial frequency (Chey et al., 1997, 1998). A related model by Pack et al. (2001) shows how areas MT

^{+}, MT

^{−}, MSTv, and MSTd can interact to produce a gaze counterflow circuit to stabilize targets while performing a smooth pursuit eye movement (Pack et al., 2001). The effects that eye movements have on heading perception are also explained by the precursor to ViSTARS, the STARS model (Elder, Grossberg, & Mingolla, 2009) that uses gain fields to compensate for the effects of eye rotations.

*path angle*, reflecting the angular difference between the object and observer FoE. Positive values indicate that the object FoE is positioned closer to the center of the screen, and the observer FoE is further to the periphery. Warren and Saunders studied path angle settings of −6°, 0°, and 6°. Figure 3 shows sample snapshots during object motion as a function of path angle when the object begins on the right size of the display. Subjects made left–right heading judgments following each trial, relative to a “probe” location indicated by a 1° vertical line. The authors employed a two-alternative fixed choice (2AFC) experimental paradigm, and many different observer heading and object FoE cases were tested with each path angle. Sample observer and object FoE configurations are shown in Figure 3 (Warren & Saunders, 1995). During a trial, dots appeared in their initial locations for 1 s to communicate the beginning of the trial to the subject. Dot motion occurred for 1.5 s and dots lingered in their final positions until the subject responded (Warren & Saunders, 1995).

^{+}and MSTd. We use this model to unify the psychophysical data on approaching and non-approaching IMOs. As shown in Figure 5, our model pools motion over V1 motion representations in model MT

^{+}and performs template matching in a competitive network in model MSTd, maintaining consistency with the primate neurophysiological data and demonstrates human-like heading bias. Our model explains the psychophysical data through an emergent

*peak shift*in model MSTd (Figure 6).

*p*pixels and

*w*cm wide at a distance

*d*cm. The studies report experimental conditions and results with respect to degrees of visual angle. We convert

*α*degrees to

*P*pixels and vice versa using

^{1}as was standard with this type of Apple computer at that time. The viewing distance of the subjects is 30 cm. Using Equation 3, we find that the 30° × 30° viewing window and 10° × 10° IMO are 529 × 423 pixels and 173 × 137 pixels, respectively

^{2}. Royden and Hildreth generated optic flow stimuli using random dots refreshed on the monitor at 25 frames/s; each stimulus had a duration of 0.8 s, for a total of 20 frames per trial. Royden and Hildreth used dot densities of 0.56 dot/deg

^{2}and 0.8 dot/deg

^{2}for the background and object, so we generate our backgrounds and objects with 500 dots and 80 dots, respectively. Adopting the convention used in the study, the center of the viewing window represents the origin of the image plane. As such, negative and positive positions reflect those to the left and right of the center, respectively. By Equation 3, the simulated horizontal observer headings of 4°, 5°, 6°, and 7° become 69 px, 86 px, 104 px, and 121 px, respectively. In the leftward motion conditions, the IMO moved from −1.4°, 0.6°, 4.7°, 8.7°, 10.7°, and 12.7° to −7.88°, −5.88°, −1.78°, 2.22°, 4.22°, and 6.22°, respectively. In the rightward motion conditions, the IMO moved from −9.9°, −5.9°, −1.9°, 0.2°, 2.2°, and 6.3° to −3.42°, 0.58°, 4.58°, 6.68°, 8.68°, and 12.78°, respectively. We designate conditions in which the object moves left and right with an “L” and “R,” respectively, and append ascending numbers to reflect the relative starting position of the object. For example, in condition L1, the object began further to the left than in L3. The initial and final positions of the IMO replicate those used in Royden and Hildreth. While in motion, the object moves with a constant velocity to the end point, as described in Royden and Hildreth.

*δ*varied between −6°, 0°, and 6° (i.e., the angle between the object FoE and that of the observer). If objects in either set of psychophysics experimental conditions grew or moved beyond the viewing window, we clipped the object at the viewing boundaries.

*t*

_{ y }to zero, since the observer only translates in depth and there is no rotational optic flow in the displays reported in the psychophysical studies. Hence, Equation 1 reduces to

*x*,

*y*),

*t*

_{ z }signifies the depth component of the translational velocity of the observer,

*t*

_{ x }indicates the horizontal component of the observer translation,

*Z*is the distance from the observer to the point in space represented by the dot, and

*l*specifies the

*l*th frame in the motion sequence. Although Equation 4 only considers first-order information (i.e., the velocity vector field over time), the velocity field representation of optic flow can yield the same heading estimates in humans as fields containing higher order information for dot displays (Warren et al., 1991). While we and many models assume velocity field representations of optic flow, higher order flow may afford amenable information for navigation to the observer. We use the notation

*I*

_{ l }(

*x*,

*y*) to represent the vector-valued optic flow field (

*l*with spatial location (

*x*,

*y*). In the non-approaching condition, we generate optic flow fields using Equation 4 with

*t*

_{ z }= 200 cm/s, the translational speed of the observer toward the background dot planes located at 400 cm and 1000 cm (Royden & Hildreth, 1996). Each object point moved at a constant speed of 8.1 deg/s, as described by Royden and Hildreth (1996). In the approaching IMO case, we reproduce the 5-s time to contact between the observer and fronto-parallel dot plane by setting

*Z*= 1000 cm and

*t*

_{ z }= 200 cm/s. Where the object exists on the display, we set

*t*

_{ z }= 300 cm/s to recreate the 3.33-s time to contact between the observer and the object. We consider the opaque object experimental condition of Warren and Saunders (1995), as it yielded the most pronounced heading biases. In order to generate the opaque object, we replace background points in the object region with those corresponding to the object. Figures 3 and 4 exhibit sample V1 representations used in the simulations.

*T*

_{ i }(

*x*,

*y*) by substituting pixel locations (

*x*,

*y*) into Equation 4 and normalizing each vector to unit length, with FoE at each horizontal position

*i*=

*t*

_{ x }. Figure 7 shows sample templates used in the simulations.

^{+}

^{+}, where cells have receptive fields that integrate over short-range motion of particular velocities. We define the pooled MT motion

*D*

_{ l }(

*x*,

*y*) according to

*G*

_{MT}is a 2D discrete multivariate Gaussian kernel with mean

*μ*

_{MT}and covariance matrix Σ

_{MT}normalized such that all points in the kernel's support sum to unity, and

*r*

_{MT}defines the kernel radius. We model MT

^{+}cells with circular receptive fields; hence, we set

*μ*=

_{MT}such that

*σ*

_{ x }=

*σ*

_{ y }=

*σ*

_{MT}and the covariance

*ρ*between

*x*and

*y*is zero. We employ a single parameter set to conservatively simulate MT

^{+}cell receptive field properties found in neurophysiological studies (Born & Bradley, 2005; Churchland et al., 2005). Nelissen et al. (2006) found strong fMRI responses to kinetic gratings compared to baseline conditions in area MT for spatial periods of 0.125 deg/cycle and to random textured patterns 3–28° in diameter, with responses increasing with size. We used

*σ*

_{MT}= 0.05° and

*r*

_{MT}= 3° to fit these findings. Figure 8 shows sample frames from the motion sequences after pooling. All convolutions in MT

^{+}are zero-padded and performed component-wise.

*D*

_{ l }(

*x*,

*y*) and all templates

*T*

_{ i }(

*x*,

*y*). That is, for a given

*D*

_{ l }(

*x*,

*y*), we match against

*T*

_{ i }(

*x*,

*y*) for all horizontal headings

*i*. We obtain a scalar value

*p*

_{ i }

^{ l }for each frame

*l*at the horizontal heading

*i*, representing the cosine similarity (i.e., inner product) between distance-weighted vectors in the motion frame and those in the template defined by

*W*

_{ i }(

*x*,

*y*) represents a distance-dependent weighting from horizontal heading

*i*. We use inverse 2D Euclidean distance, scaled by a parameter

*λ*to adjust the spatial extent of the templates. We selected

*λ*= 300 for a broad spatial tuning. The inner summation performs component-wise multiplication between vectors in template

*T*

_{ i }(

*x*,

*y*) and MT

^{+}output

*D*

_{ l }(

*x*,

*y*) for every spatial location and frame. The resulting vector is normalized by the

*L*

^{2}(Euclidean) norm (denoted ∣∣

*D*

_{ l }(

*x*,

*y*)∣∣

_{2}) and then the vector components {

*G*

_{MST}is a normalized 1D Gaussian kernel. We set the radius

*r*

_{MST}to 12°,

*σ*

_{MST}= 2°, and mean of the MST kernel

*G*

_{MST}

*μ*

_{MST}=

*recurrent competitive field*:

*A*specifies the passive decay rate,

*B*defines the saturation upper bound,

*f*(

*w*) describes the signal function, and

*I*

_{ i }defines the external input to unit

*x*

_{ i }. The

*signal function f*(

*w*) dynamically specifies the nature of the feedback a cell receives relative to its current activity. We solve Equation 8 at equilibrium (i.e.,

*x*

_{ i }= 0) and use a faster-than-linear signal function

*f*(

*w*) =

*w*

^{2}to form a choice, or winner-take-all, network (Grossberg, 1973). We obtain recurrent MSTd units,

*M*

_{ i }

^{ l }, after substituting in the smoothed pattern match distribution,

*P*

_{ i }

^{ l }, and setting

*A*= 1,

*B*= 1:

*x*

_{ i }∑

_{ k≠i }

*f*(

*x*

_{ k }) in Equation 8 occurs before the factor

*x*

_{ i }, the inhibitory effect cell

*k*has on cell

*i*is, at equilibrium, divisive rather than subtractive, as shown in Equation 9. The function

*g*(

*x*

_{ i }) is defined as a linear accumulation of network activity between stimulus frames:

*c*= 0.3 to temporally weight network activity due to new visual information higher than that of recent history. Following MSTd competition, we determine the judged heading direction by selecting the template that has the most activation in the final frame in the motion sequence. The judged heading

*n*:

*r*= 0.94 and

*r*= 0.71 using Pearson's correlation when comparing our performance against the human data for the leftward and rightward IMO conditions, respectively. Similarly to the psychophysics results, the model produces the largest error when the IMO occludes the observer heading. When the IMO does not, or hardly, covers the observer heading during the trial, the bias is small.

*δ*. Biases averaged across subjects and observer and object FoE reported by Warren and Saunders (1995) are also drawn; these values are approximated based on Figure 4 from Warren and Saunders. The model fits Warren and Saunders' data well, with

*r*= 0.99,

*r*= 0.98, and

*r*= 0.86 in the opaque, transparent, and black object conditions, respectively. These results demonstrate that approaching IMOs result in biases in the direction of the object FoE (Warren & Saunders, 1995).

^{+}and MSTd that explains the heading biases produced in the psychophysical studies of Royden and Hildreth (1996) and Warren and Saunders (1995). As found in humans by Royden and Hildreth, when non-approaching objects cover the FoE of a translating observer, the model produces biases in the direction of object motion. By contrast, when the object approaches the translating observer and covers the observer FoE, the model generates biases in the direction of the object FoE. Our model unifies the results of both studies while remaining consistent with known neurophysiology. The model also indicates that the primate visual system can determine the direction of heading by pooling motion in MT

^{+}and competition in MSTd, without needing units sensitive to differential motion. As depicted in Figure 9, our model explains the two sets of data using a

*peak shift*in the MSTd unit distributions. In the non-approaching case, the peak shift occurs because when the IMO occludes the observer's FoE, the motion around the FoE is inconsistent with a heading in that direction. The MSTd population distribution thus has a trough around the position of the IMO, which in turn causes the peak to split into a bimodal distribution with a maximum peak on one side of the IMO. In the approaching case, the peak shift occurs due to the MSTd distributions corresponding to the observer's and object's FoE being close enough that they merge and produce a peak in between the FoE positions. Royden (2002) notes that her model based on MT

^{−}differential motion cells replicates the human heading results without actively removing the IMO as would other models, such as Hildreth (1992). Our model also replicates the human data without removing the IMO to compute heading, and because it does not require differential operators, we claim that it is more consistent with neurophysiological data.

*r*= 0.99 opaque IMO,

*r*= 0.98 transparent IMO,

*r*= 0.86 black IMO) was not surprising, since the model of Warren and Saunders (1995) pools motion and also explains these data. Since the black object had no dot motion defined within its boundary, we obtain a relatively flat heading bias curve as a function of path angle, also mimicking the decrease in bias as path angle increased, seen in the human data. The bias curve in the black object condition is not flat because of intertrial variation and positional effects attributed to the object always beginning ±6° from the center of the display. We discovered relative speed of the object compared to that of observer translation and the amount of motion pooling in MT

^{+}altered the model performance. For a given path angle, increasing the speed of the approaching object tended to globally shift the biases produced for all tested observer and object FoE pairs. This is because pooling locally disperses motion direction contributions, consequently increasing neighboring template-match scores. Adjusting the amount of MT

^{+}pooling (

*r*

_{MT}, Σ

_{MT}; Equation 5) had large effects and influenced each observer and object FoE pair based on the path angle and context. Similarly, as Σ

_{MT}adjusts the model MT

^{+}cell spatial integration extent, this parameter may largely shift the MSTd cell template-match scores. By virtue of the constraint on the visual displays that the observer heading be on the same side as the approaching object, the motion pooling and template-match distributions in MSTd were usually unimodal due to smoothing that merges proximal match activity. Network accumulation in MSTd (

*c*; Equation 10) also impacted heading biases by influencing the temporal sensitivity of match scores. Our selection of

*c*allowed the network to integrate information over time but not disregard the recent past. Less smoothing in MSTd (

*σ*

_{MST}) also increased the network sensitivity to peak shifts and to other changes in the match scores between frames. Furthermore, we observed an expected symmetry of trials conducted on the right and left sides of the screen. That is, if we reflected each frame of an approaching moving object sequence about the center of the screen, we obtained the same biases. This is not true of the non-approaching object due to the lack of positional symmetry in the design of the study (Royden & Hildreth, 1996). Interestingly, biases remained insensitive to a variety of dot densities, echoing the findings of Warren and Hannon (1990) that dot density did not impact percent correct performance in their 2AFC paradigm.

*r*= 0.86); however, with these parameters, the fit to Warren and Saunders' (1995) data was reduced. Parameters in the present study were chosen to match the psychophysical and neurophysiological data with the minimal number of parameters and to utilize a single set of values across all our simulations. This reduces the model complexity and allows for greater understanding of the computations taking place. Ongoing research is investigating how multiple sets of receptive fields may interact within MT and MST and how best to parameterize them within the model. Although the results of Royden and Hildreth indicate human heading bias in the direction of object motion when the observer's FoE is occluded, the population vector of monkey MSTd cell responses may only reflect heading error if the object motion greatly deviates from that of the surrounding optic flow produced by the observer translation (Georgopoulos, Schwartz, & Kettiner, 1986; Logan & Duffy, 2006). In other words, MSTd cells in monkey may only yield a biased representation in a subset of the non-approaching IMO cases tested in this article. Our model can account for these differences with a change in parameters.

*x*

_{ i }∑

_{ k≠i }

*f*(

*x*

_{ k }) of Equation 8, which has a minus sign, the effect is neither global nor local subtraction (Royden, 2004). Our model uses divisive rather than subtractive normalization, as can most readily be seen in Equation 9 due to the shunting inhibition by

*x*

_{ i }in Equation 8, which has different effects than subtraction (Grossberg, 1973; Heeger, 1992; Levine & Grossberg, 1976). Future work will clarify the contexts within which MSTd cells respond to local object or global motion and how this may influence navigation.

^{+}/MSTd that we present in this article demonstrates that human heading biases estimated for approaching and non-approaching IMOs can be explained using motion pooling and template matching in a competitive network while remaining consistent with known neurophysiology. Differential motion processing is not necessary to explain these data in the presence of IMOs.