**We implement a neural model for the estimation of the focus of radial motion (FRM) at different retinal locations and assess the model by comparing its results with respect to the precision with which human observers can estimate the FRM in naturalistic motion stimuli. The model describes the deep hierarchy of the first stages of the dorsal visual pathway and is space variant, since it takes into account the retino-cortical transformation of the primate visual system through log-polar mapping. The log-polar transform of the retinal image is the input to the cortical motion-estimation stage, where optic flow is computed by a three-layer neural population. The sensitivity to complex motion patterns that has been found in area MST is modeled through a population of adaptive templates. The first-order description of cortical optic flow is derived from the responses of the adaptive templates. Information about self-motion (e.g., direction of heading) is estimated by combining the first-order descriptors computed in the cortical domain. The model's performance at FRM estimation as a function of retinal eccentricity neatly maps onto data from human observers. By employing equivalent-noise analysis we observe that loss in FRM accuracy for both model and human observers is attributable to a decrease in the efficiency with which motion information is pooled with increasing retinal eccentricity in the visual field. The decrease in sampling efficiency is thus attributable to receptive-field size increases with increasing retinal eccentricity, which are in turn driven by the lossy log-polar mapping that projects the retinal image onto primary visual areas. We further show that the model is able to estimate direction of heading in real-world scenes, thus validating the model's potential application to neuromimetic robotic architectures. More broadly, we provide a framework in which to model complex motion integration across the visual field in real-world scenes.**

*x*,

*y*) to the cortical domain of coordinates (

*ξ*,

*η*) is described by the following equations: where

*a*parameterizes the nonlinearity of the mapping,

*q*is related to the angular resolution,

*ρ*

_{0}is the radius of the central blind spot, and are the polar coordinates derived from the Cartesian ones. All points with

*ρ*<

*ρ*

_{0}are ignored (hence the central blind spot), thus

*ρ*

_{0}must be small with respect to the size of the image.

*N*×

_{c}*N*pixels and defined

_{r}*ρ*

_{max}= 0.5 min(

*N*,

_{c}*N*), we obtain an

_{r}*R*×

*S*(rings × sectors) discrete cortical image of coordinates (

*u*,

*v*) by taking: where ⌊·⌋ denotes the integer part,

*q*=

*S*/(2

*π*), and

*a*= exp(ln(

*ρ*

_{max}/

*ρ*

_{0})/

*R*). Moreover, we can define the compression ratio (CR) of the cortical image with respect to the Cartesian one as

*S*/2

*π*, represents the locus where the size of log-polar pixels is equal to the size of Cartesian pixels. In particular, in the area inside the red circular curve (the fovea) a single Cartesian pixel contributes to many log-polar pixels (oversampling), whereas outside this region multiple Cartesian pixels will contribute to a single log-polar pixel. This property is highlighted in the log-polar pixel bordered in violet in Figure 2a. The retinal area (i.e., the log-polar pixel) that refers to a given cortical pixel defines the cortical pixel's receptive field (RF). To avoid spatial aliasing due to the undersampling, we employ overlapping RFs (see RF shape, later, for details). An example of transformation from the Cartesian domain to the cortical domain and back to the retinal one is shown in Figure 2, for the standard image Lena and for a frame of the dead-leaves stimuli used in the experiments described later. It is worth noting that the cortical image (Figure 2e) shows the effects of the log-polar mapping: In particular, the zoomed cortical image (Figure 2c) shows that the eye, which is in fovea in Figure 2d, is overrepresented (left-bottom of Figure 2c), whereas the hat feathers, which are in periphery in Figure 2d, is underrepresented (right-middle of Figure 2c)—i.e., few neural units code the visual information of the periphery. Looking at the backward mapped image (Figure 2f), the eye has full resolution, since it is in fovea, whereas the hat feathers have low resolution due to the neural underrepresentation.

*R*and

*S*is the one that optimizes the log-polar pixel aspect ratio

*γ*, making it as close as possible to 1 (see Motion estimation, later, for details). It can be shown (Solari, Chessa, & Sabatini, 2012; Traver & Pla, 2008) that for a given

*R*, the optimal rule is

*S*= 2

*π*/(

*a*− 1).

*W*

_{max}) and the parameters of the mapping as follows (Solari et al., 2012):

*χ*). This can be derived from Equation 4 by setting the RF size to 1, inverting the equation to find the corresponding

*u*(see Equation 2), and dividing for the overall size of the modeled cortex

*R*:

*C*is computed as a Gaussian weighted sum of the Cartesian pixels

_{i}*P*in the

_{j}*i*th RF:

*C*= ∑

_{i}*, where the weights*

_{j}w_{ij}P_{j}*w*are the values of a normalized Gaussian centered on the

_{ij}*i*th RF. A similar approach is used to compute the inverse log-polar mapping that produces the retinal image, where the space-variant effect of the log-polar mapping is observable (see Figure 2c through f). However, in this article we never employ the inverse log-polar mapping (other than for graphical purposes), since all processing is performed directly in the cortical domain.

- The aspect ratio
*γ*of the log-polar pixel has to be close to 1. This allows the extraction of image features in the cortical domain by applying the same local operators (e.g., filtering) employed in the Cartesian domain. - The spatial support of the local operators has to be small on the cortical domain.
- The mapping of a vector field of image features has to be expressed in terms of general-coordinate transformation.

*γ*= 1) and on the spatial support of the filter in order to obtain undistorted RFs, since the first stage of the proposed feed-forward architecture is based on spatiotemporal filtering. Thus, the optic flow can be computed from a sequence of cortical images by using the V1-MT feed-forward architecture originally designed in the Cartesian domain.

*θ*of their contrast sensitivity in the spatial domain and their preferred velocity

*v*in the direction orthogonal to their contrast orientation, often referred to as component speed.

^{c}*x*,

*y*,

*t*). In order to achieve low computational complexity, the spatiotemporal filters

*g*(

*x*,

*y*,

*θ*,

*f*

_{s},

*f*

_{t}) are decomposed into separable filters in space

*h*(

*x*,

*y*,

*θ*,

*f*

_{s}) and time

*p*(

*t*,

*f*

_{t}). The spatial component of the RF is described by Gabor filters, and the temporal component by an exponential decay function where

*f*

_{s}and

*f*

_{t}are the spatial and temporal peak frequencies, related to the preferred velocity by

*v*=

^{c}*f*

_{t}/

*f*

_{s}, and

*σ*and

*τ*define the spatial and temporal scales, respectively.

*f*

_{s}= 0.25 c/pixel,

*f*

_{t}= [0, 0.10, 0.15, 0.23] c/frame (population of simple cells tuned to different preferred velocities),

*σ*= 2.27 pixels, and

*τ*= 2.5 frames. The filters' spatial orientations were chosen to be

*θ*=

*iπ*/8,

*i*= 0, 1, …, 7.

*g*(

*x*(

*ξ*,

*η*),

*y*(

*ξ*,

*η*),

*θ*,

*f*

_{s},

*f*

_{t}). As a consequence of the nonlinearity of the log-polar mapping, the mapped filters are distorted; Solari et al. (2012) have shown that under specific conditions such distortions can be kept to a minimum. This happens when the spatial support of the RFs is sufficiently small and the aspect ratio

*γ*of the log-polar pixel is equal to 1 (see Figure 3, top). Under these assumptions, it is possible to work directly in the cortical domain by considering spatiotemporal filters sampled in log-polar coordinates

*g*(

*ξ*,

*η*,

*θ*,

*f*

_{s},

*f*

_{t})—see Figure 3 (bottom).

*E*

^{V1}(

*ξ*,

*η*,

*t*,

*θ*,

*v*). A key property of V1 cells is their tuning to the spatial orientation and velocity of a stimulus, which arises from spatiotemporal-frequency selectivity for motion in a direction perpendicular to the contrast of the underlying pattern (Adelson & Movshon, 1982).

^{c}- The output of the V1 afferent cells is spatially pooled through a Gaussian kernel.
- The previous output is pooled by MT linear weights, which give rise to the MT tuning to speed direction
*d*; these weights are defined as cos(*d*−*θ*) where*d*∈ [0, 2*π*]. - The output of the MT orientation pooling is then fed into an exponential function which describes the static nonlinearity.

*v*and to direction of speed

^{c}*d*are denoted

*E*

^{MT}(

*ξ*,

*η*,

*t*,

*d*,

*v*).

^{c}*v*and the speed directions

^{c}*d*: It is possible to first decode the MT responses

*E*

^{MT}(

*ξ*,

*η*,

*t*,

*d*,

*v*) along each speed direction

^{c}*d*to compute the speed, then to apply the intersection of constraints on such estimated velocities.

*ξ*axis in the cortical domain, but if the FRM is shifted with respect to the center of the visual field (i.e., we have an expansion with a constant translation), the corresponding cortical flow has nonlinear components, as shown in Figure 4. Moreover, the cortical flow generated by a shift of the FRM toward the right is notably different from the cortical flow corresponding to a shift of the FRM toward the left.

**v**=

**v̄**+

**T̄**[

*ξ*,

*η*]

^{T}, where

**T̄**is the tensor composed of the partial derivatives of the cortical motion field. By describing the tensor through its dyadic components, we obtain where

*: (*

**α**^{ξ}*ξ*,

*η*) ↦ (1, 0) and

*: (*

**α**^{η}*ξ*,

*η*) ↦ (0, 1) are pure translations and

*c̃*are constants and

_{i}*v*and

_{ξ}*v*are the components of the cortical optic flow. The parameter vector [

_{η}*c̃*

_{1},

*c̃*

_{2}, … ,

*c̃*

_{6}] describes a specific configuration of cortical optic flow in a local patch.

*c*

_{1}and

*c*

_{4}(see Appendix, Equation 17) are null. We can reformulate this rule in the cortical domain by using the relationships described in Equation 20, yielding where

*T*

_{1}denotes a threshold value. It is worth noting that the right side of Equation 12 is not a constant—i.e., it decreases as a function of

*ξ*.

*c*

_{2}+

*c*

_{6}| > 0. Using the relationships described in Equation 20, this rule can be reformulated with affine cortical coefficients as where

*T*

_{2}(

*T*

_{3}) denotes a threshold value

^{1}and

*a*> 1.

^{2}, and the MATLAB code of the motion estimation module is available on ModelDB.

^{3}

_{Dir}of each ellipse was computed as where atan2(

*y*,

*x*) is the four-quadrant inverse tangent function, (

*x*

_{Dot},

*y*

_{Dot}) and (

*x*

_{FRM},

*y*

_{FRM}) are the (

*x*,

*y*) coordinates of the ellipse and FRM, respectively, and Motion

_{Dir}is 0 rad for expanding motion and

*π*rad for contracting motion.

*x*,

*y*) position of each ellipse was then computed as where dot

_{Dist}is the distance of the ellipse from the FRM and dot

_{Speed}is the speed of an ellipse 1 pixel away from the FRM, which was set to be 0.1 pixel/frame. These computations generated expanding or contracting motion with a realistic speed gradient (cf. Movie 1). Elements that fell outside the circular stimulus region or whose lifetime exceeded five movie frames were randomly repositioned within the stimulus and assigned an age of zero.

*σ*

_{FRM}is the FRM discrimination threshold,

*σ*

_{int}is the internal noise of the system,

*σ*

_{ext}is the external noise contained in the stimulus, and

*n*is the sampling efficiency, which relates to how well the system is able to integrate the information contained within the stimulus.

*SD*) age of 34 ± 9 years. All observers had normal or corrected-to-normal vision in the test eye. All subjects provided written informed consent.

*x*,

*y*) positions of the FRM of each element were selected from normal distributions with means equal to (

*x*

_{FRM},

*y*

_{FRM}) and standard deviation equal to

*σ*

_{ext}—i.e., the external noise.

*σ*

_{FRM}) at varying amounts of external noise

*σ*

_{ext}, and the observed thresholds are then fitted to Equation 16, thus obtaining estimates of the internal noise

*σ*

_{int}and sampling efficiency

*n*. FRM discrimination thresholds at each tested eccentricity were thus measured at five fixed levels of external noise: 0.25°, 0.5°, 1°, 2°, and 4°. The thresholds were measured via 15 randomly interleaved staircases (Wetherill & Levitt, 1965). The raw data from a minimum of 50 trials from each staircase were combined and fitted with a cumulative normal function by weighted least-squares regression (in which the data are weighted by their binomial standard deviation). FRM discrimination thresholds were estimated from the 80% correct point of the psychometric function. For each tested eccentricity, these thresholds were fitted via nonlinear least-squares regression to the EN function presented in Equation 16.

*F*(1, 33) = 61.12,

*p*= 10

^{−8}, no significant main effect of eccentricity,

*F*(2, 33) = 1.34,

*p*= 0.28, and no significant interaction between observer type and eccentricity,

*F*(2, 33) = 0.39,

*p*= 0.68. As can be seen in Figures 10b and 11b, there is a trend for internal noise in both human and model observers to increase with eccentricity, but the increase is not statistically significant. The model has significantly less overall internal noise than human observers.

*F*(1, 33) = 20.45,

*p*= 10

^{−4}, a significant main effect of eccentricity,

*F*(2, 33) = 17.94,

*p*= 10

^{−5}, and a significant interaction between observer type and eccentricity,

*F*(2, 33) = 4.39,

*p*= 0.02. Sampling efficiency in both human (Figures 10c) and model (Figures 11c) observers decreases with eccentricity. However, the model has significantly lower overall sampling efficiency than human observers. Furthermore, sampling efficiency decreases at a faster rate in the periphery of the model than in the periphery of human observers.

*χ*) and the model's estimated sampling efficiency,

*F*(1, 5) = 27.6,

*p*= 0.003,

*R*

^{2}= 0.816. As can be seen in Figure 12, the greater the portion of the model's cortex dedicated to the fovea, the worse the sampling efficiency. This may be due to the overrepresentation of the fovea in the model's cortex—i.e., using too many cortical processing units to represent the fovea leaves too few cortical units for the periphery. The decrease in sampling efficiency in the periphery of both model and human observers is thus sensibly attributable to RF size changes across the visual field, which are in turn driven by the lossy log-polar mapping that projects the retinal image onto primary visual areas.

*c̃*

_{1}toward the right and

*c̃*

_{4}) and minor deformation components (i.e.,

*c̃*

_{2},

*c̃*

_{3},

*c̃*

_{5}, and

*c̃*

_{6}) that make the cortical flow rotate, as happens for a diverging flow with shifted FRM (see Figure 4).

*f*spatial-frequency spectrum and contrast range of natural images, and are textured with occlusions and edges at a variety of orientations. Hence, the dead leaves are better suited to test the performance of the visual system under natural viewing conditions, since they better approximate the natural stimulus range in which the visual system operates.

^{2}The package is available on OpenCV version 2.4.X (opencv.org/downloads.html). Once the package is downloaded, the source code is available in the [contrib] folder.

*, 2, 284–299.*

*Journal of the Optical Society of America A**, 300 (5892), 523–525.*

*Nature**, 46 (8), 634–639.*

*Journal of the Optical Society of America**, 10 (3), 255–280.*

*Journal of Computational Neuroscience**, 10, 1–16.*

*Encyclopedia of Sensors**, 23 (7), 1598–1607.*

*Journal of the Optical Society of America A**, 69 (2), 170–184.*

*Computer Vision and Image Understanding**, 19 (10), 1080–1089.*

*IEEE Transactions on Pattern Analysis and Machine Intelligence**, 38 (1), 31–46.*

*Advances in Applied Probability**, 10, 433–436.*

*Spatial Vision**, 59 (4), 320–356.*

*Cognitive Psychology**, 532–540.*

*IEEE Transaction on Communications, 31**. Boca Raton, FL: CRC Press.*

*Advanced mathematics for applied and pure sciences**(pp. 41–50. 50).*

*International Conference on Computer Vision Systems**, 117 (6), 603–619.*

*Computer Vision and Image Understanding**, 65, 1329–1345.*

*Journal of Neurophysiology**, 15 (7), 5192–5208.*

*The Journal of Neuroscience**, 14 (9), 1195–1201.*

*Nature Neuroscience**, 1–24. doi:10.3758/s13428-016-0762-9.*

*Behavior Research Methods**, 14 (2), 203–211.*

*Current Opinion in Neurobiology**, 14 (1), 54–67.*

*The Journal of Neuroscience**, 9 (8), 878–895.*

*Cerebral Cortex**, 32 (7), 2299–2313.*

*The Journal of Neuroscience**, 66 (4), 596–609.*

*Neuron**, 9 (2), 181–197.*

*Visual Neuroscience**, 86 (2), 311–324.*

*The American Journal of Psychology**, 26 (1), 161–179.*

*Vision Research**, 5 (3), 374–391.*

*Neural Computation**, 41 (1–2), 35–59.*

*International Journal of Computer Vision**, 24 (1), 25–32.*

*Vision Research**(pp. 50–56. 56).*

*Proceedings of the 8th workshop on performance metrics for intelligent systems**, 89 (7), 2595–2599.*

*Proceedings of the National Academy of Sciences, USA**, 88 (1), 59–89.*

*Physiological Reviews**(pp. 223–229. 229).*

*2009 9th IEEE-RAS International Conference on Humanoid Robots**(pp. 3–24. 24). Cambridge, UK: Cambridge University Press.*

*Vision: Coding and efficiency**, 10, 437–442.*

*Spatial Vision**, 34 (21), 2917–2938.*

*Vision Research**, 10 (2), 373–401.*

*Neural Computation**(pp. 846–854. 854). NIPS.*

*Advances in neural information processing systems**, 25, 181–194.*

*Biological Cybernetics**, 38 (5), 743–761.*

*Vision Research**, 39, 342–354. doi:10.1016/j.image.2015.04.006.*

*Signal Processing: Image Communication**, 33 (1), 41–51.*

*Pattern Recognition Letters**, 125, 37–54.*

*Computer Vision and Image Understanding**, 62 (3), 626–641.*

*Journal of Neurophysiology**, 26 (10), 1354–1370.*

*Image Vision Computing**, 65 (5), 311–320. 320 .*

*Biological Cybernetics**, 336, 162–163.*

*Nature**, 14 (4), 646–660.*

*Journal of Experimental Psychology: Human Perception and Performance**, 24 (1), 315–332.*

*Perception**, 2 (2), 322–341.*

*Journal of the Optical Society of America A**,*

*British Journal of Mathematical and Statistical Psychology**18*(1), 1–10. 10, doi:10.1111/j.2044-8317.1965.tb00689.x.

*, 16 (2): 1, 1–17. 17, doi:10.1167/16.2.1. [PubMed] [Article]*

*Journal of Vision**, 23 (10), 983–989.*

*Vision Research**, 13 (10): 2, 1–17. 17, doi:10.1167/13.10.2. [PubMed] [Article]*

*Journal of Vision**, 9 (5), 956–964.*

*European Journal of Neuroscience**, 111 (11), 2332–2342.*

*Journal of Neurophysiology**v*,

_{ξ}*v*), we compute the Taylor expansion of Equation 19 at (

_{η}*ξ*

_{0},

*η*

_{0}).

*c̃*

_{1}and

*c̃*

_{4}are the values of the cortical flow (Equation 19) computed in (

*ξ*

_{0},

*η*

_{0}), and the terms

*c̃*

_{2},

*c̃*

_{3},

*c̃*

_{5}, and

*c̃*

_{6}are the partial derivatives:

*cį*

_{1},

*cį*

_{2}, …,

*cį*

_{6}] computed on the cortical optic flow to the affine coefficients of the corresponding Cartesian optic flow [

*c*

_{1},

*c*

_{2}, …,

*c*

_{6}] (Solari et al., 2014). We can solve the resulting system of equations, thus obtaining