**Optokinetic nystagmus (OKN) is an involuntary eye movement responsible for stabilizing retinal images in the presence of relative motion between an observer and the environment. Fully understanding the development of OKN requires a neurally plausible computational model that accounts for the neural development and the behavior. To date, work in this area has been limited. We propose a neurally plausible framework for the joint development of disparity and motion tuning in the visual cortex and of optokinetic and vergence eye-movement behavior. To our knowledge, this framework is the first developmental model to describe the emergence of OKN in a behaving organism. Unlike past models, which were based on scalar models of overall activity in different neural areas, our framework models the development of the detailed connectivity both from the retinal input to the visual cortex and from the visual cortex to the motor neurons. This framework accounts for the importance of the development of normal vergence control and binocular vision in achieving normal monocular OKN behaviors. Because the model includes behavior, we can simulate the same perturbations as past experiments, such as artificially induced strabismus. The proposed model agrees both qualitatively and quantitatively with a number of findings from the literature on both binocular vision and the optokinetic reflex. Finally, our model makes quantitative predictions about OKN behavior using the same methods used to characterize OKN in the experimental literature.**

*V*

_{TN}and

*V*

_{NT}, the slow phase velocities during TN and NT motion:

*G*. Larger values of

*G*create stronger biases against connections to the motor neurons from cortical neurons with significant input from the ipsilateral eye. The detailed description of the parameter is in the Appendix. Figure 8 shows that as the bias increases, the mOKN asymmetry increases. Tychsen (2007) tested mOKN asymmetry in primates with esotropia with visual stimuli moving at 30°/s and found the NBI value to be around 0.35, matching our results with

*G*= 0.012. Thus, we fixed

*G*= 0.012 in the simulations reported here, including the results in Figure 7. The ASI computed at the end of learning is 0.05 in the normal condition and 0.50 in the strabismic condition.

*G*= 0.012 in Figure 8). Our model predicts that even with this contralateral bias, mOKN can be symmetric after strabismus if the proportion of near and far disparities observed during development is balanced. For the model without contralateral bias, the NBI after strabismic development is around 0.3 when the two eyes usually observe uncorrelated motion.

*l*

_{2}norm of the weights from each cortical area to the contralateral NOT to be 1.3 times greater than the

*l*

_{2}norm of the ipsilateral weights. Note that we cannot make a quantitative comparison between the weights estimated by Kiorpes et al. and the weights in our model: Their weights are scalars representing the strength of connections between entire areas, whereas in our model the weights are matrices representing the detailed connectivity between individual units in sensory and motor areas. In our model, the final motor command depends not only upon the size but also upon the pattern of these weights. In the simplified scalar model of Kiorpes et al., the two effects are confounded.

*Journal of Experimental Child Psychology*, 23 (1), 133–150.

*Developmental neurobiology of vision*(pp. 277–287. New York: Plenum Press.

*Sensory communication*(pp. 217–234. Cambridge, MA: MIT Press.

*Automatica*, 45 (11), 2471–2482.

*The development of perception*(pp. 245–278). New York: Academic Press.

*Eye movements: Cognition and visual perception*(pp. 53–64. Hillsdale, NJ: Lawrence Erlbaum Associates.

*Neuropsychologia*, 41 (13), 1769–1784.

*The Journal of Neuroscience*, 17 (1), 296–307.

*The Journal of Physiology*, 270 (2), 321–344.

*Annals of the New York Academy of Sciences*, 656 (1), 277–296.

*Documenta Ophthalmologica Proceedings Series*, 45, 9–18.

*Perception*, 30, 36.

*Vision Research*, 39 (19), 3223–3239.

*Nature*, 419 (6903), 157–162.

*Journal of Comparative Neurology*, 201 (4), 519–539.

*Journal of Experimental Psychology: General*, 143 (5), 1923–1938, doi:10.1037/a0037021.

*Progress in oculomotor research*(pp. 443–454. New York: Elsevier.

*Functional basis of ocular motility disorders*(pp. 303–310. Oxford, UK: Pergamon Press.

*Spatially oriented behavior*(pp. 135–153. New York: Springer.

*Progress in Brain Research*, 64, 75–84.

*Progress in Brain Research*, 80, 173–182.

*The Journal of Neuroscience*, 16 (5), 1791–1807.

*Network: Computation in Neural Systems*, 11 (3), 191–210.

*Journal of Neurophysiology*, 28 (6), 1041–1059.

*The Journal of Neuroscience*, 16 (20), 6537–6553.

*Strabismus*, 21 (1), 37–49.

*Frontiers in Neurorobotics*, 7 (20), 1–10.

*Experimental Brain Research*, 49 (1), 125–130.

*IEEE Transactions on Signal Processing*, 41 (12), 3397–3415.

*Annals of the New York Academy of Sciences*, 1164 (1), 430–439.

*Vision Research*, 22 (3), 341–346.

*Vision Research*, 37 (23), 3311–3325.

*Experimental Brain Research*, 35 (2), 229–248.

*Vision Research*, 37 (13), 1747–1754.

*Behavioural Brain Research*, 46 (1), 31–42.

*Neural Computation*, 8 (5), 1021–1040.

*Robotics and Autonomous Systems*, 71, 3–12.

*Early visual development: Basic and clinical research*(pp. 364–390. New York: Oxford University Press.

*Clinical strabismus management*(pp. 117–138. Philadelphia: Saunders.

*Transactions of the American Ophthalmological Society*, 105, 564–593.

*Autonomous learning of smooth pursuit and vergence through active efficient coding*. Paper presented at the IEEE International Conference on Development and Learning and Epigenetic Robotics, Palazzo Ducale, Genoa, Italy.

*Experimental Brain Research*, 224 (2), 179–187.

*Vision Research*, 25 (10), 1431–1438.

*Vision Research*, 26 (6), 847–855.

*Intrinsically motivated learning of visual motion perception and smooth pursuit*. Paper presented at the IEEE International Conference on Robotics and Automation, Hong Kong.

*A unified model of the joint development of disparity selectivity and vergence control*. Paper presented at the IEEE International Conference on Development and Learning and Epigenetic Robotics, San Diego, CA.

*t*and

*t*− 1 are concatenated into 400 dimensional input vectors, denoted by

**x**(

*h*,

*k*,

*n*,

*t*), where

*h*∈ {L,R} indexes the hemisphere,

*k*∈ {F,P}indexes the region (fovea or periphery),

*n*∈ {1,…,50} indexes the patch, and

*t*indexes time. The input vectors are normalized to have zero mean and unit variance.

*h*∈ {L,R},

*t*indexes time,

*i*= {1,…,11} indexes the neuron,

*ε*(

*t*) is the retinal slip at time

*t*,

*ε̂*(

_{i}*h*) is the preferred slip, and the parameters

*A*and

*σ*determine the height and width of the tuning curve. Since sensory neurons in the left (right) NOT are tuned to leftward (rightward) motion, we set the preferred slips of the sensory neurons to be equally spaced from −40°/s to 0°/s for the left NOT and from 0°/s to 40°/s for the right NOT. We set

*A*= 0.1 and

*σ*= 4°/s. We concatenate the responses from the sensory neurons into a single vector denoted by

**x**(

*h*,

*k*,

*n*,

*t*) is approximated as the sparse weighted sum of unit-norm basis vectors taken from an overcomplete dictionary

*(*

**ϕ**_{i}*h*,

*k*,

*t*), where

*i*∈ {1,…,600}. The two hemispheres (

*h*) and foveal or peripheral regions (

*k*) have different dictionaries, which evolve over time. The approximation is given by

*α*(

_{i}*h*,

*k*,

*n*,

*t*) using matching pursuit. In the second step, we assume that the coefficients are constant, and update the basis vectors using gradient descent to minimize the total normalized squared reconstruction error over all patches:

*α*(

_{i}*h*,

*k*,

*n*,

*t*) are analogous to the activation of the simple cells responding to the visual information at time

*t*from patch

*n*from scale

*k*of hemiretina

*h*. We model the output of complex cells by pooling the squared coefficients for each basis vector over the set of all patches: where

*N*= 50. We concatenate these model outputs into a feature vector

**z**

_{OKN}∈ ℝ

^{11}: where

**W**

_{OKN,C}∈ ℝ

^{11×600}are weight matrices determining the connections from the subcortical and cortical sensory neurons to motor neurons. The first index

*h*of

**W**

_{OKN,C}represents the hemisphere where the motor neurons are located, and the second index

*η*represents the hemisphere where the cortical sensory neurons located. The subcortical pathway has only ipsilateral connections, but the cortical pathway has both ipsilateral and contralateral connections.

*l*

_{1}norm.

**I**is an identity matrix and the parameter

*μ*= 10 controls the synaptic strength. Thus, retinal slips excite corresponding eye rotations. This ensures that the subcortical pathway functions from the start to stabilize the retinal input when viewing objects moving in the TN direction. During testing, the subcortical connections are removed by setting

*μ*= 0.

**Γ**(

*h*,

*η*,

*k*,

*t*) = diag(

*γ*

_{1}(

*h*,

*η*,

*k*,

*t*),…,

*γ*

_{600}(

*h*,

*η*,

*k*,

*t*)) is a diagonal matrix of weight-decay parameters and

*κ*is a positive learning-rate parameter. After each update, each row of the weight vector is normalized so that the sum of the weights entering each motor neuron sums to 1. Weights are initialized to small random values drawn from independent uniform distributions before normalization.

*γ*(

_{i}*h*,

*η*,

*k*,

*t*), the the stronger the penalty to connections from the model complex cell

*i*in cortical hemisphere

*η*to the motor neurons in the NOT in hemisphere

*h*. The value of the weight-decay parameter depends upon OD

*(*

_{i}*n*,

*k*,

*t*), the OD index of complex cell

*i*from hemisphere

*η*and region

*k*, according to the equation which is plotted in Figure 13. The parameter

*G*controls the extent to which the connections are penalized. The parameter

*a*controls the slope of the transition at the threshold +

*b*(−

*b*) for the left (right) NOT. We chose

*a*= 10 and

*b*= 0.5.

**ϕ**_{i}_{,L}(

*h*,

*k*,

*t*) and

**ϕ**_{i}_{,R}(

*h*,

*k*,

*t*) denote the parts of the basis vector

*(*

**ϕ**_{i}*h*,

*k*,

*t*) corresponding to the left- and right-eye inputs, then

**z**

_{VG}∈ ℝ

^{11}: where

**W**

_{VG}∈ ℝ

^{11×600}are weight matrices determining the connection from the cortical neurons to the VG motor neurons.

*u*

_{VG}(

*t*) are obtained by choosing the preferred change in the VG angle corresponding to one of the motor neurons, which is sampled from the probability distribution obtained by applying a softmax function to the vector of motor-neuron responses,

*T*is a positive temperature parameter controlling the greediness of the softmax function. Given the VG command, the VG angle is updated according to

_{h}_{,}

_{k}_{,}

*(*

_{n}e*h*,

*k*,

*n*,

*t*), where

*e*(

*h*,

*k*,

*n*,

*t*) is given in (5).