Visual motion estimation is a canonical neural computation. In *Drosophila*, recent advances have identified anatomic and functional circuitry underlying direction-selective computations. Models with varying levels of abstraction have been proposed to explain specific experimental results but have rarely been compared across experiments. Here we use the wealth of available anatomical and physiological data to construct a minimal, biophysically inspired synaptic model for *Drosophila*’s first-order direction-selective T4 cells. We show how this model relates mathematically to classical models of motion detection, including the Hassenstein-Reichardt correlator model. We used numerical simulation to test how well this synaptic model could reproduce measurements of T4 cells across many datasets and stimulus modalities. These comparisons include responses to sinusoid gratings, to apparent motion stimuli, to stochastic stimuli, and to natural scenes. Without fine-tuning this model, it sufficed to reproduce many, but not all, response properties of T4 cells. Since this model is flexible and based on straightforward biophysical properties, it provides an extensible framework for developing a mechanistic understanding of T4 neural response properties. Moreover, it can be used to assess the sufficiency of simple biophysical mechanisms to describe features of the direction-selective computation and identify where our understanding must be improved.

*Drosophila*’s visual circuits suggest that we may move toward more mechanistic, biophysical descriptions of this computation. Here, we follow previous work (Gruntman, Romani, & Reiser, 2018; Torre & Poggio, 1978) to propose a simple, biophysically plausible synaptic model for direction selectivity in

*Drosophila*'s ON-edge-sensitive motion pathway. We compare its predictions to measurements made by several research groups in response to many stimuli, giving us a tool for understanding which features are sufficient to describe different response properties.

*Drosophila*optic lobe are the T4 and T5 cells, which are sensitive to moving ON-edges (consisting of contrast increments) and OFF-edges (consisting of contrast decrements), respectively (Clark, Bursztyn, Horowitz, Schnitzer, & Clandinin, 2011; Joesch, Schnell, Raghu, Reiff, & Borst, 2010; Maisak, Haag, Ammer, Serbe, Meier, Leonhardt, Schilling, Bahl, Rubin, Nern, 2013). Electron microscopy and genetic silencing have identified the primary inputs to T4 and T5 cells (Serbe, Meier, Leonhardt, & Borst, 2016; Shinomiya, Huang, Lu, Parag, Xu, Aniceto, Ansari, Cheatham, Lauchie, & Neace, 2019; Strother, Wu, Wong, Nern, Rogers, Le, Rubin, Reiser, 2017; Takemura, Nern, Chklovskii, Scheffer, Rubin, & Meinertzhagen, 2017). These studies suggest that T4 cells receive input from three distinct colinear spatial locations, with the neurons Mi1 and Tm3 both relaying information about the central point, and the neurons Mi9 and Mi4 acting as relays for the two flanking points (Takemura et al., 2017) (Figure 1A). The neuron T5 appears to have a similar spatial structure, with different input neurons (Shinomiya et al., 2019). Both cell types also receive spatially-localized inputs from other neurons, whose functions remain less well understood (Shinomiya et al., 2019; Takemura et al., 2017).

*Drosophila*optic lobe. We simplified this structure to consider three inputs to a T4 cell: a delayed ND-offset OFF inhibitory input representing Mi9, a centered ON excitatory input representing Mi1 and Tm3, and a delayed PD-offset ON inhibitory input representing Mi4 (or CT1 or both) (Figure 1A) (Strother et al., 2017; Takemura et al., 2017).

*L*

_{1}-normalized Gaussian spatial acceptance function

*L*

_{2}-normalized lowpass temporal filter

*x*) is the Heaviside step function. To represent the nondelayed central input to the motion detector, we replace the temporal filter

*f*by its normalized distributional derivative \(\dot f = 2\;{\tau ^{ - \frac{3}{2}}}( {\tau - t} )\;{\rm{\Theta }}( t )\;{e^{ - \frac{t}{\tau }}}\). Using these filters, we define the filtered contrast signal

*s*at each point in spacetime:

*c*(

*t*,

*x*) is the input contrast and * denotes spatiotemporal convolution with the functions

*f*and

*h*over the appropriate domain. Because taking the temporal derivative of the filtered contrast signal is equivalent to filtering with the derivative of the temporal filter, we will use the notation \(\dot s\) for the high-pass-filtered signal throughout. For convenient handling of spatial boundary conditions, we numerically simulate the full 360 degrees of visual space, which is a periodic interval.

*R*(

*x*): =max {0,

*x*} is the ramp function, and

*g*

_{inh}and

*g*

_{exc}are parameters scaling the effects of each input on the postsynaptic conductances (Figure 1A–B). Thus we represent the conductances as linear-nonlinear (LN) transformations of the input contrast (Dayan & Abbott, 2001).

*V*

_{m}of the postsynaptic cell such that the reversal potential for leak currents is 0 mV. The cell's membrane voltage dynamics are then given as (Torre & Poggio, 1978)

*c*

_{m}is the membrane capacitance,

*g*

_{leak}is the leak conductance, and

*E*

_{inh}and

*E*

_{exc}are the reversal potentials for inhibitory and excitatory currents, respectively. We follow previous work to neglect capacitive currents in T4 cells (Gruntman et al., 2018), and we solve for the pseudo-steady-state membrane voltage (Gruntman et al., 2018; Torre & Poggio, 1978)

*C*as a positively rectifying half-quadratic function

*R*

^{2}(

*x*): =(

*R*(

*x*))

^{2}:

*E*

_{exc}= 60 mV and

*E*

_{inh}= −30 mV, which are plausible based on electrophysiological experiments (Gruntman et al., 2018). The model membrane potential can be written as

*g*

_{1},

*g*

_{2}, and

*g*

_{3}to

*g*

_{leak}, rather than their absolute magnitudes, are relevant. We therefore express the postsynaptic conductances as nondimensional quantities in units of

*g*

_{leak}, leaving

*g*

_{exc}/

*g*

_{leak}and

*g*

_{inh}/

*g*

_{leak}as the model's two free parameters. The procedure used to select the values of these parameters is described in detail in Appendix B. As shown previously (Badwan et al., 2019), there exists a broad region of parameter space for which this model displays strong directional responses to sinusoid gratings with a temporal frequency of 1 Hz and a spatial wavelength of 45 degrees. We note that our choice of filter normalization, which differs from that in the previous use of this model (Badwan et al., 2019), affects the parameter values chosen, as it scales

*g*

_{1},

*g*

_{2}, and

*g*

_{3}relative to

*g*

_{leak}. Table 1 summarizes the model parameter values used in all simulations.

*s*

_{1}(

*t*): =

*s*(

*t*,

*x*− Δ),

*s*

_{2}(

*t*): =

*s*(

*t*,

*x*), and

*s*

_{3}(

*t*): =

*s*(

*t*,

*x*+ Δ), and defining the non-negative constants α: =|

*g*

_{inh}

*E*

_{inh}/

*g*

_{leak}| and γ: =|

*g*

_{exc}

*E*

_{exc}/

*g*

_{leak}|, we have, to lowest order in the inputs,

*Drosophila*respond direction-selectively to correlations higher than second-order (Clark et al., 2014; Leonhardt et al., 2016). This cannot be explained by models that compute pairwise correlations in the stimulus, such as the HRC and motion energy model. The sensitivity to higher-order correlations has been assessed using 3-point glider stimuli, which contain precise third-order correlations (Hu & Victor, 2010) (Figure 4C). The net responses of T4 cells to these stimuli have previously been inferred from behavioral measurements in

*Drosophila*with the synaptic outputs of T5 cells silenced, using gliders updated at 24 Hz (Leonhardt et al., 2016). We used in vivo 2-photon calcium imaging to directly measure the responses of T4 cells to 3-point gliders updated at 5 Hz and found that the signs of the net responses were consistent with those measured in behavior with T5 cells silenced (Figure 4C, see Methods and Appendix A for details).

*Drosophila*T4 cells. This model reproduces the direction-opponency, temporal-frequency-tuning, orientation-tuning, and phi/reverse-phi selectivity measured in T4 cells (Figures 2–4). When applied to a naturalistic velocity estimation task, it produces decorrelated signals similar to those measured in T4 and T5 neurons (Figure 5). However, it fails to reproduce the PD enhancement and fast-timescale tuning observed in T4 cells (Figures 3–4). Moreover, although it is sensitive to triplet correlations in its input, it fails to reproduce responses on the same timescales as observed in the data (Figure 4). In short, this simple synaptic model is sufficient to reproduce several distinct properties of T4 cells but cannot account for several observations.

*Journal of the Optical Society of America A*, 2, 284–299. [CrossRef]

*Vision Research*, 10, 1411–1430. [CrossRef] [PubMed]

*Current Biology*, 27, 929–944. [CrossRef] [PubMed]

*Neuron*, 36, 909–919. [CrossRef] [PubMed]

*Nature Neuroscience*, 22, 1318–1326. [CrossRef] [PubMed]

*The Journal of Physiology*, 178, 477. [CrossRef] [PubMed]

*A comprehensive course in analysis*. Providence, Rhode Island: American Mathematical Society.

*Nature*, 512, 427–430. [CrossRef] [PubMed]

*PLoS Computational Biology*, 14, e1006240. [CrossRef] [PubMed]

*Trends in Neurosciences*, 12, 297–306. [CrossRef] [PubMed]

*Proceedings of the National Academy of Sciences of the United States of America*, 102, 6172–6176. [CrossRef] [PubMed]

*Nature Neuroscience*, 18, 1067–1076. [CrossRef] [PubMed]

*PLoS Computational Biology*, 5, e1000555. [CrossRef] [PubMed]

*eLife*, 8, e47579. [CrossRef] [PubMed]

*eLife*, 5, e21053. [CrossRef] [PubMed]

*Network: Computation in Neural Systems*, 12, 199–213. [CrossRef]

*PLoS Computational Biology*, 9, e1003289. [CrossRef] [PubMed]

*Neuron*, 70, 1165–1177. [CrossRef] [PubMed]

*Current Biology*, 26, R1062–R1072. [CrossRef] [PubMed]

*Nature Neuroscience*, 17, 296–303. [CrossRef] [PubMed]

*Neuron*, 100, 1460–1473. [CrossRef] [PubMed]

*Journal of Neuroscience Methods*, 323, 48–55. [CrossRef] [PubMed]

*Theoretical neuroscience*. Cambridge, MA: MIT Press.

*eLife*, 8, e46409. [CrossRef] [PubMed]

*Search methodologies*, (pp. 403–449). Berlin, Germany: Springer.

*Current Biology*, P209–2221.E8.

*Journal of the Optical Society of America A*, 18, 241–252. [CrossRef]

*Journal of the American statistical Association*, 82, 171–185. [CrossRef]

*Neuron*, 70, 1155–1164. [CrossRef] [PubMed]

*Nature*, 418, 845–852. [CrossRef] [PubMed]

*Vision Research*, 23, 1265–1279. [CrossRef] [PubMed]

*Neuron*, 88, 390–402. [CrossRef] [PubMed]

*eLife*, e09123.

*Proceedings of the National Academy of Sciences of the United States of America*, 108, 12909–12914. [CrossRef] [PubMed]

*Cell Reports*, 18, 1356–1365. [CrossRef] [PubMed]

*Neuron*, 78, 1075–1089. [CrossRef] [PubMed]

*Annual Review of Psychology*, 59, 167–192. [CrossRef] [PubMed]

*Nature Neuroscience*, 21, 250–257. [CrossRef] [PubMed]

*eLife*, 5.

*Zeitz Naturforschung*, 11, 513–524. [CrossRef]

*Nonparametric statistical methods*, Vol. 751. Hoboken, NJ: Wiley.

*Journal of Comparative Physiology A*, 154, 707–718. [CrossRef]

*Journal of Vision*, 10, 9.1–16. [CrossRef]

*The Journal of Physiology*, 594, 883–894. [CrossRef] [PubMed]

*Science*, 262, 1901–1904. [CrossRef] [PubMed]

*Nature*, 468, 300–304. [CrossRef] [PubMed]

*Neuron*, 81, 616–628. [CrossRef] [PubMed]

*Neuron*, 59, 322–335. [CrossRef] [PubMed]

*Nature*, 509, 331–336. [CrossRef] [PubMed]

*The Journal of Neuroscience*, 21, 287–299. [CrossRef] [PubMed]

*Perception*, 36, 1–16.

*Single neuron computation*, (pp. 315–345). Philadelphia: Elsevier.

*Zeitschrift für vergleichende Physiologie*, 44, 656–684. [CrossRef]

*The Journal of Neuroscience*, 36, 8078–8092. [CrossRef] [PubMed]

*Nature Neuroscience*, 19, 706–715. [CrossRef] [PubMed]

*Nature*, 500, 212–216. [CrossRef] [PubMed]

*Nature Communications*, 10, 4979. [CrossRef] [PubMed]

*AI Memo (Massachussetts Institute of Technology)*.

*Current Biology*, 222–236.e6.

*Current Biology*, 29, 1545–1550. e1542. [CrossRef] [PubMed]

*Current Biology*, 24, 385–392. [CrossRef] [PubMed]

*Neural Computation*, 15, 735–759. [CrossRef] [PubMed]

*eLife*, 8, e49373. [CrossRef] [PubMed]

*Spatial Vision*, 10, 437–442. [CrossRef] [PubMed]

*Journale de Physique*, 4, 1755–1775. [CrossRef]

*The Journal of Neuroscience*, 26, 2941–2950. [CrossRef] [PubMed]

*The Journal of Neuroscience*, 14, 7357–7366. [CrossRef] [PubMed]

*The Journal of Neuroscience*, 21, 9445–9454. [CrossRef] [PubMed]

*Current Biology*, 28, 3748–3762. e3748. [CrossRef] [PubMed]

*Neuron*, 92, 227–239. [CrossRef] [PubMed]

*Neuron*, 66, 15. [CrossRef] [PubMed]

*Neuron*, 89, 829–841. [CrossRef] [PubMed]

*eLife*, 8, e40025. [CrossRef] [PubMed]

*Journal of Comparative Physiology A: Neuroethology, Sensory, Neural, and Behavioral Physiology*, 189, 189–202. [CrossRef]

*Journal of Vision*, 8, 32–32. [CrossRef]

*Neuron*, 94, 168–182. e110. [CrossRef] [PubMed]

*Nature*, 500, 175–181. [CrossRef] [PubMed]

*Elife*6, e24394. [CrossRef] [PubMed]

*Proceedings of the Royal Society B.*, 202, 409–416.

*Neuron*, 99, 680–688.e4. [CrossRef] [PubMed]

*Annual Review of Vision Science*, 4, 143–163. [CrossRef] [PubMed]

*Cell*, 166, 245–257. [CrossRef] [PubMed]

*The Journal of General Physiology*, 127, 495–510. [CrossRef] [PubMed]

*c*

_{0}is the input contrast, ω is the temporal frequency in units of radians per second, κ is the spatial frequency in units of radians per degree, and the negative sign is taken for rightward-drifting gratings. To assess whether our model is temporal-frequency-tuned, we computed the fraction of the total variance in a spatiotemporal frequency sweep of its responses accounted for by a separable approximation resulting from its singular value decomposition (Creamer et al., 2018). Counterphase gratings were constructed as

_{1}and ϕ

_{2}are uniformly sampled phase offsets, over which we average in all analyses. Gratings containing preferred- and orthogonal-direction motion were constructed as

*y*= 0, and that the Gaussian spatial filter is symmetric in

*x*and

*y*. Static gratings were formed by setting ω = 0. We note that our convention for the orientation of a static grating differs from the original manuscript (Fisher et al., 2015); we define the orientation as the angle between the normal to the apparent edge and the preferred direction rather than the angular position of the edge itself. Therefore, in our convention the preferred orientations and directions align.

*B*(

*t*,

*x*) is an uncorrelated binary stimulus composed of 5-degree black or white bars, and addition (respectively, subtraction) generates positive (respectively negative) correlations. The stimulus was updated at a fixed rate, and the temporal offset δ

*t*was taken to be one cycle, with its sign determining whether the stimulus was oriented in the preferred or null direction. The spatial offset δ

*x*was fixed to be 1 bar width. As shown in (Salazar-Gatzimas et al., 2016), the autocorrelation function of this stimulus, with spacetime discretized by the bar width and sampling rate, is

_{i,j}is the Kronecker delta.

*I*

_{blur}, and then used a Gaussian kernel with a standard deviation of 20° to estimate locally-averaged images

*I*

_{mean}. The contrast signal was then defined as (Chen et al., 2019)

^{6}elements.

*g*

_{exc}/

*g*

_{leak}and

*g*

_{inh}/

*g*

_{leak}. We evaluated the model solely based on its ability to produce direction-opponent average responses to 1 Hz, 45-degree sinusoid gratings similar to those measured in T4 cells (Badwan et al., 2019). To do so, we considered the direction selectivity index and analogous indexes of direction-opponency and orthogonal direction enhancement, defined as

*C*of the full model at each point in spacetime is given in terms of the filtered contrast signal

*s*as

*D*(

*t*,

*x*) ≤ 1. Because

*D*(

*t*,

*x*) ≤ 1,

*C*(

*t*,

*x*) ≤

*N*(

*t*,

*x*).

*D*is the result of applying a convex function (

*x*

^{−2}for

*x*> 0) to a non-negative linear combination of LN models with convex nonlinearities. Therefore it cannot generate direction-opponent (DO) average responses to sinusoid gratings. The proof of this proposition is a minor extension of our previous results on LNLN models with continuously-differentiable convex nonlinearities and non-negative secondary linear filters (Badwan et al., 2019). We define the soft ramp function

*x*for all positive β. As β → ∞,

*R*

_{β}(

*x*) →

*R*(

*x*) pointwise. By continuity, defining

*D*

_{β}(

*t*,

*x*) using

*R*

_{β}, we have 0 ≤

*D*

_{β}(

*t*,

*x*) →

*D*(

*t*,

*x*) ≤ 1 as β → ∞. We denote the nonlinear functional corresponding to the spacetime average of

*D*

_{β}(

*t*,

*x*) for some input stimulus

*f*as

*D*

_{β}[

*f*]. As we have the integrable constant dominating function 1, by the Lebesgue dominated convergence theorem, we have 0 ≤

*D*

_{β}[

*f*] →

*D*[

*f*] ≤ 1 as β → ∞ (Barry, 2015). By the result of (Badwan et al., 2019), we know that

*D*

_{β}[

*PD*+

*ND*] ≥

*D*

_{β}[

*PD*] and

*D*

_{β}[

*PD*+

*ND*] ≥

*D*

_{β}[

*ND*], where

*D*

_{β}[

*PD*],

*D*

_{β}[

*ND*], and

*D*

_{β}[

*PD*+

*ND*] are the average responses to PD, ND, and PD+ND sinusoid gratings, respectively. As these inequalities hold pointwise for all positive β, by taking β → ∞ we may obtain

*D*[

*PD*+

*ND*] ≥

*D*[

*PD*] and

*D*[

*PD*+

*ND*] ≥

*D*[

*ND*]. Therefore the denominator LNLN cascade cannot generate DO average responses to sinusoid gratings.