Many nocturnal insects depend on vision for daily life and have evolved different strategies to improve their visual capabilities in dim light. Neural summation of visual signals is one strategy to improve visual performance, and this is likely to be especially important for insects with apposition compound eyes. Here we develop a model to determine the optimum spatiotemporal sampling of natural scenes at gradually decreasing light levels. Image anisotropy has a strong influence on the receptive field properties predicted to be optimal at low light intensities. Spatial summation between visual channels is predicted to extend more strongly in the direction with higher correlations between the input signals. Increased spatiotemporal summation increases signal-to-noise ratio at low frequencies but sacrifices signal-to-noise ratio at higher frequencies. These results, while obtained from a model of the insect visual system, are likely to apply to visual systems in general.

*f*

_{s}as 1/

*f*

_{s}

^{2}(Dong & Atick, 1995; Field, 1987; Simoncelli & Olshausen, 2001; van der Schaaf & van Hateren, 1996). Due to correlation within natural images, neighboring visual channels share some information, and thus a proportion of the signal generated in one channel can be predicted from the signals generated in neighboring channels (Srinivasan et al., 1982; Tsukamoto, Smith, & Sterling, 1990; van Hateren, 1992c). In bright light this “predictive coding” leads to visual sampling involving lateral inhibition between adjacent channels (band-pass filtering), whereas in dim light sampling involves spatial summation of signals from groups of neighboring channels (low-pass filtering). As shown theoretically, a strategy of neural summation of light in space and time can enable an eye to improve visual reliability in dim light (Tsukamoto et al., 1990; van Hateren, 1992c; Warrant, 1999). Neural summation will, however, cause a decrease in spatial and temporal resolution (Warrant, 1999).

*Megalopta genalis,*possess larger corneal facet lenses and wider rhabdoms than those of closely related diurnal bees, and thus enjoy an increased optical sensitivity (Greiner, Ribi, & Warrant, 2004). Further improvements in visual sensitivity are likely to come from enhanced visual gain in the photoreceptors (Barlow, Bolanowski, & Brachman, 1977; Barlow et al., 1987; Frederiksen, Wcislo, & Warrant, 2008) and from a strategy of visual summation in space and time (Theobald, Greiner, Wcislo, & Warrant, 2006; Warrant et al., 2004). Morphological, electrophysiological, and theoretical evidences (Warrant, 2008b) suggest that widely arborizing second-order cells in the lamina (the LMC cells) might mediate a spatial summation of visual signals generated in the retina (Greiner, Ribi, & Warrant, 2005; Greiner, Ribi, Wcislo, & Warrant, 2004).

*Megalopta*—began to evolve a nocturnal lifestyle, the optical and neural structures of the eye altered according to natural selection (Frederiksen & Warrant, 2008). As the diurnal ancestor conquered dimmer and dimmer niches, its eyes evolved greater sensitivity: widening rhabdoms increased the visual fields of the photoreceptors, and the dendritic trees of the LMC cells became more extensive, possibly in order to mediate spatial summation. How should these optical and neural changes evolve in order to continuously provide optimal visual performance during the evolution of a nocturnal lifestyle?

*Megalopta,*which is dominated by vertical tree trunks. How might such natural scenes affect the sampling strategy that “evolves”?

*I*by superimposing photon noise ( Figure 2C). Different image velocities were simulated by shifting static grayscale images by known numbers of pixels per unit time ( Figure 2B). Here we consider only image motion in a horizontal direction and denote the horizontal image velocity by

*α*(measured in units of visual channel widths per 10 ms, see below).

*ρ*

_{ v}and Δ

*ρ*

_{ h}, respectively, and with a temporal integration time of Δ

*t*ms. The array of visual channels corresponds to the array of image pixels, and thus, Δ

*ρ*

_{ v}and Δ

*ρ*

_{ h}are measured in units of pixel widths (or units of visual channel widths).

*I*(at any given image velocity

*α*) the parameters Δ

*ρ*

_{ v}, Δ

*ρ*

_{ h}, and Δ

*t*are allowed to “evolve” in an iterative fashion to provide an optimal sampling of the input image. By “optimal” we mean the sampling that results in a filtered output image that is most like the unfiltered initial (noiseless) image. Thus, using this rationale we will determine how the spatiotemporal properties of visual sampling “evolve” to optimize vision at dimmer and dimmer light levels.

*M,*usually set to 128 or 256, are used. An image can be considered as a two-dimensional array of pixel intensities

*g*

_{k,l}with spatial coordinates

*k*and

*l*(

*k, l*= 0, ±1, …), which correspond to the vertical and horizontal directions in the image, respectively. In Figure 3 two images with different isotropy, and their normalized power spectra are shown (for the calculation of the one-dimensional power spectrum, see 3). The power for the isotropic scene is equally distributed in all directions (Figure 3A). In the case of the anisotropic scene there is more power in the horizontal than in the vertical direction for most spatial frequencies

*f*

_{s}(Figure 3B).

*g*

_{ k,l}, resulting in a sequence of images

*g*

_{ k,l,t}

^{( α)},

*t*= 0, 1, 2,… ( Figure 2B; see 1). The region of interest is given by

*k, l*= 0, …,

*M*− 1. The time step between two subsequent image frames

*g*

_{ k,l,t}

^{( α)}and

*g*

_{ k,l,t+1}

^{( α)}was chosen to be 10 ms. The parameter

*α*describes the velocity of horizontal motion and is measured in units of receptor widths per 10 ms. For

*α*> 0 the images

*g*

_{ k,l,t}

^{( α)}contain motion blur. These images represent the noiseless images at the level of the photoreceptors in the retina at time steps

*t*= 0, 1, 2, ….

*g*

_{ k,l,t}

^{( α)}( Figure 2B) and the result is a sequence of noisy images

*g*

_{ k,l,t}

^{( α,I)},

*t*= 0, 1, 2,… ( Figure 2C) with a mean intensity of

*I*photons per visual channel per second (cf. Supplementary Figure 4). Photon arrival is a random event, which follows a Poisson distribution. Assuming a mean intensity of

*I*photons per visual channel per second we can generate the noisy image sequence

*g*

_{ k,l,t}

^{( α,I)}by scaling the noiseless image with its mean value to unity, multiplying it by

*I*/100 (one image frame corresponds to 10 ms), and drawing Poisson distributed random numbers at each pixel location according to its intensity value:

_{ k,l,t}

^{( α)}represents the normalized image sequence

*g*

_{ k,l,t}

^{( α)}with mean intensity equal to 1 (see 1). Poiss(

*λ*) returns a Poisson distributed random number with mean value

*λ*.

*changes*in the incoming signal that are encoded and transferred to later stages of processing in the visual system. Visual channels in the surround predict the output of visual channels in the center, and only the difference in signals generated by the center and surround is sent to later stages of processing. The mechanism of lateral inhibition was first described between neighboring ommatidia in the horseshoe crab

*Limulus polyphemus*(Hartline, Wagner, & Ratliff, 1956). Center–surround antagonism (or lateral inhibition) is generally associated with redundancy reduction in the early visual system (van Hateren, 1992a; but see also Barlow, 2001). Natural images contain “redundant” information due to their intrinsic autocorrelation, and the limited dynamic ranges of photoreceptors and neurons strongly imply the importance of eliminating predictable signal components. Lateral inhibition leads to band-pass filtering that is characterized by the enhancement of spatial edges and temporal transients (van Hateren, 1992b). However, for low signal-to-noise ratios (dim light conditions) the band-pass characteristic of receptive fields is predicted to change to a low-pass characteristic. With decreasing signal-to-noise ratio the size of the excitatory center widens and the contribution of the inhibitory surround diminishes, and disappears altogether (Barlow, Fitzhugh, & Kuffler, 1957; Batra & Barlow, 1982; Renninger & Barlow, 1979; van Hateren, 1992c). Since we are interested in the transition from bright to dim light we decided to account only for the more dominant changes in the excitatory center and model the spatial receptive field by a Gaussian function with parameters

*σ*

_{v}and

*σ*

_{h}(see also Hemilä, Lerber, & Donner, 1998):

*u*=

*v*= 0. The spatial half-widths Δ

*ρ*

_{v}and Δ

*ρ*

_{h}can be calculated by simply multiplying

*σ*

_{v}and

*σ*

_{h}in Equation 2 by a factor of 2.35.

*t*and

*τ*

_{ p}(Howard, 1981; Howard, Dubs, & Payne, 1984; Payne & Howard, 1981). Δ

*t*and

*τ*

_{p}are measured in milliseconds and denote the half-width and the time-to-peak value of the impulse response function, respectively:

*t*and

*τ*

_{ p}are not modeled independently but the parameter

*σ*is set to a constant value. This means an increase of

*τ*

_{ p}comes with an increase in Δ

*t,*and vice versa (cf. Equation 4). Howard (1981) reported a

*σ*of 0.31 in the dark- and light-adapted photoreceptors of the locust. The dark-adapted photoreceptor in the worker honeybee has a

*σ*of 0.28, whereas in

*M. genalis σ*= 0.32 (Warrant et al., 2004). In all simulations we use a value of

*σ*= 0.32.

*N*filtered images where

*V*

_{ t}and

*G*=

*G*

_{ k,l}are discretized and normalized versions of Equations 2 and 3, respectively (see 2).

*V*

_{ t}is shifted such that its peak value is at

*t*

_{0}= ⌊

*T*/3⌋ (⌊·⌋ denotes the floor function). This means that the images

*h*

_{ k,l,n},

*n*= 0, …,

*N*− 1 in Equation 5 represent the spatiotemporally filtered versions of the frames

_{ k,l,t 0}

^{( α,I)}, …,

_{ k,l,t 0 + N−1}

^{( α,I)}. (

*G**

*g*) denotes convolution of an image

*g*=

*g*

_{ k,l,t}with a Gaussian kernel

*G*=

*G*

_{ k,l}and yields a spatially filtered image sequence (see 2). For all calculations the size of the spatial receptive field

*G*

_{ k,l}is determined by the weighting coefficients where summation falls below 1% of the on-axis amplitude.

*t*

_{0}= ⌊

*T*/3⌋ (cf. Equation 5), and

*N*

_{p}= (

*M*− 2

*m*)

^{2}

*N*. For the calculation of the

*MSE*value (Equation 6) we use

*N*= 10 subsequent image frames. To exclude artifacts due to padding,

*m*has to be chosen according to the size of the receptive field (1% threshold for summation coefficients).

*I*and image velocity

*α*smaller values of the

*MSE*indicate a better “quality” of the filtered image, that is, a better match to the original noiseless image. The goal is to find the “optimal” spatiotemporal parameters Δ

*ρ*

_{ v}, Δ

*ρ*

_{ h}, and Δ

*t*that minimize the difference between the filtered and noiseless image sequences for a given combination of

*I*and

*α*:

*MSE*values of 5 noisy instances of the same image sequence filtered with the same receptive field. To find a local minimum of Equation 7 we apply the Broyden–Fletcher–Goldfarb–Shanno (BFGS) method, which is widely used for nonlinear optimization problems and belongs to the class of quasi-Newton methods (see, e.g., Press, Teukolsky, Vetterling, & Flannery, 2007). The optimal parameter values as obtained by the BFGS method depend, in general, on the initial parameter values.

*α*all simulations begin at bright light intensities with a sampling that includes only a sole visual channel integrating signals over a single image frame (i.e., Δ

*ρ*

_{ v}= Δ

*ρ*

_{ h}= 0.75 receptor widths, Δ

*t*= 4.6 ms). Dimmer and dimmer light levels are subsequently simulated and the spatiotemporal properties found optimal at a certain light intensity are then used as initial values for the next dimmer light intensity. Thus, the spatiotemporal receptive fields “evolve” from brighter to lower light levels. The word “evolution” is used only for convenience and is not intended to suggest “biological evolution” since the model cannot account for all biological constraints.

*α*= 1.5 visual channel widths per 10 ms). The spatial “evolution” of the optimal receptive field is shown in Figures 4B and 4E for gradually decreasing light intensities (from left to right). In bright light conditions spatial sampling only includes a single visual channel, and for the isotropic scene spatial summation involves more and more neighboring visual channels as light intensity falls. In the case of an anisotropic scene the spatial receptive field “evolves” an anisotropy with summation more pronounced in the vertical direction. This anisotropy can only be observed at lower light intensities when the sampling begins to sum over multiple visual channels. At log

*I*= 1.5 the isotropic receptive field includes 27 channels with at least 1% contribution (compared to the visual channel at the center of the receptive field). At a comparable light level (log

*I*= 0.8) the anisotropic receptive field includes only 23 channels. Interestingly, decreasing the light level by one further log unit inverts the situation: 61 channels contribute to the spatial summation in the isotropic case (log

*I*= 0.5; at least 1% contribution), while 87 visual channels contribute in the anisotropic case (log

*I*= −0.2; see also Supplementary Figure 3). Single image frames of the optimally spatiotemporally filtered image sequences are shown in Figures 4C and 4F, respectively. For any given image velocity a decrease in light intensity gradually increases spatial summation and temporal integration ( Figure 5,

*α*= 1.5; see also Supplementary Figures 1A and 2A).

*α*= 1.5 there is slightly more summation in the horizontal direction (i.e., in the direction of image motion). At lower light levels the ratio between vertical and horizontal pooling is less than 1 (Δ

*ρ*

_{ v}/Δ

*ρ*

_{ h}= 0.8, log

*I*= −0.5). This is due to image blur, which is introduced at the level of the visual channels, and horizontal summation increases monotonically as image velocity

*α*increases. For

*α*= 1.5 temporal integration increases with decreasing light intensity in a similar fashion to spatial pooling. In this particular case this indicates a balanced summation in both space and time.

*ρ*

_{ v}/Δ

*ρ*

_{ h}= 7.8, log

*I*= −1.2). The onset of temporal integration occurs for

*α*≥ 1.5 only at light intensities lower than log

*I*= 0.8 ( Figure 5B; Supplementary Figure 2A).

*I,*however, strongly depends on the image velocity

*α*. At very low image velocities the receptive field almost exclusively sums temporally ( Figure 6, anisotropic image). Although this behavior can be observed for both the isotropic and anisotropic scenes, temporal integration is less extensive for the latter (cf. Supplementary Figures 1 and 2). For the anisotropic scene an increase in image velocity

*α*leads to an asymptotic increase of summation in the vertical direction ( Figure 6A), whereas summation in the horizontal direction continuously increases ( Figure 6B). This is due to motion blur, which increases as

*α*increases. Furthermore, increasing the image velocity at low light intensities leads to a strong reduction of temporal integration ( Figure 6C).

_{ opt}= SNR

_{ opt}(

*f*

_{ s}) for the optimally filtered images drops over the entire range of spatial frequencies

*f*

_{ s}as light intensity falls ( Figure 7A; see 4 for the calculation of signal and noise). The model thus indicates that vision inevitably deteriorates at lower and lower light levels even with the optimal pooling strategy as we define it here. So what is the advantage of spatial and temporal summation? To answer this question we compared the optimally filtered scenes with the corresponding unfiltered scene for each light level. As can be seen in Figure 7B, the ratio between SNR

_{ opt}and SNR

_{ unfilt}increases at lower spatial frequencies but drops at higher frequencies. Spatiotemporal summation increases signal power at lower frequencies and blurs finer image details (at higher frequencies), thus reducing high frequency signal power. The influence of spatiotemporal summation on noise power, on the other hand, is to decrease it over the entire frequency range (not shown). At higher frequencies, however, signal power is suppressed more than noise power. As a result, signal-to-noise ratio is improved at low frequencies but reduced at higher frequencies. Thus, the effect of spatiotemporal summation at dimmer light levels is to increase the reliability of coarser spatial details, while sacrificing finer spatial details ( Figure 7B). The same behavior can be observed in the temporal frequency domain (not shown): at dimmer light levels the reliability of slower details is increased, while faster details are sacrificed.

_{ opt}/SNR

_{ unfilt}drops below 1 indicates the frequency at which signal-to-noise ratio is sacrificed at higher frequencies in order to improve the signal-to-noise ratio at lower frequencies. At log

*I*= 3.5, the cut-off frequency is

*f*

_{ s}= 56 cycles/image but decreases to 18 cycles/image as light intensity falls to log

*I*= −0.5.

*ρ*

_{v}/Δ

*ρ*

_{h}is always greater than 1. At moderate light intensities and for high image velocities, however, horizontal summation dominates vertical summation due to motion blur. An increased summation in the horizontal direction, that is, in the direction of image motion, is also present for the isotropic scene. Not surprisingly, for the isotropic image sequence, the ratio Δ

*ρ*

_{v}/Δ

*ρ*

_{h}for optimum spatiotemporal receptive fields is less than or equal to 1 for all light levels

*I*and image velocities

*α*(cf. Supplementary Figure 1A).

*Megalopta*—might represent the neural substrate mediating orientation selectivity or anisotropic spatial summation, respectively.

*ρ*

_{ v}= Δ

*ρ*

_{ h}. For example for the anisotropic image sequence with

*α*= 1.5 (at log

*I*= −1.2), the optimization scheme in Equations 6 and 7, with the constraint Δ

*ρ*

_{ v}= Δ

*ρ*

_{ h}, yields a spatial half-width Δ

*ρ*of 6.2 channel widths (which includes 193 visual channels); this value is between the independently optimized half-widths Δ

*ρ*

_{ v}= 22.5 and Δ

*ρ*

_{ h}= 2.9 channel widths (with 337 visual channels in total). The isotropically constrained receptive field model (Δ

*ρ*

_{ v}= Δ

*ρ*

_{ h}) results in too much summation in the horizontal direction and too little summation in the vertical direction, which together degrades the summed output signal. Furthermore, in the anisotropic case the integration time Δ

*t*increases from 19 to 25 ms in order to counterbalance the reduced spatial summation (193 instead of 337 channels). The longer Δ

*t,*however, cannot fully compensate for the predictive loss indicated by an increase of the

*MSE*value for the constrained model (data not shown). This together shows that the shape of the receptive field should match the anisotropy of the moving stimulus. Within the framework of the model presented here, and the optimization scheme we used in this study (see Theory and methods section), this is indeed possible—the receptive field model allows the optimum combination of spatial and temporal parameters to “evolve” for each combination of

*I*and

*α*.

*I*and the image velocity

*α*. A decrease in light intensity increases spatial and temporal summation ( Figure 5, Supplementary Figures 1A and 2A). Image motion, however, has a different effect on the spatial and temporal parameters, respectively ( Figure 6). Increasing image motion decreases Δ

*t,*and to counteract the loss in temporal summation the eye has to sum more extensively in space. Therefore, a shortening of the temporal integration time comes with an increase in summation in space; and vice versa: lengthening Δ

*t*decreases Δ

*ρ*

_{v,h}( Supplementary Figures 1B and 2B). Thus, the relative balance between spatial and temporal summation is likely to be strongly influenced by ecological and behavioral constraints: slow animals are likely to invest more heavily in temporal summation than faster ones (Warrant, 1999).

*Megalopta genalis,*for which visual navigation plays an important role in homing (Warrant et al., 2004). The integration time and acceptance angle of dark-adapted photoreceptors are both large in

*Megalopta*; 32 ms and 5.6°, respectively (compared to 18 ms and 2.6°, respectively, in the diurnal honeybee

*Apis mellifera*; Warrant et al., 2004). This coarser spatial and temporal resolution, however, cannot on its own explain

*Megalopta*'s behavior at night. Since the dark-adapted integration time is not exceptionally slow, spatial summation in the lamina is likely to further improve

*Megalopta*'s visual sensitivity. This hypothesis is supported by morphological evidence in the lamina of

*M. genalis*(Greiner et al., 2005; Greiner, Ribi, Wcislo et al. 2004; Warrant et al., 2004) and by theoretical predictions (Theobald et al., 2006; Warrant, 1999). Our model confirms that spatial summation should be more extensive for greater image motion, as would be experienced by

*Megalopta*during rapid forward flight (cf. Figure 6). To be “optimal” such an extension of spatial summation should account for differences in the image anisotropy (see above). Vertical structures such as tree trunks can change the image statistics sufficiently to depart markedly from isotropy (cf. Figure 3; but see also van der Schaaf & van Hateren, 1996). Although the visual environment of

*Megalopta*'s habitat has not been investigated systematically, the average image statistics of rainforest scenes is not likely to be isotropic. A strategy of anisotropic neural summation might therefore benefit

*Megalopta*during its nightly navigation flights in the rainforests of Central and South America. Indeed, the dendritic fields of its L4 lamina monopolar cells are anisotropic in shape, being strongly elongated in the vertical direction (Greiner et al., 2005). However, whether the L4 cells are involved in anisotropic spatial summation is yet to be determined.

*Megalopta*can find its nest entrance when as few as 4.7 photons per second (log

*I*= 0.67) are absorbed on average by a single green photoreceptor (Warrant et al., 2004). At this and somewhat higher light levels,

*Megalopta*also flies through the rainforest, navigating between the nest and foraging sites. The dark-adapted integration time of

*Megalopta*is 32 ms (see above). For the isotropic image (log

*I*= 0.5) our simulations predict this integration time to be optimal for an image velocity

*α*∼ 0.7 receptor widths per 10 ms (cf. Supplementary Figure 1B). In the anisotropic case (log

*I*= 0.8) we found this integration time to be optimal for

*α*∼ 0.2–0.3 receptor widths per 10 ms, that is, for much slower image velocities compared to the isotropic image (cf. Supplementary Figure 2B). In the latter case, an increase of

*α*above 0.3 receptor widths per 10 ms would result in additional motion blur for the given integration time of 32 ms. In other words, the dynamics of the temporal receptive field would be too slow for

*α*> 0.3, resulting in an impairment of spatial and temporal resolution (Batra & Barlow, 1990; Juusola & French, 1997; Srinivasan & Bernard, 1975).

*MSE*value has its minimum. Compared to a receptive field that does not sum in space and/or time, the optimum receptive field improves the signal-to-noise ratio of its output signals at low frequencies. This improvement, however, comes only by sacrificing the signal-to-noise ratio at higher frequencies (Figure 7B). This indicates a trade-off between visual reliability and spatiotemporal resolution (Batra & Barlow, 1990; Land, 1997). The trade-off can be influenced by shifting the cut-off frequency (see Results section) toward higher or lower frequencies, that is, by reducing or increasing the amount of spatiotemporal summation, respectively (Figure 8). At this point it is interesting to consider the influence of the anisotropic receptive field model on the vertical and the horizontal signal-to-noise ratio, respectively. For the optimum spatiotemporal receptive field, the cut-off frequency in the vertical direction is lower than in the horizontal direction. Inspection of the power spectral density of the anisotropic image (Figure 3B) shows that there is more horizontal than vertical power in the image. The optimum receptive field tries to preserve as much power as possible in both directions. Since image details are richer in the horizontal direction, a higher cut-off frequency results for the horizontal compared to the vertical direction. For example, at log

*I*= −0.2 and for

*α*= 1.5 channel widths per 10 ms the vertical cut-off is at 13 cycles/image in contrast to the horizontal cut-off at 34 cycles/image (Figure 8B). This clearly shows the advantage of an optimally adapted summation in the respective directions.

*I*and for different image velocities

*α*. Optimal parameters are obtained as described in

*Theory and methods,*that is, for each simulation

*α*was constant and

*I*was gradually decreased. A. Spatial and temporal parameters plotted as a function of (logarithmic) light intensity

*I*for image velocities

*α*= 0.5, 1.5, 2.5 and 3.5 visual channel widths per 10 ms. B. Spatial and temporal parameters plotted as a function of image velocity

*α*(log

*I*= 4.5, 3.5,…, −0.5).

*I*and for different image velocities

*α*. Optimal parameters are obtained as described in

*Theory and methods,*that is, for each simulation

*α*was constant and

*I*was gradually decreased. A. Spatial and temporal parameters plotted as a function of (logarithmic) light intensity

*I*for image velocities

*α*= 0.5, 1.5, 2.5 and 3.5 visual channel widths per 10 ms. B. Spatial and temporal parameters plotted as a function of image velocity

*α*(log

*I*= 3.8, 2.8,…, −1.2).

*α*is high. This strong increase at very low light intensities is mostly due to vertical summation and counteracts the effects of shortening the integration time Δ

*t*at high values of

*α*(see

*Results*).

*I*= −1.2 for the anisotropic image, in contrast to log

*I*= −0.5 for the isotropic image (as indicated by the intersections with the dashed line).

*g*

_{ k,l}, we first define a function

*u, v*) with real-valued variables

*u*and

*v*in the following way:

*a*=

*αt*defines the left “edge” of a visual channel at time

*t*= 0, …,

*T*− 1 and

*b*=

*αt*+

*α*+ 1 defines the right “edge” after moving (

*t*+ 1).

*w*=

*w*(

*u*) is defined as

*u*

_{1}= min{

*αt*+ 1,

*αt*+

*α*},

*u*

_{2}= max{

*αt*+ 1,

*αt*+

*α*}, and

*w*

_{max}= 1/max{1,

*α*}. It follows that

*k, l*= 0, …,

*M*− 1 and

*t*= 0, …,

*T*− 1.

*M*is the size of each image in the sequence

_{ k,l,t}. If not otherwise mentioned we used

*M*= 128 and

*T*= 32 +

*N*= 42 (cf. Equation 6) in our simulations.

*G*(

*u, v*) ( Equation 2) is given by

*k, l*= −

*M*/2 + 1, …,

*M*/2. The sum of all entries in

*G*

_{ k,l}is normalized to unity. For the temporal response function (i.e., Equation 3) we proceed in a similar way:

*t*= 0, …,

*T*− 1. In this case we shift the peak value of

*V*

_{ t}to the

*t*

_{0}th time step (

*t*

_{0}= ⌊

*T*/3⌋). In addition to Equation 3 we assume

*V*(

*t*) = 0 for

*t*≤ 0.

*g*=

*g*

_{ k,l,t}with a Gaussian kernel

*G*=

*G*

_{ k,l}in Equation 5 is calculated as follows:

*g*

_{ k,l,t},

*t*= 0, 1, 2, … (i.e.,

*g*

_{ k,l,t}:= 0 for

*k, l*< 0 and

*k, l*≥

*M*) and choose

*M*to be a power of 2. This allows a computationally efficient calculation of Equation B3 by means of the fast Fourier transform (see, e.g., Press et al., 2007).

*g*

_{ k,l},

*k, l*= 0, …,

*M*− 1, of size

*M,*we want to assume that

_{ m,n}is its Fourier transform. The Fourier transform

_{ m,n}is a periodic function of which we will consider the range

*M*/2 ≤

*m, n*<

*M*/2 (

*M*is a power of 2). This means that the zero-frequency component is at

*m*=

*n*= 0. The two-dimensional power spectrum of the image

*g*

_{ k,l}is then given by ∣

_{ m,n}∣

^{2}( Figure C1B). To avoid padding artifacts a Hann window of size

*M*×

*M*is applied before the calculation of the Fourier transform (Press et al., 2007).

*m*and

*n*to a single spatial frequency

*f*

_{ s}, values of the power spectrum

*f*

_{ s}) are averaged on circles of radius

*f*

_{ s}= 0, …,

*M*/2 − 1 around the zero frequency at

*m*=

*n*= 0 (cf. the light blue circle in Figure C1B; [·] returns the nearest integer):

*f*

_{ s}) as the power spectrum of the average of

*N*filtered images:

_{ m,n}is the Fourier transform of

*h*

_{ k,l,n}

^{( i)}denotes the

*i*th instance of a filtered image sequence (cf. Equation 5). In the same way we calculate the noise

*f*

_{ s}), that is, as the power spectrum of the standard deviation:

*N*= 10 samples for the calculation of the signal-to-noise ratio SNR(

*f*

_{ s}) =

*f*

_{ s})/

*f*

_{ s}).