*Adaptation* is a phenomenological umbrella term under which a variety of temporal contextual effects are grouped. Previous models have shown that some aspects of visual adaptation reflect optimal processing of dynamic visual inputs, suggesting that adaptation should be tuned to the properties of natural visual inputs. However, the link between natural dynamic inputs and adaptation is poorly understood. Here, we extend a previously developed Bayesian modeling framework for spatial contextual effects to the temporal domain. The model learns temporal statistical regularities of natural movies and links these statistics to adaptation in primary visual cortex via divisive normalization, a ubiquitous neural computation. In particular, the model divisively normalizes the present visual input by the past visual inputs only to the degree that these are inferred to be statistically dependent. We show that this flexible form of normalization reproduces classical findings on how brief adaptation affects neuronal selectivity. Furthermore, prior knowledge acquired by the Bayesian model from natural movies can be modified by prolonged exposure to novel visual stimuli. We show that this updating can explain classical results on contrast adaptation. We also simulate the recent finding that adaptation maintains population homeostasis, namely, a balanced level of activity across a population of neurons with different orientation preferences. Consistent with previous disparate observations, our work further clarifies the influence of stimulus-specific and neuronal-specific normalization signals in adaptation.

*adaptation*(Carandini et al., 2005; Clifford et al., 2007; Kohn, 2007; Solomon & Kohn, 2014; Webster, 2011), have been a topic of fascination at least since the time of Aristotle, as revealed by perceptual aftereffects (Clifford & Rhodes, 2005; Schwartz, Hsu, & Dayan, 2007). Adaptation has been observed in many areas of the brain, such as the visual (Kohn, 2007), auditory (Pérez-González & Malmierca, 2014), olfactory (Kurahashi & Menini, 1997; Wilson, 2009), and somatosensory (Maravall, Petersen, Fairhall, Arabzadeh, & Diamond, 2007) regions. In the natural environment, sensory signals are always embedded in a temporal context, and correct inferences about the perceptual identity and behavioral relevance of the signals depend heavily on such context. This context-dependent inference has led to the proposal that adaptation is a hallmark of systems optimized to the temporal structure of the natural environment (e.g., Barlow & Földiák, 1989; Dayan, Sahani, & Deback, 2002; Fairhall, Lewen, Bialek, & de Ruyter Van Steveninck, 2001; Lochmann, Ernst, & Deneve, 2012; Schwartz et al., 2007; Wainwright, Schwartz, & Simoncelli, 2002; see also Attneave, 1954; Barlow, 1961).

*stimulus specificity*; Solomon & Kohn, 2014), and its strength often depends on neuronal selectivity (Benucci et al., 2013). Moreover, adaptation operates at a range of timescales, from milliseconds to seconds and minutes, or even longer (Dragoi et al., 2000; Kohn, 2007; Patterson et al., 2013; Solomon & Kohn, 2014). A more comprehensive treatment of experimental adaptation effects in V1 is discussed in the Introduction to V1 Adaptation Experimental Literature section.

*θ*, along with two possible phases (0° and 90° to form a quadrature pair) and nine temporal positions (eight past and one present temporal position). Using movies with frame rates of 30 frames per second (fps), we chose an adapt period of eight frames, which corresponds to about 240 ms, because this is within the range of standard short-term adaptation experimental paradigms (Kohn, 2007). Below, we further discuss model extensions for adaptation over longer timescales.

*x*and the past RF outputs by

_{t}*x*. We considered different choices for

_{p}*x*and

_{t}*x*, as detailed below for the different versions of the model (see the sections titled “Binary MGSM and Flexible Divisive Normalization” and “Multiorientation MGSM Model”).

_{p}*x*and

_{t}*x*, respectively, such that they are statistically dependent, by the multiplication of two random variables: (a) a Gaussian variable that represents structure local to each RF, which we label

_{p}*g*and

_{t}*g*, for the present and past RFs, respectively, and (b) a positive scalar random variable,

_{p}*v*, which is shared between multiple RFs (and essentially captures the dependence among them). The nonlinear dependencies between RF outputs in the past and present are introduced via the multiplication of the local Gaussian variable,

*g*, with the shared mixer variable,

*v*. Therefore, we model the dependent RF outputs of the present time frame and the past time frames as

*g*and

_{t}*g*are vectors representing the local Gaussian variable in the present and past, respectively, and are the same size as the filter outputs to which they correspond in Equation 1, that is,

_{p}*x*and

_{t}*x*. To connect the GSM to temporal contextual effects, we assume that the Gaussian components associated with the present frame,

_{p}*g*, relate to the firing rates of V1 neurons. Estimating this Gaussian component amounts to a form of divisive normalization, by essentially inverting the multiplicative model above (thus amounting to division).

_{t}*θ*for the preferred orientation of the corresponding RF),

*g*

_{t}_{,}

*, with the firing of a neuron whose preferred orientation reflects the corresponding RF output,*

_{θ}*x*

_{t}_{,}

*, in V1. We focus on the Gaussian component because it represents structure local to an RF at a single point in time. In contrast, the mixer variable,*

_{θ}*v*, represents the link between RF responses across time. As the GSM is a generative model, by inverting the model and using Bayes's rule, we can estimate the value of the local Gaussian components,

*g*. Given a set of stimuli, we collect the set of RF outputs in the past and present times, denoted by the vector (

_{t}*x*,

_{t}*x*), and then calculate the expected value of the local Gaussian component for the present. Because the generative model is multiplicative (as in Equation 1), the inverse operation of computing the local Gaussian amounts to a form of divisive normalization. It has been shown (Schwartz & Simoncelli, 2001) that such an operation also relates to efficient coding as it reduces the statistical dependencies that, in the GSM, are due to this common mixer variable,

_{p}*v*. The resulting divisive normalization equation is given by (see Coen-Cagli et al., 2009; Schwartz et al., 2006, for derivations):

*is a measure of the linear dependencies between elements of the past and present Gaussians,*

_{tp}*g*and

_{p}*g*, and intuitively may be seen as representing how strongly the features (of orientation in the past and the present) are statistically dependent in natural scenes. Equation 3 thus amounts to a weighted sum of quadratic and bilinear combinations of the RF responses that contribute to the divisive normalization signal. If the covariance matrix is the identity matrix, then this reduces simply to a sum of squares, similar to the original formulation of divisive normalization (Heeger, 1992).

_{t}*binary mixture of Gaussian scale mixtures*(MGSM), which is an extension to the temporal domain of techniques previously applied to spatial context (Coen-Cagli et al., 2009). The binary model will capture neuron-specific adaptation, as described below.

*v*and

_{p}*v*), rather than a shared mixer:

_{t}*binary*in that for any given input, it is a linear mixture of two GSM models: one in which the past and present are deemed independent (as in Equation 4) and therefore the present is not divisively normalized by the past, and one in which the past and present are deemed dependent (as in Equation 1 and in the standard GSM model) and therefore the present is normalized by the past.

*θ*, amounts to a weighted sum of two conditions, one in which the past and present are independent (denoted

*ξ*

_{1}) and one in which they are dependent (denoted

*ξ*

_{2}). In the independent condition, the RF response in the present,

*x*

_{t}_{,}

*, is not normalized by the past. In the dependent condition, the present is normalized by a set of past responses with an RF orientation matching the preferred orientation of the neuron*

_{θ}*θ*,

*x*

_{p}_{,}

*. In addition, similar to Coen-Cagli et al. (2009), we include in the normalization pool of both the independent and dependent conditions the multiple orientations in the present. This is motivated by the strong dependence typically observed between overlapping RFs with different orientations (Schwartz & Simoncelli, 2001), and it also guarantees local contrast normalization of the model neural responses (Heeger, 1992).*

_{θ}*multiorientation MGSM*, analogous to previous work in the spatial domain (Coen-Cagli et al., 2012). As its name implies, there is no longer just a binary choice of normalization by the past, on or off, but rather there are multiple past orientation pool conditions.

*ϕ*= 0°, 45°, 90°, and 135°. In the first condition, denoted

*ξ*

_{1}, the present and past are independent similar to the binary model, and the present,

*x*

_{t}_{,}

*, is not normalized by the past. However, unlike the binary model, which has only one dependent condition, there are now multiple conditions in which past and present may be deemed dependent, each pertaining to a different past RF orientation. These are denoted*

_{θ}*ξ*

_{2,}

*, with*

_{ϕ}*ϕ*again corresponding to each of the possible past orientations. Thus, there are now four dependent pools. In each of these conditions, the RF response,

*x*

_{t}_{,}

*, is normalized by the corresponding orientation in the past,*

_{θ}*x*

_{p}_{,}

*. In addition, in all conditions, the present is normalized by the other orientations in the present, as in the binary model.*

_{θ}*ϕ*. Similarly, the expected value for a model neuron with a horizontal RF is a proportional mix of the horizontal RF response normalized only by the present and not the past, normalized both by the present and the vertical past, normalized both by the present and the horizontal past, and so on. Therefore, the normalization pool for a vertical RF and a horizontal RF is shared. By sharing normalization pools, the effective tuning of the normalization signal is determined exclusively by the orientation of the stimulus shown in the present and its match to the past stimuli and not by the tuning of the neuron's RF. Hence, we term this a

*stimulus-specific normalization model*(Benucci et al., 2013; Solomon & Kohn, 2014) and apply it to replicate tuning curve suppression and repulsion (see the section titled “Tuning Curve Adaptation Reflects Sensitivity to Inputs' Statistical Similarity”). Further mathematical details for the multiorientation MGSM are provided in Appendix B.

*ξ*

_{prior}; see Appendices A and B) and the covariances Σ (see Equation 3) that parameterize the likelihood function. Parameters were optimized on an ensemble of natural movies culled from YouTube, because of the lack of a database of standardized natural movies. The stimulus ensemble consisted of 20,000 temporal sequences, each nine frames long, extracted from 100 frames of 14 natural movies with varying temporal and spatial properties, each normalized to the same range of luminance values (copies of the clips are available from the corresponding author). To find the optimal parameters, we maximized the likelihood through a generalized expectation-maximization algorithm (the approach and equations are described in Coen-Cagli et al., 2012). Briefly, in the expectation step (E-step), we estimate the posterior probabilities for

*ξ*given the current parameter values. In the maximization step (M-step), we search for the parameter values that maximize the so-called complete data log-likelihood, namely, the expectation of log(

*p*(

*x*,

_{t}*x*,

_{p}*ξ*)) under the estimated posterior over

*ξ*. We divided the M-step into multiple steps, one for the dependent and one for each of the independent covariance matrices, and iterated repeatedly between the complete E-step and each partial M-step.

*ω*, to quantitatively match the overall suppressive strength of recorded neuronal responses, which varies widely across experiments. The free parameter,

*ω*, scales the normalized response in all the dependent conditions relative to the independent condition.

*ω*is the ratio of the learned model's response normalized by the relative suppression in the data set being replicated. Note that this additional parameter does not affect in any way the qualitative behavior of the models; it only sets the overall suppression strength.

*n*th movie sequence, which contains movie frames [

*n*−8,

*n*−7, . . . ,

*n*] and RF outputs

*n*− 1)th movie sequence. The procedure for updating was the same for each of the MGSM models (binary and multiorientation), so we describe this generically for one model. Practically, for a given movie consisting of many frames, the model was initially presented with the first set of nine frames, [

*S*

_{1},

*S*

_{2}, . . . ,

*S*

_{9}], where

*S*

_{1}through

*S*

_{8}represent the past and

*S*

_{9}represents the present. Using the learned prior, the model determined the posterior probability for that set of nine stimuli. The model was then presented with Frames 2 through 10 as a new set of stimuli, [

*S*

_{2},

*S*

_{3}, . . . ,

*S*

_{10}], where

*S*

_{2}through

*S*

_{9}represent the past and

*S*

_{10}represents the present. The calculated posterior from the previous step for Frames 1 to 9 now became the new prior, and the model once again calculated the posterior probability for Frames 2 through 10. This process was repeated as each new set of stimuli were presented, up to the last nine frames. Thus, the prior was updated recursively as each new frame of the movie was presented. A recursive Bayesian estimator has similarly been used to model multiple timescales in the retina (Wark, Fairhall, & Rieke, 2009). This long-term model is now able to track the changing visual environment on the order of seconds (Figure 2).

*past vertical*filters are dependent. The learned variance of the present vertical filter is similar to the past vertical filters, whereas the other present orientations have lower variance. This learning reflects a form of similarity metric between past and present orientations, whereby the variance is high when the past vertical orientation matched the present vertical orientation. Similarly, a complementary pattern was learned for the model component in which present filters and

*past horizontal*filters are dependent (Figure 3B). This time, the variance is high when the past horizontal orientation matched the present horizontal orientation. A similar trend occurs with respect to the covariance (off diagonal) elements of the matrix. Overall, for given experimental adaptation and test stimuli, this results in the highest probability of dependency when the past and present stimulus orientations are matched. This in turn enables the stimulus-specific orientation selectivity in the divisive normalization.

*repulsion*(Figure 4D; experimental data plotted from Müller et al., 1999; see also repulsion in other data across a range of timescales and adapting orientations on the flank in Dragoi et al., 2000, 2002; Felsen et al., 2002; Patterson et al., 2013; Wissig & Kohn, 2012). Note that the model is limited in predicting tuning curves to adapter orientations matched to one of four filter orientations detailed in the Methods, thus, the difference in adapter orientation between the plotted experimental data of Müller et al. (1999) and modeling predictions (14° in Figure 4D and 45° in Figure 4E). The repulsion could be explained qualitatively by our model (Figure 4E), because the dependence probability (and therefore the degree to which normalization was engaged) was determined by the match between test and adapter. To understand the model behavior, first recall that the model uses four normalization pools each with a different orientation tuning. The adapter drives most strongly the normalization pool with matching preferred orientation (0° in Figure 4B; 45° in Figure 4E), regardless of the test stimulus. However, the normalization signal computed by such pools is used to normalize the neural response only to the degree that test and adapter are inferred dependent. Because the probability of dependence is highest when the adapting and test stimuli are matched in orientation (Figure 4F), adaptation is stronger when test stimuli are closer to the adapter. Our modeling framework thus provides a normative explanation from movie statistics for stimulus-specific adaptation (Solomon & Kohn, 2014).

*equalization*(Benucci et al., 2013; Figure 7G). Stated differently, on one hand, the biased orientation was presented more frequently than the other orientations, but on the other hand, the response of neurons preferring the biased orientation was suppressed more strongly than neurons preferring other orientations. These two effects counterbalanced each other, such that the average response over the entire stimulus ensemble did not differ between neurons with different orientation preferences.

*multiorientation model*, amounted to stimulus-specific adaptation. The model learned a common normalization signal for all oriented RFs, regardless of their orientation preference, which effectively depended only on the similarity between the adapting and test stimuli. This model mimicked the previous approach we used to capture spatial context neurophysiology data (Coen-Cagli et al., 2012) and reproduced classical short-term effects such as response suppression and tuning curve repulsion (Figure 4), as well as contrast adaptation (Figure 5). However, the stimulus-specific model could not capture response equalization (Benucci et al., 2013). We therefore implemented a binary model with independent normalization pools, which amounted to neuronal-specific normalization, and showed that this could capture equalization (Figure 7). However, response suppression was purely a measure of how well the adapter stimulus drove the RF of the neuron, and as a result, this model was unable to replicate tuning curve repulsion (Supplementary Figure S2).

*, 275, 220–224 .*

*Science**, 10, 181–200.*

*Reviews in the Neurosciences**, 347, 713–739 .*

*Journal of Physiology**, 7, 531–546 .*

*Visual Neuroscience**Journal of the Royal Statistical Society*.

*, 36, 99–102.*

*Series B**, 61, 183–193.*

*Psychological Review**, 109, 5898–5903 .*

*Proceedings of the National Academy of Sciences, USA**. pp. 217–234. Cambridge, MA: MIT Press.*

*Sensory communication*(Vol. 1*, 24, 602–607.*

*Behavioral and Brain Sciences**(pp. 54–72. Reading, MA: Addison-Wesley.*

*The computing neuron**, 16, 724–729 .*

*Nature Neuroscience**, 331, 83–87.*

*Science**, 6, 239–255 .*

*Visual Neuroscience**, 43, 1895–1906.*

*Vision Research**, 25, 10577–10597 .*

*Journal of Neuroscience**, 276, 949–952.*

*Science**Nature Reviews*

*, 13, 51–62.*

*Neuroscience**, 22, 10053–10065 .*

*Journal of Neuroscience**, 37, 501–511 .*

*Neuropharmacology**, 18, 4785–4799 .*

*Journal of Neuroscience**, 22, 622–626.*

*Current Biology**, 34, 437–446 .*

*Neuron**. Oxford, UK: Oxford University Press.*

*Fitting the mind to the world: Adaptation and after-effects in high-level vision. Advances in visual cognition**, 47, 3125–3131.*

*Vision Research**Proceedings*.

*, 267, 1705–1710.*

*Biological Sciences/The Royal Society**Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, & A. Culotta (Eds.),*(pp. 369–377. Cambridge, MA: MIT Press.

*Nips**, 8, e1002405.*

*PLoS Computational Biology**, 32, 387–402.*

*Journal of Computational Neuroscience**, 95, 271–283 .*

*Journal of Neurophysiology**. Cambridge, MA: MIT Press.*

*Theoretical neuroscience*(Vol. 806)*(pp. 237–244. Cambridge, MA: MIT Press.*

*Advances in neural information processing systems 15**, 31, 15016–15025.*

*Journal of Neuroscience**, 6, 345–358.*

*Network: Computation in Neural Systems**, 5, 883–891 .*

*Nature Neuroscience**, 28, 287–298 .*

*Neuron**, 96, 826–833 .*

*Journal of Neurophysiology**, 412, 787–792.*

*Nature**, 36, 945–954 .*

*Neuron**, 106, 145–155 .*

*Experimental Brain Research**, 70, 2024–2034 .*

*Journal of Neurophysiology**, 31, 223–236 .*

*Vision Research**, 27, 1041–1043 .*

*Vision Research**, 10, 14–23.*

*Trends in Cognitive Sciences**, 9, 181–197.*

*Visual Neuroscience**, 93, 623–627 .*

*Proceedings of the National Academy of Sciences, USA**. London: Springer.*

*Natural image statistics*(Vol. 39)*. In C. W. G. Clifford & G. Rhodes (Eds.), Fitting the mind to the world : Adaptation and after-effects in high-level vision (pp. 17–46). Oxford, UK: Oxford University Press.*

*Physiological mechanisms of adaptation in the visual system**, 8, e64294.*

*PloS One**, 52–54, 117–123.*

*Neurocomputing**, 310, 198–205.*

*Neuroscience**. New York: Cambridge University Press.*

*Perception as Bayesian inference**, 97, 3155–3164 .*

*Journal of Neurophysiology**, 29, 250–256.*

*Trends in Neuroscience**, 385, 725–729.*

*Nature**, 7, 154.*

*Frontiers in Neural Circuits**, 96, 10530–10535 .*

*Proceedings of the National Academy of Sciences, USA**, 32, 4179–4195 .*

*Journal of Neuroscience**, 182, 1036–1038 .*

*Science**, 5, e19.*

*PLoS Biology**, 278, 850–852.*

*Nature**, 285, 1405–1408.*

*Science**, 21, 6978–6990 .*

*Journal of Neuroscience**, 11, 369–380 .*

*Journal of Neuroscience**, 298, 266–268.*

*Nature**, 54, 651–667 .*

*Journal of Neurophysiology**, 88, 238–245.*

*American Scientist**T. K. Leen, T. G. Dietterich, & V. Tresp (Eds.),*(pp. 786–792. Cambridge, MA: MIT Press.

*Nips**, 111, 1203–1213.*

*Journal of Neurophysiology**, 33, 532–543 .*

*Journal of Neuroscience**, 8, 19.*

*Frontiers in Integrative Neuroscience**, 40, 49–70.*

*International Journal of Computer Vision**, 2, 79–87.*

*Nature Neuroscience**, 20, 4286–4299 .*

*Journal of Neuroscience**, 2, 609–620.*

*Visual Neuroscience**Nature Reviews*.

*, 8, 522–535 .*

*Neuroscience**, 18, 2680–2718 .*

*Neural Computation**, 4, 819–825 .*

*Nature Neuroscience**, 29, 747–755 .*

*Vision Research**, 21, 3271–3304.*

*Neural Computation**, 439, 936–942 .*

*Nature**, 378, 492–496.*

*Nature**, 38, 587–607.*

*IEEE Transactions on Information Theory**, 24, 1193–1216.*

*Annual Review of Neuroscience**, 9, e1002889.*

*PLoS Computational Biology**, 24, R1012–R1022.*

*Current Biology**, 24, 60–103.*

*Neural Computation**, 5, e12436.*

*PloS One**Y. Weiss, B. Schölkopf, & J. C. Platt (Eds.),*, (pp. 1289–1296). Cambridge, MA: MIT Press.

*Nips**, 89, 2086–2100 .*

*Journal of Neurophysiology**, 35, 4973–4982 .*

*Journal of Neuroscience**J. C. Platt, D. Koller, Y. Singer, & S. T. Roweis (Eds.),*(pp. 1545–1552. Cambridge, MA: MIT Press.

*Nips**, 48, 1456–1470 .*

*Vision Research**, 40, 1051–1065 .*

*Journal of Neurophysiology**(pp. 203–222. Cambridge, MA: MIT Press.*

*Probabilistic models of the brain: Perception and neural function**S. A. Solla, T. K. Leen, and K. R. Müller (Eds.),*(pp. 855–861). Cambridge, MA: MIT Press.

*Nips**, 61, 750–761 .*

*Neuron**, 25, 11666–11675.*

*Journal of Neuroscience**, 158, 1–42.*

*Center for Biological and Computational Learning Paper**, 92, 199–205.*

*Neurobiology of Learning and Memory**, 107, 3370–3384 .*

*Journal of Neurophysiology**θ*is given by

*x*

_{t}_{,}

*. The normalization pool includes the four possible RF orientations in the present, namely,*

_{θ}*x*= (

_{t}*x*

_{t}_{,0},

*x*

_{t}_{,45},

*x*

_{t}_{,90},

*x*

_{t}_{,135}), and only the preferred orientation

*θ*in the past, namely,

*p*

_{1},

*p*

_{2}, . . . ,

*p*

_{8}denote the eight past frames. This binary model implements neuron-specific normalization, as the tuning of the normalization signal is determined exclusively by the tuning of the neuron's RF.

*ξ*

_{1}and

*ξ*

_{2}, for the independent and dependent cases, respectively. The first factor of each term on the right-hand side of Equation 5 is the (posterior) probability of the input movie being independent or dependent. These factors weight the Gaussian estimate for each of the cases and are obtained by applying Bayes's rule, that is, combining the prior probability that any given movie sequence is dependent or independent (denoted

*ξ*

_{2}; because the present and past are connected by the shared mixer,

*v*, the Gaussian estimate is dependent on the present and the past RFs outputs,

*x*,

_{t}*x*, as in Equation 2. Conversely, in the first term on the right-hand side of Equation 5, the estimate corresponds to the case in which the movie frames are independent,

_{p}*ξ*

_{1}; because there is no connection between the present and the past, the Gaussian estimate of the present is dependent only on the RF outputs in the present,

*x*(see Coen-Cagli et al., 2009; Schwartz et al., 2006, for derivations):

_{t}*λ*terms in Equation 2 and Equation 7 normalize the RF output in the present,

*x*

_{t}_{,}

*, for the dependent and independent cases, respectively.*

_{θ}*ξ*

_{1}for the independent GSM and value

*ξ*

_{2,}

*for the GSM, where the dependent past group has orientation*

_{ϕ}*ϕ*. Therefore, in the multiorientation MGSM, the estimate of the Gaussian component in the present time for the neuron with preferred orientation

*θ*is where

*p*(

*ξ*

_{1}|

*x*

_{t}_{,}

*x*

_{p}_{,0}, . . . ,

*x*

_{p}_{,135}) +

*p*(

*ξ*

_{2,}

*|*

_{ϕ}*x*

_{t},

*x*

_{p,0}, . . . ,

*x*

_{p,135}) = 1. On the right-hand side of Equation 10, the Gaussian estimate in the first term is identical to Equation 7; the remaining terms have the same form as Equation 2, except each involves a different group of past RFs and their covariance matrix: where