July 2015
Volume 15, Issue 9
Free
Methods  |   July 2015
Understanding spike-triggered covariance using Wiener theory for receptive field identification
Author Affiliations
  • Roman A. Sandler
    Department of Biomedical Engineering, University of Southern California, Los Angeles, CA, USA
    [email protected]
  • Vasilis Z. Marmarelis
    Department of Biomedical Engineering, University of Southern California, Los Angeles, CA, USA
    [email protected]
Journal of Vision July 2015, Vol.15, 16. doi:https://doi.org/10.1167/15.9.16
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Roman A. Sandler, Vasilis Z. Marmarelis; Understanding spike-triggered covariance using Wiener theory for receptive field identification. Journal of Vision 2015;15(9):16. https://doi.org/10.1167/15.9.16.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Receptive field identification is a vital problem in sensory neurophysiology and vision. Much research has been done in identifying the receptive fields of nonlinear neurons whose firing rate is determined by the nonlinear interactions of a small number of linear filters. Despite more advanced methods that have been proposed, spike-triggered covariance (STC) continues to be the most widely used method in such situations due to its simplicity and intuitiveness. Although the connection between STC and Wiener/Volterra kernels has often been mentioned in the literature, this relationship has never been explicitly derived. Here we derive this relationship and show that the STC matrix is actually a modified version of the second-order Wiener kernel, which incorporates the input autocorrelation and mixes first- and second-order dynamics. It is then shown how, with little modification of the STC method, the Wiener kernels may be obtained and, from them, the principal dynamic modes, a set of compact and efficient linear filters that essentially combine the spike-triggered average and STC matrix and generalize to systems with both continuous and point-process outputs. Finally, using Wiener theory, we show how these obtained filters may be corrected when they were estimated using correlated inputs. Our correction technique is shown to be superior to those commonly used in the literature for both correlated Gaussian images and natural images.

Introduction
A central goal of neuroscience is to answer the question, “what does the underlying neuron or neural population compute?” Because, in most neural systems, the output of the system is exclusively considered to be its spike rate and timing, the above question can be rephrased as “What makes this neuron or neural population fire?” This question, oftentimes referred to as system identification or receptive field identification, has been a major area of research, particularly in sensory neuroscience in which one begins with a strong a priori notion of to what stimuli or stimulus modality the cell is sensitive. Early attempts to characterize neural systems attempted to parameterize their responses to one or two stimuli parameters and then describe the cell response through various forms of tuning curves. Although such tuning curve approaches are useful to gain a first glance approximation of the system, they fail to take into account the full complexity of neural responses and prove woefully inadequate when the input stimuli occupy a high-dimensional space, such as the spatiotemporal receptive fields of complex cells in the visual cortex. 
The advancement of random inputs, such as Gaussian white noise (GWN), to characterize systems in the '60s and '70s represented a major advance in the system identification problem (P. Z. Marmarelis & Marmarelis, 1978). Such inputs facilitated the development of nonparametric models whereby the system response to arbitrary stimuli could be predicted with only very minor prior assumptions about the nature of the underlying system. In particular, the Volterra/Wiener framework achieved success in modeling many nonlinear physiological systems, including mapping the nonlinear receptive fields of auditory (Aertsen & Johannesma, 1981; Eggermont, 1993) and vision (Citron & Marmarelis, 1987; Emerson, Citron, Vaughn, & Klein, 1987) neurons. However, perhaps due to the difficulty of interpreting the higher-order Volterra kernels and the need to use many kernels to characterize certain nonlinear systems, the Volterra/Wiener approach never gained the popularity of linear approaches that approximate the underlying nonlinear system with a single linear filter. In sensory neuroscience, this usually took the form of the spike-triggered average (STA; De Boer & Kuyper, 1968). Later the STA approach was expanded by adding a static nonlinearity to map the STA output to the neural response in the so-called linear-nonlinear-Poisson (LNP) model, which was used successfully to characterize the dynamics of retinal ganglion cells (Chichilnisky, 2001). 
However, the limitations of STA-based approaches soon became apparent when it was realized that the response of certain neurons were a nonlinear function of multiple linear filters. For example, in the popular Adelson-Bergen energy model, the output is determined by the sum of the square of the outputs of two Gabor filters (Adelson & Bergen, 1985; see Figure 3B); thus, the STA of such a system is zero because it responds equally to positive and negative contrasts. In other less severe cases, the system may be approximated by the STA, but multiple filters would still be required to completely characterize it. The most popular method to obtain these multiple filters is the spike-triggered covariance (STC) method, which builds off the STA framework by taking the covariance rather than the mean of all spike-triggering stimuli. Although other more complex methods have been used to obtain these filters (Paninski, 2003; Pillow & Park, 2011; Rapela, Mendel, & Grzywacz, 2006; Saleem, Krapp, & Schultz, 2008; Sharpee, Rust, & Bialek, 2004), none have been as popular as the relatively straightforward and simple STC method, which has been used in diverse areas such as modeling Hodgkin-Huxley dynamics (Agüera y Arcas, Fairhall, & Bialek, 2001), vision (Fairhall et al., 2006; Rust, Schwartz, Movshon, & Simoncelli, 2005; Touryan, Felsen, & Dan, 2005), audition (Slee, Higgs, Fairhall, & Spain, 2005), insect movement (Fox, Fairhall, & Daniel, 2010; de Ruyter van Steveninck & Bialek, 1988), and olfaction (Kim, Lazar, & Slutskiy, 2011). An excellent review of both STA and STC for characterizing receptive fields appears in Schwartz, Pillow, Rust, and Simoncelli (2006). 
Parallel to the development and popularization of STC in sensory neuroscience, the concept of principal dynamic modes (PDMs) was developed in the area of dynamic systems identification as a system-specific linear filter set to efficiently describe the linear dynamics of nonlinear systems (V. Z. Marmarelis, 1997, 2004; V. Z. Marmarelis & Orme, 1993). The PDMs are estimated from a second-order Volterra model and provide a compact representation of the model, which is much more amenable to physiological interpretation. The PDMs have been used to successfully characterize renal autoregulation (V. Z. Marmarelis, Chon, Holstein-Rathlou, & Marsh, 1999), cerebral hemodynamics (V. Z. Marmarelis, Shin, & Zhang, 2012), heart-rate variability (Zhong, Wang, Ju, Jan, & Chon, 2004), spider mechanoreceptor dynamics (V. Z. Marmarelis, Juusola, & French, 1999), the Schaffer collateral pathway in the hippocampus (V. Z. Marmarelis et al., 2013; R. Sandler et al., 2013; R. A. Sandler et al., 2014), and most recently, V1 receptive fields (Fournier et al., 2014). As recently pointed out (Fournier et al., 2014), the PDMs and STC are functionally equivalent as they both aim to identify an efficient linear filter bank for the system under study. The main functional difference is that PDMs, derived from the Volterra framework, can be used for systems with both continuous and point-process outputs, and STC filters, being an extension of the STA approach, are restricted to systems with point-process outputs. 
Although the connection between STC and the second-order Wiener kernel is often mentioned in the STC literature (Samengo & Gollisch, 2013; Schwartz et al., 2006), this connection has never been explicitly derived. In this paper, we derive this connection and show that the STC matrix is actually a modified version of the second-order Wiener kernel, which incorporates the input autocorrelation and mixes first- and second-order dynamics. Then, we will show how the STC method can be corrected to obtain the second-order Wiener kernel and PDMs while still retaining its elegance and simplicity. The obtained Wiener kernel, unlike the STC matrix, which is simply an intermediate step, is highly interpretable and provides important information about the system nonlinearity, which may not be apparent from the STC filter bank and associated nonlinearity. To provide guidance for experimentalists, several properties of the PDMs are examined, including their predictive power, robustness to noise, overfitting, and “data-hungriness,” or how much data they need in order to converge. Finally, we show how the obtained PDMs may be corrected when they are estimated using correlated inputs. This is a pertinent issue in sensory neuroscience in which complex correlated inputs, such as natural images, are commonly used to characterize systems. It is shown that the correction procedure advocated here is superior to many of those commonly used in the literature for both correlated Gaussian inputs (such as blurred images) and truly natural images obtained from an existing database. 
Methods
Wiener-Volterra kernels
The Volterra functional expansion is a method of identifying arbitrary nonlinear finite-memory systems akin to the way a Taylor series can be used to approximate arbitrary functions. Thus, in any input–output system, the output y[t] can be expressed in terms of the input, x[t], and a series of increasing-order nonlinear filters, known as the Volterra kernels, kq[τ1τq], as  Although, in this form, the Volterra series describes a dynamic system in which the input is temporal in nature (i.e., x[t]), the Volterra framework can just as easily be applied to systems in which the input is spatial or spatiotemporal. Thus, the spatial version of Equation 1 for the first two terms is  where in the context of receptive field identification, x[n, t] is the nth pixel of the input image at time t. For the remainder of this section, we will assume a spatial system even though any intuitions may just as easily be applied to temporal systems.  
The first-order Volterra kernel, k1[n], can be seen as weighing the contribution of a single pixel of the input to the output. Thus, the first-order Volterra function describes a linear system, and the first-order Volterra kernel is equivalent to the well-known impulse response function in engineering and the STA in neuroscience (see Equation 6). The second-order Volterra kernel, and the first nonlinear kernel, describes second-order nonlinear interactions between two input pixels. Each value of the kernel (depicted as a surface), for instance, k2[n1, n2] at lags n1 and n2, represents the weight (i.e., relative importance) of the product of pixel values x[n1, t]x[n2, t] for spatial systems or time lags x[tn1]x[tn2] for temporal systems in reconstructing the system output, y[t]. Thus, a large positive value of k2[n1, n2] implies strong mutual facilitation of the input-lagged values x[tn1] and x[tn2] in the way they affect y[t] (V. Z. Marmarelis, 2004). In vision research, second-order dynamics are commonly used to describe center-surround inhibition and phase insensitivity, and in temporal neural systems, they are commonly used to describe paired-pulse facilitation and depression (Krausz, 1975; Song, Marmarelis, & Berger, 2009). Likewise, the qth-order kernel describes the nonlinear interactions between q input pixels. 
Norbert Wiener was the first to propose a method of estimating the kernels by orthogonalizing the Volterra functions with respect to GWN inputs, giving rise to the well-known Wiener series (Wiener, 1966). However, the Wiener series was not popularized until the computationally feasible cross-correlation technique was introduced as a method of estimating the Wiener kernels (Lee & Schetzen, 1965; P. Z. Marmarelis & Marmarelis, 1978). Thus, for GWN inputs, the qth-order Wiener kernel was shown to be simply a scaled version of the qth-order cross-correlation, Display FormulaImage not available where y0[t] is the demeaned output, hq is the qth-order estimated Wiener kernels, and σ2 is the power (variance) of the GWN input. In particular, the first two Wiener kernels are     
Although these equations apply only when using GWN inputs, with slight modifications, they can be used to obtain Wiener kernels estimated from a wide class of inputs, such as correlated Gaussian inputs (see section on correcting for correlated inputs), Poisson processes, and uniform inputs (see Discussion) (P. Z. Marmarelis & Marmarelis, 1978; V. Z. Marmarelis & Berger, 2005). Asymptotically, or in the limit of infinite data, the estimated Wiener kernels are the optimal kernels to minimize mean squared error. It should be noted that outside of certain specific cases, such as when the underlying system only has second-order dynamics, the Wiener kernels are not generally equivalent to the Volterra kernels, and equations exist to interchange between the two (V. Z. Marmarelis, 2004). Although many physiological systems do have higher-order dynamics and, thus, theoretically, the Volterra and Wiener kernels are distinct, in practice, due to computational constraints, usually only up to the second-order Wiener model is estimated. In this situation, the obtained Wiener kernels are the best approximation to the Volterra kernels, and thus, this distinction will be ignored for the remainder of this study. Volterra modeling has a long history in neuroscience research in diverse areas, such as the retina (Citron & Marmarelis, 1987; P. Z. Marmarelis & Naka, 1973), the auditory cortex (Eggermont, 1993), the hippocampus (Song et al., 2007), and motor control (Jing, Simpson, Allen, & Newland, 2012). 
STC and Wiener kernels
STC has been a popular method in sensory neuroscience to characterize nonlinear systems. It aims to identify an optimal subspace of linear filters, which are then nonlinearly combined to determine the output firing rate. The fundamental idea behind STC and its precursor, STA, is that spike-triggering stimuli, Display FormulaImage not available have different statistical moments than mean stimuli and that these moments can be used to characterize the system. Thus, STA is defined as the mean (first-order moment) of the spike-triggering stimuli:  where N is the total amount of spikes. Equivalently, the STA can also be shown to be a scaled version of the first-order cross-correlation, Φxy[n], and, under a wide class of inputs, the first-order Wiener kernel, h1 (Equation 4; Rieke, Warland, de Ruyter van Steveninck, & Bialek, 1999). The critical step is to note that the sum of the spike-triggered stimuli Display FormulaImage not available is equivalent to the sum of all stimuli times the output Display FormulaImage not available . This is true because when an input stimuli does not elicit a spike, y[t] will be zero, and thus, that stimulus will not influence the total sum. The derivation is as follows:  where Display FormulaImage not available is the mean of y[t].  
Likewise, the STC matrix, Display FormulaImage not available is defined as the covariance (second-order moment) of the spike-triggering stimuli (Schwartz et al., 2006), i.e.,  It should be noted that although this is the most commonly used variant of the STC matrix, other variants exist, such as removing the projection of STA from Display FormulaImage not available (Petersen et al., 2008; Rust et al., 2005). The STC linear filters are then defined as the significant eigenvectors of Display FormulaImage not available  
Although it is commonly asserted in the literature that the STC matrix, Display FormulaImage not available is related to the second-order Wiener kernel, h2, the relationship, to our knowledge, has never been explicitly defined (although see Bialek & de Ruyter van Steveninck, 2005; Park, Archer, Priebe, & Pillow, 2013). Here, we derive this relationship and show that for GWN inputs, Display FormulaImage not available is actually a modified version of h2 that intertwines first- and second-order dynamics and incorporates the input autocorrelation. We begin the derivation by noting that the covariance of vector Display FormulaImage not available with mean μ can be written as Display FormulaImage not available Thus,  As in the STA derivation, the sum over spike-triggered stimuli is interchanged with that over all stimuli:  Now, y[t] is separated into y0[t] + my, giving  Finally, by noting Display FormulaImage not available and using Equations 6 and 4b to substitute in the Wiener kernels, we get  where Φx is the input autocorrelation matrix. Here, it can be seen that the STC matrix Display FormulaImage not available is composed of a scaled version of the second-order Wiener kernel and two modifying terms. The first modification, MΦ, incorporates the input autocorrelation, Φx. In the case of uncorrelated inputs, such as GWN, this modification adds a delta function down the diagonal of Display FormulaImage not available and, thus, has little effect on the acquired STC filters (eigenvectors). However, when using correlated stimuli, such as natural images, this modification can significantly alter the results as will be shown in the section on correlated inputs (although here the cross-correlation estimate for h2 in Equation 4 will also be biased; see the section on correcting for correlated inputs). This modification as a source of bias has been acknowledged previously in the context of information theory and addressed by removing the “prior” covariance matrix or the input autocorrelation matrix, Φx (Agüera y Arcas, Fairhall, & Bialek, 2003). The second modification, M1, incorporates first-order dynamics through the term Display FormulaImage not available Thus, the STC matrix mixes first- and second-order dynamics into a single kernel. Despite these modifications, the linear filters for GWN obtained from both the Wiener kernel and the STC matrix, when the STA is included, span the same subspace and, thus, are consistent and should have roughly equal predictive power. However, these modifications remove the interpretability of the second-order Wiener kernel from the STC matrix because each element of the STC matrix can no longer be viewed as weighing pairwise input correlations. Also, it can easily be shown that if x[t] is not demeaned, then there will be an additional modification to the Wiener kernels, which causes a significant bias. It is defined as  Note that this bias actually changes the shape of the STC matrix and the obtained linear filters, highlighting the absolute necessity of demeaning all stimuli when using STC or Wiener approaches. This is distinct from STA, with which not demeaning the input will only change the baseline of the obtained STA filter but not its actual shape.  
The good news is that the Wiener kernels may be obtained by using the second moment of Display FormulaImage not available rather than the covariance of Display FormulaImage not available rather than Display FormulaImage not available This will eliminate the M1 modification, leaving only the MΦ modification, which can easily be removed by subtracting the input autocorrelation. Thus, the unmodified Wiener kernel, h2, can be obtained by  where Display FormulaImage not available refers to the second moment expectation of Display FormulaImage not available The question of whether to use the covariance or second moment of Display FormulaImage not available is already present in the literature (Samengo & Gollisch, 2013; Schwartz et al., 2006). In fact, another group (Cantrell, Cang, Troy, & Liu, 2010) previously showed that using the second moment of Display FormulaImage not available (which they referred to as “noncentered STC”) could provide good results when used in the context of high-throughput cell classification but is inferior to standard STC for spike prediction. We hope that this work conclusively shows that although, when used in conjunction with the STA, both the STC and Wiener kernel will yield the same linear subspace, the Wiener kernel adds much interpretability for no theoretical or computational cost.  
PDMs
In the above section, it was shown that the STC matrix is a biased version of the second-order Wiener kernel, h2. This finding, however, does not mean that the linear filters obtained from the STC matrix are biased. To begin the search for an efficient linear filter bank, we note that any arbitrary set of Wiener/Volterra functions can be represented as a Wiener-Bose cascade model composed of a set of linear filter banks followed by a multivariate static nonlinearity (see Figure 1; V.Z. Marmarelis, 2004; Westwick & Kearney, 2003). However, unlike the true Wiener kernels, which have a canonical form from which any deviation will lead to suboptimal predictive power, any Wiener system can be expressed by an infinite number of Wiener-Bose cascades so long as the linear filter banks span the same subspace. Thus, a desirable linear filter bank will be (a) highly predictive or essentially span the same subspace as the equivalent Wiener-Bose cascade, (b) compact or contain as few filters as necessary, and (c) interpretable or facilitate scientific discovery. Our group has attempted to find such a desirable filter bank over the last two decades in the context of dynamic system identification through the use of PDMs (V. Z. Marmarelis, 2004; V. Z. Marmarelis & Orme, 1993; V. Z. Marmarelis et al., 2014). The PDMs are a system-specific and input-independent set of linear filters that aim to compactly describe the dynamics of nonlinear systems and, thus, are functionally equivalent to the STC filters. 
Figure 1
 
(A) Input–output data records were generated by feeding GWN (x[t]) through a second-order Volterra system defined by a first- and second-order kernel (K1 and K2) and thresholding the continuous output to generate the binary output, y[t]. (B) Equivalent Wiener-Bose cascade model of the system shown in (A) composed of three linearly independent filters followed by static quadratic nonlinearities.
Figure 1
 
(A) Input–output data records were generated by feeding GWN (x[t]) through a second-order Volterra system defined by a first- and second-order kernel (K1 and K2) and thresholding the continuous output to generate the binary output, y[t]. (B) Equivalent Wiener-Bose cascade model of the system shown in (A) composed of three linearly independent filters followed by static quadratic nonlinearities.
To understand the theoretical underpinnings of the PDM approach, one must first realize that Equation 1 can be expressed in the following quadratic matrix equation:  where Display FormulaImage not available is the augmented vector of the stimuli at time t:  and Display FormulaImage not available is the quadratic matrix defined by  where h0 = my is the 0th-order Wiener kernel. Now, as in standard STC analysis, the Display FormulaImage not available matrix can be factored via eigendecomposition to obtain  where E is the orthonormal matrix of the eigenvectors of Display FormulaImage not available and Λ are the corresponding eigenvalues. Finally, the L significant eigenvectors are selected according to some criteria, and those eigenvectors, from the second element onward, can be considered the PDMs, Display FormulaImage not available The obtained PDM model can then be expressed as  where here Display FormulaImage not available is the nonaugmented input stimuli vector, and ci is the first element of the ith eigenvector. Note that the effect of the ci constant is to offset the parabolic static nonlinearity (see the section on associated nonlinear functions).  
The significance of Equation 12 is that it establishes a strong theoretical foundation for the PDM method by showing that the PDMs are actually a compact representation of the estimated second-order Wiener system, which actually becomes equivalent to that system as L is increased. Having established these theoretical underpinnings, other approaches that claim to estimate unbiased linear filters may be considered. Most notably, Samengo and Gollisch (2013) have demonstrated that if one defines the STC filters as the STA and the significant eigenvectors of the STC matrix, these STC filters will span the same linear subspace whether or not the STA is subtracted prior to calculating the STC matrix. Their proof demonstrates that even though modification M1 makes the STC matrix a biased Wiener kernel, the set of linear filters obtained from the STC matrix, provided the STA is included, is itself unbiased. 
Nonetheless, although both the PDMs derived from Display FormulaImage not available and from the traditional STC approach are unbiased, they cannot be described as compact because there is no guarantee they will be linearly independent, i.e., one of the filters may be defined as a linear combination of the others and thus be superfluous. For the PDMs derived from Display FormulaImage not available , this occurs because while the eigenvectors of Display FormulaImage not available are, by definition, linearly independent, the PDMs that are the eigenvectors from the second element onward may be linearly dependent. In the traditional STC approach, this occurs because the STA filter may span the same subspace as the significant eigenvectors of the STC matrix. This issue has been brought up previously in the STC literature, and several methods have been proposed on how to efficiently combine the STA and STC eigenvectors (Kaardal, Fitzgerald, Berry, & Sharpee, 2013; Pillow & Simoncelli, 2006). Here, we suggest a method whereby a compact, linearly independent set of filters may be obtained directly from the first two Wiener kernels (or equivalently, the STA and STC matrix) without having to go through the intermediate step of first obtaining the significant eigenvectors of h2 (or STC matrix). This is done by taking the significant singular vectors of the rectangular matrix defined as Display FormulaImage not available through singular value decomposition (SVD) (V. Z. Marmarelis et al., 2014).  
Several methods have been proposed in the STC literature about how to access the significance of the STC filters. Here, we will follow the nested bootstrap approach advocated in Schwartz et al. (2006) whereby confidence bounds are generated for the STC matrix eigenvalues by shuffling the input several times with respect to the output and estimating the distribution of the largest and smallest eigenvalue. If the null hypothesis was rejected, and the largest (or smallest) eigenvalue was found to be significant, it is projected out, and the test is repeated for the next largest or smallest eigenvalue. It was found that all nested tests after the first did not change any significance assessments, so all figures show confidence bounds for the largest and smallest eigenvalue, and any eigenvalues within these bounds are significant. The same approach was used here with the h2 eigenvalues and the [h1, h2] singular values with T = 40 bootstrap repetitions for each hypothesis test. 
Finally, it is important to note that here the PDMs may be obtained from the estimated Wiener kernels because for second-order systems, the Wiener kernels are essentially equivalent to the Volterra kernels. However, in systems with higher-order nonlinearities, the estimated truncated second-order Wiener system will be dependent on the specific input used to probe the system, for example, GWN inputs of different power levels will generate different Wiener kernels. In these situations, the PDMs, which are defined to be input-independent, cannot be estimated from these estimated Wiener kernels because they contain input dependencies but only from the Volterra kernels (see Discussion) (P. Z. Marmarelis & Marmarelis, 1978). 
Correcting for correlated inputs
Many system identification techniques, including Wiener kernels and STC, were originally developed for uncorrelated (“white”) inputs. However, since then, many methods have been proposed to “correct” systems estimated from these techniques when correlated inputs were used. This is important for two reasons. First, there has been a large trend in sensory neuroscience in the last several years to use naturally occurring stimuli because the system under question was evolutionarily designed for such stimuli, and these stimuli tend to elicit more output spikes, thus facilitating system estimation (Schwartz & Simoncelli, 2001; Touryan et al., 2005). Also, in many situations, the experimenter is unable to stimulate the system and, thus, must rely on spontaneous activity, which generally has correlations (V. Z. Marmarelis et al., 2013; Song et al., 2007). 
Theunissen et al. (2001) was the first to show that the biased STA, Display FormulaImage not available can be corrected by multiplying it by the inverse of the input autocorrelation matrix, Φx:  Touryan et al. (2005) proposed that the STC method could be corrected for correlated inputs by first “prewhitening” these inputs and then using the standard STC method. Later, Samengo and Gollisch (2013) showed that this was equivalent to multiplying the biased STC matrix with the input autocorrelation matrix as    
Now that we have explicitly established the relationship between the STC matrix and the second-order Wiener kernel, we may take advantage of existing results in the literature on how to correct Wiener kernels for correlated inputs. Namely, Koh and Powers (1985) showed that Wiener kernels estimated from correlated Gaussian processes may be corrected as shown below:   Note that if we substitute Equation 9 into Equation 15, we obtain the proper correction (ignoring, differences in scale) for the STC matrix:  where, here, Display FormulaImage not available refers to the second moment matrix of Display FormulaImage not available (see Equation 9). Although the Koh-Powers correction was derived specifically for correlated Gaussian inputs, we show, in the section on correlated inputs, that it is superior to Equation 14 for both correlated Gaussian inputs and much more complicated inputs, such as natural images. However, it should be noted that taking the inverse of Φx amplifies noise in the input. Thus, the corrected kernel, h2, which is obtained by twice multiplying the biased kernel with Display FormulaImage not available may be extremely noisy. In order to compensate for this, one should use regularization techniques when taking the inverse (Touryan et al., 2005). In some cases, it may be more effective to only multiply once by Display FormulaImage not available in a manner similar to that of Equation 14 in order to avoid high levels of noise in the corrected kernel (R. A. Sandler et al., 2015).  
Associated nonlinear functions
As mentioned previously, any nonlinear Volterra-type system can be represented as a Wiener-Bose cascade consisting of a set of linear filters (here the PDMs or STC filters) followed by a multivariate static nonlinearity, F[v1(t)…vL(t)], where {v1(t)…vL(t)} are the filter outputs (V. Z. Marmarelis, 2004). In the STC literature, the nonlinearity following these filters has traditionally been estimated using a histogram approach based on Bayes rule:  where the above quotient is estimated by dividing each filter output into B bins and calculating the proportion of times a spike was elicited when the filter output fell into a specific set of bins (V. Z. Marmarelis et al., 2013; Schwartz et al., 2006). An advantage of this approach is that because the obtained nonlinearity is not constrained to be a polynomial, it can describe many nonlinearities that would otherwise be difficult to characterize with a Volterra approach, such as a sigmoidal nonlinearity (Bartesaghi, Gessi, & Migliore, 1995). However, a major issue with this approach is that the dimension of the nonlinearity is equal to the number of filters. For example, even with three filters, estimating this nonlinearity becomes unfeasible due to the large number of parameters (B3), and even if there was enough data to estimate this nonlinearity, it would be very difficult to visualize and interpret. Some researchers have attempted solve this issue by using a priori assumptions of the underlying physiological system to simplify the high dimensional multivariate nonlinearity into simpler forms (Bialek & de Ruyter van Steveninck, 2005; Rust et al., 2005). Alternatively, many researchers have attempted to get around this “curse of dimensionality” by assuming that the filter outputs are independent of each other, or, equivalently, that there are no nonlinear interactions between the L filters. In such a situation, the output of each filter would be transformed independently by a one-dimensional static nonlinearity, which our group has previously dubbed the associated nonlinear function (ANF) (V. Z. Marmarelis, 2004). Thus, with this “independence assumption,” one would only need to estimate L one-dimensional nonlinearities (ANFs), each with B parameters, rather than one L-dimensional nonlinearity with BL parameters.  
The Wiener-Volterra framework of nonlinear systems analysis can provide such guidelines. As shown in Equation 12, in a second-order system, the output is the weighted sum of the square of the PDM outputs. Thus, the PDMs all contribute additively to the output, y[t], and the “independence assumption” is satisfied. If, however, the underlying system has higher-order nonlinearities and the PDMs are acquired from only the first- and second-order kernels (as currently done), then these PDMs may or may not have nonlinear interactions or “cross-terms,” which also contribute to the output, thus violating the “independence assumption.” Our group has developed methods for statistically testing whether such cross-terms exist, and, if they do, how one may characterize the system without resorting to a full multivariate nonlinearity (V. Z. Marmarelis, 2004; V. Z. Marmarelis et al., 2013). In this work, however, all of the prethreshold dynamics of the simulated systems are second order, and thus these methods will not be used. It should be noted that even though the threshold trigger in spiking systems technically makes the nonlinearity “infinite order,” it has been shown that the number of Wiener kernels needed to completely characterize such a system is at most the order of nonlinearity of the prethreshold dynamics (V. Z. Marmarelis, Citron, & Vivo, 1986). 
Simulations and model assessment
The efficacy of the STC and PDM methods was compared using synthetic spatial and temporal systems in which ground truth was available. In order to study the differences between the two methods more generally, a Monte Carlo approach was used with several trials. In each trial, a physiologically realistic second-order system, such as that shown in Figure 1a was obtained by making the first- and second-order kernels equal to the randomly weighted sum of L = 3 Laguerre basis functions (and for the second-order case, the outer products of these basis functions) (Ogura, 1985). Unless otherwise noted, the first- and second-order kernels were normalized so that their respective output power would be equal. The output was generated by feeding a random Gaussian input through this system and then thresholding the output. The first two thirds of the data was used for model estimation (training set), and the remaining third was used for model assessment (testing set). 
The PDM/STC filters were also estimated using correlated inputs in order to test the correction methods described in the section on correcting for correlated inputs. Two forms of correlated inputs were used: correlated Gaussian processes (with only second-order statistical correlations) and natural images. Correlated Gaussian inputs were obtained by feeding GWN though a lowpass filter. Natural images were obtained from the data set used in Touryan et al. (2005) (see Dan, Felsen, & Touryan, 2009). 
Two metrics were used to evaluate the goodness-of-fit of the estimated models by comparing the estimated continuous output, Display FormulaImage not available with the true binary output y[t]. Receiver operating characteristic (ROC) curves, which evaluate the performance of a binary classifier (Caruana & Niculescu-Mizil, 2004; Zanos et al., 2008), plot the true positive rate Display FormulaImage not available against the false positive rate Display FormulaImage not available over the putative range of threshold values, T, such that  The area under the curve (AUC) of ROC plots is used as a performance metric of the model and has been shown to be equivalent to the Mann-Whitney two-sample statistic (Hanley & McNeil, 1982). The AUC, which is obtained by numerically integrating the ROC curve, ranges from 0 to 1, with 0.5 indicating a random predictor and higher values indicating better model performance. The second metric used was the Pearson correlation coefficient, ρ, between the estimated prethreshold output, Display FormulaImage not available and the true output, y[t]. Unless otherwise mentioned, all model assessment metrics were evaluated on the testing set.  
Results
Illustrative cases
In this section, we present three illustrative systems, one temporal and two spatial, in which the STC and PDM methods were compared in terms of their ability to recover the underlying ground truth linear filter bank and their ability to predict the output spike train. In the first case, a GWN input was fed through the second-order Volterra system in Figure 1a and then thresholded to generate the output spike train. As explained in the section on associated nonlinear functions, every second-order nonlinear system can be decomposed into a Wiener-Bose cascade consisting of a linear filter bank followed by independent quadratic nonlinearities. An equivalent cascade model of the previously mentioned system appears in Figure 1b. Note that although both representations of Figure 1 describe the same system, they both showcase different aspects of the system. For example, the large positive central hump negative corners of the second-order kernel in Figure 1a show pairwise facilitation for nearby stimuli and pairwise inhibition for stimuli that are more distally separated. Alternatively, the three linear filters in Figure 1b show three distinct modes, each with their own frequency profiles, which contribute quadratically to the output. Ideally, a researcher would use both representations synergistically to obtain a deeper understanding of the nonlinear dynamics than would be possible from either representation in isolation. 
In order to provide intuition on differences between the second-order Wiener kernel, h2, and the STC matrix, Display FormulaImage not available four methods were used to identify the underlying system from the input and output data records. First, the traditional STC approach was used as defined in Equation 7. The obtained STC matrix is shown in Figure 2ai. As noted previously in Equation 8, this matrix has two modifications from the canonical Wiener kernel, h2 (shown in Figure 1), the addition of the input autocorrelation, MΦ, which can be seen as the “delta function” down the matrix diagonal, and the intermixing of first- and second-order dynamics (M1). These modifications prohibit any straightforward interpretation or prediction from the STC matrix and relegate it to being an intermediate step for obtaining the STC filters, which are predictive and interpretable.  
Figure 2
 
(A–D) Results of four estimation methods to characterize system in Figure 1. Note (ii) shows gray confidence bounds for largest and smallest eigenvalues (see section on PDMs). (E) ROC plots of prediction results using only second-order kernels (Equation 2) before they were decomposed into linear filters and static nonlinearities (Equation 12). (F) ROC plots showing prediction results using estimated linear filters and static nonlinearities. Notice that ROC results for STC and STC-MΦ filters are practically identical because the MΦ modification has minimal effect on the obtained filters when using uncorrelated inputs. In both (E) and (F), the light blue line (TPR = FPR) indicates a model with no predictive power. (G) Projection of spike-triggering stimuli (blue) and non-spike-triggering stimuli (red) onto the first two PDMs (shown in Diii). Note that separation is imperfect even without noise. The third PDM is needed to achieve optimal results. WIEN = Wiener kernel. PDM = principal dynamic mode. All column axes are equivalent.
Figure 2
 
(A–D) Results of four estimation methods to characterize system in Figure 1. Note (ii) shows gray confidence bounds for largest and smallest eigenvalues (see section on PDMs). (E) ROC plots of prediction results using only second-order kernels (Equation 2) before they were decomposed into linear filters and static nonlinearities (Equation 12). (F) ROC plots showing prediction results using estimated linear filters and static nonlinearities. Notice that ROC results for STC and STC-MΦ filters are practically identical because the MΦ modification has minimal effect on the obtained filters when using uncorrelated inputs. In both (E) and (F), the light blue line (TPR = FPR) indicates a model with no predictive power. (G) Projection of spike-triggering stimuli (blue) and non-spike-triggering stimuli (red) onto the first two PDMs (shown in Diii). Note that separation is imperfect even without noise. The third PDM is needed to achieve optimal results. WIEN = Wiener kernel. PDM = principal dynamic mode. All column axes are equivalent.
Figure 3
 
(A) Schematic representation of Adelson-Bergen energy model. (B) Recovered STA from energy model contains no information. (C) STC eigenvalues and obtained filters. Note only two filters contain information. (C) Wiener kernel eigenvalues and recovered filters.
Figure 3
 
(A) Schematic representation of Adelson-Bergen energy model. (B) Recovered STA from energy model contains no information. (C) STC eigenvalues and obtained filters. Note only two filters contain information. (C) Wiener kernel eigenvalues and recovered filters.
The eigenvalue distribution of the STC matrix, along with the associated bootstrap confidence bounds for the largest and smallest eigenvalue (shown in gray) can be seen in Figure 2aii. As can be seen, there are clearly three significant eigenvalues (the confidence bounds were also recomputed for the second highest eigenvalue). The set of significant filters, which is defined as the significant STC eigenvectors along with the STA, and their respective associated nonlinear functions are shown in Figure 2aiii and 2aiv. 
In Figure 2b, the STC matrix is shown with the input autocorrelation subtracted as was done in Agüera y Arcas et al. (2003); thus, only the M1 modification was present. As can be seen, when the input is uncorrelated (“white”), this modification asymptotically has no effect on the obtained STC filters; when dealing with finite data records, however, it may induce negative artifacts. In Figure 2c, the M1 modification was removed, resulting in a different set of STC matrix eigenvectors. However, as shown by Samengo and Gollisch (2013), when the STA is included, these eigenvectors should span the same linear subspace as the linear filters obtained from the traditional STC matrix. Finally, in Figure 2d, both modifications were removed to obtain the estimated canonical Wiener kernel, Display FormulaImage not available (to be compared with the true Wiener kernel, h2 in Figure 1a). This is the only kernel whose elements can be interpreted as the weight assigned by pairwise interactions of the inputs. The PDMs, obtained by the SVD method, are shown in Figure 2dii. The method recovered a compact and efficient set of three linearly independent filters. Notice that although the PDMs and STC filters should theoretically span the same subspace, the obtained PDMs much more closely match the linear filters used to generate the output (Figure 1b). The projection of spike-triggering stimuli (blue dots) and non-spike-triggering stimuli (red dots) onto the first two PDMs can be further seen in Figure 2g. As can be seen, the first two PDMs are able to achieve a large degree of separability between the stimuli although inclusion of the third PDM is needed to achieve near perfect separability.  
The predictive power of both the kernel models, evaluated using Equation 1, and the Wiener-Bose models (Equation 12) are shown in the ROC plots of Figure 2e and 2f for all four permutations of the STC method. As expected, the estimated Wiener kernels outperformed all other kernel models (Figure 2e). This is because, asymptotically, the Wiener kernels can be shown to be optimal. More interestingly, the PDMs outperformed the STC filters even though they theoretically should span the same subspace (Figure 2f). This occurs because the STA and STC matrix eigenvectors are linearly dependent, and thus, the ANF cannot be calculated independently for each filter. Note that this occurs even though all eigenvectors were deemed significant via the bootstrapping method. In order to avoid this issue, one could use cross-validation rather than the bootstrapping approach to determine which filters to include. Using this method, the third STC matrix filter was shown to offer no additional predictive power and deemed superfluous. This slightly raised the STC filter's predictive power; however, it was still slightly lower than that of the PDMs (AUC = .985 vs. AUC = .991). The three PDMs actually have better predictive power than the corresponding Wiener kernel model from which they were derived (AUC = .991 vs. AUC = .984). This shows that the first three PDMs contain all the information of the second-order Volterra model. 
The next illustrative example examines the well-known Adelson-Bergen energy model (Adelson & Bergen, 1985) in which the input is fed through two linear filters whose output is then squared and summed:  The schematic of this model is shown in Figure 3a. This model is meant to represent the receptive field of an ideal V1 complex cell, whose response incorporates phase invariance. In this model, there are only second-order (quadratic) dynamics. Thus, the STA or first-order Wiener kernel, shown in Figure 3b, is zero. This means that the M1 modification present in the STC matrix is irrelevant, leaving only the input-autocorrelation modification, MΦ. Note that here, unlike in the previous one-dimensional (temporal) example, visually examining the second-order kernel is meaningless because the 2-D (spatial) input was arbitrarily rearranged into a 1-D vector. However, interpretation of the second-order Wiener kernel may still be done by considering only specific parts of it, such as the diagonal (Citron & Marmarelis, 1987). Figure 3c and 3d shows the eigenvalue distribution, linear filters, and nonlinearities of the STC and PDM methods. As can be seen, both methods recover the ground truth filters very accurately. Furthermore, as can be presumed from the similarity of the obtained filters, both methods had roughly equal predictive power. This shows that in the case of uncorrelated inputs, the MΦ modification has an asymptotically negligent effect on the recovered filters. As we shall see later, however, this is not the case for correlated inputs when the input autocorrelation is more complex than a delta function down the kernel diagonal.  
In the final illustrative example, a sample V1 cell is constructed with three filters: two filters with the same orientation, but out of phase, and a third filter whose orientation is 180° opposed to the first two (Figure 4a). The recovered STA (Figure 4b) was a jumbled version of both orientations and thus was difficult to interpret. Once again, both the STC (Figure 4c) and PDM (Figure 4d) methods were used to characterize the system from the generated input and output data series. The obtained PDMs had slightly better predictive power than the STC filter set (AUCPDM = .985 vs. AUCSTC = .98). Furthermore, the PDMs were much closer to the original rotation of the ground truth filters, thus facilitating interpretability (see Kaardal et al., 2013, and Pillow & Simoncelli, 2006, which explore this vital issue more deeply). 
Figure 4
 
(A) Schematic of sample nonlinear Gabor model containing three underlying linear filters. (B–D) Same as in Figure 3. Note STC was only able to detect two of the three ground truth linear filters. ANF = associated nonlinear function.
Figure 4
 
(A) Schematic of sample nonlinear Gabor model containing three underlying linear filters. (B–D) Same as in Figure 3. Note STC was only able to detect two of the three ground truth linear filters. ANF = associated nonlinear function.
General trends
The previous three illustrative examples were meant to provide intuition concerning how the linear filter banks estimated by the PDM and STC methods compared with the underlying ground truth system and how well these methods were able to separate between spike-triggering and non-spike-triggering stimuli (i.e., their predictive power). In this section, Monte Carlo simulations are used to explore more generally how these two methods compare in terms of their predictive power, overfitting, robustness to noise, and convergence or “data-hungriness.” In all these cases, the recovered models were assessed based on their predictive power rather than on how closely they recovered the ground truth system. This is because neither method promises that its recovered filters will be equivalent to the ground truth filters but only that it will span the same linear filter space. This property is implicitly measured by the recovered systems' predictive power because it is reasonable to assume that as the recovered filter space moves away from the true filter space, its predictive power will decline. 
The predictive power of the two models was evaluated on T = 30 randomly generated systems with GWN inputs (see Methods, section on simulations and model assessment). As noted previously, all synthetic systems could be presented as a second-order Volterra system, or, alternatively, as a Wiener-Bose cascade consisting of three independent linear filters. In Figure 5a, the predictive power was measured as a function of the number of included filters. For the STC method, the first included filter was the STA. As can be seen, the PDMs modestly outperformed the STC filter set for any amount of included filters. Notably, the first PDM outperformed the STA filter, showing that even in a Wiener-Bose cascade model with a single linear filter, the STA is suboptimal. Furthermore, as suggested by the example in Figure 2, the predictive power of the STC filter set decreased with the inclusion of the third significant STC filter (fourth filter total) due to the linear dependence between the STA and STC filters. This occurs even though the fourth filter was deemed significant by the bootstrap method, demonstrating the need for cross-validation as essential for filter selection. Figure 5b compares the predictive power of the PDM and STC filter sets (with three filters). As can be seen, the PDMs outperformed the STC filters in almost every trial (ρPDM = .843 vs. ρSTC = .815, p < 0.001). 
Figure 5
 
Predictive power of traditional STC versus PDM method. (A) Predictive power as a function of included filters. For STC, the first included filter was the STA. Notice that including the fourth significant STC filter actually slightly reduced predictive power. Error bars show SEM. (B) Predictive power for T = 30 trials. All predictive power results are evaluated on testing set data.
Figure 5
 
Predictive power of traditional STC versus PDM method. (A) Predictive power as a function of included filters. For STC, the first included filter was the STA. Notice that including the fourth significant STC filter actually slightly reduced predictive power. Error bars show SEM. (B) Predictive power for T = 30 trials. All predictive power results are evaluated on testing set data.
In experimental physiological system identification, the experimenter requires criteria to know how much data is needed to estimate a valid model. Usually, this choice is constrained by the inherent nonstationarities of physiological systems. Model convergence or “data hungriness” describes the amount of data needed to successfully model the system. Overfitting describes how well the model estimated on the given data set will generalize to other data sets acquired from the same system. To assess these properties, N = 30 trials were conducted with randomly generated synthetic systems as described previously. For each synthetic system, the STC and PDM estimation methods were applied to randomly generated training/in-sample data of various lengths from 200 to 15,000 bins. Then the predictive performance of the estimated filters was evaluated on testing/out-of-sample data. 
The predictive power of the training and testing sets (solid and dashed lines, respectively) is shown in Figure 6a for the ρ metric. The convergence of the data is described by the shape of the testing set curve in Figure 6a, top, and, in particular, how quickly this curve approaches a steady-state value. In order to facilitate comparisons between the two methods, their predictive power was plotted as a percentage of their maximal value in Figure 6a, bottom. As can be seen, in noise-free conditions, it takes both methods only about 150 spikes to converge to more than 90% of their maximal value. The singular value distribution of the PDM method for a sample system is shown in Figure 6b. As shown by the red line, with only 200 output spikes, the PDM eigenvalues can be distinguished from the remaining eigenvalues. 
Figure 6
 
(A) Convergence and overfitting analysis. Top shows the predictive power of both training (solid lines) and testing (dashed lines) sets for data of various lengths ranging from 20 to 3,000 spikes (200 to 15,000 bins). Bottom indicates the performance of the testing set, normalized by its maximal (steady-state) performance. All points in (A) and (B) were averaged over N = 30 trials. (B) Eigenvalue distribution for a sample system as amount of output spikes are increased. (C) Model robustness measured by predictive power as input noise is increased. % Input noise = (input noise power/total input power) × 100. Same layout as (A). (D) Recovered PDMs of sample system in Figure 1 when the input was composed of 60% noise. Error bars show in (A) and (C) show SD.
Figure 6
 
(A) Convergence and overfitting analysis. Top shows the predictive power of both training (solid lines) and testing (dashed lines) sets for data of various lengths ranging from 20 to 3,000 spikes (200 to 15,000 bins). Bottom indicates the performance of the testing set, normalized by its maximal (steady-state) performance. All points in (A) and (B) were averaged over N = 30 trials. (B) Eigenvalue distribution for a sample system as amount of output spikes are increased. (C) Model robustness measured by predictive power as input noise is increased. % Input noise = (input noise power/total input power) × 100. Same layout as (A). (D) Recovered PDMs of sample system in Figure 1 when the input was composed of 60% noise. Error bars show in (A) and (C) show SD.
In experimental settings, noise is an unavoidable reality, which may arise in a variety of ways, including imperfect instrumentation and/or from unobserved inputs. Thus, any input–output model must be robust, i.e., able to accurately estimate the underlying system even in the presence of noise. Here, the robustness of the PDM and STC methods was assessed by seeing how their predictive power declines as increasing amounts of uncorrelated Gaussian noise was added to the input time series. The results are shown in Figure 6c. Notice that training set prediction results (solid line) rapidly decline with increasing noise because noise was added only to the training input; however, the models estimated from these noisy input–output time series are still good as can be seen by their prediction results on the testing set (dashed line), in which no input noise was added. Both methods proved to be very robust, and their predictive power declined by less than 10% even when 60% of the input power was caused by noise. Figure 6d shows the PDMs obtained from the system in Figure 1 when the input was composed of 60% noise. As can be seen, the obtained PDMs still faithfully represent the ground truth system. 
Correlated inputs
In all of the previous simulations, uncorrelated Gaussian inputs were used to stimulate the model. As discussed previously in Methods, in the section on correcting for correlated inputs, in many situations either the experimenter does not have the luxury of designing the inputs or chooses to use correlated inputs, such as natural images. Such correlated inputs will bias the obtained PDM/STC filters necessitating procedures to correct the estimated filters. In this section, we will present two illustrative examples, one temporal and one spatial, in which the system was stimulated with correlated inputs, and how the correction procedures discussed in the section on correcting for correlated inputs were used to correct the estimated system. 
In Figure 7a, a second-order Volterra system was stimulated with a correlated Gaussian process and thresholded to obtain a binary output. Figure 7b shows the kernels obtained through three methods: (a) the biased Wiener kernels obtained through the standard procedure without correction (Equation 4b), (b) the correction approach originally used in Touryan et al. (2005) and later developed in Samengo and Gollisch (2013) (Equation 14), and, finally, (c) the approach originally derived by Koh and Powers (1985) and advocated here (Equation 16, bottom row). It should be noted that in filter set II, the input autocorrelation modification, MΦ, is not removed, which is a serious source of bias when using correlated inputs. These different filter sets will henceforth be referred to as filter set I, filter set II, and filter set III, respectively. Figure 7b through 7d shows the kernels, eigenvalue/singular value distribution, and obtained filters from all three methods. Note that the corrected Wiener kernel from Figure 7di obtained near perfect predictive power (as seen in Figure 7e), thus validating the correction formula from Equation 16. Note that all methods only showed two significant eigenvalues even though there are three ground truth filters. The reason for this is presumably because the third filter is inhibitory, and thus, the application of a relatively high binary threshold may have obstructed its recovery (V. Z. Marmarelis et al., 1986). As mentioned previously, quality of the obtained filter sets must be assessed by their predictive power, which quantifies how closely the obtained filter set spans the true filter space. As can be seen in the ROC plot in Figure 7f, filter set III has a significantly higher AUC value than the other two methods. The superiority of filter set III can also be seen by comparing it with the ground truth filters in Figure 7a. Its two filters match almost exactly the first two filters of the system. The two filters of filter sets I and II, on the other hand, are strongly biased versions of the first two ground truth filters. It should also be noted that recent work by Aljadeff, Segev, Berry, and Sharpee (2013) has proposed alternative methods on how to recover optimal linear subspaces when using highly correlated Gaussian inputs, which may further improve results here and possibly recover the third linear filter. 
Figure 7
 
(A) A correlated Gaussian input was put through sample system and thresholded as in Figure 1. (B–D) Results of three different correction techniques for I/O data of (A) (see text). (E and F) Predictive power of obtained kernels (E) and linear filters (F). (G) Projection f data onto the first two filters of filter set III is able to almost perfectly separate the data.
Figure 7
 
(A) A correlated Gaussian input was put through sample system and thresholded as in Figure 1. (B–D) Results of three different correction techniques for I/O data of (A) (see text). (E and F) Predictive power of obtained kernels (E) and linear filters (F). (G) Projection f data onto the first two filters of filter set III is able to almost perfectly separate the data.
In the second illustrative example, the Adelson-Bergen energy model of Figure 3a was stimulated with correlated (blurred) Gaussian images obtained using a 2-D Gaussian filter of 10 pixels and 1 pixel variance (Figure 8a) and natural images (Figure 8b). The recovered filters are shown in Figure 8c and 8d. Once again, no method was able to perfectly recover the original filters. However, once again, by using predictive power as a measure for how close the filter spaces are to each other, it can be seen that in the case of blurred images, filter set III almost perfectly spans the ground truth filter space as it achieved near perfect predictive power (AUC = .999). Thus, even though filter set III does not appear visually similar to the ground truth filters, they span the same space. 
Figure 8
 
Results of three different recovery methods onto correlated Gaussian images (A) and natural images (B). (i) Sample input. (ii–iv) Obtained filters of three different methods (see text). (v and vi) Predictive power of kernels and linear filter models, respectively. ROC = receiver operative curve. PDM = principal dynamic mode.
Figure 8
 
Results of three different recovery methods onto correlated Gaussian images (A) and natural images (B). (i) Sample input. (ii–iv) Obtained filters of three different methods (see text). (v and vi) Predictive power of kernels and linear filter models, respectively. ROC = receiver operative curve. PDM = principal dynamic mode.
In the case of natural images, filter set III has a much lower predictive power than it did with blurred images; however, it still performs significantly better than the other two methods. This is to be expected because the correction in Equation 9 was derived to be used exclusively with correlated Gaussian inputs such as those in Figure 8a. In fact, the use of STC with non-Gaussian inputs, the obtained biases, and possible solutions have been previously discussed in the literature (Fitzgerald, Rowekamp, Sincich, & Sharpee, 2011; Pillow & Park, 2011; Rajan & Bialek, 2013). What we have demonstrated is that, at least in the example presented here, the method still works well even with highly non-Gaussian inputs, such as natural images. Even better results were obtained with uniform and tertiary inputs put through a lowpass filter (results not shown). We should note, however, that strong regularization was needed to obtain good results with natural images. In certain very noisy systems, the methods used to obtain filter sets I and II may be more appropriate. Once again, the practitioner is advised to use the predictive power of the obtained filters to validate whatever method they choose to use. 
Discussion
STC is currently the standard technique in sensory neuroscience for characterizing nonlinear systems whose output is a nonlinear function of the outputs of multiple nonlinear linear filters/receptive fields. In this work, the STC method is examined through the Voltera/Wiener framework of characterizing nonlinear systems by explicitly deriving the relationship between the STC matrix and the first two Wiener kernels (Equation 8). Our main finding is that the STC matrix is actually a modified version of the second-order Wiener kernel, which incorporates the input autocorrelation (MΦ) and mixes first- and second-order dynamics (M1). Furthermore, it is shown how one may obtain the Wiener kernels while preserving the simplicity and elegance of the STC method and with no additional computational expense, i.e., for free (see Equation 9). 
Parallel to the popularization of the STC method in sensory neuroscience, the PDMs were developed in nonlinear dynamical systems identification as an efficient filter set to describe the linear dynamics of nonlinear systems (V. Z. Marmarelis, 2004). Functionally, the PDMs are equivalent to the STC filters. However, perhaps because the PDMs have largely (although not exclusively) been used to characterize continuous dynamical systems, such as cerebral hemodynamics, in which the context of a “receptive field” is not applicable, they have largely escaped the attention of sensory neurophysiologists. The PDMs are obtained from a second-order Volterra system and attempt to answer an open question in the STC literature: how to combine the STA and STC to obtain the optimal set of linear filters for the system. 
Although the STC matrix can be considered a “biased” Wiener kernel in the sense that it is suboptimal to use in Equation 2 (see Figure 2), the STC filter set, which is derived from the STC matrix and is composed of the STA and significant STC eigenvectors, is nonetheless unbiased and asymptotically spans the same subspace as filters derived from the Wiener kernels, i.e., the PDMs. Given the correspondence between these two filter sets, the question may be asked what benefit one gains by using the Wiener kernel/PDMs over the STC matrix/filter set or at least of establishing the relationship between the two. We suggest the following: First, unlike the STC matrix, which is only an intermediate step to acquire the STC filter set, the Wiener kernel is highly interpretable and may reveal information about the system not apparent from simply examining the linear subspace. Second, the Wiener/PDM method, unlike STC, allows a single unified framework to be applied to systems with both continuous and binary outputs—or, for the neuronal case, to both subthreshold and suprathreshold dynamics (see Fournier et al., 2014). Third, one may apply the vast body of theory developed for Wiener/Volterra kernels, particularly their convergence properties, to STC (see below). Finally, the PDM extraction method presented here allows one to recover a linearly independent filter set in a single step rather than first obtaining the significant STC filters and then combining them with the STA. Furthermore, the PDMs are shown to have slightly better predictive power than the STC filters and, in many cases, may be more interpretable (see Figure 4 and Cantrell et al., 2010). 
By explicitly establishing the relationship between the STC matrix and the second-order Wiener kernel, one gains “for free” all the prior theoretical work done on Wiener theory. For example, it was believed for a long time that the STC method is only valid for Gaussian inputs (Paninski, 2003). Very recently, it was shown that the STC is actually consistent for arbitrary spherical inputs (Samengo & Gollisch, 2013). This result is corroborated by noting that the first two Wiener kernels are very similar for a wide range of inputs, including uniform and discrete-level inputs (such as binary and tertiary); the only differences would be along the second-order kernel diagonal, which involves fourth-order moments of the input used (P. Z. Marmarelis & Marmarelis, 1978). However, it can be shown that if systems with higher-order nonlinearities are estimated with a “truncated” second-order model, the projection of the higher-order nonlinearities onto the first two kernels will differ based on the statistical properties of the input used. Thus, even though the kernels estimated from Equation 4 are the best possible kernels for a wide range of inputs, they will differ depending on the specific input used, based on how this truncation bias expresses itself (P. Z. Marmarelis & Marmarelis, 1978). The same comments would apply to any linear filter bank estimated from the Wiener kernels because it would simply be a compact representation of the latter (see Equation 12). Furthermore, Wiener kernels have been derived for Poisson inputs, which are not symmetrical (V. Z. Marmarelis & Berger, 2005; R. A. Sandler et al., 2015). These Poisson Wiener kernels are particularly important when one wishes to characterize neurons whose only identifiable inputs are the spiking history of their input neurons rather than external stimuli, such as images or sound (Berger et al., 2012; Song et al., 2007; Zanos et al., 2008). Perhaps the most important gain from Wiener theory is how to correct Wiener kernels estimated from correlated Gaussian inputs (Equation 15). This correction scheme was used here and found to be superior to commonly used previous methods both for correlated Gaussian inputs and natural images (Figures 7 and 8) although it should be noted that for natural images with higher-order correlations, recently proposed methods may further improve STC results (Aljadeff et al., 2013). 
In this work, in order to emphasize the relationship between the PDMs and the STC filters, the PDMs were derived from Wiener kernels estimated using the cross-correlation technique (CCT; Lee & Schetzen, 1965). However, several improvements have been made to the PDM method, which make the PDMs used in practice far superior to those presented here. First, the CCT is largely obsolete as a method of estimating the Wiener/Volterra kernels (Korenberg, Bruder, & McIlroy, 1988). Rather, current practice in our group is to estimate the Volterra kernels by first projecting the input onto a set of efficient basis functions, such as the Laguerre basis set (V. Z. Marmarelis, 1993) or b-splines (Song et al., 2013) and then estimating the kernel coefficients through least squares estimation or generalized linear models (V. Z. Marmarelis, 2004; Song et al., 2007; Truccolo, Eden, Fellows, Donoghue, & Brown, 2005). The obtained kernels are found to be much more robust than the corresponding CCT kernels and can be estimated using much less data. Most importantly for sensory neurophysiologists, these kernels can be estimated from highly correlated inputs, such as natural images without having to apply any further correction schemes (V. Z. Marmarelis, 1994). Furthermore, the static nonlinearity can also be estimated much more efficiently by expanding it on a polynomial basis (V. Z. Marmarelis, 2004). We have found that a cubic (third-order) polynomial is sufficient to model the nonlinearity of most physiological systems. Another advantage of these estimation techniques is that they enable one to explicitly include interneuronal and spike-history effects, such as bursting and the after-hyperpolarization into the model (Eikenberry & Marmarelis, 2013; Song et al., 2007). Most neurons are influenced by these effects, and thus any STC filters obtained without considering them will be biased as has been shown in Pillow et al. (2008). In the future, we hope to apply these techniques to the receptive field identification problem. 
Acknowledgments
We wish to thank Arvind Iyer for his helpful discussions and the anonymous reviewers whose thorough comments were unusually helpful in bettering this manuscript. This work was supported by NIH grant P41-EB001978 to the Biomedical Simulations Resource at the University of Southern California. 
Commercial relationships: none. 
Corresponding author: Roman A. Sandler. 
Address: Department of Biomedical Engineering, University of Southern California, Los Angeles, CA, USA. 
References
Adelson E. H., Bergen J. R. (1985). Spatiotemporal energy models for the perception of motion. Journal of the Optical Society of America A, 2 (2), 284–299.
Aertsen A. M. H. J., Johannesma P. I. M. (1981). The spectro-temporal receptive field. Biological Cybernetics, 42 (2), 133–143.
Agiiera y Arcas B., Fairhall A. L.,& Bialek W. (2001). What can a single neuron compute? In Advances in neural information processing systems 13: Proceedings of the 2000 conference, volume 13, (p. 75). Cambridge, MA: MIT Press.
Agüera y Arcas B., Fairhall A. L., Bialek W. (2003). Computation in a single neuron: Hodgkin and Huxley revisited. Neural Computation, 15 (8), 1715–1749.
Aljadeff J., Segev R., Berry M. J.,II, Sharpee T. O. (2013). Spike triggered covariance in strongly correlated Gaussian stimuli. PLoS Computational Biology, 9 (9), e1003206.
Bartesaghi R., Gessi T., Migliore M. (1995). Input-output relations in the entorhinal-hippocampal-entorhinal loop: Entorhinal cortex and dentate gyrus. Hippocampus, 5 (5), 440–451.
Berger T. W., Song D., Chan R. H. M., Marmarelis V. Z., La-Coss J., Wills J., Granacki J. J. (2012). A hippocampal cognitive prosthesis: Multi-input, multi-output nonlinear modeling and VLSI implementation. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 20 (2), 198–211.
Bialek W. de Ruyter van Steveninck R. R. (2005). Features and dimensions: Motion estimation in fly vision. arXiv preprint q-bio/0505003.
Cantrell D. R., Cang J., Troy J. B., Liu X. (2010). Non-centered spike-triggered covariance analysis reveals neurotrophin-3 as a developmental regulator of receptive field properties of on-off retinal ganglion cells. PLoS Computational Biology, 6 (10), e1000967.
Caruana R., Niculescu-Mizil A. (2004). Data mining in metric space: An empirical analysis of supervised learning performance criteria. In Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 69–78). ACM.
Chichilnisky E. J. (2001). A simple white noise analysis of neuronal light responses. Network: Computation in Neural Systems, 12 (2), 199–213.
Citron M. C., Marmarelis V. Z. (1987). Applications of minimum-order Wiener modeling to retinal ganglion cell spatiotemporal dynamics. Biological Cybernetics, 57 (4–5), 241–247.
Dan Y., Felsen G., Touryan J. (2009). Extracellular recording from cells in cat primary visual cortex. crcns.org, doi:10.6080/K0RN35SV.
De Boer E., Kuyper P. (1968). Triggered correlation. IEEE Transactions on Biomedical Engineering, (3), 169–179.
de Ruyter van Steveninck R. R., Bialek W. (1988). Real-time performance of a movement-sensitive neuron in the blowfly visual system: Coding and information transfer in short spike sequences. Proceedings of the Royal Society of London. Series B. Biological Sciences, 234 (1277), 379–414.
Eggermont J. J. (1993). Wiener and Volterra analyses applied to the auditory system. Hearing Research, 66 (2), 177–201.
Eikenberry S. E., Marmarelis V. Z. (2013). A nonlinear autoregressive Volterra model of the Hodgkin–Huxley equations. Journal of Computational Neuroscience, 34 (1), 163–183.
Emerson R. C., Citron M. C., Vaughn W. J., Klein S. A. (1987). Nonlinear directionally selective subunits in complex cells of cat striate cortex. J Neurophysiol, 58 (1), 33–65.
Fairhall A. L., Burlingame C. A., Narasimhan R., Harris R. A., Puchalla J. L., Berry M. J.,II. (2006). Selectivity for multiple stimulus features in retinal ganglion cells. Journal of Neurophysiology, 96 (5), 2724–2738.
Fitzgerald J. D., Rowekamp R. J., Sincich L. C., Sharpee T. O. (2011). Second order dimensionality reduction using minimum and maximum mutual information models. PLoS Computational Biology, 7 (10), e1002249.
Fournier J., Monier C., Levy M., Marre O., Sári K., Kisvárday Z. F., Frégnac Y. (2014). Hidden complexity of synaptic receptive fields in cat v1. The Journal of Neuroscience, 34 (16), 5515–5528.
Fox J. L., Fairhall A. L., Daniel T. L. (2010). Encoding properties of haltere neurons enable motion feature detection in a biological gyroscope. Proceedings of the National Academy of Sciences, USA, 107 (8), 3840–3845.
Hanley J. A., McNeil B. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143 (1), 29–36.
Jing X., Simpson D. M., Allen R., Newland P. L. (2012). Understanding neuronal systems in movement control using Wiener/Volterra kernels: A dominant feature analysis. Journal of Neuroscience Methods, 203 (1), 220–232.
Kaardal J., Fitzgerald J. D., Berry M. J., Sharpee T. O. (2013). Identifying functional bases for multidimensional neural computations. Neural Computation, 25 (7), 1870–1890.
Kim A. J., Lazar A. A., Slutskiy Y. B. (2011). System identification of drosophila olfactory sensory neurons. Journal of Computational Neuroscience, 30 (1), 143–161.
Koh T., Powers E. J. (1985). Second-order Volterra filtering and its application to nonlinear system identification. IEEE Transactions on Acoustics, Speech and Signal Processing, 33 (6), 1445–1455.
Korenberg M. J., Bruder S. B., Mcllroy P. J. (1988). Exact orthogonal kernel estimation from finite data records: Extending Weiner's identification of nonlinear systems. Annals of Biomedical Engineering, 16 (2), 201–214.
Krausz H. I. (1975). Identification of nonlinear systems using random impulse train inputs. Biological Cybernetics, 19 (4), 217–230.
Lee Y. W., Schetzen. M. (1965). Measurement of the Wiener kernels of a non-linear system by cross-correlation. International Journal of Control 2 (3), 237–254.
Marmarelis, P. Z., Marmarelis V. Z. (1978). Analysis of physiological systems: The white-noise approach. New York: Plenum Press.
Marmarelis P. Z., Naka K. I. (1973). Nonlinear analysis and synthesis of receptive-field responses in the catfish retina. I. Horizontal cell leads to ganglion cell chain. Journal of Neurophysiology, 36 (4), 605–618.
Marmarelis V. Z. (1993). Identification of nonlinear biological systems using Laguerre expansions of kernels. Annals of Biomedical Engineering, 21 (6), 573–589.
Marmarelis V. Z. (1994). On kernel estimation using non-Gaussian and/or non-white input data. In V. Z. Marmarelis (Ed.), Advanced methods of physiological system modeling (pp. 229–242). New York: Plenum Press.
Marmarelis V. Z. (1997). Modeling methology for nonlinear physiological systems. Annals of Biomedical Engineering, 25 (2), 239–251.
Marmarelis V. Z. (2004). Nonlinear dynamic modeling of physiological systems. Hoboken, NJ: Wiley-Interscience.
Marmarelis V. Z., Berger T. W. (2005). General methodology for nonlinear modeling of neural systems with poisson point-process inputs. Mathematical Biosciences, 196 (1), 1–13.
Marmarelis V. Z., Chon K. H., Holstein-Rathlou N. H., Marsh D. J. (1999). Nonlinear analysis of renal autoregulation in rats using principal dynamic modes. Annals of Biomedical Engineering, 27 (1), 23–31.
Marmarelis V. Z., Citron M. C., Vivo C. P. (1986). Minimum-order Wiener modelling of spike-output systems. Biological Cybernetics, 54 (2), 115–123.
Marmarelis V. Z., Juusola M., French A. S. (1999). Principal dynamic mode analysis of nonlinear transduction in a spider mechanoreceptor. Annals of Biomedical Engineering, 27 (3), 391–402.
Marmarelis V. Z., Orme M. E. (1993). Modeling of neural systems by use of neuronal modes. IEEE Transactions on Biomedical Engineering, 40 (11), 1149–1158.
Marmarelis V. Z., Shin D. C., Song D., Hampson R. E., Deadwyler S. A., Berger T. W. (2013). Nonlinear modeling of dynamic interactions within neuronal ensembles using principal dynamic modes. Journal of Computational Neuroscience, 34 (1), 73–87.
Marmarelis V. Z., Shin D. C., Song D., Hampson R. E., Deadwyler S. A., Berger T. W. (2014). On parsing the neural code in the prefrontal cortex of primates using principal dynamic modes. Journal of Computational Neuroscience, 36 (3), 321–337.
Marmarelis V. Z., Shin D. C., Zhang R. (2012). Linear and nonlinear modeling of cerebral flow autoregulation using principal dynamic modes. The Open Biomedical Engineering Journal, 6, 42.
Ogura H. (1985). Estimation of Wiener kernels of a nonlinear system and a fast algorithm using digital Laguerre filters. In 15th NIBB Conference, Okazaki, Japan (pp. 14–62), 1985.
Paninski L. (2003). Convergence properties of three spike-triggered analysis techniques. Network: Computation in Neural Systems, 14 (3), 437–464.
Park I. M., Archer E. W., Priebe N., Pillow J. W. (2013). Spectral methods for neural characterization using generalized quadratic models. In Advances in neural information processing systems (pp. 2454–2462). NIPS Proceedings.
Petersen R. S., Brambilla M., Bale M. R., Alenda A., Panzeri S., Montemurro M. A., Maravall M. (2008). Diverse and temporally precise kinetic feature selectivity in the VPm thalamic nucleus. Neuron, 60 (5), 890–903.
Pillow J. W., Park I. M. (2011). Bayesian spike-triggered covariance analysis. In Advances in neural information processing systems, (pp. 1692–1700). NIPS Proceedings.
Pillow J. W., Shlens J., Paninski L., Sher A., Litke A. M., Chichilnisky E. J., Simoncelli E. P. (2008). Spatio-temporal correlations and visual signalling in a complete neuronal population. Nature, 454 (7207), 995–999.
Pillow J. W., Simoncelli E. P. (2006). Dimensionality reduction in neural models: An information-theoretic generalization of spike-triggered average and covariance analysis. Journal of Vision, 6 (4): 9, 414–428, doi:10.1167/6.4.9.[PubMed] [Article]
Rajan K., Bialek W. (2013). Maximally informative “stimulus energies” in the analysis of neural responses to natural signals. PloS one, 8 (11), e71959.
Rapela J., Mendel J. M., Grzywacz N. M. (2006). Estimating nonlinear receptive fields from natural images. Journal of Vision, 6 (4): 11, 441–474, doi:10.1167/6.4.11.[PubMed] [Article]
Rieke F., Warland D., de Ruyter van Steveninck R. R., Bialek W. (1999). Spikes: Exploring the Neural Code. Cambridge, MA: MIT Press.
Rust N. C., Schwartz O., Movshon J. A., Simoncelli E. P. (2005). Spatiotemporal elements of macaque V1 receptive fields. Neuron, 46 (6), 945–956.
Saleem A. B., Krapp H. G., Schultz S. R. (2008). Receptive field characterization by spike-triggered independent component analysis. Journal of Vision, 8 (13): 2, 1–16, doi:10.1167/8.13.2.[PubMed] [Article]
Samengo I., Gollisch T. (2013). Spike-triggered covariance: Geometric proof, symmetry properties, and extension beyond Gaussian stimuli. Journal of Computational Neuroscience, 34 (1), 137–161.
Sandler R., Shin D. C., Song D., Hampson R. E., Deadwyler S. A., Berger T. W., Marmarelis V. Z. (2013). Closed-loop modeling of the hippocampus and design of neurostimulation patterns for suppressing seizures. In Neural Engineering (NER), 2013 6th International IEEE/EMBS Conference on (pp. 1143–1146). IEEE.
Sandler R. A., Song D., Hampson R. E., Deadwyler S. A., Berger T. W., Marmarelis V. Z. (2014). Model-based assessment of an in-vivo predictive relationship from ca1 to ca3 in the rodent hippocampus. Journal of Computational Neuroscience, 240, 1–15.
Sandler R. A., Deadwyler S. A., Hampson R. E., Song D., Berger T. W., Marmarelis V. Z. (2015). System identification of point-process neural systems using probability based Volterra kernels. Journal of Neuroscience Methods, 240, 179–192.
Schwartz O., Pillow J. W., Rust N. C., Simoncelli E. P. (2006). Spike-triggered neural characterization. Journal of Vision, 6 (4): 13, 484–507, doi:10.1167/6.4.13.[PubMed] [Article]
Schwartz O., Simoncelli E. P. (2001). Natural signal statistics and sensory gain control. Nature Neuroscience, 4 (8), 819–825.
Sharpee T., Rust N. C., Bialek W. (2004). Analyzing neural responses to natural signals: Maximally informative dimensions. Neural Computation, 16 (2), 223–250.
Slee S. J., Higgs M. H., Fairhall A. L., Spain W. J. (2005). Two-dimensional time coding in the auditory brainstem. The Journal of Neuroscience, 25 (43), 9978–9988.
Song D., Chan R. H. M., Marmarelis V. Z., Hampson R. E., Deadwyler S. A., Berger T. W. (2007). Nonlinear dynamic modeling of spike train transformations for hippocampal-cortical prostheses. IEEE Transactions on Biomedical Engineering, 54 (6), 1053–1066.
Song D., Marmarelis V. Z., Berger T. W. (2009). Parametric and non-parametric modeling of short-term synaptic plasticity. Part I: Computational study. Journal of Computational Neuroscience, 26 (1), 1–19.
Song D., Wang H., Tu C. Y., Marmarelis V. Z., Hampson R. E., Deadwyler S. A., Berger T. W. (2013). Identification of sparse neural functional connectivity using penalized likelihood estimation and basis functions. Journal of Computational Neuroscience, 35 (3), 335–357.
Theunissen F. E., David S. V., Singh N. C., Hsu A., Vinje W. E., Gallant J. L. (2001). Estimating spatio-temporal receptive fields of auditory and visual neurons from their responses to natural stimuli. Network: Computation in Neural Systems, 12 (3), 289–316.
Touryan J., Felsen G., Dan Y. (2005). Spatial structure of complex cell receptive fields measured with natural images. Neuron, 45 (5), 781–791.
Truccolo W., Eden U. T., Fellows M. R., Donoghue J. P., Brown E. N. A point process framework for relating neural spiking activity to spiking history, neural ensemble, and extrinsic covariate effects. Journal of Neurophysiology, 93 (2), 1074–1089.
Westwick D. T., Kearney R. E. (2003). Models of nonlinear systems. Identification of nonlinear physiological systems (pp. 57–101). Hoboken, NJ: John Wiley & Sons.
Wiener N. (1966). Nonlinear problems in random theory. Cambridge, MA: The MIT Press.
Zanos T. P., Courellis S. H., Berger T. W., Hampson R. E., Deadwyler S. A., Marmarelis V. Z. (2008). Nonlinear modeling of causal interrelationships in neuronal ensembles. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 16 (4), 336–352.
Zhong Y., Wang H., Ju K. H., Jan K.-M., Chon K. H. (2004). Nonlinear analysis of the separate contributions of autonomic nervous systems to heart rate variability using principal dynamic modes. IEEE Transactions on Biomedical Engineering, 51 (2), 255–262.
Figure 1
 
(A) Input–output data records were generated by feeding GWN (x[t]) through a second-order Volterra system defined by a first- and second-order kernel (K1 and K2) and thresholding the continuous output to generate the binary output, y[t]. (B) Equivalent Wiener-Bose cascade model of the system shown in (A) composed of three linearly independent filters followed by static quadratic nonlinearities.
Figure 1
 
(A) Input–output data records were generated by feeding GWN (x[t]) through a second-order Volterra system defined by a first- and second-order kernel (K1 and K2) and thresholding the continuous output to generate the binary output, y[t]. (B) Equivalent Wiener-Bose cascade model of the system shown in (A) composed of three linearly independent filters followed by static quadratic nonlinearities.
Figure 2
 
(A–D) Results of four estimation methods to characterize system in Figure 1. Note (ii) shows gray confidence bounds for largest and smallest eigenvalues (see section on PDMs). (E) ROC plots of prediction results using only second-order kernels (Equation 2) before they were decomposed into linear filters and static nonlinearities (Equation 12). (F) ROC plots showing prediction results using estimated linear filters and static nonlinearities. Notice that ROC results for STC and STC-MΦ filters are practically identical because the MΦ modification has minimal effect on the obtained filters when using uncorrelated inputs. In both (E) and (F), the light blue line (TPR = FPR) indicates a model with no predictive power. (G) Projection of spike-triggering stimuli (blue) and non-spike-triggering stimuli (red) onto the first two PDMs (shown in Diii). Note that separation is imperfect even without noise. The third PDM is needed to achieve optimal results. WIEN = Wiener kernel. PDM = principal dynamic mode. All column axes are equivalent.
Figure 2
 
(A–D) Results of four estimation methods to characterize system in Figure 1. Note (ii) shows gray confidence bounds for largest and smallest eigenvalues (see section on PDMs). (E) ROC plots of prediction results using only second-order kernels (Equation 2) before they were decomposed into linear filters and static nonlinearities (Equation 12). (F) ROC plots showing prediction results using estimated linear filters and static nonlinearities. Notice that ROC results for STC and STC-MΦ filters are practically identical because the MΦ modification has minimal effect on the obtained filters when using uncorrelated inputs. In both (E) and (F), the light blue line (TPR = FPR) indicates a model with no predictive power. (G) Projection of spike-triggering stimuli (blue) and non-spike-triggering stimuli (red) onto the first two PDMs (shown in Diii). Note that separation is imperfect even without noise. The third PDM is needed to achieve optimal results. WIEN = Wiener kernel. PDM = principal dynamic mode. All column axes are equivalent.
Figure 3
 
(A) Schematic representation of Adelson-Bergen energy model. (B) Recovered STA from energy model contains no information. (C) STC eigenvalues and obtained filters. Note only two filters contain information. (C) Wiener kernel eigenvalues and recovered filters.
Figure 3
 
(A) Schematic representation of Adelson-Bergen energy model. (B) Recovered STA from energy model contains no information. (C) STC eigenvalues and obtained filters. Note only two filters contain information. (C) Wiener kernel eigenvalues and recovered filters.
Figure 4
 
(A) Schematic of sample nonlinear Gabor model containing three underlying linear filters. (B–D) Same as in Figure 3. Note STC was only able to detect two of the three ground truth linear filters. ANF = associated nonlinear function.
Figure 4
 
(A) Schematic of sample nonlinear Gabor model containing three underlying linear filters. (B–D) Same as in Figure 3. Note STC was only able to detect two of the three ground truth linear filters. ANF = associated nonlinear function.
Figure 5
 
Predictive power of traditional STC versus PDM method. (A) Predictive power as a function of included filters. For STC, the first included filter was the STA. Notice that including the fourth significant STC filter actually slightly reduced predictive power. Error bars show SEM. (B) Predictive power for T = 30 trials. All predictive power results are evaluated on testing set data.
Figure 5
 
Predictive power of traditional STC versus PDM method. (A) Predictive power as a function of included filters. For STC, the first included filter was the STA. Notice that including the fourth significant STC filter actually slightly reduced predictive power. Error bars show SEM. (B) Predictive power for T = 30 trials. All predictive power results are evaluated on testing set data.
Figure 6
 
(A) Convergence and overfitting analysis. Top shows the predictive power of both training (solid lines) and testing (dashed lines) sets for data of various lengths ranging from 20 to 3,000 spikes (200 to 15,000 bins). Bottom indicates the performance of the testing set, normalized by its maximal (steady-state) performance. All points in (A) and (B) were averaged over N = 30 trials. (B) Eigenvalue distribution for a sample system as amount of output spikes are increased. (C) Model robustness measured by predictive power as input noise is increased. % Input noise = (input noise power/total input power) × 100. Same layout as (A). (D) Recovered PDMs of sample system in Figure 1 when the input was composed of 60% noise. Error bars show in (A) and (C) show SD.
Figure 6
 
(A) Convergence and overfitting analysis. Top shows the predictive power of both training (solid lines) and testing (dashed lines) sets for data of various lengths ranging from 20 to 3,000 spikes (200 to 15,000 bins). Bottom indicates the performance of the testing set, normalized by its maximal (steady-state) performance. All points in (A) and (B) were averaged over N = 30 trials. (B) Eigenvalue distribution for a sample system as amount of output spikes are increased. (C) Model robustness measured by predictive power as input noise is increased. % Input noise = (input noise power/total input power) × 100. Same layout as (A). (D) Recovered PDMs of sample system in Figure 1 when the input was composed of 60% noise. Error bars show in (A) and (C) show SD.
Figure 7
 
(A) A correlated Gaussian input was put through sample system and thresholded as in Figure 1. (B–D) Results of three different correction techniques for I/O data of (A) (see text). (E and F) Predictive power of obtained kernels (E) and linear filters (F). (G) Projection f data onto the first two filters of filter set III is able to almost perfectly separate the data.
Figure 7
 
(A) A correlated Gaussian input was put through sample system and thresholded as in Figure 1. (B–D) Results of three different correction techniques for I/O data of (A) (see text). (E and F) Predictive power of obtained kernels (E) and linear filters (F). (G) Projection f data onto the first two filters of filter set III is able to almost perfectly separate the data.
Figure 8
 
Results of three different recovery methods onto correlated Gaussian images (A) and natural images (B). (i) Sample input. (ii–iv) Obtained filters of three different methods (see text). (v and vi) Predictive power of kernels and linear filter models, respectively. ROC = receiver operative curve. PDM = principal dynamic mode.
Figure 8
 
Results of three different recovery methods onto correlated Gaussian images (A) and natural images (B). (i) Sample input. (ii–iv) Obtained filters of three different methods (see text). (v and vi) Predictive power of kernels and linear filter models, respectively. ROC = receiver operative curve. PDM = principal dynamic mode.
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×