Free
Research Article  |   January 2002
Full identification of a linear-nonlinear system via cross-correlation analysis
Author Affiliations
Journal of Vision January 2002, Vol.2, 1. doi:https://doi.org/10.1167/2.1.1
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Duane Q. Nykamp, Dario L. Ringach; Full identification of a linear-nonlinear system via cross-correlation analysis. Journal of Vision 2002;2(1):1. https://doi.org/10.1167/2.1.1.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

A statistical model used extensively in vision research consists of a cascade of a linear operator followed by a static (memoryless) nonlinearity. Common applications include the measurement of simple-cell receptive fields in primary visual cortex and the modeling of human performance in various psychophysical tasks. It is well known that the front-end linear filter of the model can readily be recovered, up to a multiplicative constant, using reverse-correlation techniques. However, a full identification of the model also requires an estimation of the output nonlinearity. Here, we show that for a large class of static nonlinearities, one can obtain analytical expressions for the estimates. The technique works with both Gaussian and binary noise stimuli. The applicability of the method in physiology and psychophysics is demonstrated. Finally, the proposed technique is shown to converge much faster than the currently used linear-reconstruction method.

Introduction
A linear-nonlinear model consists of a cascade of a linear operator acting on the input in series with a static nonlinearity. In practice, the input (or stimulus) is a finite-dimensional vector, x, and the output (or response), r, can be either 0 or 1. In this case, the model can be written simply as  
(1)
where g(·) is a nonlinear function and the (column) vector w represents the linear filtering operation. This equation can be considered a one-term projection pursuit regression model (Friedman & Stuetzle, 1981). Therefore, the optimization techniques developed for projection pursuit regression can be used to fit w and g(·) to the data. In the laboratory, however, the researcher has full control of the stimulus; therefore, as shown below, more efficient methods for estimating the parameters of the model can be developed. 
Cross-correlation methods are commonly used to estimate the front-end linear filtering (the vector w) in this model (Marmarelis & Marmarelis, 1978; Jones & Palmer, 1987; Emerson, Korenberg, & Citron, 1989; DeAngelis, Ohzawa, & Freeman, 1993a; DeAngelis, Ohzawa, & Freeman, 1993b; Reid, Victor, & Shapley, 1997; Ringach, Sapiro, & Shapley, 1997; Anzai, Ohzawa, & Freeman, 1999). A simple geometrical proof of the technique can be found in Chichilnisky (2001). In addition to estimating w, a full characterization of the model also requires that we estimate the shape of the nonlinearity. 
The importance of estimating g(·) has been somewhat neglected in the past [however, see Anzai et al, (1999) and Chichilnisky (2001)]. This is surprising because it is well known that the tuning properties of a neuron (such as its orientation or spatial frequency bandwidth) do not depend solely on its linear kernel; they can also be influenced by a static nonlinearity in the spike generation mechanism, represented by g(·). Specifically, it has been suggested that both direction and orientation tuning can be sharpened significantly by thresholding or by an accelerating nonlinearity (Reid, Soodak, & Shapley, 1987; DeAngelis et al, 1993a,b; Anzai et al, 1999; Carandini & Ferster, 2000). Thus, estimating g(·) is clearly important when comparing the tuning properties of the model to that of the data. 
Similarly, cross-correlation methods are being used in visual psychophysics applications to recover the “classification images” used by subjects in two-alternative forced-choice (2AFC) tasks (Ahumada & Beard, 1998; Ringach, 1998; Gold, Murray, Bennett, & Sekuler, 2000). These methods show that performance is correlated, to some extent, with some physical attribute of the image. However, in order to quantitatively evaluate the fraction of human performance explained by the model, one needs to identify the static nonlinearity as well. 
One strategy to identify the nonlinearity is to perform a two-step analysis (Anzai et al, 1999; Chichilnisky, 2001). First, measure the linear kernel using reverse correlation, then generate a scatter-plot of the linear prediction (wTx against the actual response r and smooth it to obtain an estimate of the static nonlinearity. Here we show that for a large class of nonlinearities it is possible to write down a closed form solution for the estimates of the linear kernel and the static nonlinearity in one single step. The parameters of the nonlinearity are found by matching the input-output moments of the model to the data. We refer to this technique as the moment method. The calculation avoids computing the linear prediction altogether (which is a very time-consuming calculation) and can be updated recursively, as new data arrives. Furthermore, our results apply to both Gaussian and binary noise inputs. 
Model
Let each component of the input be an independent normal (Gaussian) random variable with standard deviation σ. Alternatively, let each component be a binary random variable that takes the values σ or −σ with equal probability. As shown in Appendix A, if the linear kernel is normalized so that ‖w‖ = 1, the average output is  
(2)
 
The correlation between the input and the output is proportional to the kernel w,  
(3)
where E{·} denotes expected value and the constant of proportionality is given by (Bussgang, 1952 
(4)
 
Thus, the linear filter w can be recovered from E{wr} because, Image not available, which is the usual reverse correlation result. 
Through Equation 2 and 4, the measurement of the mean output, Image not available, and the proportionality constant, C, impose constraints on the nonlinearity g(·). The proposed moment method uses these measurements, which correspond to the first moment of the output Image not available and the second input-output moment (cross-correlation), to specify the nonlinearity. For example, if one considers a class of nonlinearities specified by only two parameters, Equations 2 and 4 can be used to determine g(·). If, however, one considers a class of nonlinearities with more than two parameters, additional conditions must be provided to determine g(·). We demonstrate the procedure with some classes on nonlinearities used in both electrophysiology and psychophysics: half-rectification with threshold, a power law, an error function, and the Naka-Rushton equation. 
Half-Rectifier with Threshold Nonlinearity
A linear-nonlinear system with a threshold half-rectifier nonlinearity has been one of the classical models of simple cells (Movshon, Thompson, & Tolhurst, 1978; Tolhurst & Heeger, 1997; Carandini & Ferster, 2000). The nonlinearity is described by, Image not available, where [x]+ is a half-rectifier such that [x]+ = x if x if x > 0 and is zero otherwise. In this case (see Appendix A), it can be shown that  
(5)
 
(6)
where erfc(x = 1 − erf(x is the complementary error function and Image not available is the error function. The system represented by Equations 5 and 6 has two measured variables, Image not available and C, and two unknowns, A and y0. Thus, given the measurements, one can solve numerically the system to obtain estimates of A and y0
Power Law Nonlinearity
This is an important example because power laws are used to model the output nonlinearity in simple cortical cells (Emerson et al, 1989; Albrecht & Geisler, 1991; Heeger, 1992; Anzai et al, 1999). Furthermore, they have been shown to fit the data slightly better than half-rectification (Tolhurst & Heeger, 1997). If we assume g(y) = Ayβ, then, as shown in Appendix A, we obtain  
(7)
 
(8)
where the gamma function is given by Image not available. Once again, the system represented by Equations 7 and 8 has two measured variables, Image not available and C, and two unknowns, A and β. Thus, given the measurements one can solve numerically the system to obtain estimates of A and β. 
Error Function Nonlinearity
Another important case is when the nonlinearity can be represented by an error function:  
(9)
 
This class of nonlinearities is specified by three parameters: rmax, y0 and ɛ. In Appendix A we derive the following specific forms of Equations 2 and 4 for the error function nonlinearity:  
(10)
 
(11)
where Image not available. Because these equations provide only two constraints and the nonlinearity has three parameters, additional information is required to identify the system. One approach is to obtain additional constraints by estimating higher moments between the input and output of the system, which is discussed below. Another possibility is to begin by estimating rmax. For example, rmax can be taken as the reciprocal of the minimum inter-spike interval in the data record, which is the method used in this paper. Alternatively, the maximum output rate may be determined by the response to optimal stimuli from other experiments. In the case of 2AFC psychophysical experiments, rmax can be defined as 1. Thus, if we know rmax and measure Image not available and C, we can solve for y0 and ɛ analytically. 
Naka-Rushton Nonlinearity
A final example of a sigmoidal function commonly used in models is the Naka-Rushton (NR) equation,  
(12)
with parameters rmax, c, and n. Unfortunately, the integrals in Equation 2 and 4 do not have nice analytic expressions, but we can still solve numerically the system of integral equations,  
(13)
 
(14)
for c and n provided that rmax is estimated from some additional information as above. 
Results
To demonstrate the ability of the moment method to reconstruct the nonlinearity, we tested the model with two simulated experiments. First, we modeled the response of a human subject in a 2AFC psychophysical task. Second, we modeled the receptive field of a simple-cell in primary visual cortex. In each case, we calculated the mean response rate and correlation magnitude, using the method described in detail in Appendix B. To evaluate the performance of the method, the parameters of the simulated nonlinear system were compared with these estimated values. We also compared the performance of the proposed technique with variants of the linear reconstruction method. Anzai et al (1999) used a straightforward implementation of the linear reconstruction (or LR) method, whereas Chichilnisky (2001) implemented a cross-validated version of the technique (or CLR). Details about the implementations of these algorithms are described in Appendix C. To anticipate the results, we find that the moment method clearly outperforms the LR and CLR techniques. 
Psychophysical Task
We simulated the study of Ahumada & Beard (1998) who measured classification images in a Vernier acuity task. In our simulations, we assume a spatial kernel (the vector w) similar to those obtained in the actual measurements (Figure 1a) and an error function nonlinearity with rmax = 1, y0 = 0.5, ɛ = 1.0. In each trial of the experiment one of two possible stimuli, x0 or x1, is presented in the presence of additive white Gaussian noise with σ = 1. The subject is asked to identify the stimulus. We code the subject’s response as r = 0 for the x0 choice and = 1 for the x1 choice. The probability that the subject will select x1 is modeled as Image not available. If we pool all the trials in which only x1 was presented, we obtain Image not available (where n represents the noise in the stimulus). Similarly, for the trials in which only x0 was presented, we obtain Image not available. The “classification images” obtained by cross-correlation between response and stimulus in these two cases should be the same and proportional to w. Similarly, the nonlinearities should be identical up to a translation. In fact, these constraints could be used as a test for the validity of the model in a 2AFC task. 
Figure 1
 
The kernel used for the psychophysical task. The kernel used in the simulation (a). Red regions indicate areas where increases in luminance would result in an increase in the probability of reporting the x1 stimulus. Blue regions represent areas where increases in luminance would result in a decrease of the probability of reporting the x1 stimulus. b–c. The estimate of the kernel after 10,000 (b) and 100,000 (c) trials.
Figure 1
 
The kernel used for the psychophysical task. The kernel used in the simulation (a). Red regions indicate areas where increases in luminance would result in an increase in the probability of reporting the x1 stimulus. Blue regions represent areas where increases in luminance would result in a decrease of the probability of reporting the x1 stimulus. b–c. The estimate of the kernel after 10,000 (b) and 100,000 (c) trials.
We ran simulations that contained between 300 and 100,000 trials per stimulus. We selected one of the stimuli and calculated the linear kernel (or “classification image”) (Figure 1b and 1c) and the estimated error function nonlinearity (Figure 2 The nonlinearity calculations show that the proposed method converges faster than the LR/CLR methods (the result from the CLR method is not shown, but it was no better than the LR method). Our technique gives a reasonable approximation to the nonlinearity after only 500 trials, whereas the LR method yields a gross underestimate (Figure 2a). After 2400 trials, the moment method has determined the nonlinearity almost perfectly, but the LR method still underestimates (Figure 2b). After 75,000 trials, both methods have converged to the correct solution (Figure 2c). 
Figure 2
 
Results from the psychophysical simulations. a. Estimate of the nonlinearity after 500 trials. The solid green line is the estimated error function nonlinearity using the moment method, and the dashed blue line is the error function nonlinearity actually used in the simulations. Red diamonds show the nonlinearity calculated from the LR method. Error bars are one standard error and reflect error only from the second half of the reconstruction. Parameters obtained with error function estimate: rmax = 1, y0 = 0.29, and ɛ = 0.67. b. Estimates of the nonlinearity after 2400 trials. Estimated parameters: rmax = 1, y0 = 0.43, and ɛ = 0.97. c. Estimates of the nonlinearity after 75,000 trials. The estimated nonlinearity nearly coincides with the simulated nonlinearity and is barely visible. Estimated parameters: rmax = 1, y0 = 0.502, and ɛ = 1.02. d. Rate of convergence of the moment method to the simulated parameters. The relative errors are averages over 60 simulations. Relative errors are less than 10% after 2500 trials.
Figure 2
 
Results from the psychophysical simulations. a. Estimate of the nonlinearity after 500 trials. The solid green line is the estimated error function nonlinearity using the moment method, and the dashed blue line is the error function nonlinearity actually used in the simulations. Red diamonds show the nonlinearity calculated from the LR method. Error bars are one standard error and reflect error only from the second half of the reconstruction. Parameters obtained with error function estimate: rmax = 1, y0 = 0.29, and ɛ = 0.67. b. Estimates of the nonlinearity after 2400 trials. Estimated parameters: rmax = 1, y0 = 0.43, and ɛ = 0.97. c. Estimates of the nonlinearity after 75,000 trials. The estimated nonlinearity nearly coincides with the simulated nonlinearity and is barely visible. Estimated parameters: rmax = 1, y0 = 0.502, and ɛ = 1.02. d. Rate of convergence of the moment method to the simulated parameters. The relative errors are averages over 60 simulations. Relative errors are less than 10% after 2500 trials.
To study the convergence of the algorithm, we simulated 60 runs and estimated the average relative error of the parameters for increasing lengths of the data record, ranging from 300 to 100,000 trials. A summary of the average relative error in y0 and ɛ vs. length of the data record is shown in Figure 2d. After 2500 trials, the error is down to 10 percent, and it converges toward zero as the number of trials is increased further. 
Simple Cell
The second simulation was that of a simple cell in visual cortex. To mimic a cell’s response, we used a linear kernel obtained from cross-correlation analysis of actual experimental data. The kernel was discretized in 45 time slices with ΔT = 2 ms and by a 40 × 40 grid in space (see Figure 3a). In each 2-ms interval, the probability of a spike was given by Equation 1, and the nonlinearity was a power law with parameters A = 0.01 and β = 2. The input was binary noise taking the values 1 and −1, giving a standard deviation of σ = 1. 
Figure 3
 
Estimates of a time slice of the linear kernel. a. Time slice of the simulated kernel at 60 ms. b–c. Estimates of the time slice at 60 ms calculated after 30 simulated minutes (b) and 8 simulated hours (c) of data collection.
Figure 3
 
Estimates of a time slice of the linear kernel. a. Time slice of the simulated kernel at 60 ms. b–c. Estimates of the time slice at 60 ms calculated after 30 simulated minutes (b) and 8 simulated hours (c) of data collection.
We simulated the response of the simple cell to noise stimuli with experimental time ranging from 1 minute to 8 hours. For each case, we estimated the kernel (Figure 3b and 3c) and the power law nonlinearity (Figure 4). These simulations show the strength of the moment method with large kernels, such as the 40 × 40 × 45 kernel simulated here. Unlike the LR/CLR methods, our technique closely approximates the simulated nonlinearity after only 15 minutes, or less than 5000 spikes (Figure 4a). Even the CLR method required 8 simulated hours (more than 140,000 spikes) to converge. The bias in the LR method is large and is still substantial in the 8-hour simulation. 
Figure 4
 
Nonlinearity calculations from the simple-cell simulation. a. Estimated nonlinearity after 15 minutes of simulated time. The solid green line is the power-law estimate of the nonlinearity and the dashed blue line is the nonlinearity used in the simulation. The error bars of the LR (red) and CLR (black) represent one standard error and were calculated from the second step of the reconstruction. Parameters from the estimated nonlinearity: A = 0.00959 and β = 2.08. b. Estimated nonlinearity after 1 hour of simulated time. Estimated parameters: A = 0.00985 and β = 2.02. c. Estimated nonlinearity after 8 hours of simulated time. Estimated parameters: A = 0.00995 and β = 2.002. d. Rate of convergence of the moment method to the simulated parameters. Relative errors are less than 10% after 10 minutes of simulated time. In panels a–c, the y axis is in units of spikes per ms.
Figure 4
 
Nonlinearity calculations from the simple-cell simulation. a. Estimated nonlinearity after 15 minutes of simulated time. The solid green line is the power-law estimate of the nonlinearity and the dashed blue line is the nonlinearity used in the simulation. The error bars of the LR (red) and CLR (black) represent one standard error and were calculated from the second step of the reconstruction. Parameters from the estimated nonlinearity: A = 0.00959 and β = 2.08. b. Estimated nonlinearity after 1 hour of simulated time. Estimated parameters: A = 0.00985 and β = 2.02. c. Estimated nonlinearity after 8 hours of simulated time. Estimated parameters: A = 0.00995 and β = 2.002. d. Rate of convergence of the moment method to the simulated parameters. Relative errors are less than 10% after 10 minutes of simulated time. In panels a–c, the y axis is in units of spikes per ms.
The relatively poor performance of the CLR method can be explained by the noise in the estimate of a large kernel. If the estimate of w were extremely noisy, the linear prediction y = wTx would be mostly noise, and the CLR method would give a flat line at the mean rate. Figure 5b illustrates that the CLR results lie between the mean rate and the true nonlinearity after 15 minutes of data collection, as would be expected from a noisy estimate of y. As the simulation length increases, the results approach the correct nonlinearity. A more detailed discussion of the noise in the CLR methods, as well as a discussion on the bias in the LR method, can be found in Appendix C. 
Figure 5
 
a. Estimated error function (red line), power law (green line), and Naka-Rushton (cyan line) nonlinearities to one simulated hour of data generated by a power law function (dashed blue line). The y axis is in units of spikes per ms. b. Consequences of a noisy estimate of w on the shape of the nonlinearity obtained by the CLR method. The dotted line represents the mean output rate. As expected, the estimate lies between this line and the simulated nonlinearity. Data are from 15 minutes of simulated time.
Figure 5
 
a. Estimated error function (red line), power law (green line), and Naka-Rushton (cyan line) nonlinearities to one simulated hour of data generated by a power law function (dashed blue line). The y axis is in units of spikes per ms. b. Consequences of a noisy estimate of w on the shape of the nonlinearity obtained by the CLR method. The dotted line represents the mean output rate. As expected, the estimate lies between this line and the simulated nonlinearity. Data are from 15 minutes of simulated time.
Figure 4d shows the rate of convergence of the estimated nonlinearity as a function of simulated time between 1 minute and 8 hours. For each data point of 1 hour or less, we computed 12 simulations and calculated the average relative error in the nonlinearity parameters. For longer trials, the relative error estimates were based on four simulations of 2 hours, two simulations of 4 hours, and one simulation of 8 hours. After only 10 minutes of simulation (around 3000 spikes), the estimated parameters of the power law are within 10 percent of the actual parameters. 
To obtain the above results, we assumed we knew the nonlinearity was a power law. Normally, the investigator does not necessarily know in advance the shape of the nonlinearity. A natural question is what would happen if we were to fit other nonlinearities to data generated by a power law function. Figure 5a shows results of the moment method if we assumed a power law, error function, or Naka-Rushton nonlinearity. The different estimates roughly agree for small values of y but diverge as y increases. Thus, confidence in any extrapolation to large y values requires precise knowledge of the shape of the nonlinearity. Because the LR and CLR methods do not make any assumptions about the shape of the nonlinearity, they could be used to determine the proper class of nonlinearities to use in the moment method. 
The moment method is a parametric technique that assumes a particular class of nonlinearities. On the other hand, the LR and CLR techniques are nonparametric algorithms. A natural question is: would the performance of the two techniques be similar if we added information about the class of relevant nonlinearities to the LR/CLR methods. This could be done by fitting a parametric nonlinear function to the scatter-plot of the linear prediction versus the actual response in the LR/CLR algorithms. In the simulation of the psychophysics experiment (Figure 2), we fit an error function, and for the simulation of the simple-cell receptive field (Figure 4), we fit a power-law. Figure 6 shows the maximum relative error of the parameters obtained from these fits as well as the parameters from the moment method. In all cases, the moment method is dramatically superior to the LR/CLR methods. The better performance of the LR method than the CLR method in Figure 6a is understandable because removing bias can sometimes decrease accuracy. The erratic behavior of the LR results in Figure 6b is due its deviation from a power-law shape (as seen in Figure 4a and b). 
Figure 6
 
Rates of convergence of the methods to estimate the nonlinearity parameters. Relative errors in the moment method (green line), LR method (red line), and CLR method (black line) are plotted versus simulation length. a. Relative error of fitted error function parameters in the psychophysical task simulation. The maximum of the relative error in y0 and the relative error in ɛ is shown. Relative errors are less than 10% after 2500 trials for the moment method and after 40,000 to 50,000 trials for the LR and CLR methods. Results are from the same simulations used for Figure 2d. b. Relative error of fitted power law parameters in the simple cell simulation. The maximum of the relative error in A and the relative error in β is shown. Relative errors are less than 10% after 10 minutes of simulated time for the moment method and after 8 hours for the CLR method. The LR results were not fit well by a power law after less than 2 hours of simulated time; thus, the relative error measurements for those simulations have little meaning. Even after 8 hours, the relative error for the LR method was around 30%. Results are from the same simulations used for Figure 4d.
Figure 6
 
Rates of convergence of the methods to estimate the nonlinearity parameters. Relative errors in the moment method (green line), LR method (red line), and CLR method (black line) are plotted versus simulation length. a. Relative error of fitted error function parameters in the psychophysical task simulation. The maximum of the relative error in y0 and the relative error in ɛ is shown. Relative errors are less than 10% after 2500 trials for the moment method and after 40,000 to 50,000 trials for the LR and CLR methods. Results are from the same simulations used for Figure 2d. b. Relative error of fitted power law parameters in the simple cell simulation. The maximum of the relative error in A and the relative error in β is shown. Relative errors are less than 10% after 10 minutes of simulated time for the moment method and after 8 hours for the CLR method. The LR results were not fit well by a power law after less than 2 hours of simulated time; thus, the relative error measurements for those simulations have little meaning. Even after 8 hours, the relative error for the LR method was around 30%. Results are from the same simulations used for Figure 4d.
Discussion
The moment method is a fast and accurate method to identify a linear-nonlinear system. In this method, we estimate the linear filter using cross-correlation, and identify the output nonlinearity by matching the mean response and cross-correlation amplitude to the data. The method is fast because it does not require the calculation of a linear prediction to estimate the static nonlinearity. Simulations demonstrate that the rate of convergence of the moment technique is faster than the LR and CLR methods. The advantage of the moment method is particularly evident when the kernels are large, as in the simulation of the simple cell (Figure 4). Here, the moment method obtains a reasonable estimate of the nonlinearity after 15 minutes, whereas the LR/CLR estimates are inaccurate for this amount of data. Another benefit of the technique is that estimates can be updated recursively as new data arrives. This provides a means to stop an experiment when a sufficient signal-to-noise ratio is achieved. We note that both the moment method and the LR/CLR methods are dependent on the assumption of a linear-nonlinear system as described by Equation 1. Thus, these methods will give accurate results only to the extent that a system is well represented by this lumped phenomenological model. 
As mentioned above, one advantage of the LR/CLR methods is that no assumptions are made about the shape of the nonlinearity. This observation suggests using the LR/CLR methods initially when studying a new system. These experiments will be time consuming, as large amounts of data are required for accurate estimates of g(·). However, once a family of parametric functions is found to fit the measured nonlinearities, the more efficient moment method can be used to conduct further experiments. 
Several nonlinearities of interest are parameterized by more than two variables. In these cases, additional conditions must be provided to fully determine g(·). One possibility is to obtain independent information about the remaining parameters (as done with rmax above). In the present study, we calculate two moments to estimate a two-parameter nonlinearity. We are currently investigating the possibility of calculating higher order moments to constrain the additional parameters. 
Acknowledgments
We thank E. J. Chichilnisky and J. D. Victor for helpful comments and criticisms on an earlier version of this paper. This work was supported by a NSF Mathematical Sciences Postdoctoral Research Fellowship (D.Q.N) and by NSF-IBN-9720305 and NIH-EY-12816 (D.L.R). 
Appendix
Appendix A. Derivation of Equations for Nonlinearity
In this appendix, we derive the equations for the mean response Image not available and correlation E{x} for the linear-nonlinear system in response to a discrete approximation of Gaussian white noise. In this derivation, we ignore the distinction between space and time and assume that the probability of a response is given by  
(A1)
 
Equation 1 where each component of x is a normal random variable with mean zero and standard deviation σ. Let N be the dimension of w and x, thus x has the probability density function  
(A2)
 
We derive Equation 2 for the mean response Image not available:  
(A3)
 
In the derivation, we let u = Ox, where O is any orthogonal matrix whose first row is wT. (Recall that w is a unit vector.) Then wTx is the first component of u, which we replaced by y
To derive an expression for the input-output correlation  
(A4)
we first look at  
(A5)
Bj = 0 j ≠ 1 because Bj is an odd integral in this case. The first component of B is  
(A6)
where we integrated by parts to obtain the last expression. Thus,  
(A7)
where e1 is the first unit vector. The input-output correlation is then  
(A8)
where  
(A9)
 
This result is known as Bussgang’s Theorem (Bussgang, 1952). 
Half-rectifier with threshold nonlinearity
If Image not available, then  
(A10)
 
which is Equation 6. Also,  
(A11)
which is Equation 5
Power law nonlinearity
If Image not available, then  
(A12)
which is Equation 7. Also,  
(A13)
which is Equation 8
Error function nonlinearity
Let the nonlinearity be  
(A14)
where erf(x) is the error function Image not available. The derivative of g(y) is   Then, the mean response is  
(A15)
where Image not available and Image not available. We integrated by parts in the second step. To obtain the last step, we computed a change of variables with one of the variables parallel to the line u = (y + y0)/(σ√2) and completed the square in the resulting integrand. We have thus obtained Equation 10
The magnitude of the input-output correlation is obtained by a completion of the square:  
(A16)
 
Non-Gaussian input
One challenge in using the reverse-correlation technique is getting enough data for a reasonable signal-to-noise ratio. Typically, one must run long experiments in order to obtain good results. This problem is exacerbated with physiological experiments because cells do not respond well to a white noise stimulus. Thus, one would like to develop ways to increase the output rate. 
If g′(x) ≥ 0, the output rate increases with the standard deviation σ of the input [to see this, integrate Equation 2 by parts]. Therefore, one would like to maximize the standard deviation of the input. However, physical devices, like computer monitors, have limited dynamic ranges. The discrete approximation to Gaussian white noise as described above will have a relatively small σ given restraints on the magnitude of the stimulus. We thus extended the theory to more general forms of noise stimuli that increase the stimulus standard deviation. 
Let M denote the maximum stimulus magnitude that could be produced by an input device, ie, each component of the stimulus must be in the interval [−M, M]. In that case, a preferable input would be one where each component of the stimulus was a binary random variable that took on the values M and −M with equal likelihood. This input would have a standard deviation of M
Generalizations of the central limit theorem permit the use of our reconstruction method even with this input. Because ‖w‖ = 1, the sum Image not available converges in distribution to a normal random variable with standard deviation M as the number of components increases (for example, see the discussion on page 700 of Port 1994). This convergence is valid as long as the sum isn’t dominated by one component of w. Because the argument of the nonlinearity g(x) would be approximately a normal random variable with σ = M, the output rate should closely match the white noise case. 
Furthermore, to compute E{xr} one really computes an average of the input conditioned on the presence of an output event [Equation B1]. The central limit theorem states that this average converges in distribution to a normal random variable. The results for non-Gaussian input, such as binary input, should thus closely approximate the above results. Numerical simulations showed virtually the same results for both types of input. 
Appendix B. Computation of Correlation Magnitude
To compute the magnitude of the input-output correlation,  
(B1)
 
This average stimulus is simply the input-output correlation divided by the mean response rate:  
(B2)
We used Bayes’ Theorem to complete the third step and Equations 1 and A2 in the fourth step. Image not available is the probability density function of x conditioned on r = 1. The mean output rate is simply Image not available. The correlation magnitude is thus  
(B3)
 
There is a subtlety regarding how one should calculate Image not available to reduce the bias in the estimate of C. If Image not available is a noisy estimate of p, then Image not available overestimates the magnitude of Image not available. Thus, a naïve calculation of C will overestimate the input-output correlation. 
Assume Image not available is an unbiased estimator of p so that Image not available. Let pj be the jth component of p. If δpj is the standard deviation of Image not available, then  
(B4)
and  
(B5)
Thus, an unbiased estimate of Image not available 
(B6)
 
To calculate δp, we divide the results into multiple trials and calculate an approximation to p for each trial. We let Image not available be the average of these values and δp be the sample standard deviation of the mean. We then use B6 and B3 to calculate the input-output correlation. 
The bias with the naïve estimate of the input-output correlation increases with the size of the kernel. The magnitude ‖p‖ is fixed for a given nonlinearity, but ‖δp‖ increases with the number of components. Thus, the bias in the naïve estimate of C was much larger with the 40 × 40 ×45 kernel of the simple cell than with the 32 × 32 kernel of the psychophysical task. 
Calculations with only small amounts of data can be so noisy that Image not available. In that case, ‖p‖ is the square root of a negative number, and we cannot compute an estimate of the nonlinearity. Thus, in Figure 2d we could not include the results for a few simulations with 500 or fewer trials. Similarly, we could not include one simulation that was a minute long in Figure 4d
Appendix C: Linear Prediction Method
For a benchmark by which to evaluate the moment method, we also calculated the static nonlinearity with a linear reconstruction (LR) method similar to that employed by Anzai et al (1999) and Chichilnisky (2001). This two-step method involves first calculating the linear kernel w via reverse correlation and then calculating the average response as a function of the linear prediction y = wTx. In the latter calculation, we discretized y into intervals, using narrower intervals around zero and wider intervals away from zero, because y will have a normal distribution centered around zero. For each interval, we calculated the mean and standard error of the response if the interval contained three or more data points. 
We present results from both the LR method and a cross-validated implementation of the linear-prediction method (CLR). In the LR method, we used all the data for both parts of the calculation. In the CLR method, we divided the data into N parts, where N ≈ 200. We calculated N-1 versions of the linear kernel, where each version used all the data except two consecutive parts. Thus, each version of the kernel would be based on approximately 99% of the data. We then used all the data to calculate the average response as a function of the linear prediction. The key of the CLR method was to switch kernels used for the linear prediction y = wTx so that the same data was never used simultaneously for both the linear kernel estimate w and the corresponding stimulus x. Thus, the CLR method removes the bias in y = wTx created from covariance between the stimulus x and the noisy estimate of w
The bias in y = wTx is similar to the bias in ‖p2 = pTp discussed above. As with ‖p2, the bias in wTx increases with the size of the kernel. Thus, for small kernels, the CLR method may not give better results than the LR method because sometimes removing bias can increase the error. The bias with large kernels, on the other hand, may greatly skew the results of the LR method. In this case, the CLR method will be significantly more accurate. For the small 32 × 32 kernel of the psychophysical simulations, the bias in the LR method did give an underestimate of the nonlinearity (Figure 2a,2b), but the bias was small. As shown in Figure 6a, the LR method outperformed the CLR method in this case. However, the large 40 × 40 × 45 kernel of the simple-cell simulations produced significant bias in the LR method even after 8 hours (Figure 4). In this case, the CLR method was necessary to give accurate results. 
Although the CLR method removes the bias caused by covariance between the stimulus and the kernel estimate, it is still sensitive to noise in the estimate of w, as discussed in connection with Figure 5b. When w is high dimensional, small noise in each component of w can build up to make the unbiased estimate of y = wTx very noisy, as observed in the simple-cell simulations. 
References
Ahumada, A. J. Jr. Beard, B.L. (1998). Response classification images in vernier acuity. Investigative Ophthalmology and Visual Science, 39, S1109.
Albrecht, D. G. Geisler, W. S. (1991). Motion selectivity and the contrast-response function of simple cells in the visual cortex. Visual Neuroscience, 7, 531–546. [PubMed] [CrossRef] [PubMed]
Anzai, A. Ohzawa, I. Freeman, R. D. (1999). Neural mechanisms for processing binocular information. I. Simple cells. Journal of Neurophysiology, 82, 891–908. [PubMed] [PubMed]
Bussgang, J.J. (1952). Crosscorrelation functions of amplitude distorted Gaussian signals. Technical Report 216. Boston: Massachusetts Institute of Technology Research Laboratory of Electronics.
Carandini, M. Ferster, D. (2000). Membrane potential and firing rate in cat primary visual cortex. Journal of Neuroscience, 20, 470–484. [PubMed] [PubMed]
Chichilnisky, E. J. (2001). A simple white noise analysis of neural light responses. Network, 12, 199–213. [PubMed] [CrossRef] [PubMed]
DeAngelis, G. C. Ohzawa, I. Freeman, R. D. (1993). Spatiotemporal organization of simple-cell receptive fields in the cat's striate cortex. I. General characteristics and postnatal development. Journal of Neurophysiology, 69, 1091–1117. [PubMed] [PubMed]
DeAngelis, G. C. Ohzawa, I. Freeman, R. D. (1993). Spatiotemporal organization of simple-cell receptive fields in the cat’s striate cortex. II. Linearity of temporal and spatial summation. Journal of Neurophysiology, 69, 1118–1135. [PubMed] [PubMed]
Emerson, R. C. Korenberg, M. J. Citron, M. C. (1989). Identification of intensive nonlinearities in cascade models of visual cortex and its relation to cell classification. In V.Z., Marmarelis (Ed.), Advanced Methods of Physiological System Modeling (pp. 97–111). New York: Plenum 1989.
Friedman, J. H. Stuetzle, (1981). Projection pursuit regression. Journal of the American Statistical Association 76, 817–823. [CrossRef]
Gold, J. M. Murray, R.F. Bennett, P. J. Sekuler, A. B. (2000). Deriving behavioural receptive fields for visually completed contours. Current Biology, 10, 663–666. [PubMed] [CrossRef] [PubMed]
Heeger, D. J. (1992). Normalization of cell responses in cat striate cortex. Visual Neuroscience, 9, 181–197. [PubMed] [CrossRef] [PubMed]
Jones, J. P. Palmer, L. A. (1987). An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex. Journal of Neurophysiology, 58, 1233–1258. [PubMed] [PubMed]
Marmarelis, P. N. Marmarelis, V. Z. (1978). Analysis of physiological systems: The white noise approach. New York: Plenum Press.
Movshon, J. A. Thompson, I. D. Tolhurst, D. J. (1978). Spatial summation in the receptive fields of simple cells in the cat’s striate cortex. Journal of Physiology (London), 283, 53–77. [PubMed] [CrossRef]
Port, S. C. (1994). Theoretical Probability for Applications. New York: John Wiley & Sons.
Reid, R. C. Soodak, R. E. Shapley, R. M. (1987). Linear mechanisms of directional selectivity in simple cells of cat striate cortex. Proceedings of the National Academy of Sciences of the United States of America, 84, 8740–8744. [PubMed] [CrossRef] [PubMed]
Reid, R. C. Victor, J. D. Shapley, R. M. (1997). The use of m-sequences in the analysis of visual neurons: Linear receptive field properties. Visual Neuroscience 14, 1015–1027. [PubMed] [CrossRef] [PubMed]
Ringach, D. L. Sapiro, G Shapley, R. (1997). A subspace reverse-correlation technique for the study of visual neurons. Vision Research 37, 2455–2464. [PubMed] [CrossRef] [PubMed]
Ringach, D. L. (1998). Tuning of orientation detectors in human vision. Vision Research 38, 963–972. [PubMed] [CrossRef] [PubMed]
Tolhurst, D. J. Heeger, D. J. (1997). Comparison of contrast-normalization and threshold models of the responses of simple cells in cat striate cortex. Visual Neuroscience, 14, 293–309. [PubMed] [CrossRef] [PubMed]
Figure 1
 
The kernel used for the psychophysical task. The kernel used in the simulation (a). Red regions indicate areas where increases in luminance would result in an increase in the probability of reporting the x1 stimulus. Blue regions represent areas where increases in luminance would result in a decrease of the probability of reporting the x1 stimulus. b–c. The estimate of the kernel after 10,000 (b) and 100,000 (c) trials.
Figure 1
 
The kernel used for the psychophysical task. The kernel used in the simulation (a). Red regions indicate areas where increases in luminance would result in an increase in the probability of reporting the x1 stimulus. Blue regions represent areas where increases in luminance would result in a decrease of the probability of reporting the x1 stimulus. b–c. The estimate of the kernel after 10,000 (b) and 100,000 (c) trials.
Figure 2
 
Results from the psychophysical simulations. a. Estimate of the nonlinearity after 500 trials. The solid green line is the estimated error function nonlinearity using the moment method, and the dashed blue line is the error function nonlinearity actually used in the simulations. Red diamonds show the nonlinearity calculated from the LR method. Error bars are one standard error and reflect error only from the second half of the reconstruction. Parameters obtained with error function estimate: rmax = 1, y0 = 0.29, and ɛ = 0.67. b. Estimates of the nonlinearity after 2400 trials. Estimated parameters: rmax = 1, y0 = 0.43, and ɛ = 0.97. c. Estimates of the nonlinearity after 75,000 trials. The estimated nonlinearity nearly coincides with the simulated nonlinearity and is barely visible. Estimated parameters: rmax = 1, y0 = 0.502, and ɛ = 1.02. d. Rate of convergence of the moment method to the simulated parameters. The relative errors are averages over 60 simulations. Relative errors are less than 10% after 2500 trials.
Figure 2
 
Results from the psychophysical simulations. a. Estimate of the nonlinearity after 500 trials. The solid green line is the estimated error function nonlinearity using the moment method, and the dashed blue line is the error function nonlinearity actually used in the simulations. Red diamonds show the nonlinearity calculated from the LR method. Error bars are one standard error and reflect error only from the second half of the reconstruction. Parameters obtained with error function estimate: rmax = 1, y0 = 0.29, and ɛ = 0.67. b. Estimates of the nonlinearity after 2400 trials. Estimated parameters: rmax = 1, y0 = 0.43, and ɛ = 0.97. c. Estimates of the nonlinearity after 75,000 trials. The estimated nonlinearity nearly coincides with the simulated nonlinearity and is barely visible. Estimated parameters: rmax = 1, y0 = 0.502, and ɛ = 1.02. d. Rate of convergence of the moment method to the simulated parameters. The relative errors are averages over 60 simulations. Relative errors are less than 10% after 2500 trials.
Figure 3
 
Estimates of a time slice of the linear kernel. a. Time slice of the simulated kernel at 60 ms. b–c. Estimates of the time slice at 60 ms calculated after 30 simulated minutes (b) and 8 simulated hours (c) of data collection.
Figure 3
 
Estimates of a time slice of the linear kernel. a. Time slice of the simulated kernel at 60 ms. b–c. Estimates of the time slice at 60 ms calculated after 30 simulated minutes (b) and 8 simulated hours (c) of data collection.
Figure 4
 
Nonlinearity calculations from the simple-cell simulation. a. Estimated nonlinearity after 15 minutes of simulated time. The solid green line is the power-law estimate of the nonlinearity and the dashed blue line is the nonlinearity used in the simulation. The error bars of the LR (red) and CLR (black) represent one standard error and were calculated from the second step of the reconstruction. Parameters from the estimated nonlinearity: A = 0.00959 and β = 2.08. b. Estimated nonlinearity after 1 hour of simulated time. Estimated parameters: A = 0.00985 and β = 2.02. c. Estimated nonlinearity after 8 hours of simulated time. Estimated parameters: A = 0.00995 and β = 2.002. d. Rate of convergence of the moment method to the simulated parameters. Relative errors are less than 10% after 10 minutes of simulated time. In panels a–c, the y axis is in units of spikes per ms.
Figure 4
 
Nonlinearity calculations from the simple-cell simulation. a. Estimated nonlinearity after 15 minutes of simulated time. The solid green line is the power-law estimate of the nonlinearity and the dashed blue line is the nonlinearity used in the simulation. The error bars of the LR (red) and CLR (black) represent one standard error and were calculated from the second step of the reconstruction. Parameters from the estimated nonlinearity: A = 0.00959 and β = 2.08. b. Estimated nonlinearity after 1 hour of simulated time. Estimated parameters: A = 0.00985 and β = 2.02. c. Estimated nonlinearity after 8 hours of simulated time. Estimated parameters: A = 0.00995 and β = 2.002. d. Rate of convergence of the moment method to the simulated parameters. Relative errors are less than 10% after 10 minutes of simulated time. In panels a–c, the y axis is in units of spikes per ms.
Figure 5
 
a. Estimated error function (red line), power law (green line), and Naka-Rushton (cyan line) nonlinearities to one simulated hour of data generated by a power law function (dashed blue line). The y axis is in units of spikes per ms. b. Consequences of a noisy estimate of w on the shape of the nonlinearity obtained by the CLR method. The dotted line represents the mean output rate. As expected, the estimate lies between this line and the simulated nonlinearity. Data are from 15 minutes of simulated time.
Figure 5
 
a. Estimated error function (red line), power law (green line), and Naka-Rushton (cyan line) nonlinearities to one simulated hour of data generated by a power law function (dashed blue line). The y axis is in units of spikes per ms. b. Consequences of a noisy estimate of w on the shape of the nonlinearity obtained by the CLR method. The dotted line represents the mean output rate. As expected, the estimate lies between this line and the simulated nonlinearity. Data are from 15 minutes of simulated time.
Figure 6
 
Rates of convergence of the methods to estimate the nonlinearity parameters. Relative errors in the moment method (green line), LR method (red line), and CLR method (black line) are plotted versus simulation length. a. Relative error of fitted error function parameters in the psychophysical task simulation. The maximum of the relative error in y0 and the relative error in ɛ is shown. Relative errors are less than 10% after 2500 trials for the moment method and after 40,000 to 50,000 trials for the LR and CLR methods. Results are from the same simulations used for Figure 2d. b. Relative error of fitted power law parameters in the simple cell simulation. The maximum of the relative error in A and the relative error in β is shown. Relative errors are less than 10% after 10 minutes of simulated time for the moment method and after 8 hours for the CLR method. The LR results were not fit well by a power law after less than 2 hours of simulated time; thus, the relative error measurements for those simulations have little meaning. Even after 8 hours, the relative error for the LR method was around 30%. Results are from the same simulations used for Figure 4d.
Figure 6
 
Rates of convergence of the methods to estimate the nonlinearity parameters. Relative errors in the moment method (green line), LR method (red line), and CLR method (black line) are plotted versus simulation length. a. Relative error of fitted error function parameters in the psychophysical task simulation. The maximum of the relative error in y0 and the relative error in ɛ is shown. Relative errors are less than 10% after 2500 trials for the moment method and after 40,000 to 50,000 trials for the LR and CLR methods. Results are from the same simulations used for Figure 2d. b. Relative error of fitted power law parameters in the simple cell simulation. The maximum of the relative error in A and the relative error in β is shown. Relative errors are less than 10% after 10 minutes of simulated time for the moment method and after 8 hours for the CLR method. The LR results were not fit well by a power law after less than 2 hours of simulated time; thus, the relative error measurements for those simulations have little meaning. Even after 8 hours, the relative error for the LR method was around 30%. Results are from the same simulations used for Figure 4d.
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×