Reverse correlation techniques have been extensively used in physiology (Marmarelis & Marmarelis 1978; Sakai, Naka, & Korenberg, 1988), allowing characterization of both linear and nonlinear aspects of neuronal processing (e.g., Emerson, Bergen, & Adelson, 1992; Emerson & Citron 1992). Over the past decades, Ahumada (1996) developed a psychophysical reverse correlation technique, termed noise image classification (NIC), for deriving the linear properties of sensory filters in the context of audition first (Ahumada, 1967; Ahumada, Marken, & Sandusky, 1975), and then vision (Ahumada, 1996, 2002; Beard & Ahumada, 1998). This work explores ways of characterizing nonlinear aspects of psychophysical filters. One approach consists of an extension of the NIC technique (ExtNIC), whereby second-order (rather than just first-order) statistics in the classified noise are used to derive sensory kernels. It is shown that, under some conditions, this procedure yields a good estimate of second-order kernels. A second, different approach is also considered. This method uses functional minimization (fMin) to generate kernels that best simulate psychophysical responses for a given set of stimuli. Advantages and disadvantages of the two approaches are discussed. A mathematical appendix shows some interesting facts: (1) that nonlinearities affect the linear estimate (particularly target-present averages) obtained from the NIC method, providing a rationale for some related observations made by Ahumada (1967); (2) that for a linear filter followed by a static nonlinearity (LN system), the ExtNIC estimate of the second-order nonlinear kernel is correctly null, provided the criterion is unbiased; (3) that for a biased criterion, such an estimate may contain predictable modulations related to the linear filter; and (4) that under certain assumptions and conditions, ExtNIC does return a correct estimate for the second-order nonlinear kernel.

*I*, such as a visual stimulus, into a binary number: where

*o*is 0 for “no” and 1 for “yes,”

*I*is an input function defined over an

*n-dimensional*space (e.g.,

*x*,

*y*,

*t*),

*F*is a functional mapping I into a real number, and

*thr*

_{c}(

*x*) is a function that returns 0 when

*x*<

*c*, and 1 when

*x*≥

*c*. This work considers only yes/no tasks (a brief explanation of how the considerations made here can be extended to two-alternative forced-choice [2AFC] tasks is given in the next section). To simplify notation,

*I*is defined over one dimension only (

*x*).

**L**

_{n}are the system’s kernels, and

*V*

_{n}are outcomes of filtering stimulus

*I*with these kernels.1 Volterra kernels are intrinsically symmetric. Volterra showed that this representation applies to nonlinear systems that are time-invariant, with finite memory, and analytic (Volterra, 1959).

**L**

_{2},

**L**

_{3},…,

**L**

_{n}) to characterize it. This work will focus only on second-order kernels, but can be easily extended to higher orders.

*L*

_{2}dictates how pairs of input pulses interact in the system, that is whether pairs of inputs need to covary positively or negatively (or not at all) in order to drive a positive response from the system. The natural extension of the NIC technique to deriving this nonlinear kernel is to compute, instead of the first-order statistics associated with the classified noise, its second-order statistics (e.g., Marmarelis & Marmarelis, 1978), in terms of second-order moments or covariance matrices. In Section 2 and in the “Appendix,” it is shown that this method does return a good estimate of psychophysical second-order kernels.

*I*

_{i}(where

*i*refers to the

*i*

^{th}trial), the difference between the experimentally determined output sequence

*o*

_{i}

^{exp}and one (

*o*

_{i}) computed using an equation similar to (1). The minimization is first carried out for a linear system, determining the linear kernel

*W*

_{1}that accounts for most of the output. The system in Equation 1 is then upgraded to second-order, and the best second-order kernel

*W*

_{2}is similarly determined. In general (i.e., when input is arbitrary), every time the system is upgraded to a higher order, one has to correct for the fit deriving from lower orders (this is explained in Section 3).

*I*

_{i}

^{[s,o]}is the

*i*

^{th}stimulus image associated with stimulus

*s*(where

*s*= 0 is noise-only,

*s*= 1 is target+noise) and response

*o*.

**L**

_{1}

^{est}of the system’s linear kernel

*L*

_{1}is (Ahumada, 1996): where (

*E*(

*x*) being the expectation of

*x*across trials). The “Appendix” provides a formal derivation of how this estimate (target-present averages in particular) is affected by a second-order kernel

**L**

_{2}.

*L*

_{2}

^{est}is correctly null.

**L**

_{1}can create spurious modulations in

**L**

_{2}

^{est}when the criterion is biased, and these modulations are of the type

**L**

_{1}(ν)·

**L**

_{1}(ξ) (see “Appendix”). Covariance partly compensates for these effects by subtracting a similar term (this relates to the difference between solid squares and dashed line in the top right panel of Figure 3b; see Section 4).

*L*

_{2}

^{[s,o]}images for the different stimulus-response classes is, therefore, informative as to whether the final modulations observed in

*L*

_{2}

^{est}are possibly due to LN filtering or not. For example, a positive modulation (with respect to baseline noise variance) along the diagonal in an individual response class (in cov

^{[s,o]}) cannot be due to LN filtering, and would require further investigation.

*L*

_{2}

^{est}that are not localized at the level of

**L**

_{1}(ν)·

**L**

_{1}(ξ), as those too cannot be due to repercussions from LN filtering. In general, it is important that the experimenter examines

*L*

_{2}

^{[s,o]}’s and

*L*

_{1}

^{[s,o]}’s carefully and decides whether, in the specific context of the experiment being carried out, these are informative.

*c*. Noise image intensities were uniformly distributed around 0 (spanning −1 to 1), the size of images was 7, and the target was the vector [0 0 0 1 0 0 0] (added on half trials). A total of 5,000 trials were run.

*I*

_{i}is the

*i*

^{th}input image, and

*o*

_{i}

^{exp}the

*i*

^{th}response from the observer (0 for “no,” 1 for “yes”). The fMin method works by computing the best linear kernel first, using a Volterra representation of the system that extends only up to the linear order:

*min*

_{NoTrialsPerBlock}is the best minimization obtained by allowing criterion

*c*in Equation 6 to vary every

*NoTrialsPerBlock*number of trials, and

*D*(

*x*

_{i},

*y*

_{i}) for

*i*ranging from 1 to

*n*(and

*x*and

*y*binary) is , where (

*x*

_{i}≠

*y*

_{i}) returns 1 if

*x*

_{i}≠

*y*

_{i}and 0 if

*x*

_{i}=

*y*

_{i}(the choice of criterion variation and distance measure

*D*offered here are not the only ones that are possible; e.g., one may allow

*c*to vary according to a prespecified probability distribution without optimizing its value every

*NoTrialsPerBlock*number of trials, and

*D*could be chosen to be the sum of square differences).

*o*

_{i}matches

*o*

_{i}

^{exp}will be affected by the choice of

*c*in Equation 6. It seems reasonable that

*c*should be allowed to vary, as is the case in the psychophysics (over a large number of trials, it is expected that observers’ criteria fluctuate a bit). How often this should happen will depend on the specifics of the experiment.

*c*should not be allowed to vary too often: in the extreme case of

*NoTrialsPerBlock*= 1, minimizing Equation 6 becomes meaningless, as on each trial

*i*there will always be a value for

*c*that returns

*o*

_{i}=

*o*

_{i}

^{exp}. A reasonable value for

*NoTrialsPerBlock*could be, for example, 200. This means that the minimization in Equation 7 would be carried out computing the threshold step in Equation 6 for an optimal choice of

*c*every 200 trials.

*W*

_{1}has already been computed; what needs to be determined are an adjusting linear kernel

*W*

_{1}

^{adj}, and a second-order kernel

*W*

_{2}(Victor, 1992): these are obtained adopting the same minimization procedure as before. The best approximating second-order kernel

*W*

_{2}is the second-order kernel

*L*

_{2}that minimizes the difference between

*o*

_{i}

^{exp}and

*o*

_{i}:

**L**

_{2}, one also has to minimize for

*L*

_{1}

^{adj}, obtaining both

*W*

_{2}and

*W*

_{1}

^{adj}. The best approximating first-order kernel, however, is still the one computed before,

*W*

_{1}, not the correcting one (

*W*

_{1}

^{adj}) introduced here (see below for an explanation).

*y*=

*f*(

*x*) within a specified interval [

*a*,

*b*] using a polynomial expansion of

*x*, ; say

*f*(

*x*) =

*x*

^{2}, and the interval [0,1]. If we approximate this with a zeroth-order expansion

*f**=

*a*

_{0}, the best

*a*

_{0}(in the least mean square difference sense) is

*w*

_{0}= 1/3. At the first-order,

*f**(

*x*) =

*a*

_{0}+

*a*

_{1}

*x*, we have that the best approximation is for

*a*

_{0}= −1/6, and

*a*

_{1}= 1. We write this as

*f**(

*x*) =

*w*

_{0}+

*w*

_{0,1}

^{adj}+

*w*

_{1}

*x*, with

*w*

_{0,1}

^{adj}= −1/2 being the first-order-related correction for the zeroth order term (so that

*w*

_{0}+

*w*

_{0,1}

^{adj}= −1/6), and

*w*

_{1}= 1. In other words, we need to adjust our best estimate for the zeroth order term as we introduce a linear term. At the second order, we can achieve perfect fit using

*f**(

*x*) =

*x*

^{2}, so we need to cancel out both zeroth- and first-order terms:

*f**(

*x*) =

*w*

_{0}+

*w*

_{0,1}

^{adj}+

*w*

_{0,2}

^{adj}+ (

*w*

_{1}+

*w*

_{1,2}

^{adj}) +

*x*+

*w*

_{2}

*x*

^{2}, with

*w*

_{0,2}

^{adj}= 1/6,

*w*

_{1,2}

^{adj}= −

*w*

_{1}, and

*w*

_{2}= 1. From here, the reader can see that

*w*

_{n}is the best approximating coefficient of order

*n*for the

*n*

^{th}order approximation. This is also what the best approximating kernels

*W*

_{n}’s are. The reader is referred to Victor,1992, where this method is presented as an abstract formulation of the Wiener approach to nonlinear system representation.

*a*,

*b*] is symmetric (

*a*= −

*b*), then

*w*

_{1,2}

^{adj}= 0, that is, there is no linear correction term when upgrading to second-order, as first and second order terms are orthogonal (i.e., ). In the examples considered in this work, input noise is indeed symmetric, so

*W*

_{1}

^{adj}= 0 as in Figure 4. However, the formulation has been kept general to include cases in which the input is not constrained (e.g., one can think of applications in which the input consists of a limited set of images from medical reports).

*NoTrialsPerBlock*= 200). A psychophysical experiment was simulated as in the previous section, for a system of known Volterra kernels (shown in a and d). b shows

*W*

_{1}for this system, and e plots

*W*

_{2}(minimizations were carried out using standard Matlab routines); the thin trace in b shows

*W*

_{1}

^{adj}. Both kernel estimates are quite good (as in the previous section, correlations for the two kernels [shown in c and f] are high). It must be pointed out that the validity of this procedure is independent of the particular structure that was selected for the nonlinear system used in these simulations (i.e., the system does not have to be literally implementing a Volterra expansion; this was done here only for illustrative purposes).

*x*axis). Criterion bias is in units of half the difference between the mean response to signal+noise and that to noise alone; 0 is for a criterion that is halfway between the mean responses. Values greater than 1 and smaller than −1 refer to very conservative and very lax criteria, respectively. Again, the two approaches overall perform very similarly. The main difference is that for a very nonlinear system, fMin recovers the linear filter slightly better than ExtNIC at unbiased criterion, but performs more poorly for biased criteria. The linear estimate from ExtNIC is improved by using false alarms only (

*L*

_{1}

^{[0,1]}, solid line). The extreme case of a very nonlinear system shown in the left panels is very unlikely; more likely scenarios are those in middle and right plots. Both techniques work well for these cases, and are reasonably robust to criterion bias. ExtNIC estimates obtained using second-order moments (Equation 4) were very similar to those obtained using covariance (5), except in the estimate of the nonlinear kernel for a very linear system (dotted line). Using covariance is slightly more robust to criterion bias, as anticipated in Section 2.

*NoTrialsPerBlock*= 200). Clearly, there is no way here to assess the goodness of derivation for these kernels as was done in Figure 3, because we do not have direct knowledge of the system’s structure.

*W*

_{2}.

*statistically significant*contribution of second-order nonlinearity). The measure taken above (what percentage of psychophysical responses is correctly predicted by Equation 1) applies only to the threshold condition explored in the experiment. That is, one cannot predict, from the estimate at threshold, what the importance of nonlinear behavior in the system would be in suprathreshold conditions. It is possible that a nonlinear mechanism that is being exposed at threshold may play a more important role at suprathreshold — the fact that it is being studied at threshold has to do only with the technical requirements that are necessary to expose the mechanism, but this does not mean that the mechanism is being studied in its “ecological” regime for signal-to-noise ratio. In light of these considerations, it becomes very important to devote efforts to the characterization of nonlinear mechanisms, even if their impact on performance may be small (but nonetheless significant) within the limited threshold range explored in the experiment.

*Z*scores were computed using this technique for panels on the left (ExtNIC); for fMin kernels, they were estimated using the Hessian matrix at the minimum.

^{1}This is a simplified description of the Volterra expansion. The original formulation involves

*convolution*between

**L**

_{i}and

*I*to obtain

*V*

_{i}, so that the dimensionality of

*V*

_{i}is equal to that of the input,

*I*: p ]In Equation 2,

*V*

_{i}are scalars, as it is simply the

*cross-correlation*between

**L**

_{i}and

*I*that is being computed. Whether this simplified representation is applicable or not depends on the specific context being studied. It incorporates the assumption that the nervous system is basing its behavioral decisions on outputs from the most sensitive mechanism(s). A convolution operation can be thought of as extended sampling by a bank of linear filters; of all these filters, only those sampling the region in the vicinity of the target will be used to drive behavior, as those provide useful information for the task at hand (i.e., they are the most sensitive to the target). Equation 2 refers only to these mechanisms. Another (more practical) reason for adopting this simplified representation is that it conforms to Ahumada’s formulation.

- for a linear filter (
**L**_{2}= 0) followed by a static nonlinearity (LN system),*L*_{1}^{est}(Equation 3) provides a good estimate of the first-order kernel**L**_{1}(Equation 16; this is in line with Ahumada’s [2002] demonstration of the classical Bussgang result [Bussgang, 1952]); - the target-present average noise images
*L*_{1}^{[1,o]}in Equation 3 (i.e., those for hits and misses) are affected, for a system with nonzero second-order kernel**L**_{2}, by “pollution” from this nonlinear term (Equation 15), providing an explanation for some observations made by Ahumada in relation to target-present averages (Ahumada, 1967); - for an LN system and unbiased criterion,
*L*_{2}^{est}= 0 (Equation 17); this result is central to this “Appendix” — however, - when the criterion is biased,
*L*_{2}^{est}(Equation 4) may contain modulations related to*L*_{1}by a*L*_{1}(ν)·*L*_{1}(ξ) term (Equation 18);

**L**

_{i}= 0 for

*i*> 2), and input space is one-dimensional (this is the system considered in all the simulations in this work). Its (unthresholded) output on trial

*i*is: where

*s*is the stimulus configuration (

*s*= 0 is noise only,

*s*= 1 is target+noise),

*Dim*is the size of input space (equal to filter size), and: where

*n*

_{i}(

*j*) is the noise image on trial

*i*, and

*t*(

*j*) is the function defining the target. For Gaussian noise

*n*: where the symbol ΣП stands for summation over all distinct ways of partitioning the 2

*M*random variables into products of averages of pairs (Schetzen, 1980).

*L*

_{1}

^{est}, the estimate for the linear kernel

*L*

_{1}. Average noise images for the four stimulus-response classes are indicated by

*L*

_{1}

^{[s,o]}, where

*s*is, as above, the target (0, absent; 1, present) and

*o*is the response (e.g., the average noise image for the false alarms is

*L*

_{1}

^{[0,1]}).

*L*

_{1}

^{[s,1]}, as computed using Ahumada’s method, can be written as: where

*p*

_{i}

^{[s,1]}is the probability that trial

*i*will be of type [

*s*,1], and

*p*

_{i}

^{[s,1]}=

*g*(

*r*

_{i}

^{[s]}where

*g*is a static nonlinear function that maps output response from the system onto probability of psychophysical response “yes.” We now expand

*g*up to its second-order Taylor term around mean response

*r*

^{–[s]}(implying that we are assuming

*g*to have continuous derivatives up to order 3). Equation 13 can then be written as: Substituting Equations 9 to 12 into 14, this becomes and, from Equation 13, it is easy to show that . Equation 3 is then Let us verify something very familiar: for

**L**

_{2}= 0, this reduces to Figure 5 is a useful tool for thinking about

*g*. This nonlinearity is assumed to be approximately odd-symmetric around its midpoint. As a matter of fact, most decisional transducers enjoy this symmetry (e.g., a noisy threshold belongs to this category; see Nykamp & Ringach, 2002, for more examples). It is also the case that , and . When the criterion is unbiased (i.e., ), the two regions of

*g*that map and onto and , respectively, are mirror-symmetric, so that and . This means that, for an unbiased criterion, Despite lack of bias in the criterion,

*L*

_{1}

^{est}is still affected by terms that depend on the interaction between

**L**

_{2}and the target. These terms come, of course, from target-present averages. This result may be related to the observation made by Ahumada (1967) that for a system pooling nonlinearly (e.g., max rule) from a bank of linear filters, averages for noise-only trials would return the average of the bank (i.e., the linear part of the process), whereas averages from target+noise trials would return something heavily affected by the target shape. In fact, Equation 15 shows that when the target is present (

*s*= 1), there are extra terms that involve

**L**

_{2}·

*L*; for a nonlinear system (

**L**

_{2}≠ 0), these terms affect the target-present averages

*L*

_{1}

^{est}and

*L*

_{1}

^{[1,0]}, making

*L*

_{1}

^{est}(equation above) depart from Equation 16.

*L*

_{1}

^{[s,1]}, one can show that: where and

*δ*(

*x*) is the Dirac delta function (

*δ*(

*x*) = 0 for

*x*≠ 0, ). It is easy to show that . We can derive the estimate for

**L**

_{2}according to Equation 4:

*g*to orders higher than the second. This means that if an experiment is carried out at unbiased criterion and

*L*

_{2}

^{est}≠ 0, then the system cannot be modeled as LN (which is the most widely used model in vision). In other words, the effect of the nonlinear decisional step

*g*on

*L*

_{2}

^{est}can be neutralized by having a balanced criterion (provided

*g*enjoys approximate odd-symmetry).

*K*’s are constants (for a given overall criterion bias). The last term is the reason for using Equation 5 rather than 4: computing covariance partly compensates for the

**L**

_{1}(

*ν*)

**L**

_{1}(ξ) term in the estimate above.

*δ*(

*ν*-

*ξ*)·

*constant*. If the second-order term in the expansion of

*g*is sizeable, then the term

*B*appears in

*L*

_{2}

^{est}, whether the criterion is biased or not. However, the simulations in section 4 show that this term is not particularly disruptive of

*L*

_{2}

^{est}.