Free
Research Article  |   November 2009
Nonlinear characterization of a simple process in human vision
Author Affiliations
Journal of Vision November 2009, Vol.9, 1. doi:10.1167/9.12.1
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Peter Neri; Nonlinear characterization of a simple process in human vision. Journal of Vision 2009;9(12):1. doi: 10.1167/9.12.1.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Perceptual processes are often modeled as linear filters followed by a decisional rule. This simple model is central to the understanding of visual processing in humans. Its scope may be extended to capture a wider range of behaviors by the addition of nonlinear operators or kernels, but there is no evidence in human sensory processing that these operators are able to enhance the linear description. We focused on a simple process in human vision, the perception of brightness in a center-surround annular stimulus. We used psychophysical reverse correlation to fully characterize this process up to its second-order nonlinearity. The resulting characterization was then used to reconstruct/predict individual responses by the human observers, a process that was significantly enhanced by the addition of the nonlinear kernels. These results provide direct evidence that behavioral second-order kernels can be successfully derived using reverse correlation, and furthermore that they can be effectively exploited to simulate human vision. We show that the former result does not imply the latter by performing a second series of experiments involving orientation-defined textures, for which no measurable benefit was gained from the addition of second-order kernels.

Introduction
The most common description for the response of a human observer performing a behavioral task is r = g[ h 1i] where the linear filter h 1 is applied to the input stimulus i by template-matching • (dot product), and the output of the filter is converted into a psychophysical decision r (e.g. 0 for correct and 1 for incorrect) by a nonlinear transducer g (boldface refers to a vector, e.g. h 1, i; capital letters refer to matrices (introduced below), e.g. H 2). The expression for r above is known as a linear-nonlinear (LN) cascade model and has played a foundational role in thinking about early vision (Jung & Spillman, 1970; Neri & Levi, 2006). h1 can be recovered via reverse correlation (Marmarelis & Marmarelis, 1978), a well-known result which has been exploited in various domains of sensory neuroscience (Victor, 2005; Wu, David, & Gallant, 2006). More specifically in relation to the present study, h1 can be recovered for human observers using a psychophysical variant of reverse correlation known as noise image classification (Ahumada, 2002; Ahumada & Lovell, 1971). The most general expression for r above contains convolution * rather than template-matching • (Bracewell, 1965; Schetzen, 1980), however this distinction is immaterial when the output is a scalar (the reader may consult the relevant section in 1 for clarifications on this issue, but may also skip/postpone it without loss of continuity). 
Because in the current study we used a 2 alternative forced choice (2AFC) paradigm, there are two stimuli on every trial: the target stimulus i [1] and the non-target stimulus i [0]. Each stimulus is filtered by h 1 and the final response is based on the difference between the outputs of the filter to the two stimuli, h 1i [1]h 1i [0] = h 1 • ( i [1]i [0]) (we are assuming no internal noise for the purpose of introducing these concepts; the role of internal noise is addressed later in the article). In the expression r = g ( h 1i) this means that i = i [1]i [0] and g(x) is the Heaviside unit step function: this rule responds 0 (incorrect response) if the non-target stimulus elicited a larger response than the target stimulus and 1 (correct response) if the target stimulus elicited a larger response than the non-target stimulus (randomly otherwise). Although this description is not technically linear due to the presence of the static nonlinearity g, it is the closest linear approximation to a human observer performing a 2AFC task (see also Abbey & Eckstein, 2002). 
Linear sensory filters describe how different epochs of the stimulus are individually weighted by the observer: h 1 is the weighting function applied to every time value of the input i. However they do not specify how modulations at two different times may interact with each other (second-order interactions) to drive the observer's response. To include this class of nonstatic nonlinearities, the LN description can be extended by Volterra expansion (Korenberg & Hunter, 1996; Marmarelis, 2004; Marmarelis & Marmarelis, 1978; Volterra, 1930; Westwick & Kearney, 2003): 
r=g[h1(i[1]i[0])+H2(I[1]I[0])],
(1)
where the matrix H2 is the second-order Volterra kernel applied to the outer product matrix I = ii by template-matching along two dimensions (Frobenius inner product). Similarly to the earlier expression, the template-matching formulation above is equivalent to the convolution-based Volterra formulation in the present context (see 1). Second-order kernels can be thought of as covariance matrices: they indicate whether two modulations at different times co-varied positively or negatively when the observer was likely to report the corresponding stimulus as target. Like the first-order kernel h1, the second-order kernel H2 can be derived via reverse correlation techniques (Marmarelis & Marmarelis, 1978; Neri, 2004; Schetzen, 1980) and has been explicitly recovered for a number of applications in sensory neuroscience (e.g. Emerson, Bergen, & Adelson, 1992; Naka, Sakai, & Ishii, 1988; Nemoto, Momose, Kiyosawa, Mori, & Mochizuki, 2004; Sakai, Wang, & Naka, 1995; Stark, 1969; Victor, Shapley, & Knight, 1977; Yeshurun, Wollberg, & Dyn, 1987). However its properties have not been investigated nearly as extensively as the first-order kernel, and notably there is no relevant comprehensive work on its applicability in the context of human sensory behavior. 
This study reports data from two series of experiments involving luminance-defined and texture-defined stimuli. In both cases the stimulus configuration was optimized to elicit perceptual responses that are expected from previous literature to involve highly nonlinear phenomena. We derive first-order and second-order kernels and demonstrate that their overall structure is conserved across different observers. We then show that the system can be synthesized by incorporating the derived kernels into Equation 1. More specifically, Equation 1 is able to replicate human responses on individual trials with high accuracy, and the contribution provided by the second-order kernels in this sense is sizeable. For luminance-defined stimuli we found that Equation 1 captured all accountable variance in human response. There is a substantial amount of supplementary material appended to this article, mainly due to the large size of the data set and the inclusion of several topics from nonlinear systems analysis. Although a full understanding of this article requires the entirety of the material presented here, the main results and logic can be followed without reference to the Supplementary material
Materials and methods
Visual stimuli and task
Luminance experiments
The stimulus was divided into center and surround ( Figure 1A). All pixels within the center region (0.5 deg diameter) had equal luminance. Pixels in the surround had equal luminance within a diameter of 1.5 deg. Outside the 1.5 deg diameter luminance varied smoothly to equal background luminance (37 cd/m 2) following a Gaussian profile of standard deviation ( SD) 0.4 deg (see magenta profile in Figure 1A). The luminance of both center and surround was manipulated independently every 20 ms, for a total duration of 260 ms (13 stimulus frames). Surround luminance was randomly modulated according to a Gaussian distribution with mean background luminance and SD 3.3 cd/m 2. We refer to this modulation as n sur (blue trace in Figure 1B). The center was modulated in the same way according to n ctr (black trace in Figure 1B). On each trial, two independent stimuli like the one just described were presented on opposite sides of fixation at 5 deg eccentricity (distance between stimulus center and fixation point). One of the two stimuli contained a target signal s which was added to the center modulation, so that center luminance was i ctr = n ctr + s where s consisted of a 13.2 cd/m 2 (S1–S2) or 9.9 cd/m 2 (S3–S6) peak in the middle frame and 0 everywhere else (gray trace in Figure 1B). Observers were asked to indicate which side (left versus right) showed the stimulus containing a bright flash in the center by pressing a button, which triggered the next presentation after a random delay uniformly distributed between 150 and 350 ms. Before data collection they were shown noiseless versions of the stimulus as well as versions with small amounts of noise, to provide them with a clear visual rendition of the target to be detected. They were explicitly instructed to base their judgment on the appearance of the center alone and not on the surround. They received feedback (correct/incorrect) at the end of each trial. All observers found the task easy to learn and understand. We tested 6 observers, all naive except S3 (author PN). Block length was 100 trials. At the end of each block observers were informed about the % of correct responses across all blocks as well as the last block. Number of trials (including half of double-pass): 6150 (S1), 6100 (S2), 6000 (S3), 6000 (S4), 6000 (S5), 6000 (S6). Percent correct: 0.78 (S1), 0.7 (S2), 0.73 (S3), 0.72 (S4), 0.68 (S5), 0.7 (S6). We chose a discrete-trial technique (similar to Neri & Heeger, 2002; Neri & Levi, 2008a) rather than a RT-based (reaction time) technique with continuous presentation (similar to Ringach, 1998) because previous studies suggest that the former method is able to resolve relevant aspects of kernel temporal dynamics that are not resolved by the latter. More specifically, Tadin, Lappin, and Blake (2006) and Neri and Levi (2008a) used a discrete-trial technique which exposed an oscillatory pattern for human directional kernels on the scale of ∼100 ms. This pattern was not resolved by similar experiments using a RT-based technique (Busse, Katzner, Tillmann, & Treue, 2008). Similarly, the ability to resolve temporal dynamics in Ringach (1998) appeared relatively poor. 
Figure 1
 
Visual stimuli used in the luminance (A–B) and texture (C–D) experiments. (A) The luminance stimulus consisted of a central region of uniform luminance (bright in the figure) surrounded by a region of uniform luminance out to the dashed circle (dark in the figure), progressively approaching background luminance outside the dashed circle following a Gaussian profile (see magenta cross-section at bottom). (B) The luminance of both center and surround (black and blue traces respectively) was modulated by a random Gaussian process of equal mean and SD for the two regions (red profile to the right). When the target signal was present, this consisted of an additive modulation to the central frame in the sequence (gray trace). (C) The texture stimulus was similar to the luminance stimulus in that it was subdivided into center and surround regions, but the (locally averaged) luminance profile was constant across the stimulus (see magenta cross-section at bottom; light-magenta profile shows the contrast envelope). Instead, the orientation of the texture was manipulated smoothly between vertical and horizontal (see texture bar next to the y axis in D) so that positive modulations corresponded to more horizontally oriented textures, and negative modulations to more vertically oriented textures. Aside of these differences, the texture stimuli contained very similar modulation patterns to those used for the luminance stimuli (compare panels B and D). See Materials and methods for more details.
Figure 1
 
Visual stimuli used in the luminance (A–B) and texture (C–D) experiments. (A) The luminance stimulus consisted of a central region of uniform luminance (bright in the figure) surrounded by a region of uniform luminance out to the dashed circle (dark in the figure), progressively approaching background luminance outside the dashed circle following a Gaussian profile (see magenta cross-section at bottom). (B) The luminance of both center and surround (black and blue traces respectively) was modulated by a random Gaussian process of equal mean and SD for the two regions (red profile to the right). When the target signal was present, this consisted of an additive modulation to the central frame in the sequence (gray trace). (C) The texture stimulus was similar to the luminance stimulus in that it was subdivided into center and surround regions, but the (locally averaged) luminance profile was constant across the stimulus (see magenta cross-section at bottom; light-magenta profile shows the contrast envelope). Instead, the orientation of the texture was manipulated smoothly between vertical and horizontal (see texture bar next to the y axis in D) so that positive modulations corresponded to more horizontally oriented textures, and negative modulations to more vertically oriented textures. Aside of these differences, the texture stimuli contained very similar modulation patterns to those used for the luminance stimuli (compare panels B and D). See Materials and methods for more details.
Texture experiments
The visual stimulus was conceptually similar to the luminance stimulus (compare Figure 1D with 1B) except the modulation consisted of a texture variation rather than luminance variation. If z is texture modulation (plotted on the y axis in Figure 1D) ranging from −1 to 1, T(z) =
θ 1 θ 2
[G( θ) * A]d θ is the associated texture image. θ 1(z) = πz and θ 2(z) = π(2 − z) for z > 0, θ 1(z) = π(1/2 − z) and θ 2(z) = π(5/2 + z) for z < 0 ( θ = π for horizontal), A is a noise image consisting of a random uniformly distributed luminance modulation for each pixel (range ±33 cd/m 2 around background luminance) and G is a two-dimensional Gabor (cosine) filter of orientation θ ( SD's of Gaussian envelope 0.1 deg and 1 deg (orthogonal and parallel to θ), carrier frequency 3.3 cycles/deg). Center (2.5 deg diameter) and surround were modulated independently. The spatial luminance profile for the texture surround was modulated by an envelope which was constant within 6 deg of diameter, and decreased according to a Gaussian profile of 1.8 deg SD outside the 6 deg diameter (light-magenta profile in Figure 1C). On each trial, two independent stimuli were presented on opposite sides of fixation at 8 deg eccentricity (distance between stimulus center and fixation point). Observers were asked to indicate which side showed the stimulus containing a horizontally oriented texture flash in the center. Piloting showed that this task was significantly more difficult than the luminance task, thus forcing us to change the parameters slightly in order to bring performance within a comfortably measurable range. Frame duration had to be increased to 40 ms, resulting in 7 frames total in order to maintain a similar stimulus duration (280 ms). The SD of the Gaussian noise modulation was the same for center and surround (shown by red trace in Figure 1D) but differed slightly across observers; this adjustment was necessary to allow different signal-to-noise ratios (SNR) for the different observers while ensuring that texture modulation would not exceed the maximum value afforded by the stimulus. Noise SD ( σ N) and signal amplitude were 0.15/0.55 (S1, S4 and S6), 0.125/0.625 (S2), 0.2/0.4 (S3 and S5). Number of trials (including half of double-pass): 3500 (S1), 3300 (S2), 2150 (S3), 1000 (S4), 6200 (S5), 4200 (S6). Percent correct: 0.66 (S1), 0.68 (S2), 0.66 (S3), 0.7 (S4), 0.6 (S5), 0.69 (S6). 
Derivation of sensory kernels
Each trial is fully described by n [s,r] p where p refers to either center (p = ctr) or surround (p = sur), s refers to the noise modulation within the target-absent (s = 0) or target-present (s = 1) stimulus, and r indicates whether the observer responded correctly (r = 1) or incorrectly (r = 0). Estimated first-order kernels were ĥ1 p = 〈 n [1,1] p〉 + 〈 n [0,0] p〉 − 〈 n [1,0] p〉 − 〈 n [0,1] p〉 =
s , r
(s, r)〈 n [s,r] p〉 where 〈 〉 is average across trials of the indexed type and Δ(s, r) = 1 for s = r, Δ(s, r) = −1 for s ≠ r (Abbey & Eckstein, 2002; Ahumada, 2002). Similarly, estimated second-order kernels were ĥ2p =
s,r
Δ(s, r)cov{n[s,r]p, n[s,r]p} where cov{ } is the covariance matrix across trials of the indexed type (Neri, 2004). When kernels were derived from target-present images only, the same equations were applied but for s = 1 only (rather than both 0 and 1); when derived from target-absent images only s = 0. The center-surround second-order kernel ĥ2ctr-sur was derived by computing covariance as cov{n[s,r]ctr, n[s,r]sur}. ĥ1h1 for LN systems (Ahumada, 2002) in line with Bussgang's result (Bussgang, 1952) but this relationship is only approximately valid if H2 ≠ 0 (Neri, 2004; see also Extra Figure 2). ĥ2 is featureless for the LN model provided the decisional transducer g is symmetrical and unbiased (Neri, 2004). Both conditions are satisfied by the 2AFC paradigm used here (see also Extra Figure 1 for related simulations and 2). This result differs from that obtained for standard applications of the Volterra-Wiener approach for which it is known that the LN model generates a second-order kernel of structure h1h1 (Chen, Ishii, & Suzumura, 1986; Hung & Stark, 1991; James, 1992; Marmarelis, 2004; Westwick & Kearney, 2003). The closest equivalent in the present context is offered by kernels derived from noise image sub-classes (target-present and target-absent) where this modulation is observed (see Extra Figures 1F and 1J). 
Computation of combined kernels (Figure 4)
Each observer was characterized using 2 estimated first-order kernels ĥ1 ctr and ĥ1 sur and 3 estimated second-order kernels Ĥ2 ctr, Ĥ2 sur and Ĥ2 ctr-sur (see James, 1992 for an analogous characterization of large monopolar cells (LMC) in the fly and Benardete & Kaplan, 1997 for primate P retinal ganglion cells). For some analyses we combined these into 1 first-order kernel ĥ1comb = ĥ1ctr − ĥ1sur and 1 second-order kernel ĥ2comb = diag(ĥ2ctr) + diag(ĥ2sur) − diag(ĥ2ctr-sur) where diag() takes the diagonal of the covariance matrix (the variance in the case of ĥ2ctr and ĥ2sur). These combination rules are empirical, i.e. based on the observed symmetry of the kernels; they are not theoretically motivated and are not meant to reflect any specific theoretical notion. 
Scalar metrics for assessing early/late kernel bias (Figure 5)
Given a 1-dimensional kernel ĥ (i.e. either ĥ1 or diag(ĥ2)), the centroid (center of mass) was computed as 〈t∣ ĥ∣/〈∣ ĥ∣〉〉 where 〈 〉 is average across time t (Bracewell, 1965). The peak was computed as the time point corresponding to max(ĥ). For luminance kernels, the early/late ratio was computed as 〈ĥt<130ms2〉/〈ĥt>130ms2〉, i.e. by taking the modulation ratio between points within the time window indicated by the top blue line in Figure 4A and points within the time window indicated by the top yellow line in Figure 4A. The edges/center ratio was computed as 〈ĥt<70ms∨t>190ms2〉/〈ĥ50ms<t<210ms2〉, i.e. by taking the modulation ratio between points within the time windows indicated by the bottom blue lines in Figure 4A and points within the time window indicated by the bottom yellow line in Figure 4A. For texture kernels, the two ratios were 〈ĥt<140ms2〉/〈ĥt>140ms2〉 and 〈ĥt<100ms∨t>180ms2〉/〈ĥ60ms<t<220ms2〉. 
Bandpass index
This was simply μ P/ σ P where μ P and σ P are mean and SD of the power distribution across temporal frequency obtained by computing (via discrete Fourier transform) and normalizing (to sum 1) the power spectrum of the original time function f. If f conforms to a Gabor wavelet, the corresponding (Gaussian) spectrum returns a bandpass index of 1.3236 for low-pass f (Barr & Sherrill, 1999) indicated by dashed lines in Figures 7B and 7D. The x axis in Figure 7D plots μP
Nonlinear second-order models
By expanding Equation 1, the scalar output r is  
r = g [ ĥ 1 c t r ( i [ 1 ] c t r i [ 0 ] c t r ) + ĥ 1 s u r ( i [ 1 ] s u r i [ 0 ] s u r ) + k ( Ĥ 2 c t r [ I [ 1 ] c t r I [ 0 ] c t r ] + Ĥ 2 s u r [ I [ 1 ] s u r I [ 0 ] s u r ] + Ĥ 2 c t r s u r [ I [ 1 ] c t r s u r I [ 0 ] c t r s u r ] ) ],
(2)
where i p and I p are defined and indexed using the conventions detailed earlier. The system kernels h 1 and H 2 are replaced by estimates (ĥ1 and ĥ2) because Equation 2 is implemented as an approximation to the observer (while Equation 1 defines an abstract model) and, in the absence of further assumptions, the only available specification for the kernels consists of their empirical estimates. k (the relative contribution of the nonlinear kernels) is not a free parameter because, provided ĥ2 is appropriately scaled with respect to ĥ1 (i.e. by 1/(2 σ N 2)), it is theoretically expected to equal 1 (Marmarelis & Marmarelis, 1978; Neri, 2004; Schetzen, 1980). However kernel estimation is subject to several sources of error, some of which affect the gain ratio between ĥ1 and ĥ2 (Marmarelis & Marmarelis, 1978). These issues are relevant in the present context due to the addition of a target signal, the presence of internal noise and the decisional nonlinearity (see for example equation immediately after 16 (page 89) in Neri, 2004). We demonstrate the dependence of the ĥ2 to ĥ1 gain ratio on the shape of ĥ1 and ĥ2 as well as internal noise and signal intensity in Extra Figure 7. To demonstrate that the poor performance of the nonlinear second-order model in the texture experiments was not simply due to these factors, in a subsequent step of the analysis (Figures 9C9E) we optimized k individually for each subject to maximize trial-by-trial model-human consistency cmh defined as the % of trials on which the model response matches the human response. We computed consistency for k ranging from 10−2 to 102, including k = 0 and k = ∞ (the latter was achieved by eliminating the first-order kernel from the equation; Figures 9C and 9D show examples of consistency (y axis) versus k (x axis) functions). 
Uncertainty models
Using the same notation adopted for the nonlinear second-order model:  
r = g ( max [ w ( f c t r * i [ 1 ] c t r + f s u r * i [ 1 ] s u r ) ] max [ w ( f c t r * i [ 0 ] c t r + f s u r * i [ 0 ] s u r ) ] ),
(3)
 
f ( t ) = t α exp ( t / τ ) cos ( 2 π t / ν + π φ ),
(4)
where w is a fixed weighting function and f is a cosine-gamma function (Chen, Wang, & Qian, 2001). We used fctr = −fsur for all uncertainty models except the version in which the two f's corresponded to the empirically derived first-order kernels from target-present noise fields, and w = constant or w = 1 − t/τ (linearly decreasing) where τ = 500 ms for luminance and 520 ms for texture experiments. For each observer separately, we computed cmh for the 4 f parameters (exhaustive search) within the following ranges: α = 0 to 4.5 (step size 0.5); τ = 20 to 260 (step size 20); ν = 100 to 460 (step size 40); Φ = −1 to 1 (step size 0.2). We then selected the 4 values corresponding to the largest consistency as optimal values for that observer. The ranges spanned by our search allowed for great variability in the shape of possible f's (gray traces in Figure 9F). It is known that uncertainty models generate modulations within first-order kernels (Tjan & Nandy, 2006); it is shown in 2 here that they generate modulations within second-order kernels. The uncertainty model also represents the most obvious adaptation of a temporal probability summation model to a 2AFC design. In the probability summation model (Watson, 1979) the stimulus i is a potential target whenever its time-varying output o = fctr * ictr + fsur * isur or equivalently max(o) exceeds a threshold value. In the 2AFC context this criterion is replaced by the (N) decisional transducer thus equating the two models. 
Retinal model
The time-varying output response o to stimulus i was determined by the differential equation  
d o d t = [ q ( t ) b · o ( t τ D ) + 1 ] +,
(5)
where q = h 1 RGC * i ctrh 1 RGC * i sur, h 1 RGC is the normalized (to peak value = 1) temporal impulse response of primate M retinal ganglion cells estimated by Benardete and Kaplan (1999) (magenta trace in Figure 7C), [ ]+ implements half-wave rectification ([x]+ = xg(x)), b =
15
, τD = 60 ms and o = 0 for t ≤ 0. We used σN2 = 1 and target intensity =
103
(matched to average human value for luminance experiments). The final response was r = g⌊Σto(i[1]) − Σto(i[0])⌋. We ran 50 simulations of 36K trials each (matching the aggregate number of trials for the luminance experiments). 
Estimation of maximum achievable model-human consistency
Model-human consistency c mh (% of trials on which the model response matches the human response) for the best possible model must lie in the interval c hh < c mh < (1 +
2 c h h 1
)/2 for a 2AFC task, where c hh is human-human consistency (Neri & Levi, 2006). The latter is estimated by performing a double-pass experiment in which the same set of stimuli is presented twice and the % of same responses to the two sets is computed (Burgess & Colborne, 1988). We performed a series of double-pass experiments consisting of 100-trial blocks (like in the main experiment). Observers were not aware of any difference with respect to blocks for the main experiment. In double-pass blocks, the second half of 50 trials showed the same stimuli presented during the first half of 50 trials, but in randomly permuted order. We collected the following number of double-pass trials for luminance experiments: 300 (S1), 1400 (S2), 1000 (S3), 600 (S4), 600 (S5), 600 (S6). Number of double-pass trials for texture experiments: 1000 (S1), 600 (S2), 300 (S3), 200 (S4), 400 (S5), 200 (S6). Half of these (the first 50 trials of each block) were extracted and combined with trials from the main experiment for the purpose of computing kernels. 
Cross-validation and final prediction error (FPE) metrics for model estimation
We adopted a ‘leave-one-out’ n-fold cross-validation technique (Allen, 1974; Hong et al., 2008; Kohavi, 1995; Stone, 1974). If n is the total number of trials, we predict the psychophysical response on trial number j by computing kernels from the remaining n − 1 trials (i.e. from 1 to j − 1, and from j + 1 to n) and by applying Equation 2 to the stimulus presented on trial j. This is repeated for j from 1 to n, and model-human cross-consistency is computed across all trials similarly to model-human consistency. A related approach is to use final prediction error (FPE) to factor out bias in cmh due to model parameterization, where FPE = (1 − cmh)(n + m/n − m) and m is the number of regressors/parameters (Akaike, 1970; Claeskens & Hjort, 2008; Hong et al., 2008). This metric is closely related to both leave-one-out cross-validation and information criteria (Hong et al., 2008). 
Results
General structure of luminance kernels
We delivered a noisy process to the observer by randomly modulating the luminance of a simple visual stimulus ( Figure 1A) during a 260 ms pulse ( Figure 1B). The stimulus was subdivided into a center (shown by the bright dot in Figure 1A) and a surround (indicated by the dashed magenta line in Figure 1A). The luminance of the center was modulated independently from the luminance of the surround, so that the two modulations (indicated by black and blue traces for center and surround respectively in Figure 1B) were uncorrelated. Observers saw two such stimuli on each trial, one to the left and one to the right of fixation. For one of the two stimuli, the luminance of the center was briefly increased during a 20 ms epoch by adding a target signal (indicated by the gray trace in Figure 1B). Observers were required to indicate the stimulus containing this brief increase in luminance. We specifically designed the luminance stimuli to generate a strong simultaneous contrast illusion (which we verified informally by setting the center to background luminance and the surround to bright and dark on opposite sides of fixation). Our motivation for focusing on this phenomenon is that we expected it to involve highly nonlinear processes, and therefore maximize our chances of measuring robust second-order kernels. 
We derived the underlying sensory filters using reverse correlation (Ahumada, 2002; see Materials and methods). The black and blue traces in Figure 2A show linear sensory filters for center and surround respectively (aggregate across observers). As expected from previous work (Ahumada & Lovell, 1971; Nagai, Bennett, & Sekuler, 2007; Neri & Heeger, 2002; Thomas & Knoblauch, 2005) they peak at target occurrence (indicated by the gray-shaded region). The linear filter for the surround (blue) is approximately equal to that for the center, except it takes the opposite sign (correlation between center and surround kernels was (mean ± SD across observers) −0.75 ± 0.06). This pattern is consistent with a Mexican-hat spatial profile (a positive center modulation surrounded by negative flanking modulations), dovetailing classical accounts of the simultaneous contrast illusion based on retinal center-surround receptive fields (Spillmann, 1971). 
Figure 2
 
Luminance first-order and second-order kernels for the aggregate observer (36,250 trials). (A) shows the first-order kernels for center (black) and surround (blue), as well as the diagonals of second-order kernels for center (red), surround (yellow) and center-surround (magenta). (B–D) show full second-order kernels for center (B), surround (C) and center-surround (D). (A–D) show kernels derived from all noise fields, (E–H) only from images containing a target, (I–L) only from images not containing a target (see Materials and methods). Filter amplitude for first-order kernels is in units of external noise SD (σN) and plotted to the black numeric labels on the y axis; for second-order kernel diagonals the amplitude is in units of external noise variance and plotted to the red numerical labels. Second-order kernels are plotted as Z score maps (colored for ∣Z∣ > 2, green for positive and blue for negative). White traces show marginal averages along the main diagonal. Target occurrence is indicated by gray vertical bar in A, E and I. Shaded regions in A, E and I show ±1 SEM.
Figure 2
 
Luminance first-order and second-order kernels for the aggregate observer (36,250 trials). (A) shows the first-order kernels for center (black) and surround (blue), as well as the diagonals of second-order kernels for center (red), surround (yellow) and center-surround (magenta). (B–D) show full second-order kernels for center (B), surround (C) and center-surround (D). (A–D) show kernels derived from all noise fields, (E–H) only from images containing a target, (I–L) only from images not containing a target (see Materials and methods). Filter amplitude for first-order kernels is in units of external noise SD (σN) and plotted to the black numeric labels on the y axis; for second-order kernel diagonals the amplitude is in units of external noise variance and plotted to the red numerical labels. Second-order kernels are plotted as Z score maps (colored for ∣Z∣ > 2, green for positive and blue for negative). White traces show marginal averages along the main diagonal. Target occurrence is indicated by gray vertical bar in A, E and I. Shaded regions in A, E and I show ±1 SEM.
As mentioned in the Introduction, linear sensory filters do not specify how modulations at two different times may interact with each other. If present, these interactions can be estimated by deriving second-order kernels via a reverse correlation analysis similar to that used for deriving the linear (first-order) kernels (Neri, 2004; see Materials and methods). Examples of second-order kernels for our experiment are shown in Figures 2B2D for all three types of second-order interactions: center modulations interacting with center modulations (Figure 2B), surround modulations interacting with surround modulations (Figure 2C), and center modulations interacting with surround modulations (Figure 2D; green for positive and blue for negative). It is clear that there are many significant modulations in all three kernels (see Z-score legends to the right), indicating that the underlying process cannot be modeled using a L(N) model (Neri, 2004; see Materials and methods, 2 and Extra Figure 1). For visualization purposes we focus on the diagonal (variance) modulations, shown by the red, yellow and magenta traces in Figure 2A alongside the first-order kernels described previously. We consider off-diagonal modulations later in the Results section. 
First-order and second-order kernels display different temporal dynamics. The first-order kernels (black and blue), as mentioned previously, peak at target occurrence around 130 ms and show similar modulations both before and after the target epoch. Second-order kernels (red, yellow and magenta) peak before target occurrence and progressively decline as the stimulus unfolds. This asynchrony between the kernels has previously been reported in the context of a luminance detection task using bar stimuli (Neri & Heeger, 2002). As previously reported and as we document below, peak asynchrony is in the range 70–100 ms (Figure 5B). Interestingly, James (1992) observed an analogous asynchrony for fly LMC neurons, although the first-order kernel lagged behind the second-order kernel by a much smaller amount (5–10 ms) than we report for human vision. Similar result and timescale were measured by French and Kuster (1985) for locust photoreceptors (see also Pece, French, Korenberg, & Kuster, 1990). 
It has been recognized by previous authors that sensory filters derived using psychophysical reverse correlation often differ with respect to the noise fields that are used to compute them (e.g. Abbey & Eckstein, 2002; Ahumada, Marken, & Sandusky, 1975; Neri & Heeger, 2002; Solomon, 2002; Thomas & Knoblauch, 2005). More specifically, large differences may be observed when the filters are derived from target-present images as opposed to target-absent images (Ahumada et al., 1975; Barth, Beard, & Ahumada, 1999; Solomon, 2002; Thomas & Knoblauch, 2005) as predicted by theoretical considerations (Neri, 2004; see Extra Figure 2). The kernels we discussed above, shown in Figures 2A2D, were derived from both image classes. Figures 2E2H show kernels derived from target-present images only, and Figures 2I2L from target-absent images (see Materials and methods). There are several differences, the most evident being that the first-order kernels from target-absent images do not show the pronounced peak at target occurrence that is observed for the other two image classes (compare black/blue traces in Figure 2I with those in Figures 2A and 2E). Notwithstanding these differences, we show later that the asynchrony is present in all three classes and that they share the same overall characteristics (Figure 5). 
General structure of texture kernels
We wished to establish whether these characteristics are common to other visual attributes besides luminance. We therefore designed and performed a second series of experiments using stimuli that probed texture processing rather than luminance processing. The texture stimuli were designed to target orientation-selective neurons with antagonistic surrounds (Jones, Wang, & Sillito, 2002; Zenger-Landolt & Koch, 2001). Extra-classical receptive fields show well-characterized nonlinear properties (Allman, Miezin, & McGuinness, 1985). In the case of orientation-selective neurons, the extra-classical surround may be unresponsive to an oriented stimulus presented only within the surround itself, but the response to an oriented center stimulus is nevertheless modulated by the stimulus presented in the surround (Blakemore & Tobin, 1972 and the vast literature that followed). This highly nonlinear behavior may underlie well-known orientation-contrast illusions such as repulsion of adjacent differently oriented patterns (Gibson, 1937), as well as the effects of oriented flanks on contrast discrimination (e.g. Solomon & Morgan, 2006; Yu & Levi, 2000). Similarly to the luminance experiments, we reasoned that by designing our stimuli to target these well-known nonlinearities in the physiological and psychophysical literature we may increase our chances of observing highly significant modulations in the second-order kernels. The task in this case involved detection of an orientation-defined signal (Figure 1C), but the conceptual structure of the stimulus was essentially identical to the luminance stimulus (compare Figure 1D with 1B). This similarity allowed us to study the associated input-output characteristics using the same tools that we applied to the luminance experiment, and thus compare the two directly. 
The most conspicuous difference between luminance and texture kernels, shown in Figures 2 and 3 respectively, is that texture second-order kernels peak at (rather than before) target occurrence. This is clearly visible for the center-center second-order kernel (diagonal shown by red traces in Figures 3A, 3E, and 3I), which peaks at target occurrence (gray-shaded region) regardless of which image class it is derived from. In order to capture Figures 2 and 3 using a small number of descriptors in Figure 4, we combined both first-order kernels into one trace (black) and all three second-order kernels into another trace (red), separately for luminance (top row) and texture experiments (bottom row) and for kernels derived from all three noise image classes (three different columns; see Materials and methods for details). It is clear from Figure 4 that the asynchrony between first-order linear and second-order nonlinear kernels is less pronounced (if at all present) for the texture kernels. 
Figure 3
 
Texture first-order and second-order kernels for the aggregate observer (20,350 trials). Same plotting conventions as Figure 2. Shaded regions in A, E and I show ±1 SEM.
Figure 3
 
Texture first-order and second-order kernels for the aggregate observer (20,350 trials). Same plotting conventions as Figure 2. Shaded regions in A, E and I show ±1 SEM.
Figure 4
 
Combined kernels for both luminance and texture experiments (aggregate observer). Top row shows combined kernels for the luminance experiments obtained from all noise images (A), target-present images only (B) and target-absent images only (C). First-order combined kernels (black) were computed by subtracting the surround first-order kernel from the center first-order kernel. Second-order combined kernels (red) were computed by summing the diagonal second-order kernels for center and surround, and by subtracting that for center-surround (see Materials and methods). (D–F) show combined kernels for the texture experiments. Yellow and blue lines in A show regions used to compute modulation ratios (see Materials and methods and Figure 5). Yellow trace in C shows best-fitting exponential to the red trace (thin yellow lines show 95% confidence intervals for the fit). Shaded regions show ±1 SEM.
Figure 4
 
Combined kernels for both luminance and texture experiments (aggregate observer). Top row shows combined kernels for the luminance experiments obtained from all noise images (A), target-present images only (B) and target-absent images only (C). First-order combined kernels (black) were computed by subtracting the surround first-order kernel from the center first-order kernel. Second-order combined kernels (red) were computed by summing the diagonal second-order kernels for center and surround, and by subtracting that for center-surround (see Materials and methods). (D–F) show combined kernels for the texture experiments. Yellow and blue lines in A show regions used to compute modulation ratios (see Materials and methods and Figure 5). Yellow trace in C shows best-fitting exponential to the red trace (thin yellow lines show 95% confidence intervals for the fit). Shaded regions show ±1 SEM.
Because we found some variability across observers, it is difficult to draw conclusions from simply inspecting individual perceptual filters (Extra Figures 3 and 4). We therefore performed additional analyses that captured some relevant aspects of both first-order and second-order kernels, and quantified each aspect using a single value for each perceptual filter (scalar metrics, see Materials and methods). This made it then possible to perform simple population statistics in the form of t-tests, and confirm or reject specific hypotheses about the overall shape of the perceptual filters. Our conclusions are therefore based on individual observer data, not on the aggregate observer. This distinction is important because there is no generally accepted procedure for generating an average perceptual filter from individual images for different observers (see Neri & Levi, 2008b for a detailed discussion of this issue). 
Assessment of kernel structure using scalar metrics
There are numerous aspects that may be quantified with regard to the general shape of the kernels. Our focus here is on the asynchrony between first-order and second-order kernels. For this reason we adopted 4 metrics that reflect whether a given trace operates mostly within the early as opposed to the late epoch of the stimulus (see Materials and methods): 1) centroid of the kernel across time; 2) time of occurrence of largest peak; 3) ratio between modulations in the early epoch and late epoch; 4) ratio between modulations within early + late epochs and the central epoch (around target occurrence). For the luminance experiments, it is reasonable to focus on the results obtained from the combined traces ( Figure 4) because first-order kernels share the same overall shape and so do second-order kernels ( Figure 2). By combining them into two traces (one for first-order and one for second-order) we are gaining resolving power from averaging more data without distorting the interpretation of their overall structure. 
Results from the combined luminance traces are indicated by red symbols in Figure 5. Regardless of which image class (different rows) the kernels are computed from, all combined traces show that the centroid is displaced to earlier times for second-order compared with first-order kernels (red points lie below unity line in Figures 5A, 5E, and 5I), the peak for second-order kernels occurs 70–90 ms before that for first-order kernels (red points lie below unity line in Figures 5B, 5F, and 5J; mean peak asynchrony (± SD across observers) is indicated in each panel), the early/late modulation ratio is greater than unity for second-order but not for first-order kernels (red points lie above horizontal dotted line but on vertical dotted line in Figures 5C, 5G, and 5K), the edges/center modulation ratio is smaller than unity for first-order but not for second-order kernels (red points lie to the left of vertical dotted line but on horizontal dotted line in Figures 5D, 5H, and 5L). All these effects were statistically significant (red asterisks in the different panels). 
Figure 5
 
Assessment of luminance kernel structure using scalar metrics. First column (A, E and I) shows centroid for second-order diagonal (y axis) versus first-order kernel (x axis); black symbols for center, blue symbols for surround and red symbols for the combined kernels. Gray shading shows temporal ranges spanned by the target signal. Asterisks (color-coded like the data) indicate whether data points fall significantly away from unity line. Second column (B, F and J) is plotted to similar conventions but reports times of peak occurrence (the plotted values are averages of bootstrap estimates (Efron, 1979) in order to avoid excessive overlap of data points due to peak times being discrete; however all statistical comparisons in the text were carried out on actual peak times). Digits show average time difference (in ms) between first-order and second-order peaks (±1 SD across observers) for combined kernels. Third column (C, G and K) plots early/late modulation ratios (see Materials and methods) using similar conventions, but asterisks next to the horizontal dashed line indicate whether data points fall significantly away (above or below) from the horizontal dashed line, while asterisks next to the vertical dashed line indicate whether data points fall significantly away (left or right) from the vertical dashed line. Fourth column (D, H and L) is plotted to the same conventions as the third column, but plots edges/center modulation ratios (see Materials and methods). Inset to A plots centroid values for center versus surround. Error bars (smaller than symbol when not visible) show ±1 SEM. Different symbol shapes refer to different observers: S1 •, S2 ▪, S3 ♦, S4 ▸, S5 ◂, S6 ▴.
Figure 5
 
Assessment of luminance kernel structure using scalar metrics. First column (A, E and I) shows centroid for second-order diagonal (y axis) versus first-order kernel (x axis); black symbols for center, blue symbols for surround and red symbols for the combined kernels. Gray shading shows temporal ranges spanned by the target signal. Asterisks (color-coded like the data) indicate whether data points fall significantly away from unity line. Second column (B, F and J) is plotted to similar conventions but reports times of peak occurrence (the plotted values are averages of bootstrap estimates (Efron, 1979) in order to avoid excessive overlap of data points due to peak times being discrete; however all statistical comparisons in the text were carried out on actual peak times). Digits show average time difference (in ms) between first-order and second-order peaks (±1 SD across observers) for combined kernels. Third column (C, G and K) plots early/late modulation ratios (see Materials and methods) using similar conventions, but asterisks next to the horizontal dashed line indicate whether data points fall significantly away (above or below) from the horizontal dashed line, while asterisks next to the vertical dashed line indicate whether data points fall significantly away (left or right) from the vertical dashed line. Fourth column (D, H and L) is plotted to the same conventions as the third column, but plots edges/center modulation ratios (see Materials and methods). Inset to A plots centroid values for center versus surround. Error bars (smaller than symbol when not visible) show ±1 SEM. Different symbol shapes refer to different observers: S1 •, S2 ▪, S3 ♦, S4 ▸, S5 ◂, S6 ▴.
This remarkable degree of consistency across metrics and noise image classes can be summarized by the following statement: luminance first-order kernels peak around target occurrence and are approximately symmetric before and after this time, while second-order kernels peak ∼0.1 sec before target occurrence and modulate mostly in the early epoch. These conclusions match the qualitative observations made earlier ( Figure 2). It is unlikely that the value of the measured asynchrony is simply a consequence of the chosen number of pre-target noise frames, because a similar estimate was obtained from target-absent noise fields ( Figure 5J). The underlying mechanisms are likely to be hard-wired because the second-order modulation indicates that observers used a suboptimal strategy to perform the task (the early part of the stimulus does not provide useful information for detecting the target). In contrast to previous work (Neri & Heeger, 2002) here we delivered trial-by-trial feedback to the observers, excluding the possibility that their perceptual mechanisms may have been able to readjust and optimize themselves for the task at hand (Droll, Abbey, & Eckstein, 2009). 
When the metric analysis is performed on the texture kernels, the results indicate that first-order and second-order kernels are more similar in structure than we observed for luminance kernels. In general, both first-order and second-order kernels peaked at the same time and modulated mostly within the same epoch (Extra Figure 5). It is of interest that, although there was no detectable difference in peak occurrence between second-order and first-order kernels for the center texture (solid black symbols in Extra Figure 5B), second-order (but not first-order) kernels showed a significant early versus late modulation bias (solid black symbols in Extra Figure 5C) in line with a reverse correlation study of orientation processing that bears some relation to the texture experiments presented here (Nagai et al., 2007). Although their study differed from ours in several important respects, Nagai et al. (2007) found that variance kernels modulated mostly before the target and displayed an asynchrony of ∼0.1 sec with respect to the first-order linear kernels (see their Discussion, page 12) while at the same time they were unable to identify a reliable difference in peak occurrence between variance and first-order kernels, similarly to what we found here. 
Assessment of kernel structure for NL(N) and LNL(N) modulations
The addition of one or two simple stages to the L(N) model does predict second-order modulations. In particular, two other standard parametric models are commonly considered in the literature (Marmarelis, 2004; Naka et al., 1988; Westwick & Kearney, 2003): the Hammerstein NL(N) model (Figure 6A) and the Korenberg LNL(N) model (Figure 6E). The former predicts that the main-diagonal (variance) modulation in the second-order kernel is a scaled first-order kernel, while the latter predicts that the marginal average across either columns or rows of the second-order kernel is a scaled first-order kernel (see 1). We estimated the extent to which these predictions apply despite the presence of a target signal, internal noise and the decisional transducer (see Extra Figure 6). 
Figure 6
 
Applicability of Hammerstein (A) and Korenberg (E) cascade models to the experimental data (see 1 and Extra Figure 6 for detailed description of the implementation used here). Amplitude of first-order kernel at each time point is plotted on x axis against either amplitude of second-order diagonal at each corresponding time point (B–C) or amplitude of second-order marginal (F–G) separately for each subject, for both luminance (B and F) and texture kernels (C and G) and for both center (black) and surround (blue), in units of their SD. Red and yellow ovals reflect the linear fit characteristics for center and surround respectively (the oval is tilted to align with best-fit line, positioned at center of mass, with parallel-to-line and orthogonal-to-line widths equal to standard deviations of the data across the two axes). The correlation values for the linear fits from B–C are plotted in D (for texture on the y axis versus luminance on the x axis), and from F–G in H. The shaded regions show ranges expected from simulations of both cascade models with varying degrees of internal noise and external SNR, constrained by output performance of the human observers (see Extra Figure 6). Projected ranges are only plotted with reference to the texture axis because luminance correlations were not significantly different from 0 in either D or H (see text), and are shown separately for center (gray) and surround (light blue). Error bars (smaller than symbol when not visible) show ±1 SEM.
Figure 6
 
Applicability of Hammerstein (A) and Korenberg (E) cascade models to the experimental data (see 1 and Extra Figure 6 for detailed description of the implementation used here). Amplitude of first-order kernel at each time point is plotted on x axis against either amplitude of second-order diagonal at each corresponding time point (B–C) or amplitude of second-order marginal (F–G) separately for each subject, for both luminance (B and F) and texture kernels (C and G) and for both center (black) and surround (blue), in units of their SD. Red and yellow ovals reflect the linear fit characteristics for center and surround respectively (the oval is tilted to align with best-fit line, positioned at center of mass, with parallel-to-line and orthogonal-to-line widths equal to standard deviations of the data across the two axes). The correlation values for the linear fits from B–C are plotted in D (for texture on the y axis versus luminance on the x axis), and from F–G in H. The shaded regions show ranges expected from simulations of both cascade models with varying degrees of internal noise and external SNR, constrained by output performance of the human observers (see Extra Figure 6). Projected ranges are only plotted with reference to the texture axis because luminance correlations were not significantly different from 0 in either D or H (see text), and are shown separately for center (gray) and surround (light blue). Error bars (smaller than symbol when not visible) show ±1 SEM.
Figures 6B and 6C test the NL(N) prediction for both luminance (B) and texture (C) kernels by plotting the second-order diagonal (y axis) versus the first-order kernel (x axis) for all subjects. For both center (black) and surround (blue) kernels, there seems to be little correlation in the luminance case (red and yellow ovals in Figure 6B are often circular) but significant correlations in the texture case (ovals are elongated in 6C). This difference is apparent when the corresponding correlation coefficients are plotted for luminance (x axis) versus texture (y axis) in Figure 6D: luminance correlations are not different from 0 (they fall on the vertical dashed line) for both center (black, p = 0.8) and surround (blue, p = 0.47) kernels. In contrast, texture center kernels show positive correlations (black points fall above the horizontal dashed line, p < 10 −3) and surround kernels show negative correlations (blue points fall below the horizontal dashed line, p < 0.03). 
The shaded regions show expected ranges for the correlation values as estimated from simulation (see Extra Figure 6). They represent values that we may reasonably expect for our sample of observers, and show that center kernels are expected to fall within mid-to-high positive correlation values (shaded black region) while surround kernels are expected to show either no correlation or positive correlation values (shaded blue region). Neither the luminance nor the texture data fall within the expected ranges: as detailed above the luminance center kernels do not show significant positive correlations, while the texture surround kernels show significant negative correlations. The sign of the simulated ranges can be inverted by choosing a different function for the early nonlinearity (Marmarelis, 2004), but as long as it is the same for center and surround it will predict same sign for both and will therefore fail to explain the texture data. 
Figures 6F and 6G test the LNL(N) prediction by plotting the second-order marginal average versus the first-order kernel, and Figure 6H shows the corresponding correlation coefficients. Similarly to Figure 6D, correlation values for luminance kernels are not significantly different from 0 for either center ( p = 0.35) or surround ( p = 0.65), while texture kernels show positive correlations for the center (black points lie above the horizontal dashed line, p < 10 −3) and negative correlations for the surround (blue points lie below the horizontal dashed line, p < 0.01). The correlation values for both center and surround texture kernels fall within the ranges expected from simulation (shaded regions), indicating that the LNL(N) model may provide an adequate description of the mechanism underlying texture (but not luminance) processing in our experiments; this result is in line with existing models of human texture processing which are mostly of the LNL type also termed filter-rectify-filter (FRF) models (see Supplementary Text “Relation of texture kernels to filter-rectify-filter modeling schemes”). It is also relevant in this context that the uncertainty model can be approximated by a LNL cascade with a highly expansive N (see 2). 
Intuitively, the reason why the above-detailed parametric models fail to capture the structure of the luminance kernels is that they all predict that second-order kernels should indirectly reflect the structure of first-order kernels, relayed via static nonlinearities. However, as demonstrated by the results obtained using scalar metrics, even a coarse assessment of kernel structure shows that first-order and second-order luminance kernels exhibit different temporal dynamics ( Figure 5), making it unlikely that the latter reflect the former in a straightforward fashion. In this sense the two analyses detailed above (assessment of kernel structure using coarse metrics of temporal dynamics in the previous section and parametric models in this section) offer two views of the same underlying result, i.e. that while first- and second-order kernels look similar for the texture experiments, their structure differs for the luminance experiments. This consideration applies not only to the comparison between the first-order kernel and the diagonal of the second-order kernel, as afforded by the preceding analysis, but also to modulations outside the diagonal region (off-diagonal modulations): both first- and second-order kernels are lowpass in the texture experiments ( Figure 3), while the first-order is lowpass and the second-order is bandpass in the luminance experiments ( Figure 2). These characteristics are analyzed in detail in the following two sections. 
Off-diagonal modulations
The analyses presented so far have focused primarily on the diagonal region of second-order kernels. The results presented in Figures 6F6H were computed from the whole kernel, but marginals are dominated by the diagonals in our experiments (hence the similarity between Figures 6D and 6H). It is clear that there are modulations of interest in off-diagonal regions. More specifically, it appears that luminance second-order kernels contain antagonistic flanks (negative modulations running along the sides of the main diagonal for center-center and surround-surround second-order kernels in Figures 2B and 2C, and positive modulations for center-surround kernels in Figure 2D). This feature is absent from texture second-order kernels (see Figure 3). 
Because most second-order modulations appear aligned with the diagonal, we can capture off-diagonal modulations by averaging across the diagonal axis. The resulting marginal averages are shown by tilted white traces in Figures 2 and 3 and confirm the presence of a Mexican-hat profile for luminance second-order kernels, but not for texture kernels. This difference in shape is reflected in the Fourier power: luminance traces (black) generate a bandpass spectrum while texture traces (red) generate a lowpass spectrum ( Figure 7A). 
Figure 7
 
Structure of second-order off-diagonal regions. Black trace in A corresponds to white trace (marginal average along main diagonal) in Figure 2B for luminance center-center second-order kernel, red trace to white trace in Figure 3B for texture kernel. Right traces show corresponding Fourier power spectra, from which we computed a bandpass index (see Materials and methods) for both texture (plotted on y axis in B) and luminance kernels (x axis) shown separately for center-center (black), surround-surround (blue), center-surround (gray) and combined (across all previous three) kernels (red). Dashed lines show values expected for a Gaussian filter (see Materials and methods). Bars next to axes show distributions pooled across all data points. Black and blue traces in C show on-axis modulations within center-center and surround-surround kernels respectively. Inset shows center-center kernel from Figure 2B, whose on-axis modulation profile is indicated by the red rectangle. Magenta trace shows first-order kernel of M (parasol) retinal ganglion cell estimated by Benardete and Kaplan (1999). Thin black line shows the uncertainty filter from parameter average (see Figure 9F). All traces in C have been rescaled (across y axis) and time-shifted (along x axis) so that their peaks equal 1 and are aligned at time 0. Bandpass index for on-axis profiles is plotted on the y axis in D versus mean frequency of power spectrum distribution on the x axis for each subject (different symbols), for both luminance (solid) and texture (open) kernels and for both center (black) and surround (blue). Bars next to axes show corresponding distributions. Dashed horizontal line as in B, vertical magenta line indicates expected value for magenta trace in C. Shaded regions (A and C) and error bars (B and D) show ±1 SEM.
Figure 7
 
Structure of second-order off-diagonal regions. Black trace in A corresponds to white trace (marginal average along main diagonal) in Figure 2B for luminance center-center second-order kernel, red trace to white trace in Figure 3B for texture kernel. Right traces show corresponding Fourier power spectra, from which we computed a bandpass index (see Materials and methods) for both texture (plotted on y axis in B) and luminance kernels (x axis) shown separately for center-center (black), surround-surround (blue), center-surround (gray) and combined (across all previous three) kernels (red). Dashed lines show values expected for a Gaussian filter (see Materials and methods). Bars next to axes show distributions pooled across all data points. Black and blue traces in C show on-axis modulations within center-center and surround-surround kernels respectively. Inset shows center-center kernel from Figure 2B, whose on-axis modulation profile is indicated by the red rectangle. Magenta trace shows first-order kernel of M (parasol) retinal ganglion cell estimated by Benardete and Kaplan (1999). Thin black line shows the uncertainty filter from parameter average (see Figure 9F). All traces in C have been rescaled (across y axis) and time-shifted (along x axis) so that their peaks equal 1 and are aligned at time 0. Bandpass index for on-axis profiles is plotted on the y axis in D versus mean frequency of power spectrum distribution on the x axis for each subject (different symbols), for both luminance (solid) and texture (open) kernels and for both center (black) and surround (blue). Bars next to axes show corresponding distributions. Dashed horizontal line as in B, vertical magenta line indicates expected value for magenta trace in C. Shaded regions (A and C) and error bars (B and D) show ±1 SEM.
To assess the significance of these differences across observers we computed a simple bandpass index (see Materials and methods). Figure 7B shows that this index is larger for luminance second-order kernels of all types ( p < 0.02) as data points fall below the unity line (the only exception being center-surround kernels, p = 0.49). Furthermore, texture kernels generate values that are not statistically different ( p > 0.05) from the bandpass index expected for a lowpass Gabor filter (i.e. Gaussian) indicated by the dashed line (see distribution bars next to y axis), whereas luminance indices (except for center-surround p = 0.06) fall significantly above ( p < 10 −3) this value (see distribution bars next to x axis). We conclude from this analysis that our earlier qualitative observations on off-diagonal modulations are quantitatively correct and conserved across observers. 
Estimation of front-end filtering
The bandpass nature of luminance second-order kernels is of particular interest because previous literature has exposed first-order operators of biphasic structure (i.e. relatively bandpass) in retinal cells (e.g. Baccus, Ölveczky, Manu, & Meister, 2008; Benardete & Kaplan, 1999; Marmarelis & Naka, 1972; Victor, 1987) and similar temporal impulse responses in human vision (Stromeyer & Martini, 2003; Watson, 1982). No such structure is visible in the luminance first-order kernels (black traces in Figures 4A4C; notice that first-order kernels do not necessarily correspond to the system impulse response (Korenberg & Hunter, 1996)). However it is known that for the large class of models of the LNL(N) type, which are directly relevant to retinal processing (Field & Rieke, 2002; Spekreijse, 1969; Spekreijse & Oosting, 1970; Victor, 1988), the first row (or first column) of second-order kernels (termed on-axis modulation here) reflects the shape of the front-end linear stage (Korenberg & Hunter, 1996; Marmarelis, 2004; see Materials and methods). Although we showed in Figure 6H that our luminance results are unlikely to be explained by an LNL(N) model, they may be captured at least in part by this class of models raising the possibility that the on-axis modulations of second-order luminance kernels may preserve some trace of the front-end filtering stage. 
This speculation is supported by the similarity between on-axis second-order modulations, shown by the black (center) and blue (surround) traces in Figure 7C, and the impulse response of primate parasol (M-projecting) retinal ganglion cells (magenta trace). We quantified this similarity by comparing their bandpass characteristics across observers. Figure 7D shows that on-axis modulations, like the diagonal marginal averages, are bandpass in nature (solid points fall above horizontal dashed line at p < 10 −3; see also solid distribution bars next to y axis), and the average frequency from their power distribution is very close to the value for retinal ganglion cells (solid distribution bars next to x axis fall near vertical magenta line, p = 0.26). Texture kernels are less bandpass (open distribution bars next to y axis fall near horizontal dashed line although they differ from it at p = 0.04) and slower than retinal ganglion cells (open distribution bars next to x axis fall to the left of magenta line, p < 10 −7). We conclude from this analysis that, notwithstanding the lack of biphasic structure in the luminance first-order kernels, visual processing in the conditions of our luminance experiments likely involved biphasic front-end filters with characteristics similar to those reported for retinal ganglion cells. 
Gain-control model of retinal processing
The analysis in Figures 7C and 7D prompted us to develop a retinal model of luminance processing that would capture the main characteristics of the observed kernels ( Figure 8A). In this model the visual stimulus is linearly filtered by the temporal biphasic response function of M retinal ganglion cells (magenta trace in Figure 7C) via convolution. The output of this linear stage is half-wave rectified (Niu & Yuan, 2008) and incorporated into a differential equation to determine the time-varying response of the system. This is integrated across time to generate the final decision variable for selecting either ‘left’ or ‘right’ stimulus (see Materials and methods). These few modules can replicate several features of first-order kernels, but they fail to account for the structure of second-order kernels. We therefore equipped the model with an additional component, which played a critical role in replicating the nonlinear kernels. This component consisted of a gain-control feedback loop (red in Figure 8A). The choice of a servo-mechanism was mainly prompted by the damped profile of the second-order main diagonal (red trace in Figure 4C) which is common in nonlinear feedback systems (Marmarelis, 1991; Wiener, 1948). Gain control is also a well-characterized phenomenon in both vertebrate (Sakai et al., 1995; Shapley & Enroth-Cugell, 1984) and invertebrate (van Hateren & Snippe, 2001) retinae. 
Figure 8
 
Retinal model and associated kernels. (A) shows model structure: input stimulus is convolved (*) with center-surround receptive field (black and blue smooth traces) using first-order filter from primate retinal ganglion cells (magenta trace in Figure 7C). Output from this stage is half-wave rectified and fed onto an accumulator (inverted triangular symbol; implemented by differential equation, see Materials and methods), then integrated across time (Σt) to yield scalar decision variable further converted into psychophysical decision. Time-varying signal from accumulator stage is carried back into circuit with temporal delay τ = 60 ms to control gain of linear stage via divisive inhibition. Gain control loop is highlighted in red. The model was used to simulate a psychophysical experiment adopting same parameters selected for human observers. Associated kernels are plotted in B–J (because center-center and surround-surround kernels are intrinsically symmetric it was sufficient to plot only half of each). Plotting conventions are similar to Figure 2, except shaded regions in B, E and H show ±1 SD across repeated simulations and second-order kernels are plotted as Zm score maps where Zm is in units of SD rather than SEM (colored for ∣Zm∣ > 2). Yellow trace in H shows best-fitting exponential to red trace (thin yellow lines show 95% confidence intervals) similarly to Figure 4C. Second-order traces for surround-surround kernels (yellow) overlap with center-center kernels (red) so are not visible separately in B, E and H.
Figure 8
 
Retinal model and associated kernels. (A) shows model structure: input stimulus is convolved (*) with center-surround receptive field (black and blue smooth traces) using first-order filter from primate retinal ganglion cells (magenta trace in Figure 7C). Output from this stage is half-wave rectified and fed onto an accumulator (inverted triangular symbol; implemented by differential equation, see Materials and methods), then integrated across time (Σt) to yield scalar decision variable further converted into psychophysical decision. Time-varying signal from accumulator stage is carried back into circuit with temporal delay τ = 60 ms to control gain of linear stage via divisive inhibition. Gain control loop is highlighted in red. The model was used to simulate a psychophysical experiment adopting same parameters selected for human observers. Associated kernels are plotted in B–J (because center-center and surround-surround kernels are intrinsically symmetric it was sufficient to plot only half of each). Plotting conventions are similar to Figure 2, except shaded regions in B, E and H show ±1 SD across repeated simulations and second-order kernels are plotted as Zm score maps where Zm is in units of SD rather than SEM (colored for ∣Zm∣ > 2). Yellow trace in H shows best-fitting exponential to red trace (thin yellow lines show 95% confidence intervals) similarly to Figure 4C. Second-order traces for surround-surround kernels (yellow) overlap with center-center kernels (red) so are not visible separately in B, E and H.
As shown in Figure 8, the simple model outlined above is able to capture all the essential features of the empirical results (compare Figures 8B8J with Figure 2). The decreasing profile of the second-order diagonal (red trace) is well-fitted by an exponential decay function with τ = 202 ± 9 ms (yellow trace in Figure 8H), which overlaps with 185 ± 28 ms for the best-fitting exponential from the aggregate experimental kernel (yellow trace in Figure 4C). Most importantly, the model replicates the different temporal evolution of first-order and second-order kernels from target-absent noise images ( Figure 8H), a result which required the introduction of a temporal delay of 60 ms within the gain-control loop ( τ in Figure 8A). This figure is in excellent agreement with relevant physiological and psychophysical estimates (see Supplementary Text “Potential relation of luminance kernels to underlying physiology”). Although only indirectly related, our earlier work on the temporal dynamics of directional processing in humans (Neri & Levi, 2008a) required the introduction of a comparable delay (90 ms) in the divisive feedback loop of the motion detector model. 
Simulation of trial-by-trial human responses by nonlinear second-order models
As noted earlier significant modulations within second-order kernels ( Figures 2 and 3) demonstrate that human behavior cannot be described by linear filters alone (Neri, 2004; see also Extra Figure 1 and 2), but do not imply that second-order kernels would allow better prediction of human responses on individual trials. The latter possibility can be tested directly by computing trial-by-trial responses from a simulated observer that uses only the first-order kernels as well as one that uses both first-order and second-order kernels, and measure how well the two models match the corresponding human responses to the same set of stimuli. We use the term ‘consistency’ to refer to the percentage of trials on which two processes, for example model and human, gave the same answer (Neri & Levi, 2006), a metric directly related to Hamming distance (Hamming, 1950). Figures 9A and 9B plot consistency between the first-order model and human responses on the x axis, versus consistency between a series of nonlinear models and human responses on the y axis (we use the linear model as benchmark). Consistency for the nonlinear second-order model is statistically better than the linear first-order model only for the luminance experiments (red symbols in Figure 9A fall significantly above the unity line (p < 0.02 paired t-test) and red distribution bars at bottom left fall to the left of the main diagonal line), not for the texture experiments (Figure 9B, p = 0.88). 
Figure 9
 
Replicability of human responses for different nonlinear models. (A) Linear-human consistency (percentage of trials on which the linear first-order model and the human observer gave the same response to the same set of stimuli) is plotted on the x axis for luminance kernels against model-human consistency for seven different nonlinear models. Red and light-red symbols refer to models that incorporate second-order kernels, blue and light-blue symbols refer to variants of the uncertainty model (see Materials and methods). Solid red symbols refer to the full nonlinear second-order model; light-red symbols refer to the same model when regions outside the diagonal of the second-order kernels are set to 0, open red symbols when regions within the diagonal are set to 0. Solid blue symbols refer to the uncertainty model with different f filters for different observers (individually optimized) and front-end convolution weighted by a linearly decreasing function, open blue symbols when weighted by constant function; solid light-blue symbols refer to same f for all observers (obtained by averaging individually optimized f parameters) and linearly decreasing weighting function; open light-blue symbols refer to f equal to the first-order kernel derived from target-present images (different f's were applied to center and surround) and linearly decreasing weighting function. Bars at bottom-left and top-right show corresponding distributions after pooling across all second-order models (red) or all uncertainty models (blue). Green symbols refer to a model that responds correctly on every trial. (B) plots the same for texture kernels. C and D plot model-human consistency for the full nonlinear second-order model as a function of k (relative contribution of nonlinear kernel) ranging from 0 to ∞. Red vertical lines show peak locations for the different traces (one trace per observer). Corresponding k values are plotted in E for luminance (x axis) versus texture (y axis). Dashed lines mark unity. (F) Filter functions used in the uncertainty model for both luminance (left) and texture (right) simulations, when the weighting function (red line) was constant (top) and linearly decreasing (bottom). Gray traces show individually optimized filters for different observers, blue traces show the function corresponding to the average of the individually optimized parameters. Error bars (smaller than symbol when not visible) show ±1 SEM. For panel E they were computed using Hessian-based delta method (Seber & Wild, 2003).
Figure 9
 
Replicability of human responses for different nonlinear models. (A) Linear-human consistency (percentage of trials on which the linear first-order model and the human observer gave the same response to the same set of stimuli) is plotted on the x axis for luminance kernels against model-human consistency for seven different nonlinear models. Red and light-red symbols refer to models that incorporate second-order kernels, blue and light-blue symbols refer to variants of the uncertainty model (see Materials and methods). Solid red symbols refer to the full nonlinear second-order model; light-red symbols refer to the same model when regions outside the diagonal of the second-order kernels are set to 0, open red symbols when regions within the diagonal are set to 0. Solid blue symbols refer to the uncertainty model with different f filters for different observers (individually optimized) and front-end convolution weighted by a linearly decreasing function, open blue symbols when weighted by constant function; solid light-blue symbols refer to same f for all observers (obtained by averaging individually optimized f parameters) and linearly decreasing weighting function; open light-blue symbols refer to f equal to the first-order kernel derived from target-present images (different f's were applied to center and surround) and linearly decreasing weighting function. Bars at bottom-left and top-right show corresponding distributions after pooling across all second-order models (red) or all uncertainty models (blue). Green symbols refer to a model that responds correctly on every trial. (B) plots the same for texture kernels. C and D plot model-human consistency for the full nonlinear second-order model as a function of k (relative contribution of nonlinear kernel) ranging from 0 to ∞. Red vertical lines show peak locations for the different traces (one trace per observer). Corresponding k values are plotted in E for luminance (x axis) versus texture (y axis). Dashed lines mark unity. (F) Filter functions used in the uncertainty model for both luminance (left) and texture (right) simulations, when the weighting function (red line) was constant (top) and linearly decreasing (bottom). Gray traces show individually optimized filters for different observers, blue traces show the function corresponding to the average of the individually optimized parameters. Error bars (smaller than symbol when not visible) show ±1 SEM. For panel E they were computed using Hessian-based delta method (Seber & Wild, 2003).
Both first-order and second-order models have no free parameters. The relative contribution of all nonlinear kernels compared with linear kernels (which we term k here) is theoretically expected to equal 1 provided kernels are appropriately scaled (Marmarelis & Marmarelis, 1978; Neri, 2004), however k may depend on a number of factors and therefore differ from 1 in practice (see Materials and methods). This may explain the poor performance of the nonlinear second-order model in the texture experiments if the true k is significantly different from the theoretical expectation of unity for the texture experiments and not for the luminance experiments. To demonstrate that this is not the reason for the discrepancy, we computed consistency for a large range of k values, and then selected the best value for each subject separately. The resulting consistency values for the nonlinear second-order model were nevertheless statistically no different from the first-order model for the texture experiments (p > 0.05). Furthermore, the curve describing how consistency varies with k was unimodal and remarkably similar across subjects only for the luminance experiments (Figure 9C, k = 1.25 ± 0.44 SD across observers), not for the texture experiments (Figure 9D, k = 15.6 ± 25.2), and k was close to the expected value of 1 only for luminance kernels (horizontal red bar in Figure 9E spans a tight region overlapping with the vertical dashed line). Finally, the average increase in consistency from adding second-order kernels was 3% in the luminance experiments (∼0% in the texture experiments). Given that the effective dynamic range for consistency is between 50% (chance) and 100%, this figure corresponds to a larger % improvement over the linear model. When consistency is converted into d′ (sensitivity index for how well the model replicates the human observer) which maps chance to 0 (see black numeric labels in Figures 9A and 9B), the improvement can be more correctly estimated at ∼14%. 
The ability to capture individual observer differences is relevant in this context as demonstrated by testing the nonlinear second-order model using the aggregate kernels for all observers (rather than individually derived kernels). This procedure leads to a significant drop (∼2%) in model-human consistency for the second-order model ( p < 0.01) to the extent that it no longer outperforms the linear model that uses the individually derived first-order kernels ( p = 0.22) and the associated k values are ∼5 × smaller (significant difference at p < 10 −3). However this model still outperforms the corresponding linear model that uses the aggregate first-order kernel ( p < 0.01) because a larger drop in performance (∼3%) is observed for the linear model when it is not tailored to individual observer variations ( p < 0.01). 
We were also interested in whether the contribution of the second-order kernels in the luminance experiments was entirely attributable to the diagonal of the kernels, or whether the off-diagonal modulations also played a role (Hung & Stark, 1979). We therefore computed consistency when the regions outside the diagonal were set to 0 (light-red symbols), and when the region within the diagonal was set to 0 (open red symbols). We found that both versions still significantly outperformed the linear model (p < 0.03 and p < 0.01 respectively), but both were also worse than the original full second-order model (p < 0.01 and p < 0.04 respectively) and did not differ between them (p = 0.52). We conclude that regions outside the main diagonal in second-order kernels do play a role, possibly as significant as the role played by the diagonal region. This result is consistent with the analysis in Figure 7 demonstrating that off-diagonal regions are highly structured in the luminance kernels. 
Consistency-based comparisons between Volterra models of different order do not allow meaningful statements about their relative ‘goodness’ or power/complexity trade-off, because they involve large and very different number of estimated parameters (Figure 2 in Watanabe & Stark, 1975; see also Korenberg & Hunter, 1996); this can lead to significant overfitting for limited data sets. Comparisons based on methods that account for the number of model parameters (Akaike, 1974; Miller, 1990) place a nearly insurmountable penalty on the full second-order model, particularly when the dimensionality of the system output is much smaller than the dimensionality of the input and the system internal noise is high (Marmarelis & Naka, 1972; see also Marmarelis, 1993 for parameter reduction using Laguerre expansions and Korenberg & Hunter, 1996 for potentially associated pitfalls (Korenberg, Bruder, & McIlroy, 1988)). However partial second-order models incorporating only main diagonals or off-diagonals (detailed above) have significantly less parameters. These reduced models outperformed the linear model (p < 0.05) after accounting for parameterization via final prediction error (FPE) analysis (Akaike, 1970; Hong et al., 2008; see Materials and methods). As shown by the texture experiments (Figure 9B) this result is not in general trivially obtained (see for example early studies on compensatory motor tracking in humans (Hung & Stark, 1977, 1991) where the contribution of nonlinear processing to human behavior cannot be resolved due to excessive system noise). 
Simulation of trial-by-trial human responses by uncertainty models
For direct comparison with the nonlinear second-order model, we implemented four different versions of the most common nonlinear model in visual psychophysics: the uncertainty model (Pelli, 1985). In a 2AFC experiment this class of models also approximates probability summation models (Watson, 1979; see Materials and methods). In the first version of the uncertainty model, the visual stimulus is convolved with front-end filter f and the largest value in the convolution is selected as output. The filter f is a wavelet defined by 4 parameters which we optimized individually for each subject (see Materials and methods). The top row in Figure 9F shows the optimal filters for all subjects in both luminance (left) and texture (right) experiments. The second version of the uncertainty model is similar except the output from the convolution is weighted by a linearly decreasing function before selecting the largest value (red trace in bottom row of Figure 9F). In this version the simulated observer relies more on the early portion of the stimulus. We found that this weighting function provided better results so we adopted it for the remaining two versions of the model. The third version is identical to the second except f was the same for all observers and was derived from the average of the optimal parameters defined for the second version (blue in Figure 9F). In the fourth version f corresponded to the empirically determined first-order kernels from target-present noise images. We tested this version because previous authors have suggested that sensory kernels derived from target-present noise images expose the underlying filter in the face of uncertainty (Tjan & Nandy, 2006), and because this result was indirectly implied by our earlier theoretical work (Neri, 2004). 
Compared with the linear first-order model, the uncertainty model was at best equally good but most often worse (blue and light-blue symbols in Figures 9A and 9B). It never performed as well as the nonlinear second-order model (blue symbols fall below red symbols and blue distribution bars at top right fall to the right of the main diagonal line): the best uncertainty model (version number 2 indicated by solid blue symbols) was significantly worse than the full second-order model (solid red symbols) in both luminance ( p < 0.01) and texture ( p < 0.03) experiments. The associated kernels were sometimes similar to those derived empirically (Extra Figure 8), although they struggled to replicate the asynchrony for kernels derived from target-absent noise images (see Extra Figure 8C and Extra Figure 9) except for the uncertainty model where f is derived from the optimal parameters averaged across subjects (Extra Figure 10I; the model-human consistency associated with this model was worse than the linear model ( p < 0.02) and (in d′ units) 32% poorer than the nonlinear second-order model). The main weakness of all uncertainty models was that, regardless of f, the decreasing temporal envelope was critical for simulating second-order temporal dynamics and this model component is entirely arbitrary: if the envelope is viewed as an intrinsic property of the detection mechanism (e.g. fast adaptation) the model itself does not provide a mechanistic explanation for its origin (it is simply imposed by the modeler); if on the other hand it is viewed as a top-down cognitive operator (e.g. observers pay more attention to the beginning of the stimulus presentation) then the question arises as to the temporal resolution at which observers may be able to access the temporally varying output of a given channel. Channel output may be accessed dynamically over several seconds (Anderson, Murphy, & Jones, 2007) but less likely on a timescale <1/4 second roughly equivalent to attentional dwell time (Duncan, Ward, & Shapiro, 1994). Whichever interpretation is adopted (whether low-level or higher-level) this aspect of the uncertainty models appears unsatisfactory. 
The second relevant component of the uncertainty model is the linear receptive field f that supports the convolution. The luminance function obtained by averaging the optimized parameters across observers (thin black trace in Figure 7C) differs significantly from previous estimates of temporal impulse responses for contrast processing: it is more sluggish by a factor of 2 and is triphasic, inconsistent with the fast biphasic impulse response commonly observed in retinal ganglion cells (magenta trace in Figure 7C) and human vision (Stromeyer & Martini, 2003; Watson, 1982). Finally, the kernels generated by uncertainty models failed to simulate the differential effects of replicability on correct and incorrect trials observed experimentally and the correlation of kernel outputs (Extra Figure 11). 
Model-human consistency against maximum accountable consistency
Figures 9A and 9B show that the nonlinear second-order model was able to replicate human responses more efficiently than all other models we tested, but we wished to establish whether the performance of this model could still be improved or whether it reached maximum achievable performance. It is known that human responses cannot be replicated with 100% accuracy due to observer internal noise (Burgess & Colborne, 1988; Green, 1964; Green & Swets, 1966), thus establishing an upper limit on model-human consistency dictated by the statistics of internal noise. It is possible to determine a range of consistency for the best possible model without making any assumption about the distribution of internal noise (Neri & Levi, 2006). If a given model lies within this range, the model is very close to the best possible model. The range varies as a function of an indirect measure of internal noise statistics, the consistency between two different sets of responses given by the same human observer to the same set of stimuli (double-pass technique, see Materials and methods). 
Figure 10A plots human-human consistency obtained in this way on the x axis, versus model-human consistency for the nonlinear second-order model. The gray-shaded region shows the range spanned by the best possible model (see Materials and methods). For the luminance experiments (solid symbols) the second-order model performs well in that all observers except possibly one fall within the gray region. The result is not as good for the texture experiments (open symbols): half the observers fall below the gray region and only one is clearly within it. We conclude that, at least in the case of the luminance experiments, the ability to replicate human response achieved by the nonlinear second-order model is not only satisfactory in relative terms with respect to other models ( Figure 9A) but also in absolute terms with respect to the best possible model ( Figure 10A). 
Figure 10
 
Accounting for internal noise. (A) Model-human consistency for full nonlinear second-order models is plotted on y axis against human-human consistency on x axis (estimated from double-pass experiments, see Materials and methods). Gray area shows region spanned by best possible model (see Neri & Levi, 2006 and Materials and methods). (B) plots signal discriminability (d′ adjusted for internal noise) on x axis versus internal noise (units of external noise SD) on y axis. Dashed vertical line marks unity. Shaded horizontal region shows range estimated by Burgess and Colborne (1988). Black bars next to axes show mean ± SD for the data points, thick for solid and thin for open. Gray bars next to x axis show output (unadjusted) discriminability (see text). C plots experimentally measured absolute efficiency on x axis versus efficiency predicted by L(N) linear amplifier model (Murray et al., 2005). Inset plots measured efficiency for luminance versus texture. Except for inset, solid symbols refer to luminance data and open symbols to texture data across entire figure. Error bars (smaller than symbol when not visible) show ±1 SEM.
Figure 10
 
Accounting for internal noise. (A) Model-human consistency for full nonlinear second-order models is plotted on y axis against human-human consistency on x axis (estimated from double-pass experiments, see Materials and methods). Gray area shows region spanned by best possible model (see Neri & Levi, 2006 and Materials and methods). (B) plots signal discriminability (d′ adjusted for internal noise) on x axis versus internal noise (units of external noise SD) on y axis. Dashed vertical line marks unity. Shaded horizontal region shows range estimated by Burgess and Colborne (1988). Black bars next to axes show mean ± SD for the data points, thick for solid and thin for open. Gray bars next to x axis show output (unadjusted) discriminability (see text). C plots experimentally measured absolute efficiency on x axis versus efficiency predicted by L(N) linear amplifier model (Murray et al., 2005). Inset plots measured efficiency for luminance versus texture. Except for inset, solid symbols refer to luminance data and open symbols to texture data across entire figure. Error bars (smaller than symbol when not visible) show ±1 SEM.
From Figure 10A it appears that observers were affected by larger amounts of internal noise in the texture experiments compared with the luminance experiments (open symbols are shifted to the left of solid symbols, almost significant at p = 0.054). This conclusion cannot be reached on the basis of human-human consistency alone because this metric depends on both internal noise and signal discriminability (Green & Swets, 1966). These can be estimated from human-human consistency and human % correct responses provided certain assumptions about the underlying statistics and a signal detection model of observer choice (see Burgess & Colborne, 1988 for details). Figure 10B plots both signal discriminability and internal noise estimated using this model (on x and y axes respectively) for both luminance and texture experiments (solid and open symbols respectively). Internal noise was similar between the two sets of experiments, possibly indicating that it was mainly late decisional noise rather than stimulus-specific, and its values were within the range (horizontal shaded region) expected from previous research (Burgess & Colborne, 1988). Although it did not differ significantly between the two conditions (p = 0.24) internal noise was slightly larger in the texture experiments (1.02 ± 0.42 SD across observers) as opposed to the luminance experiments (0.81 ± 0.23; vertical thin bar to the right of y axis is above vertical thick bar). Similarly, signal discriminability was slightly worse in the texture experiments (0.91 ± 0.31) compared with the luminance experiments (1.07 ± 0.18) although the difference was not statistically significant (p = 0.32). It should be emphasized that Figure 10B plots the estimated signal discriminability in the absence of internal noise, which differs from the combined output discriminability normally denoted by d′. Output discriminability (indicated by gray horizontal bars in Figure 10B) was significantly higher for the luminance experiments (p < 0.05). We did not find any significant correlations between metrics and between experiments. 
Measured versus predicted absolute efficiency
The applicability of kernel estimation to the problem of replicating human performance can also be assessed using a separate method developed by Murray, Bennett, and Sekuler (2005). This approach is limited to the first-order kernel with associated L(N) model and only relies on the coarse behavioral metric of absolute efficiency (which does not capture trial-by-trial response patterns). However it presents the significant advantage of being able to account for human internal noise without explicitly measuring it. Figure 10C shows predicted versus measured absolute efficiency for both luminance (solid) and texture (open) experiments. Interestingly, the L(N) model (with internal noise matched to human) underestimates human efficiency for the luminance experiments (solid symbols fall below unity line at p < 0.01), but overestimates it for the texture experiments (open symbols fall above unity line at p < 0.02). It therefore fails to account for both sets of experiments, but it appears that the reasons for its failure differ in the two cases. 
For the luminance experiments, our earlier analysis indicates that the failure of the L(N) model derives from its inability to capture relevant second-order nonlinearities: the analysis used to generate Figure 10C assumes an L(N) model with additive internal noise, so its failure in this case may result from the presence of second-order nonlinearities. For the texture experiments we concluded that a LNL(N) model is appropriate ( Figure 6H), but an L(N) model yielded similar model-human consistency ( Figure 9B) so the latter should also provide a reasonable approximation for the purpose of computing absolute efficiency. However the internal noise process associated with the texture experiments, and possibly with the luminance experiments too, may not have been additive as assumed by the analysis in Figure 10C. In particular, nonlinear processing may be treated as a form of consistent internal noise (Li, Klein, & Levi, 2006) by the L(N) model and therefore conflated with inconsistent internal noise. Finally in relation to absolute efficiency, its measured value was significantly higher for the luminance condition (data points fall below the unity line in the inset to Figure 10C, p < 0.02). Based on Figure 10B, this difference in efficiency should be interpreted as a consequence of both lower signal discriminability and higher internal noise for texture processing. 
Differential effects of replicability conditional upon correct/incorrect human responses
Model-human consistency conflates trials on which both human and model responded correctly with trials on which both responded incorrectly (classed as model-human match), and similarly it conflates trials on which one responded correctly and the other one incorrectly (classed as model-human mismatch). Below we consider model-human consistency conditional upon whether the human observer was correct or incorrect on a given trial. We focus on the output of the nonlinear second-order kernel alone (rather than the full second-order model) so that first-order and second-order components can be assessed more transparently. 
Figure 11A plots first-order (x axis) versus second-order (y axis) consistency for both luminance (solid) and texture (open) experiments. When consistency is computed from all trials (as in Figure 9) the output from first-order kernels captures human responses better than second-order kernels in the luminance experiments (solid black symbols fall below unity line p < 0.01), but equally well in the texture experiments (open black symbols fall on unity line p = 0.54). At first this may appear inconsistent with the previous analysis, where we showed that second-order kernels augmented first-order kernels only for the luminance experiments and not for the texture experiments. However this depends on whether first-order and second-order kernels successfully replicate human responses on the same subset of trials or on different subsets of trials. 
Figure 11
 
Differential analysis of correct and incorrect trials. (A) Consistency (full colors) and cross-consistency (lighter colors) between human responses and second-order kernels (sum of all three) is plotted on y axis against consistency between human responses and first-order kernels (sum of both) on x axis, for all trials (black) and for trials on which observers responded incorrectly (blue) or correctly (red). Solid for luminance and open for texture. Dashed lines mark chance (50%). Bars next to axis show mean ± SD for corresponding full-color data points. Cross-consistency was computed using a leave-one-out cross-validation technique (see Materials and methods). Inset shows final prediction error (FPE) values for luminance kernel outputs (see Materials and methods and Results). (B) output from both first-order (black) and second-order (red) luminance kernels is plotted for every trial (one point per trial) on x axis against output on the same trial from corresponding leave-one-out kernels (computed from all trials except the one for which output is being plotted) on y axis, for a representative observer (S1). Spread is more pronounced for second-order as opposed to first-order kernels (correlations between x and y for the former are larger than the latter across observers at p < 10 −3). Inset magnifies region around the origin directly relevant for predicting human responses. High-contrast data points take opposite signs for x and y coordinates, indicating trials classified one way by full kernels and the opposite way by leave-one-out kernels. There are many more trials on which second-order kernels give opposite responses (high-contrast red dots) than trials on which first-order kernels give opposite responses (high-contrast black dots) demonstrating that (as expected) the cross-validation procedure impacts second-order kernels more than first-order kernels. (C–D) Cross-consistency for center and surround kernels separately. (E) Cross-consistency for first-order center + surround on x axis (same x values as light symbols in A) versus second-order center-surround kernel on y axis. Error bars (smaller than symbol when not visible) show ±1 SEM.
Figure 11
 
Differential analysis of correct and incorrect trials. (A) Consistency (full colors) and cross-consistency (lighter colors) between human responses and second-order kernels (sum of all three) is plotted on y axis against consistency between human responses and first-order kernels (sum of both) on x axis, for all trials (black) and for trials on which observers responded incorrectly (blue) or correctly (red). Solid for luminance and open for texture. Dashed lines mark chance (50%). Bars next to axis show mean ± SD for corresponding full-color data points. Cross-consistency was computed using a leave-one-out cross-validation technique (see Materials and methods). Inset shows final prediction error (FPE) values for luminance kernel outputs (see Materials and methods and Results). (B) output from both first-order (black) and second-order (red) luminance kernels is plotted for every trial (one point per trial) on x axis against output on the same trial from corresponding leave-one-out kernels (computed from all trials except the one for which output is being plotted) on y axis, for a representative observer (S1). Spread is more pronounced for second-order as opposed to first-order kernels (correlations between x and y for the former are larger than the latter across observers at p < 10 −3). Inset magnifies region around the origin directly relevant for predicting human responses. High-contrast data points take opposite signs for x and y coordinates, indicating trials classified one way by full kernels and the opposite way by leave-one-out kernels. There are many more trials on which second-order kernels give opposite responses (high-contrast red dots) than trials on which first-order kernels give opposite responses (high-contrast black dots) demonstrating that (as expected) the cross-validation procedure impacts second-order kernels more than first-order kernels. (C–D) Cross-consistency for center and surround kernels separately. (E) Cross-consistency for first-order center + surround on x axis (same x values as light symbols in A) versus second-order center-surround kernel on y axis. Error bars (smaller than symbol when not visible) show ±1 SEM.
We can gain some preliminary insight into this issue by computing consistency separately for trials on which the human observer was correct and those on which the observer was incorrect (red and blue symbols respectively in Figure 11A). For the texture experiments there is still a good match between first-order and second-order kernels (open symbols tend to fall on the unity line). Both kernels are far better at replicating human responses on correct compared with incorrect trials (open red symbols are shifted to the right ( p < 0.001) and upward ( p < 0.001) with respect to open blue symbols), and both are worse than chance on incorrect trials (open blue symbols fall below ( p < 10 −3) horizontal dashed line and to the left ( p < 10 −5) of vertical dashed line). A related behavior is expected for a model that responds correctly on every trial or nearly so, like the ideal observer: this model would show 100% consistency on correct trials, and 0% consistency on incorrect trials. As shown by the green symbols in Figure 9B, the model-human consistency associated with this model is as high as the linear or any other model ( p > 0.05) for the texture experiments. 
A different trend is observed for the luminance experiments (solid symbols). Similarly to the texture experiments, the first-order kernel is better at replicating human responses on correct trials (solid red symbols are shifted rightward with respect to solid blue symbols, p < 10 −4). However the second-order kernel is equally good on both correct and incorrect trials (solid red and blue symbols fall on the same horizontal line ( p = 0.63) and are both greater than chance at p < 0.01 and p < 0.03 respectively), and far better than the first-order kernel on incorrect trials (solid blue symbols fall above unity line, p < 0.003). In addition, a model that responds correctly on every trial falls well below ( p < 0.02) the linear model (green symbols in Figure 9A). These results indicate that the improvement in model-human consistency delivered by second-order kernels in the luminance experiments (red symbols in Figure 9A) may be explained by the ability of second-order kernels to rectify the output from first-order kernels on trials on which first-order kernels were poor. Similarly, the poor improvement in the texture experiments (red symbols in Figure 9B) most likely resulted from the fact that second-order kernels, though as good as first-order kernels in replicating human responses ( Figure 11A), were good in the same way on the same trials and therefore did not add much useful information for replicating human behavior. 
We confirmed these results by computing predictive power, here termed cross-consistency, using a leave-one-out cross-validation technique (Hong et al., 2008; Stone, 1974). This procedure penalizes second-order more than first-order kernels (Figure 11B) thus taking into account differences in parameterization and overfitting between the two models. Cross-consistency is indicated by the light-colored symbols in Figure 11A. As expected the overall pattern is very similar to that observed for model-human consistency. In particular, second-order kernels outperform first-order kernels on incorrect trials (solid light-blue symbols fall above unity line, p < 0.01) but not on correct trials (solid light-red symbols fall below unity line, p < 0.01). We also applied FPE analysis (as discussed in relation to Figure 9) and second-order kernels still outperformed first-order kernels on incorrect trials (blue symbols fall below unity line in inset to Figure 11A, p < 0.003). 
To determine whether the correct/incorrect differential effect described above was attributable to both center and surround information, to only one of them or to the interaction between them, we computed cross-consistency separately for center kernels ( Figure 11C), surround kernels ( Figure 11D) and center-surround kernels ( Figure 11E). This analysis identified the center-surround second-order kernels as the operator carrying the relevant information: on incorrect trials, only this kernel subclass generated cross-consistency values that were both above chance (blue points in Figure 11E fall above the horizontal dashed line at p < 0.001) and higher than the corresponding linear output (blue points in Figure 11E fall above unity line at p < 0.002). 
Trial-by-trial correlation between first-order and second-order outputs
We can test the above-mentioned hypothesis about kernel redundancy more directly by studying the correlation between the output from first-order kernels (plotted on the x axis in Figures 12A and 12B) and from second-order kernels (plotted on the y axis) for both luminance (panel A) and texture (panel B) experiments in one representative observer. For the luminance kernels ( Figure 12A) it appears that there is little or no correlation on both correct (red) and incorrect (blue) trials (yellow and black oval shapes are nearly circular), while a positive correlation appears to be present for the texture kernels as indicated by the tilted oval shapes of the response clouds in Figure 12B
Figure 12
 
Correlation (or lack thereof) between first-order and second-order outputs. (A) Output from first-order kernels (sum of both) is plotted on x axis against output from second-order kernels (sum of all three) on y axis for a representative observer (S1). Each point refers to an individual trial, red if the observer responded correctly and blue if incorrectly. Traces next to x and y axes plot marginal distributions. Linear-fit ovals (see caption to Figure 6 for explanation) are yellow for red data points and black for blue. B same as A but for texture kernels. (C) Correlation between first-order and second-order outputs is plotted for texture kernels on y axis versus luminance kernels on x axis, when computed from all trials (black) and from trials on which observers responded correctly (red) or incorrectly (blue). Open symbols show results obtained after removing the target signal from the stimuli. Dashed lines mark 0 correlation. Solid lines mark unity lines (y = ∣x∣). Error bars in C show ±1 SEM (smaller than symbol when not visible).
Figure 12
 
Correlation (or lack thereof) between first-order and second-order outputs. (A) Output from first-order kernels (sum of both) is plotted on x axis against output from second-order kernels (sum of all three) on y axis for a representative observer (S1). Each point refers to an individual trial, red if the observer responded correctly and blue if incorrectly. Traces next to x and y axes plot marginal distributions. Linear-fit ovals (see caption to Figure 6 for explanation) are yellow for red data points and black for blue. B same as A but for texture kernels. (C) Correlation between first-order and second-order outputs is plotted for texture kernels on y axis versus luminance kernels on x axis, when computed from all trials (black) and from trials on which observers responded correctly (red) or incorrectly (blue). Open symbols show results obtained after removing the target signal from the stimuli. Dashed lines mark 0 correlation. Solid lines mark unity lines (y = ∣x∣). Error bars in C show ±1 SEM (smaller than symbol when not visible).
Figure 12C plots correlation coefficients for all observers in both luminance (x axis) and texture (y axis) experiments (solid symbols). Overall it is clear that the magnitude of the correlations is larger for texture kernels (solid points fall above the solid diagonal lines, p < 10 −8). Moreover texture kernels show exclusively positive correlations for all trial types (solid points fall above the horizontal dotted line). The output from luminance kernels is somewhat correlated across all trials (solid black symbols fall to the right of the vertical dotted line, p < 10 −4), but not significantly correlated for correct and incorrect trials separately (solid red and blue symbols are not statistically different from 0, p > 0.05). 
The positive correlations observed for texture kernels are mainly driven by the target signal, as demonstrated by the open symbols in Figure 12C computed after removing the signal from the stimulus images. The pattern of results changes very little for the luminance kernels, but drastically for the texture kernels: correlations are not significantly different from zero for all trials (open black symbols fall on horizontal dotted line p = 0.31) and for correct trials (open red symbols fall on horizontal dotted line p = 0.74), although they do differ from 0 for incorrect trials (open blue symbols fall below horizontal dotted line p < 0.01) and (perhaps surprisingly) they are negative. The different impact of signal removal on luminance and texture results is unlikely to depend on SNR alone, because the average (across observers) signal amplitude in units of noise SD was identical (3.33) between the two sets of experiments. 
Discussion
Block-structure models like the retinal circuit in Figure 8A typically involve a sizeable number of free parameters and assumptions, as well as trial-and-error exploratory strategies combined with modeler's intuition. The value of these models depends on the application they are intended for: if the application is to provide a potential link to the underlying physiological mechanisms then their value is clear. If on the other hand the application is to simulate human behavioral decisions, we favor the more general and specified approach demonstrated by the nonlinear second-order kernel expansion ( Equation 1). First, we have shown that second-order kernels can be derived with high statistical reliability. This is demonstrated not only by the associated Z scores ( Figures 2 and 3), but also by the remarkable consistency we observed for the scalar metrics used to assess the overall shape of these operators across observers ( Figure 5). Second, we have shown that when luminance kernels are incorporated into a Volterra scheme ( Equation 2) and used to simulate human response on a trial-by-trial basis, they outperformed every other model we tested ( Figure 9A). Finally, we have shown that the ability to simulate human responses afforded by this approach is within the maximum achievable range ( Figure 10A). 
Perhaps the most significant result in relation to the replication/prediction of human responses is provided by the differential analysis of correct and incorrect trials ( Figure 11A). Figure 11A shows that while second-order kernels are poorer than first-order kernels at predicting human responses for luminance trials on which the observer responds correctly, they are far better for trials on which the observer responds incorrectly. The implication of this result is that the utility of second-order kernels depends critically on the pay-off matrix used to evaluate the model. If, for example, a specific application requires that the model is able to predict when the observer will strike a hit (or equivalently a correct rejection) but does not require accurate prediction of false alarms (or equivalently misses), then a first-order model may be preferable. If on the other hand what is at premium is the prediction of human false alarms, and potential failure to predict hits is not critical, then a second-order model (with little or no contribution from the first-order kernel) may be the better choice. 
The power of Volterra's formulation for capturing the behavior of nonlinear systems lies mainly in its general applicability. Provided the system satisfies a few reasonable assumptions (Sandberg, 1983; Volterra, 1930) its behavior is accurately described by Volterra kernels regardless of the specific nature, whether it is a physical device (Salgado & O'Reilly, 1996) or a neural structure (although it is relevant to certain neural applications that Volterra representations do not in general apply to hysteretic systems (Korenberg & Hunter, 1996)). To provide a few examples of its versatility, this approach has been exploited to describe the response properties of individual neurons (Emerson et al., 1992; Victor et al., 1977), the pupillary reflex in the classic work of Stark and collaborators (Sandberg & Stark, 1968; Stark, 1969) and memory constraints on human information processing (Heath, 2000). To our knowledge the current results represent the first example in the human literature on sensory processing. 
In order to establish whether the general applicability of this approach translates into a valuable asset, a critical question at this stage is how far it can be extended (Simoncelli, 2003): will it apply to other forms of perceptual processing? Our texture experiments have taken a first step in the direction of answering this question. The estimated second-order kernels did not increase model-human consistency (Figure 9B) but their structure provided important insights into the possible architecture of the underlying circuitry (Figures 6E6H; see also Supplementary Text “Relation of texture kernels to filter-rectify-filter modeling schemes”). Despite their different characteristics, the mechanisms probed by luminance and texture stimuli are both relatively low-level: further work is necessary to establish the applicability of this approach to visual processing in general, particularly to the percept of more complex (higher-level) visual phenomena. A further issue relating to applicability is the range of stimulus regimes across which the kernel characterization may transfer when derived from the more restricted range afforded in the laboratory. This question, like the one stated before it, cannot be answered satisfactorily at present. 
Supplementary Materials
Supplementary Figure 1 - Supplementary Figure 1 
Supplementary Figure 1. Simulated first-order and second-order kernels for a first-order L(N) observer model. Same plotting conventions as Figure 2. The simulated observer used the empirically derived first-order kernels for center and surround in Figure 2A (aggregate observer). For a first-order observer, second-order kernels derived from all image classes (B-D) are featureless. Notice that first-order and second-order kernels are plotted to a 10-fold scaling difference (compare with ~2-fold in Figure 2). 
Supplementary Figure 2 - Supplementary Figure 2 
Supplementary Figure 2. Estimated first-order kernels Image not available for a model implementing equation 1. Left column shows h1 (blue trace) and H2 (surface plot), right column shows Image not available derived from target-present noise fields (red) or target-absent noise fields (black). When H2 = 0 (A), Image not availableh1 for both noise field classes. This relationship is distorted by H2 ≠ 0 and Image not available becomes dependent on the noise field class that is used to estimate it. Target intensity was equal to σN. These simulations demonstrate specific aspects of a more general result that has been analytically derived in the Appendix to Neri (2004). Shaded regions in B, D and F show ± 1 SD across 100 simulations. Dashed lines indicate 0. 
Supplementary Figure 3 - Supplementary Figure 3 
Supplementary Figure 3. Luminance first-order and second-order kernels for individual observers (different rows). Same plotting conventions as Figure 2. Shown are only kernels computed from all noise images. 
Supplementary Figure 4 - Supplementary Figure 4 
Supplementary Figure 4. Texture first-order and second-order kernels for individual observers (different rows). Same plotting conventions as Extra Figure 3
Supplementary Figure 5 - Supplementary Figure 5 
Supplementary Figure 5. Assessment of texture kernel structure using scalar metrics. Same plotting conventions as Figure 5
Supplementary Figure 6 - Supplementary Figure 6 
Supplementary Figure 6. Estimation of diagImage not available correlation range expected from SDT implementation of NL(N) model. The scalar output o in response to stimulus i was Image not available where Image not available is standard normal with SD = 32 ms and mean centered at target occurrence, shown by grey line in B (as indicated by the above expression the function used for the surround (light-blue line) was simply Image not available. Image not available is a Gaussian noise source with mean 0 and SD Image not available, Image not available is the SD of Image not available across all trials (i.e. internal noise is in units of noiseless output SD Image not available and ε is internal noise intensity. The final response Image not available was 1 for ‘correct’ and 0 for ‘incorrect’. The model was used to simulate a psychophysical experiment of 6K trials. The associated first-order kernels are shown in B for both center (black) and surround (blue) and for target signal = 0 (open) or target signal equal 5 (solid). Clearly the presence of a target signal affects the first-order estimate for the center (compare solid black points with open black points in B). The associated center-center second-order kernel is shown in C (red trace shows slice along main diagonal). (D) amplitude of first-order kernel (from B) at each time point is plotted on x axis against amplitude of second-order diagonal at each corresponding time point. The corresponding correlation value for center kernels is plotted in F as a function of signal intensity (y axis) and internal noise intensity ε (x axis). E plots the corresponding model performance as percent of correct responses. Upper and lower limits for performance range spanned by human observers are indicated by red lines in E. Corresponding range in F is shown by red-tinted region. We took maximum and minimum correlation values within this region to represent a reasonable estimate of the correlation range that we may expect for our sample of human observers. This range is indicated for both center and surround by shaded regions in Figure 6D. A similar procedure was used to estimate marginImage not available correlation ranges for the LNL(N) model (indicated by shaded regions in Figure 6H) with Image not available
Supplementary Figure 7 - Supplementary Figure 7 
Supplementary Figure 7. Dependence of nonlinear/linear ratio k on shape of Volterra operators, target signal intensity and internal noise. The first-order kernel Image not available (yellow trace) and second-order kernel (surface plot) shown in A were incorporated into equation 2 to simulate a psychophysical experiment. We computed output kernel gain ratio for the associated perceptual kernels Image not available and Image not available (derived via reverse correlation) as Image not available where Image not available is the first-order coefficient (slope) of the best linear fit between and Image not available and Image not available (Image not available and Image not available). Output kernel gain ratio is plotted on the y axis in B versus the associated k value in equation 2 (input kernel gain ratio) on the x axis in the absence of internal noise (red traces) or with internal noise equal to 2.5 (blue traces). We scaled the raw estimated kernels by 1/(n!vn) (Marmarelis and Marmarelis, 1978; Schetzen, 1980) where is noise power (here v = Image not available) and n is kernel order. Given this scaling we expect traces to fall on black unity line (Marmarelis and Marmarelis, 1978). This result is applicable when there is no target signal (light-red trace in B). As target intensity is increased (increasingly saturated hues) there is a marked departure from this prediction depending on system nonlinearity (greater for larger input gain ratio). In addition the relationship depends on internal noise: C plots output kernel gain ratio (bright when greater than input ratio and dark when smaller) as a function of both target signal intensity (y axis) and internal noise intensity (x axis) for simulated input ratio = 0.025. Red lines show contours (line thickness reflects contour value). The dependence of kernel gain ratio on both variables is not separable. Additionally, kernel gain ratio depends on the shapes of the Volterra operators as demonstrated in D-F. For the orthogonal kernels in D (even waveform for first-order kernel and odd waveform underlying the shape of second-order kernel) the kernel gain ratio shows less dependence on signal intensity (E) as one may expect from the result that the first-order kernel estimate is affected by the overlap between target signal and second-order kernel (Neri, 2004). There is however a degree of dependence on both target intensity and internal noise (F). 
Supplementary Figure 8 - Supplementary Figure 8 
Supplementary Figure 8. Simulated combined kernels for both luminance and texture experiments for the uncertainty model. Plotted to the same conventions as Figure 4, with the addition of light-colored traces for individual observers (full color traces show average across observers). The uncertainty model used individually optimized filter functions and a linearly decreasing weighting function. 
Supplementary Figure 9 - Supplementary Figure 9 
Supplementary Figure 9. Assessment of simulated luminance kernel structure using scalar metrics for the uncertainty model. Plotted to the same conventions as Figure 5. The uncertainty model used individually optimized filter functions and a linearly decreasing weighting function (same as Extra Figure 8). For target-absent kernels (bottom row) there is no difference in centroid (I), no difference in peak occurrence (J), both combined kernels show an early/late asymmetry (K) and neither shows an edges/center asymmetry (L). 
Supplementary Figure 10 - Supplementary Figure 10 
Supplementary Figure 10. Simulated first-order and second-order kernels for the uncertainty model (average f) plotted to the same conventions as Figure 2. The uncertainty model used the filter function corresponding to the average of the individually optimized parameters (shown by thin blue trace in A) and a linearly decreasing weighting function (shown by thin green trace in A). The kernels shown here represent the best approximation to the luminance data we were able to obtain after extensive simulations with uncertainty models. 
Supplementary Figure 11 - Supplementary Figure 11 
Supplementary Figure 11. Model-human consistency (left panels) and kernel output correlation (right panels) for simulated kernels from uncertainty models. (A-B) refer to the uncertainty model that used individually optimized filter functions, (C-D) to the uncertainty model that used the filter function corresponding to the average of the individually optimized parameters (both used a linearly decreasing weighting function). (A) and (C) are plotted to the same conventions as Figure 11A, (B) and (D) to the same conventions as Figure 12C. The second-order kernels generated by the uncertainty model do not replicate human response above chance on incorrect trials, neither when f is individually optimized (solid blue symbols fall on the horizontal dashed line in A, p = 0.7) nor when the average f is used (C, p = 0.24). Both models are poor at simulating output correlations for texture kernels (compare B and D with Figure 12C). 
Supplementary Figure 12 - Supplementary Figure 12 
Supplementary Figure 12. Uncertainty models predict second-order kernel modulations (see Appendix B for details). Black traces in both A and B show variance (diag(Image not available)) for derived second-order kernel (target-absent) pixels in a 2AFC experiment, plotted against input dimensionality d (number of Gaussian processes) for target signal level = 1. Remaining traces in A show variance for individual stimulus-response classes. Solid horizontal line shows variance = 1. Grey-to-black traces in B show diag(Image not available) for increasing target signal levels (from 0 in lightest grey to 5 in black). Inset plots diag(Image not available) as a function of signal level for d = 4 (indicated by dashed red vertical line). Shaded regions in A and thin red lines in inset to B show ±1 SD across 100 repeated simulations. Each simulation consisted of 10K trials. 
Supplementary File - Supplementary File 
Supplementary File - Supplementary File 
Supplementary File - Supplementary File 
Supplementary File - Supplementary File 
Supplementary File - Supplementary File 
Supplementary File - Supplementary File 
Supplementary File - Supplementary File 
Supplementary File - Supplementary File 
Supplementary File - Supplementary File 
Supplementary File - Supplementary File 
Supplementary File - Supplementary File 
Supplementary File - Supplementary File 
Supplementary File - Supplementary File 
Supplementary File - Supplementary File 
Supplementary File - Supplementary File 
Supplementary File - Supplementary File 
Supplementary File - Supplementary File 
Supplementary File - Supplementary File 
Appendix A
Parametric models of signal detection observers: L(N), NL(N) and LNL(N)
Within the context of signal detection theory (SDT) adopted here the process of interest necessarily involves the application of a nonlinear rule to generate the final psychophysical choice. For 2AFC experiments we can specify this rule exactly: stimulus number 1 is processed by the system (including the addition of internal noise) and assigned a scalar value x 1, the same is done for stimulus 2 to generate x 2, and the response r is simply g(x 2 − x 1) where g(x) is the Heaviside unit step function (although in general not ideal (Pelli, 1985) this rule is highly plausible for the human observer). We make the reasonable assumption that observers, as well as conforming to basic SDT principles (Green & Swets, 1966), applied this rule with no bias, i.e. they showed no tendency to prefer stimulus 1 over stimulus 2 or vice-versa for reasons other than their respective processing outputs (this may not apply to temporal (as opposed to spatial) 2AFC tasks (Yeshurun, Carrasco, & Maloney, 2008)). This assumption is supported by the lack of detectable bias in observers' overall responses: the proportion of ‘right’ responses was 0.48 ± 0.04 (mean ± SD across observers) and 0.50 ± 0.10 for luminance and texture experiments respectively. Neither survived a t-test as different from 0.5 (p = 0.33 for luminance, p = 0.94 for texture). By adopting this assumption we can translate the above formulation into one where we define x2 as the target stimulus, x1 the non-target, and r as either ‘correct’ (1) or ‘incorrect’ (0), which is the formulation used throughout this article. 
The adoption of this framework implies that all models contain a final nonlinear stage and that the preceding layers must return a single scalar quantity. Our primary interest is in characterizing the previous layers, not the final nonlinear decisional stage, because the former are believed to reflect general and physiologically relevant properties of how visual processing operates, while the latter is dependent upon the specific question that is asked of observers and the form in which the response is acquired. We denote this fact by bracketing the end nonlinearity, e.g. L(N) indicates a standard Wiener system where (N) is the 2AFC decisional transducer. It is relevant in this context that any static monotonically increasing nonlinearity immediately preceding (N) (e.g. neuronal spike threshold (Priebe & Ferster, 2008)) does not affect the final response and therefore neither the computation of ĥ1 and H2 (we are assuming a noiseless model or one in which internal noise is added before the nonlinearity), a notion sometimes referred to as Birdsall's theorem (Lasley & Cohn, 1981; Tanner, 1961). Because any plausible static nonlinearity preceding (N) would have the above characteristics we are effectively using L(N) as short-hand for LN(N). 
A further consequence of the −(N) SDT framework is that Volterra's original formulation of a second-order nonlinear system o = h 0 + h 1** i + H 2* * I where * is convolution (2-dimensional in the case of H 2) can be rewritten using Equation 1 where template-matching (•) replaces convolution. h 0 is irrelevant because it is factored out in the 2AFC comparison so we set it to 0. Because g(x) in Equation 1 takes a scalar argument, the temporally modulated output of the combined convolution in Volterra's equation must be summed across time according to some weighting function w to yield one value. It follows that  
( h 1 * * i + H 2 * * I ) w = h 1 · i + H 2 · I
(6)
where h 1 = h 1* ★ w, H 2 = H 2* ★ w (the last expression is more explicitly written as H 2( τ 1, τ 2) = Σ tH 2*(t − τ 1, t − τ 2) w(t)), i.e. h 1 and H 2 correspond to h 1* and H 2* after cross-correlation with w. Equation 1 is therefore equivalent to Volterra's convolution-based expression in the present context. Because the human observer's w is not in general known, kernel estimation is restricted to h 1 and H 2 (in the form of ĥ1 and ĥ2, see Materials and methods). 
Two other well-characterized parametric models (besides the Wiener L(N) model) are the Hammerstein NL(N) model (Hunter & Korenberg, 1986) and the Korenberg LNL(N) model (Korenberg & Hunter, 1986). Their standard formulations NL and LNL are modified here because the linear end-stage L must be followed by (N) to convert its output into a binary psychophysical decision. These parametric models are expected to generate specific kernel modulations: 1) for the NL(N) model diag(ĥ2) ∝ ĥ1; 2) for the LNL(N) model margin(ĥ2) ∝ ĥ1 where margin(A) = Στ2A(τ1, τ2) takes the marginal sum across columns (because of the intrinsic symmetry of center-center and surround-surround second-order kernels the same result is obtained when the sum is taken across rows); 3) for the L1NL2(N) model ĥ2(t, 0) ∝ f(t) where f is the filter associated with front-end linear stage L1. These relationships are well-known for a white-noise input (Hung & Stark, 1991; Marmarelis, 2004; Victor & Shapley, 1980; Westwick & Kearney, 2003). However in the context of a psychophysical experiment the addition of a target signal and the presence of the decisional transducer (with associated internal noise) make the kernel estimates (both first-order and second-order) potentially depart from those commonly derived in the reverse correlation literature (see Appendix in Neri, 2004). We therefore simulated both NL(N) and LNL(N) models for a representative set of filter shapes, internal noise values and target signal values (see Extra Figure 6). The simulations allowed us to obtain estimates of the correlation ranges that may be realistically expected for the proportionality relations listed above when applied to the specific context of our experiments (shaded regions in Figures 6D and 6H). 
Appendix B
Second-order kernels under uncertainty
Uncertainty models generate modulations within second-order kernels. We show this for the diagonal (variance) of the second-order kernels. i.e. diag(H 2) ≠ 0, and for the simplest version of an uncertainty model where the response o to stimulus i of dimensionality d is o = max( i). Each value of i is drawn from a standard normal distribution. We first consider the simplest linear case o = 〈 i〉 where 〈 〉 is mean across the vector, for which both o( i [0]) and o( i [1]) are Gaussian distributed. We approximate a 2AFC experiment as yes-no with unbiased criterion (threshold point) k (the validity of this approximation is confirmed by simulations, see Extra Figure 12). This truncates the distribution of o( i [0]) into two portions, one for correct rejections and one for false-alarms. The resulting distributions match those obtained from the truncation of o( i [1]), hits and misses respectively, except for a shift and sign-inversion of intensity values (Green & Swets, 1966). n[0,1] (noise fields on correct rejections) and n[1,1] (hits) therefore have equal expected pixelwise variances σ[0,1]2 and σ[1,1]2, and so have n[0,0] (false alarms) and n[1,0] (misses). It follows that diag(ĥ2) = σ[1,1]2 + σ[0,0]2σ[0,1]2σ[1,0]2 = 0, offering a simplified demonstration of the result that a linear filter with unbiased criterion returns a featureless second-order kernel (see Neri, 2004 and Extra Figure 1). We now examine the distributions of n[0,1] and n[0,0] for the simplified uncertainty model. When d = 1 this model is equivalent to the linear model and diag(ĥ2) = 0. When d > 1, each pixel of n[0,1] is distributed according to a truncated (at k) normal distribution so σ[0,1]2 < 1. Each pixel of n[0,0] is distributed with expected mean μ[0,0] and expected variance σ[0,0]2 according to a mixture of two distributions: the distribution of the maximum of d normally distributed variables, which follows a Gumbel distribution (Kotz & Nadarajah, 2000) with mean μM and variance σM2, and a softly truncated (where the truncation point t is not fixed but determined by the distribution of the maximum) normal distribution softT with mean μT and variance σT2, for which we know that σ[0,1]2 < σT2 < 1 (because t > k) and μT < 0 (because t > 0). We can write μ[0,0] = pμT + qμM and σ[0,0]2 = pσT2 + qσM2 + p(μTμ[0,0])2 + q(μMμ[0,0])2 where p = (d − 1)/d and q = 1/d. We want to show that σ[0,0]2 > σ[0,1]2 which can be rewritten as σ[0,0]2 = pσT2 + qσM2 + p(μTμM)2(1 − q)q > σ[0,1]2. We can replace σT2 with σ[0,1]2 because σ[0,1]2 < σT2, as well as replace μT with 0 because μT < 0 and μM > 0, so that the inequality is now σM2 + μM2(1 − q) > σ[0,1]2. Because σ[0,1]2 < 1 we can rewrite σM2 + μM2(d − 1)/d > 1. μM2 and σM2 are expressible in closed form for d up to 5; for d = 3 (μM = 3/(2
π
); σM2 = (4π − 9 + 2
3
)/(4π)) and d = 4 (μM = 3/(2
π
)⌊1 +
2π
sin−1(
13
)⌋; σM2 = 1 +
3μ
μM2) the inequality is true so σ[0,0]2 > σ[0,1]2. For d = 5 (μM = 5/(4
π
)⌊1 +
6π
sin−1(
13
)⌋) the more stringent inequality μM2(d − 1)/d > 1 is true. Because μM increases monotonically with d, it remains true for d > 5. We note that when d → ∞ we have μ[0,0]μT and σ[0,0]2σT2, i.e. the pixelwise distribution of n[0,0] approaches that of softT. For d → ∞ we also have that μM and consequently k are increasingly large, so that both softT and n[0,1] approach the standard normal distribution, and we therefore expect σ[0,0]2σ[0,1]2. We then have that σ[0,0]2σ[0,1]2 is a non-monotonic function of d whereby it equals 0 for both d = 1 and d → ∞ and is positive otherwise (although we have not directly demonstrated this for d = 2). A similar result can be demonstrated for σ[1,1]2σ[1,0]2 by restricting the analysis to pixels that do not contain the target, for which very similar logic applies. In conclusion, for pixels that do not contain the target we have σ[0,0]2σ[0,1]2 > 0 and σ[1,1]2σ[1,0]2 > 0 (when d > 2), so that diag(ĥ2) > 0. The case d = 2 is confirmed by simulations (see Extra Figure 12). 
A different approach involves approximating the uncertainty model to an L 1NL 2 model where N is a highly expansive nonlinearity, e.g. o = 〈 o n〉 for large n where the linear stage L 2 is simply averaging and L 1 is front-end filtering by the function f, i.e. o = f * i. If we assume that o > 0 by adding a baseline response to L 1, o is increasingly dominated by the largest value of o as n → ∞, leading to a series of 2AFC responses that approximates that obtained from the uncertainty max rule. For the simplified max model discussed above where o = max( i), L 1 involves no more than adding the baseline response and can therefore be ignored (provided we assume i > 0), leaving us with a NL model for which it is known that diag(ĥ2) = ρ ĥ1 (see 1) where in general ρ > 0 for an expansive nonlinearity (Marmarelis, 2004). In the simplified model the expression is simply diag(ĥ2) = ρ > 0, confirming the result obtained above.
List of symbols
  • •, ⊗, *, ★
     

    inner (dot) product, outer product, convolution and cross-correlation

  • i
     

    input stimulus vector i = ( i 1…, i n) where i t is the stimulus value at time point t

  • s, n
     

    signal and noise components of input stimulus

  • I
     

    input stimulus outer product matrix ii

  • h 1
     

    system first-order (linear) kernel

  • H 2
     

    system second-order (nonlinear) kernel

  • ĥ1, ĥ2
     

    kernel estimates

  • σ N
     

    standard deviation of external Gaussian noise source

  • áá P
     

    indexing stimulus region p (p = ctr for center, p = sur for surround)

  • áá [s,r]
     

    indexing target absent/present (s = 0 or 1) on incorrect/correct (r = 0 or 1) trials

  • g
     

    decisional transducer function

  • r
     

    observer response/decision (0 correct, 1 incorrect)

  • o
     

    time-varying output of the system (before g is applied to yield r)

  • c hh, c mh
     

    human-human and model-human consistency

 
Acknowledgments
Supported by Royal Society (University Research Fellowship) and Medical Research Council (New Investigator Research Grant). 
Commercial relationships: none. 
Corresponding author: Peter Neri. 
Email: pn@white.stanford.edu. 
Address: Institute of Medical Sciences, Aberdeen Medical School, Aberdeen AB25 2ZD, UK. 
References
Abbey, C. K. Eckstein, M. P. (2002). Classification image analysis: Estimation and statistical inference for two-alternative forced-choice experiments. Journal of Vision, 2, (1):5, 66–78, http://journalofvision.org/2/1/5/, doi:10.1167/2.1.5. [PubMed] [Article] [CrossRef]
Ahumada, A. J. (2002). Classification image weights and internal noise level estimation. Journal of Vision, 2, (1):8, 121–131, http://journalofvision.org/2/1/8/, doi:10.1167/2.1.8. [PubMed] [Article] [CrossRef]
Ahumada, A. J. Lovell, J. (1971). Stimulus features in signal detection. Journal of the Acoustical Society of America, 49, 1751–1756. [CrossRef]
Ahumada, A. J. Marken, R. Sandusky, A. (1975). Time and frequency analyses of auditory signal detection. Journal of the Acoustical Society of America, 57, 385–390. [CrossRef] [PubMed]
Akaike, H. (1970). Statistical predictor identification. Annals of the Institute of Statistical Mathematics, 22, 203–217. [CrossRef]
Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19, 716–723. [CrossRef]
Allen, D. M. (1974). The relationship between variable selection and data augmentation and a method for prediction. Technometrics, 16, 125–127. [CrossRef]
Allman, J. Miezin, F. McGuinness, E. (1985). Stimulus specific responses from beyond the classical receptive field. Annals Reviews of Neuroscience, 8, 407–430. [CrossRef]
Anderson, N. D. Murphy, K. M. Jones, D. G. (2007). Temporal aspects of orientation pooling using visual noise stimuli. Journal of Vision, 7, (1):9, 1–11, http://journalofvision.org/7/1/9/, doi:10.1167/7.1.9. [PubMed] [Article] [CrossRef] [PubMed]
Baccus, S. A. Ölveczky, B. P. Manu, M. Meister, M. (2008). A retinal circuit that computes object motion. Journal of Neuroscience, 28, 6807–6817. [PubMed] [Article] [CrossRef] [PubMed]
Barr, D. R. Sherrill, E. T. (1999). Mean and variance of truncated normal distributions. American Statistician, 53, 357–361.
Barth, E. Beard, B. L. Ahumada, A. J. (1999). Nonlinear features in Vernier acuity. Proceedings of SPIE, 3644, 88–96.
Benardete, E. A. Kaplan, E. (1997). The receptive field of the primate P retinal ganglion cell: II Nonlinear dynamics. Visual Neuroscience, 14, 187–205. [PubMed] [CrossRef] [PubMed]
Benardete, E. A. Kaplan, E. (1999). The dynamics of primate M retinal ganglion cells. Visual Neuroscience, 16, 355–368. [PubMed] [CrossRef] [PubMed]
Blakemore, C. Tobin, E. A. (1972). Lateral inhibition between orientation detectors in the cat's visual cortex. Experimental Brain Research, 15, 439–440. [PubMed] [CrossRef] [PubMed]
Bracewell, R. N. (1965). The Fourier transform and its applications.. Singapore: McGraw-Hill.
Burgess, A. E. Colborne, B. (1988). Visual signal detection: IV Observer inconsistency. Journal of the Optical Society of America A, Optics and Image Science, 5, 617–627. [PubMed] [CrossRef] [PubMed]
Busse, L. Katzner, S. Tillmann, C. Treue, S. (2008). Effects of attention on perceptual direction tuning curves in the human visual system. Journal of Vision, 8, (9):2, 1–13, http://journalofvision.org/8/9/2/, doi:10.1167/8.9.2. [PubMed] [Article] [CrossRef] [PubMed]
Bussgang, J. J. (1952). Cross-correlation functions of amplitude-distorted Gaussian signals. MIT Research Laboratory Electricity Technical Report, 216, 1–14.
Chen, H. W. Ishii, N. Suzumura, N. (1986). Structural classification of nonlinear systems by input and output measurements. International of Journal of Systems Science, 17, 741–774. [CrossRef]
Chen, Y. Wang, Y. Qian, N. (2001). Modeling V1 disparity tuning to time-varying stimuli. Journal of Neurophysiology, 86, 143–155. [PubMed] [Article] [PubMed]
Claeskens, G. Hjort, N. L. (2008). Model selection and model averaging.. Cambridge, UK: Cambridge University Press.
Droll, J. A. Abbey, C. K. Eckstein, M. P. (2009). Learning cue validity through performance feedback. Journal of Vision, 9, (2):18, 1–22, http://journalofvision.org/9/2/18/, doi:10.1167/9.2.18. [PubMed] [Article] [CrossRef] [PubMed]
Duncan, J. Ward, R. Shapiro, K. (1994). Direct measurement of attentional dwell time in human vision. Nature, 369, 313–315. [PubMed] [CrossRef] [PubMed]
Efron, B. (1979). Bootstraps methods: Another look at the jackknife. Annals of Statistics, 7, 1–26. [CrossRef]
Emerson, R. C. Bergen, J. R. Adelson, E. H. (1992). Directionally selective complex cells and the computation of motion energy in cat visual cortex. Vision Research, 32, 203–218. [PubMed] [CrossRef] [PubMed]
Field, G. D. Rieke, F. (2002). Nonlinear signal transfer from mouse rods to bipolar cells and implications for visual sensitivity. Neuron, 34, 773–785. [PubMed] [CrossRef] [PubMed]
French, A. S. Kuster, J. E. (1985). Nonlinearities in locust photoreceptors during transduction of small numbers of photons. Journal of Comparative Physiology A: Sensory, Neural, and Behavioral Physiology, 156, 645–652. [CrossRef]
Gibson, J. J. (1937). Adaptation, after-effect, and contrast in the perception of tilted lines: II Simultaneous contrast and the areal restriction on the after-effect. Journal of Experimental Psychology, 20, 553–569. [CrossRef]
Green, D. M. (1964). Consistency of auditory detection judgments. Psychology Review, 71, 392–407. [CrossRef]
Green, D. M. Swets, J. A. (1966). Signal detection theory and psychophysics. New York: Wiley.
Hamming, R. W. (1950). Error detecting and error correcting codes. Journal of Bell System Tech, 26, 147–160. [CrossRef]
Heath, R. A. (2000). Nonlinear dynamics: Techniques and applications in psychology.. Mahwah, New Jersey: Erlbaum.
Hong, X. Mitchell, R. J. Chen, S. Harris, C. J. Li, K. Irwin, G. W. (2008). Model selection approaches for non-linear system identification: A review. International Journal of Systems Science, 39, 925–946. [CrossRef]
Hung, G. K. Stark, L. W. (1977). The kernel identification method (1910–1977—Review of theory, calculation, application, and interpretation. Mathematical Biosciences, 37, 135–190. [CrossRef]
Hung, G. K. Stark, L. W. (1979). Interpretation of kernels: III Positive off-diagonal kernels as correlates of the dynamic process of papillary escape. Mathematical Bioscience, 46, 159–187. [CrossRef]
Hung, G. K. Stark, L. W. (1991). The interpretation of kernels—An overview. Annals of Biomedical Engineering, 19, 509–519. [PubMed] [CrossRef] [PubMed]
Hunter, I. W. Korenberg, M. J. (1986). The identification of nonlinear biological systems: Wiener and Hammerstein cascade models. Biological Cybernetics, 55, 135–144. [PubMed]
James, A. C. Pinter, R. B. Nabet, B. (1992). Nonlinear operator network models of processing in the fly lamina. Nonlinear vision. (pp. 39–73). Boca Raton, FL: CRC Press.
Jones, H. E. Wang, W. Sillito, A. M. (2002). Spatial organization and magnitude of orientation contrast interactions in primate V1. Journal of Neurophysiology, 88, 2796–2808. [PubMed] [Article] [CrossRef] [PubMed]
Jung, R. Spillman, L. (1970). Receptive-field estimation and perceptual integration in human visionn Early experience and visual information processing in perceptual and reading disorders (pp. 181–197). Washington: National Academy of Sciences Proceedings.
Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. International Joint Conference on Artificial Intelligence, 1995, 1137–1143.
Korenberg, M. J. Hunter, I. W. (1986). The identification of nonlinear biological systems: LNL cascade models. Biological Cybernetics, 55, 125–134. [PubMed] [PubMed]
Korenberg, M. J. Bruder, S. B. McIlroy, P. J. (1988). Exact orthogonal kernel estimation from finite records: Extending Wiener's identification of nonlinear systems. Annals of Biomedical Engineering, 16, 201–214. [PubMed] [CrossRef] [PubMed]
Korenberg, M. J. Hunter, I. W. (1996). The identification of nonlinear biological systems: Volterra kernel approaches. Annals of Biomedical Engineering, 24, 250–268. [PubMed] [CrossRef]
Kotz, S. Nadarajah, S. (2000). Extreme value distributions. London, UK: Imperial College Press.
Lasley, D. J. Cohn, T. E. (1981). Why luminance discrimination may be better than detection. Vision Research, 21, 273–278. [PubMed] [CrossRef] [PubMed]
Li, R. W. Klein, S. A. Levi, D. M. (2006). The receptive field and internal noise for position acuity change with feature separation. Journal of Vision, 6, (4):2, 311–321, http://journalofvision.org/6/4/2/, doi:10.1167/6.4.2. [PubMed] [Article] [CrossRef]
Marmarelis, P. Z. Naka, K. (1972). White-noise analysis of a neuron chain: An application of the Wiener theory. Science, 175, 1276–1278. [PubMed] [CrossRef] [PubMed]
Marmarelis, P. Z. Marmarelis, V. Z. (1978). Analysis of physiological systems: The white-noise approach. New York: Plenum Press.
Marmarelis, V. Z. (1993). Identification of nonlinear biological systems using Laguerre expansions of kernels. Annals Biomedical Engineering, 21, 573–589. [CrossRef]
Marmarelis, V. Z. (2004). Nonlinear dynamic modeling of physiological systems.. Piscataway, New Jersey: Wiley IEEE Press.
Miller, A. J. (1990). Subset selection in regression. Boca Raton, FL: Chapman and Hall/CRC.
Murray, R. F. Bennett, P. J. Sekuler, A. B. (2005). Classification images predict absolute efficiency. Journal of Vision, 5, (2):5, 139–149, http://journalofvision.org/5/2/5/, doi:10.1167/5.2.5. [PubMed] [Article] [CrossRef]
Nagai, M. Bennett, P. J. Sekuler, A. B. (2007). Spatiotemporal templates for detecting orientation-defined targets. Journal of Vision, 7, (8):11, 1–16, http://journalofvision.org/7/8/11/, doi:10.1167/7.8.11. [PubMed] [Article] [CrossRef] [PubMed]
Naka, K. Sakai, H. M. Ishii, N. (1988). Generation and transformation of second-order nonlinearity in catfish retina. Annals of Biomedical Engineering, 16, 53–64. [CrossRef] [PubMed]
Nemoto, N. Momose, K. Kiyosawa, M. Mori, H. Mochizuki, M. (2004). Characteristics of first and second order kernels of visually evoked potentials elicited by pseudorandom stimulation. Documenta Ophthalmologica, 108, 157–163. [PubMed] [CrossRef] [PubMed]
Neri, P. (2004). Estimation of nonlinear psychophysical kernels. Journal of Vision, 4, (2):2, 82–91, http://journalofvision.org/4/2/2/, doi:10.1167/4.2.2. [PubMed] [Article] [CrossRef]
Neri, P. Heeger, D. J. (2002). Spatiotemporal mechanisms for detecting and identifying image features in human vision. Nature Neuroscience, 5, 812–816. [PubMed] [PubMed]
Neri, P. Levi, D. M. (2006). Receptive versus perceptive fields from the reverse-correlation viewpoint. Vision Research, 46, 2465–2474. [PubMed] [CrossRef] [PubMed]
Neri, P. Levi, D. M. (2008a). Temporal dynamics of directional selectivity in human vision. Journal of Vision, 8, (1):22, 1–11, http://journalofvision.org/8/1/22/, doi:10.1167/8.1.22. [PubMed] [Article] [CrossRef]
Neri, P. Levi, D. M. (2008b). Evidence for joint encoding of motion and disparity in human visual perception. Journal of Neurophysiology, 100, 3117–3133. [PubMed] [CrossRef]
Niu, W. Yuan, J. (2008). A multi-subunit spatiotemporal model of local edge detector cells in the cat retina. Neurocomputing, 72, 302–312. [PubMed] [CrossRef]
Pece, A. E. C. French, A. S. Korenberg, M. J. Kuster, J. E. (1990). Nonlinear mechanisms for gain adaptation in locust photoreceptors. Journal of Biophysics, 57, 733–743. [PubMed] [CrossRef]
Pelli, D. G. (1985). Uncertainty explains many aspects of visual contrast detection and discrimination. Journal of the Optical Society of America A, 2, 1508–1532. [CrossRef]
Priebe, N. J. Ferster, D. (2008). Inhibition, spike threshold, and stimulus selectivity in primary visual cortex. Neuron, 57, 482–497. [PubMed] [CrossRef] [PubMed]
Ringach, D. L. (1998). Tuning of orientation detectors in human vision. Vision Research, 38, 963–972. [CrossRef] [PubMed]
Salgado, H. M. O'Reilly, J. J. (1996). Experimental validation of Volterra series nonlinear modeling for microwave subcarrier optical systems. IEEE Proceedings Optoelectron, 143, 209–213. [CrossRef]
Sakai, H. M. Wang, J. Naka, K. (1995). Contrast gain control in the lower vertebrate retinas. Journal of General Physiology, 105, 815–835. [PubMed] [Article] [CrossRef] [PubMed]
Sandberg, A. Stark, L. W. (1968). Wiener G-function analysis as an approach to nonlinear characteristics of human pupil light reflex. Brain Research, 11, 194–211. [PubMed] [CrossRef] [PubMed]
Sandberg, I. W. (1983). Expansions for discrete-time nonlinear systems. Circuit System Signal Process, 2, 179–192. [CrossRef]
Schetzen, M. (1980). The Volterra and Wiener theories of nonlinear systems. New York: Wiley.
Seber, G. A. F. Wild, C. J. (2003). Nonlinear regression. New York: John Wiley.
Shapley, R. Enroth-Cugell, C. (1984). Visual adaptation and retinal gain controls. Progress Retinal Research, 3, 263–346. [PubMed] [CrossRef]
Simoncelli, E. P. (2003). Seeing patterns in the noise. TICS, 7, 51–53.
Solomon, J. A. (2002). Noise reveals visual mechanisms of detection and discrimination. Journal of Vision, 2, (1):7, 105–120, http://journalofvision.org/2/1/7/, doi:10.1167/2.1.7. [PubMed] [Article] [CrossRef]
Solomon, J. Morgan, M. J. (2006). Stochastic re-calibration: Contextual effects on perceived tilt. Proceedings of the Royal Society of London B: Biological Sciences, 273, 2681–2686. [PubMed] [Article] [CrossRef]
Spekreijse, H. (1969). Rectification in the goldfish retina: Analysis by sinusoidal and auxiliary stimulation. Vision Research, 9, 1461–1472. [PubMed] [CrossRef] [PubMed]
Spekreijse, H. Oosting, H. (1970). Linearizing: A method for analyzing and synthesizing nonlinear systems. Kybernetik, 7, 22–31. [CrossRef] [PubMed]
Spillmann, L. (1971). Foveal perceptive fields in the human visual system measured with simultaneous contrast in grids and bars. Pflugers Arch, 326, 281–299. [CrossRef] [PubMed]
Stark, L. W. (1969). The pupillary control system: Its nonlinear adaptive and stochastic engineering design characteristic. Automatica, 5, 655–676. [CrossRef]
Stone, M. (1974). Cross validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society B: Statistical Methodology, 36, 117–147.
Stromeyer, C. F. Martini, P. (2003). Human temporal impulse response speeds up with increased stimulus contrast. Vision Research, 43, 285–298. [PubMed] [CrossRef] [PubMed]
Tadin, D. Lappin, J. S. Blake, R. (2006). Fine temporal properties of center-surround interactions in motion revealed by reverse correlation. Journal of Neuroscience, 26, 2614–2622. [PubMed] [Article] [CrossRef] [PubMed]
Tanner, Jr., W. P. (1961). Physiological implications of psychophysical data. Annals of the New York Academy of Sciences, 89, 752–765. [PubMed] [CrossRef] [PubMed]
Thomas, J. P. Knoblauch, K. (2005). Frequency and phase contributions to the detection of temporal luminance modulation. Journal of the Optical Society of America A, Optics, Image Science, and Vision, 22, 2257–2261. [PubMed] [Article] [CrossRef] [PubMed]
Tjan, B. S. Nandy, A. S. (2006). Classification images with uncertainty. Journal of Vision, 6, (4):8, 387–413, http://journalofvision.org/6/4/8/, doi:10.1167/6.4.8. [PubMed] [Article] [CrossRef] [PubMed]
van Hateren, J. H. Snippe, H. P. (2001). Information theoretical evaluation of parametric models of gain control in blowfly photoreceptor cells. Vision Research, 41, 1851–1865. [PubMed] [CrossRef] [PubMed]
Victor, J. D. (1987). The dynamics of the cat retinal X cell centre. The Journal of Physiology, 386, 219–246. [PubMed] [Article] [CrossRef] [PubMed]
Victor, J. D. (1988). The dynamics of the cat retinal Y cell subunit. The Journal of Physiology, 405, 289–320. [PubMed] [Article] [CrossRef] [PubMed]
Victor, J. D. (2005). Analyzing receptive fields, classification images and functional images: Challenges with opportunities for synergy. Nature Neuroscience, 8, 1651–1656. [PubMed] [Article] [CrossRef] [PubMed]
Victor, J. D. Shapley, R. M. (1980). A method of nonlinear analysis in the frequency domain. Journal of Biophysics, 29, 459–483. [PubMed] [Article] [CrossRef]
Victor, J. D. Shapley, R. M. Knight, B. W. (1977). Nonlinear analysis of cat retinal ganglion cells in the frequency domain. Proceedings of the National Academy of Sciences of the United States of America, 74, 3068–3072. [PubMed] [CrossRef] [PubMed]
Volterra, V. (1930). Theory of functionals and of integral and integro-differential equations. New York: Dover Publications.
Watson, A. B. (1979). Probability summation over time. Vision Research, 19, 515–522. [PubMed] [CrossRef] [PubMed]
Watson, A. B. (1982). Derivation of the impulse response: Comments on the method of Roufs and Blommaert. Vision Research, 22, 1335–1337. [PubMed] [CrossRef] [PubMed]
Westwick, D. T. Kearney, R. E. (2003). Identification of nonlinear physiological systems. Piscataway, New Jersey: Wiley IEEE Press.
Wiener, N. (1948). Cybernetics or control and communication in the animal and the machine. New York: Wiley.
Wu, M. C. K. David, S. V. Gallant, J. L. (2006). Complete functional characterization of sensory neurons by system identification. Annual Review of Neuroscience, 29, 477–505. [PubMed] [CrossRef] [PubMed]
Yeshurun, Y. Wollberg, Z. Dyn, N. (1987). Identification of MGB cells by Volterra kernels. Biology Cybern, 56, 203–208. [CrossRef]
Yeshurun, Y. Carrasco, M. Maloney, L. T. (2008). Bias and sensitivity in two-interval forced choice procedures: Tests of the difference model. Vision Research, 48, 1837–1851. [PubMed] [CrossRef] [PubMed]
Yu, C. Levi, D. M. (2000). Surround modulation in human vision unmasked by masking experiments. Nature Neuroscience, 3, 724–728. [PubMed] [CrossRef] [PubMed]
Zenger-Landolt, B. Koch, C. (2001). Flanker effects in peripheral contrast discrimination—Psychophysics and modeling. Vision Research, 41, 3663–3675. [PubMed] [CrossRef] [PubMed]
Watanabe, A. Stark, L. (1975). Kernel method for nonlinear analysis: Identification of a biological control system. Mathematical Biosciences, 27, 99–108. [CrossRef]
Figure 1
 
Visual stimuli used in the luminance (A–B) and texture (C–D) experiments. (A) The luminance stimulus consisted of a central region of uniform luminance (bright in the figure) surrounded by a region of uniform luminance out to the dashed circle (dark in the figure), progressively approaching background luminance outside the dashed circle following a Gaussian profile (see magenta cross-section at bottom). (B) The luminance of both center and surround (black and blue traces respectively) was modulated by a random Gaussian process of equal mean and SD for the two regions (red profile to the right). When the target signal was present, this consisted of an additive modulation to the central frame in the sequence (gray trace). (C) The texture stimulus was similar to the luminance stimulus in that it was subdivided into center and surround regions, but the (locally averaged) luminance profile was constant across the stimulus (see magenta cross-section at bottom; light-magenta profile shows the contrast envelope). Instead, the orientation of the texture was manipulated smoothly between vertical and horizontal (see texture bar next to the y axis in D) so that positive modulations corresponded to more horizontally oriented textures, and negative modulations to more vertically oriented textures. Aside of these differences, the texture stimuli contained very similar modulation patterns to those used for the luminance stimuli (compare panels B and D). See Materials and methods for more details.
Figure 1
 
Visual stimuli used in the luminance (A–B) and texture (C–D) experiments. (A) The luminance stimulus consisted of a central region of uniform luminance (bright in the figure) surrounded by a region of uniform luminance out to the dashed circle (dark in the figure), progressively approaching background luminance outside the dashed circle following a Gaussian profile (see magenta cross-section at bottom). (B) The luminance of both center and surround (black and blue traces respectively) was modulated by a random Gaussian process of equal mean and SD for the two regions (red profile to the right). When the target signal was present, this consisted of an additive modulation to the central frame in the sequence (gray trace). (C) The texture stimulus was similar to the luminance stimulus in that it was subdivided into center and surround regions, but the (locally averaged) luminance profile was constant across the stimulus (see magenta cross-section at bottom; light-magenta profile shows the contrast envelope). Instead, the orientation of the texture was manipulated smoothly between vertical and horizontal (see texture bar next to the y axis in D) so that positive modulations corresponded to more horizontally oriented textures, and negative modulations to more vertically oriented textures. Aside of these differences, the texture stimuli contained very similar modulation patterns to those used for the luminance stimuli (compare panels B and D). See Materials and methods for more details.
Figure 2
 
Luminance first-order and second-order kernels for the aggregate observer (36,250 trials). (A) shows the first-order kernels for center (black) and surround (blue), as well as the diagonals of second-order kernels for center (red), surround (yellow) and center-surround (magenta). (B–D) show full second-order kernels for center (B), surround (C) and center-surround (D). (A–D) show kernels derived from all noise fields, (E–H) only from images containing a target, (I–L) only from images not containing a target (see Materials and methods). Filter amplitude for first-order kernels is in units of external noise SD (σN) and plotted to the black numeric labels on the y axis; for second-order kernel diagonals the amplitude is in units of external noise variance and plotted to the red numerical labels. Second-order kernels are plotted as Z score maps (colored for ∣Z∣ > 2, green for positive and blue for negative). White traces show marginal averages along the main diagonal. Target occurrence is indicated by gray vertical bar in A, E and I. Shaded regions in A, E and I show ±1 SEM.
Figure 2
 
Luminance first-order and second-order kernels for the aggregate observer (36,250 trials). (A) shows the first-order kernels for center (black) and surround (blue), as well as the diagonals of second-order kernels for center (red), surround (yellow) and center-surround (magenta). (B–D) show full second-order kernels for center (B), surround (C) and center-surround (D). (A–D) show kernels derived from all noise fields, (E–H) only from images containing a target, (I–L) only from images not containing a target (see Materials and methods). Filter amplitude for first-order kernels is in units of external noise SD (σN) and plotted to the black numeric labels on the y axis; for second-order kernel diagonals the amplitude is in units of external noise variance and plotted to the red numerical labels. Second-order kernels are plotted as Z score maps (colored for ∣Z∣ > 2, green for positive and blue for negative). White traces show marginal averages along the main diagonal. Target occurrence is indicated by gray vertical bar in A, E and I. Shaded regions in A, E and I show ±1 SEM.
Figure 3
 
Texture first-order and second-order kernels for the aggregate observer (20,350 trials). Same plotting conventions as Figure 2. Shaded regions in A, E and I show ±1 SEM.
Figure 3
 
Texture first-order and second-order kernels for the aggregate observer (20,350 trials). Same plotting conventions as Figure 2. Shaded regions in A, E and I show ±1 SEM.
Figure 4
 
Combined kernels for both luminance and texture experiments (aggregate observer). Top row shows combined kernels for the luminance experiments obtained from all noise images (A), target-present images only (B) and target-absent images only (C). First-order combined kernels (black) were computed by subtracting the surround first-order kernel from the center first-order kernel. Second-order combined kernels (red) were computed by summing the diagonal second-order kernels for center and surround, and by subtracting that for center-surround (see Materials and methods). (D–F) show combined kernels for the texture experiments. Yellow and blue lines in A show regions used to compute modulation ratios (see Materials and methods and Figure 5). Yellow trace in C shows best-fitting exponential to the red trace (thin yellow lines show 95% confidence intervals for the fit). Shaded regions show ±1 SEM.
Figure 4
 
Combined kernels for both luminance and texture experiments (aggregate observer). Top row shows combined kernels for the luminance experiments obtained from all noise images (A), target-present images only (B) and target-absent images only (C). First-order combined kernels (black) were computed by subtracting the surround first-order kernel from the center first-order kernel. Second-order combined kernels (red) were computed by summing the diagonal second-order kernels for center and surround, and by subtracting that for center-surround (see Materials and methods). (D–F) show combined kernels for the texture experiments. Yellow and blue lines in A show regions used to compute modulation ratios (see Materials and methods and Figure 5). Yellow trace in C shows best-fitting exponential to the red trace (thin yellow lines show 95% confidence intervals for the fit). Shaded regions show ±1 SEM.
Figure 5
 
Assessment of luminance kernel structure using scalar metrics. First column (A, E and I) shows centroid for second-order diagonal (y axis) versus first-order kernel (x axis); black symbols for center, blue symbols for surround and red symbols for the combined kernels. Gray shading shows temporal ranges spanned by the target signal. Asterisks (color-coded like the data) indicate whether data points fall significantly away from unity line. Second column (B, F and J) is plotted to similar conventions but reports times of peak occurrence (the plotted values are averages of bootstrap estimates (Efron, 1979) in order to avoid excessive overlap of data points due to peak times being discrete; however all statistical comparisons in the text were carried out on actual peak times). Digits show average time difference (in ms) between first-order and second-order peaks (±1 SD across observers) for combined kernels. Third column (C, G and K) plots early/late modulation ratios (see Materials and methods) using similar conventions, but asterisks next to the horizontal dashed line indicate whether data points fall significantly away (above or below) from the horizontal dashed line, while asterisks next to the vertical dashed line indicate whether data points fall significantly away (left or right) from the vertical dashed line. Fourth column (D, H and L) is plotted to the same conventions as the third column, but plots edges/center modulation ratios (see Materials and methods). Inset to A plots centroid values for center versus surround. Error bars (smaller than symbol when not visible) show ±1 SEM. Different symbol shapes refer to different observers: S1 •, S2 ▪, S3 ♦, S4 ▸, S5 ◂, S6 ▴.
Figure 5
 
Assessment of luminance kernel structure using scalar metrics. First column (A, E and I) shows centroid for second-order diagonal (y axis) versus first-order kernel (x axis); black symbols for center, blue symbols for surround and red symbols for the combined kernels. Gray shading shows temporal ranges spanned by the target signal. Asterisks (color-coded like the data) indicate whether data points fall significantly away from unity line. Second column (B, F and J) is plotted to similar conventions but reports times of peak occurrence (the plotted values are averages of bootstrap estimates (Efron, 1979) in order to avoid excessive overlap of data points due to peak times being discrete; however all statistical comparisons in the text were carried out on actual peak times). Digits show average time difference (in ms) between first-order and second-order peaks (±1 SD across observers) for combined kernels. Third column (C, G and K) plots early/late modulation ratios (see Materials and methods) using similar conventions, but asterisks next to the horizontal dashed line indicate whether data points fall significantly away (above or below) from the horizontal dashed line, while asterisks next to the vertical dashed line indicate whether data points fall significantly away (left or right) from the vertical dashed line. Fourth column (D, H and L) is plotted to the same conventions as the third column, but plots edges/center modulation ratios (see Materials and methods). Inset to A plots centroid values for center versus surround. Error bars (smaller than symbol when not visible) show ±1 SEM. Different symbol shapes refer to different observers: S1 •, S2 ▪, S3 ♦, S4 ▸, S5 ◂, S6 ▴.
Figure 6
 
Applicability of Hammerstein (A) and Korenberg (E) cascade models to the experimental data (see 1 and Extra Figure 6 for detailed description of the implementation used here). Amplitude of first-order kernel at each time point is plotted on x axis against either amplitude of second-order diagonal at each corresponding time point (B–C) or amplitude of second-order marginal (F–G) separately for each subject, for both luminance (B and F) and texture kernels (C and G) and for both center (black) and surround (blue), in units of their SD. Red and yellow ovals reflect the linear fit characteristics for center and surround respectively (the oval is tilted to align with best-fit line, positioned at center of mass, with parallel-to-line and orthogonal-to-line widths equal to standard deviations of the data across the two axes). The correlation values for the linear fits from B–C are plotted in D (for texture on the y axis versus luminance on the x axis), and from F–G in H. The shaded regions show ranges expected from simulations of both cascade models with varying degrees of internal noise and external SNR, constrained by output performance of the human observers (see Extra Figure 6). Projected ranges are only plotted with reference to the texture axis because luminance correlations were not significantly different from 0 in either D or H (see text), and are shown separately for center (gray) and surround (light blue). Error bars (smaller than symbol when not visible) show ±1 SEM.
Figure 6
 
Applicability of Hammerstein (A) and Korenberg (E) cascade models to the experimental data (see 1 and Extra Figure 6 for detailed description of the implementation used here). Amplitude of first-order kernel at each time point is plotted on x axis against either amplitude of second-order diagonal at each corresponding time point (B–C) or amplitude of second-order marginal (F–G) separately for each subject, for both luminance (B and F) and texture kernels (C and G) and for both center (black) and surround (blue), in units of their SD. Red and yellow ovals reflect the linear fit characteristics for center and surround respectively (the oval is tilted to align with best-fit line, positioned at center of mass, with parallel-to-line and orthogonal-to-line widths equal to standard deviations of the data across the two axes). The correlation values for the linear fits from B–C are plotted in D (for texture on the y axis versus luminance on the x axis), and from F–G in H. The shaded regions show ranges expected from simulations of both cascade models with varying degrees of internal noise and external SNR, constrained by output performance of the human observers (see Extra Figure 6). Projected ranges are only plotted with reference to the texture axis because luminance correlations were not significantly different from 0 in either D or H (see text), and are shown separately for center (gray) and surround (light blue). Error bars (smaller than symbol when not visible) show ±1 SEM.
Figure 7
 
Structure of second-order off-diagonal regions. Black trace in A corresponds to white trace (marginal average along main diagonal) in Figure 2B for luminance center-center second-order kernel, red trace to white trace in Figure 3B for texture kernel. Right traces show corresponding Fourier power spectra, from which we computed a bandpass index (see Materials and methods) for both texture (plotted on y axis in B) and luminance kernels (x axis) shown separately for center-center (black), surround-surround (blue), center-surround (gray) and combined (across all previous three) kernels (red). Dashed lines show values expected for a Gaussian filter (see Materials and methods). Bars next to axes show distributions pooled across all data points. Black and blue traces in C show on-axis modulations within center-center and surround-surround kernels respectively. Inset shows center-center kernel from Figure 2B, whose on-axis modulation profile is indicated by the red rectangle. Magenta trace shows first-order kernel of M (parasol) retinal ganglion cell estimated by Benardete and Kaplan (1999). Thin black line shows the uncertainty filter from parameter average (see Figure 9F). All traces in C have been rescaled (across y axis) and time-shifted (along x axis) so that their peaks equal 1 and are aligned at time 0. Bandpass index for on-axis profiles is plotted on the y axis in D versus mean frequency of power spectrum distribution on the x axis for each subject (different symbols), for both luminance (solid) and texture (open) kernels and for both center (black) and surround (blue). Bars next to axes show corresponding distributions. Dashed horizontal line as in B, vertical magenta line indicates expected value for magenta trace in C. Shaded regions (A and C) and error bars (B and D) show ±1 SEM.
Figure 7
 
Structure of second-order off-diagonal regions. Black trace in A corresponds to white trace (marginal average along main diagonal) in Figure 2B for luminance center-center second-order kernel, red trace to white trace in Figure 3B for texture kernel. Right traces show corresponding Fourier power spectra, from which we computed a bandpass index (see Materials and methods) for both texture (plotted on y axis in B) and luminance kernels (x axis) shown separately for center-center (black), surround-surround (blue), center-surround (gray) and combined (across all previous three) kernels (red). Dashed lines show values expected for a Gaussian filter (see Materials and methods). Bars next to axes show distributions pooled across all data points. Black and blue traces in C show on-axis modulations within center-center and surround-surround kernels respectively. Inset shows center-center kernel from Figure 2B, whose on-axis modulation profile is indicated by the red rectangle. Magenta trace shows first-order kernel of M (parasol) retinal ganglion cell estimated by Benardete and Kaplan (1999). Thin black line shows the uncertainty filter from parameter average (see Figure 9F). All traces in C have been rescaled (across y axis) and time-shifted (along x axis) so that their peaks equal 1 and are aligned at time 0. Bandpass index for on-axis profiles is plotted on the y axis in D versus mean frequency of power spectrum distribution on the x axis for each subject (different symbols), for both luminance (solid) and texture (open) kernels and for both center (black) and surround (blue). Bars next to axes show corresponding distributions. Dashed horizontal line as in B, vertical magenta line indicates expected value for magenta trace in C. Shaded regions (A and C) and error bars (B and D) show ±1 SEM.
Figure 8
 
Retinal model and associated kernels. (A) shows model structure: input stimulus is convolved (*) with center-surround receptive field (black and blue smooth traces) using first-order filter from primate retinal ganglion cells (magenta trace in Figure 7C). Output from this stage is half-wave rectified and fed onto an accumulator (inverted triangular symbol; implemented by differential equation, see Materials and methods), then integrated across time (Σt) to yield scalar decision variable further converted into psychophysical decision. Time-varying signal from accumulator stage is carried back into circuit with temporal delay τ = 60 ms to control gain of linear stage via divisive inhibition. Gain control loop is highlighted in red. The model was used to simulate a psychophysical experiment adopting same parameters selected for human observers. Associated kernels are plotted in B–J (because center-center and surround-surround kernels are intrinsically symmetric it was sufficient to plot only half of each). Plotting conventions are similar to Figure 2, except shaded regions in B, E and H show ±1 SD across repeated simulations and second-order kernels are plotted as Zm score maps where Zm is in units of SD rather than SEM (colored for ∣Zm∣ > 2). Yellow trace in H shows best-fitting exponential to red trace (thin yellow lines show 95% confidence intervals) similarly to Figure 4C. Second-order traces for surround-surround kernels (yellow) overlap with center-center kernels (red) so are not visible separately in B, E and H.
Figure 8
 
Retinal model and associated kernels. (A) shows model structure: input stimulus is convolved (*) with center-surround receptive field (black and blue smooth traces) using first-order filter from primate retinal ganglion cells (magenta trace in Figure 7C). Output from this stage is half-wave rectified and fed onto an accumulator (inverted triangular symbol; implemented by differential equation, see Materials and methods), then integrated across time (Σt) to yield scalar decision variable further converted into psychophysical decision. Time-varying signal from accumulator stage is carried back into circuit with temporal delay τ = 60 ms to control gain of linear stage via divisive inhibition. Gain control loop is highlighted in red. The model was used to simulate a psychophysical experiment adopting same parameters selected for human observers. Associated kernels are plotted in B–J (because center-center and surround-surround kernels are intrinsically symmetric it was sufficient to plot only half of each). Plotting conventions are similar to Figure 2, except shaded regions in B, E and H show ±1 SD across repeated simulations and second-order kernels are plotted as Zm score maps where Zm is in units of SD rather than SEM (colored for ∣Zm∣ > 2). Yellow trace in H shows best-fitting exponential to red trace (thin yellow lines show 95% confidence intervals) similarly to Figure 4C. Second-order traces for surround-surround kernels (yellow) overlap with center-center kernels (red) so are not visible separately in B, E and H.
Figure 9
 
Replicability of human responses for different nonlinear models. (A) Linear-human consistency (percentage of trials on which the linear first-order model and the human observer gave the same response to the same set of stimuli) is plotted on the x axis for luminance kernels against model-human consistency for seven different nonlinear models. Red and light-red symbols refer to models that incorporate second-order kernels, blue and light-blue symbols refer to variants of the uncertainty model (see Materials and methods). Solid red symbols refer to the full nonlinear second-order model; light-red symbols refer to the same model when regions outside the diagonal of the second-order kernels are set to 0, open red symbols when regions within the diagonal are set to 0. Solid blue symbols refer to the uncertainty model with different f filters for different observers (individually optimized) and front-end convolution weighted by a linearly decreasing function, open blue symbols when weighted by constant function; solid light-blue symbols refer to same f for all observers (obtained by averaging individually optimized f parameters) and linearly decreasing weighting function; open light-blue symbols refer to f equal to the first-order kernel derived from target-present images (different f's were applied to center and surround) and linearly decreasing weighting function. Bars at bottom-left and top-right show corresponding distributions after pooling across all second-order models (red) or all uncertainty models (blue). Green symbols refer to a model that responds correctly on every trial. (B) plots the same for texture kernels. C and D plot model-human consistency for the full nonlinear second-order model as a function of k (relative contribution of nonlinear kernel) ranging from 0 to ∞. Red vertical lines show peak locations for the different traces (one trace per observer). Corresponding k values are plotted in E for luminance (x axis) versus texture (y axis). Dashed lines mark unity. (F) Filter functions used in the uncertainty model for both luminance (left) and texture (right) simulations, when the weighting function (red line) was constant (top) and linearly decreasing (bottom). Gray traces show individually optimized filters for different observers, blue traces show the function corresponding to the average of the individually optimized parameters. Error bars (smaller than symbol when not visible) show ±1 SEM. For panel E they were computed using Hessian-based delta method (Seber & Wild, 2003).
Figure 9
 
Replicability of human responses for different nonlinear models. (A) Linear-human consistency (percentage of trials on which the linear first-order model and the human observer gave the same response to the same set of stimuli) is plotted on the x axis for luminance kernels against model-human consistency for seven different nonlinear models. Red and light-red symbols refer to models that incorporate second-order kernels, blue and light-blue symbols refer to variants of the uncertainty model (see Materials and methods). Solid red symbols refer to the full nonlinear second-order model; light-red symbols refer to the same model when regions outside the diagonal of the second-order kernels are set to 0, open red symbols when regions within the diagonal are set to 0. Solid blue symbols refer to the uncertainty model with different f filters for different observers (individually optimized) and front-end convolution weighted by a linearly decreasing function, open blue symbols when weighted by constant function; solid light-blue symbols refer to same f for all observers (obtained by averaging individually optimized f parameters) and linearly decreasing weighting function; open light-blue symbols refer to f equal to the first-order kernel derived from target-present images (different f's were applied to center and surround) and linearly decreasing weighting function. Bars at bottom-left and top-right show corresponding distributions after pooling across all second-order models (red) or all uncertainty models (blue). Green symbols refer to a model that responds correctly on every trial. (B) plots the same for texture kernels. C and D plot model-human consistency for the full nonlinear second-order model as a function of k (relative contribution of nonlinear kernel) ranging from 0 to ∞. Red vertical lines show peak locations for the different traces (one trace per observer). Corresponding k values are plotted in E for luminance (x axis) versus texture (y axis). Dashed lines mark unity. (F) Filter functions used in the uncertainty model for both luminance (left) and texture (right) simulations, when the weighting function (red line) was constant (top) and linearly decreasing (bottom). Gray traces show individually optimized filters for different observers, blue traces show the function corresponding to the average of the individually optimized parameters. Error bars (smaller than symbol when not visible) show ±1 SEM. For panel E they were computed using Hessian-based delta method (Seber & Wild, 2003).
Figure 10
 
Accounting for internal noise. (A) Model-human consistency for full nonlinear second-order models is plotted on y axis against human-human consistency on x axis (estimated from double-pass experiments, see Materials and methods). Gray area shows region spanned by best possible model (see Neri & Levi, 2006 and Materials and methods). (B) plots signal discriminability (d′ adjusted for internal noise) on x axis versus internal noise (units of external noise SD) on y axis. Dashed vertical line marks unity. Shaded horizontal region shows range estimated by Burgess and Colborne (1988). Black bars next to axes show mean ± SD for the data points, thick for solid and thin for open. Gray bars next to x axis show output (unadjusted) discriminability (see text). C plots experimentally measured absolute efficiency on x axis versus efficiency predicted by L(N) linear amplifier model (Murray et al., 2005). Inset plots measured efficiency for luminance versus texture. Except for inset, solid symbols refer to luminance data and open symbols to texture data across entire figure. Error bars (smaller than symbol when not visible) show ±1 SEM.
Figure 10
 
Accounting for internal noise. (A) Model-human consistency for full nonlinear second-order models is plotted on y axis against human-human consistency on x axis (estimated from double-pass experiments, see Materials and methods). Gray area shows region spanned by best possible model (see Neri & Levi, 2006 and Materials and methods). (B) plots signal discriminability (d′ adjusted for internal noise) on x axis versus internal noise (units of external noise SD) on y axis. Dashed vertical line marks unity. Shaded horizontal region shows range estimated by Burgess and Colborne (1988). Black bars next to axes show mean ± SD for the data points, thick for solid and thin for open. Gray bars next to x axis show output (unadjusted) discriminability (see text). C plots experimentally measured absolute efficiency on x axis versus efficiency predicted by L(N) linear amplifier model (Murray et al., 2005). Inset plots measured efficiency for luminance versus texture. Except for inset, solid symbols refer to luminance data and open symbols to texture data across entire figure. Error bars (smaller than symbol when not visible) show ±1 SEM.
Figure 11
 
Differential analysis of correct and incorrect trials. (A) Consistency (full colors) and cross-consistency (lighter colors) between human responses and second-order kernels (sum of all three) is plotted on y axis against consistency between human responses and first-order kernels (sum of both) on x axis, for all trials (black) and for trials on which observers responded incorrectly (blue) or correctly (red). Solid for luminance and open for texture. Dashed lines mark chance (50%). Bars next to axis show mean ± SD for corresponding full-color data points. Cross-consistency was computed using a leave-one-out cross-validation technique (see Materials and methods). Inset shows final prediction error (FPE) values for luminance kernel outputs (see Materials and methods and Results). (B) output from both first-order (black) and second-order (red) luminance kernels is plotted for every trial (one point per trial) on x axis against output on the same trial from corresponding leave-one-out kernels (computed from all trials except the one for which output is being plotted) on y axis, for a representative observer (S1). Spread is more pronounced for second-order as opposed to first-order kernels (correlations between x and y for the former are larger than the latter across observers at p < 10 −3). Inset magnifies region around the origin directly relevant for predicting human responses. High-contrast data points take opposite signs for x and y coordinates, indicating trials classified one way by full kernels and the opposite way by leave-one-out kernels. There are many more trials on which second-order kernels give opposite responses (high-contrast red dots) than trials on which first-order kernels give opposite responses (high-contrast black dots) demonstrating that (as expected) the cross-validation procedure impacts second-order kernels more than first-order kernels. (C–D) Cross-consistency for center and surround kernels separately. (E) Cross-consistency for first-order center + surround on x axis (same x values as light symbols in A) versus second-order center-surround kernel on y axis. Error bars (smaller than symbol when not visible) show ±1 SEM.
Figure 11
 
Differential analysis of correct and incorrect trials. (A) Consistency (full colors) and cross-consistency (lighter colors) between human responses and second-order kernels (sum of all three) is plotted on y axis against consistency between human responses and first-order kernels (sum of both) on x axis, for all trials (black) and for trials on which observers responded incorrectly (blue) or correctly (red). Solid for luminance and open for texture. Dashed lines mark chance (50%). Bars next to axis show mean ± SD for corresponding full-color data points. Cross-consistency was computed using a leave-one-out cross-validation technique (see Materials and methods). Inset shows final prediction error (FPE) values for luminance kernel outputs (see Materials and methods and Results). (B) output from both first-order (black) and second-order (red) luminance kernels is plotted for every trial (one point per trial) on x axis against output on the same trial from corresponding leave-one-out kernels (computed from all trials except the one for which output is being plotted) on y axis, for a representative observer (S1). Spread is more pronounced for second-order as opposed to first-order kernels (correlations between x and y for the former are larger than the latter across observers at p < 10 −3). Inset magnifies region around the origin directly relevant for predicting human responses. High-contrast data points take opposite signs for x and y coordinates, indicating trials classified one way by full kernels and the opposite way by leave-one-out kernels. There are many more trials on which second-order kernels give opposite responses (high-contrast red dots) than trials on which first-order kernels give opposite responses (high-contrast black dots) demonstrating that (as expected) the cross-validation procedure impacts second-order kernels more than first-order kernels. (C–D) Cross-consistency for center and surround kernels separately. (E) Cross-consistency for first-order center + surround on x axis (same x values as light symbols in A) versus second-order center-surround kernel on y axis. Error bars (smaller than symbol when not visible) show ±1 SEM.
Figure 12
 
Correlation (or lack thereof) between first-order and second-order outputs. (A) Output from first-order kernels (sum of both) is plotted on x axis against output from second-order kernels (sum of all three) on y axis for a representative observer (S1). Each point refers to an individual trial, red if the observer responded correctly and blue if incorrectly. Traces next to x and y axes plot marginal distributions. Linear-fit ovals (see caption to Figure 6 for explanation) are yellow for red data points and black for blue. B same as A but for texture kernels. (C) Correlation between first-order and second-order outputs is plotted for texture kernels on y axis versus luminance kernels on x axis, when computed from all trials (black) and from trials on which observers responded correctly (red) or incorrectly (blue). Open symbols show results obtained after removing the target signal from the stimuli. Dashed lines mark 0 correlation. Solid lines mark unity lines (y = ∣x∣). Error bars in C show ±1 SEM (smaller than symbol when not visible).
Figure 12
 
Correlation (or lack thereof) between first-order and second-order outputs. (A) Output from first-order kernels (sum of both) is plotted on x axis against output from second-order kernels (sum of all three) on y axis for a representative observer (S1). Each point refers to an individual trial, red if the observer responded correctly and blue if incorrectly. Traces next to x and y axes plot marginal distributions. Linear-fit ovals (see caption to Figure 6 for explanation) are yellow for red data points and black for blue. B same as A but for texture kernels. (C) Correlation between first-order and second-order outputs is plotted for texture kernels on y axis versus luminance kernels on x axis, when computed from all trials (black) and from trials on which observers responded correctly (red) or incorrectly (blue). Open symbols show results obtained after removing the target signal from the stimuli. Dashed lines mark 0 correlation. Solid lines mark unity lines (y = ∣x∣). Error bars in C show ±1 SEM (smaller than symbol when not visible).
Supplementary Figure 1
Supplementary Figure 2
Supplementary Figure 3
Supplementary Figure 4
Supplementary Figure 5
Supplementary Figure 6
Supplementary Figure 7
Supplementary Figure 8
Supplementary Figure 9
Supplementary Figure 10
Supplementary Figure 11
Supplementary Figure 12
Supplementary File
Supplementary File
Supplementary File
Supplementary File
Supplementary File
Supplementary File
Supplementary File
Supplementary File
Supplementary File
Supplementary File
Supplementary File
Supplementary File
Supplementary File
Supplementary File
Supplementary File
Supplementary File
Supplementary File
Supplementary File
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×