Open Access
Article  |   January 2025
Estimating the contribution of early and late noise in vision from psychophysical data
Author Affiliations
Journal of Vision January 2025, Vol.25, 12. doi:https://doi.org/10.1167/jov.25.1.12
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Jesús Malo, José Juan Esteve-Taboada, Guillermo Aguilar, Marianne Maertens, Felix A. Wichmann; Estimating the contribution of early and late noise in vision from psychophysical data. Journal of Vision 2025;25(1):12. https://doi.org/10.1167/jov.25.1.12.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Human performance in psychophysical detection and discrimination tasks is limited by inner noise. It is unclear to what extent this inner noise arises from early noise (e.g., in the photoreceptors) or from late noise (at or immediately prior to the decision stage, presumably in cortex). Very likely, the behaviorally limiting inner noise is a nontrivial combination of both early and late noise. Here we propose a method to quantify the contributions of early and late noise purely from psychophysical data. Our approach generalizes classical results for linear systems by combining the theory of noise propagation through a nonlinear network with expressions to obtain a perceptual metric through a nonlinear network. We show that from threshold-only data, the relative contributions of early and late noise can only be disentangled when the experiments include substantial external noise. When full psychometric functions are available, early and late noise sources can be quantified even in the absence of external noise. Our psychophysical estimate of the magnitude of early noise—assuming a standard cascade of linear and nonlinear model stages—is substantially lower than the noise in cone photocurrents computed via an accurate model of retinal physiology, the ISETBio. This is consistent with the idea that one of the fundamental tasks of early vision is to reduce the comparatively large retinal noise.

Introduction
One goal of psychophysical research is to relate physical to perceived stimulus properties. An important characteristic of this relationship is its variability. When the same stimulus is presented multiple times (which is common in psychophysical experiments), it might evoke different behavioral responses, at least when it is neither trivially easy nor difficult to perceive (neither “at ceiling” nor “at floor”). Since the advent of signal detection theory (SDT) in the 1950s, the source of this behavioral variability has been conceptualized and modeled as arising from inner noise on a putative decision- or evidence-axis (Tanner & Swets, 1954; Swets, 1961; Green & Swets, 1988). In this widely accepted view, perceptual detection and discrimination behavior depends on the strength of the inner response to a stimulus relative to the inner variability (“signal and noise”). 
From a neurophysiological standpoint, noise sources in sensory systems are multiple and arise at every stage within the system. In the visual system, neuronal variability (noise) is observed for photon detection in the photoreceptors as well as for all neurons along the visual pathways. The effect of the different neurophysiologically measured noise sources on observable psychophysical behavior is unclear, however. 
Despite the neuroscientific identification of many noise sources, perceptual models of human visual behavior commonly refer to all noise sources together as inner noise (Burgess & Colborne, 1988). Often, inner noise is modeled as a single late noise source, which is added as decision noise after completion of all processing stages in a model. The discrimination ability of a model is hence determined by the variability of the model output (i.e., after the stimuli have propagated through and are changed by the system). Discussion of noise in psychophysical models is predominantly limited to the type of late noise, that is, whether its variance is fixed or varies with signal strength (Wichmann, 1999; Georgeson & Meese, 2006). A discussion of the number of noise sources and their position within the (nonlinear) processing stages is comparatively rare (but see Pelli, 1991; Pelli & Farell, 1999; Henning, Bird, & Wichmann, 2002). 
We can think of at least two reasons why the number and type of noise sources in behavioral models of perception have been rarely explored and are thus still wide open: 
  • (1) Psychophysically, all we can measure is the probability of correct detection or discrimination as a function of stimulus intensity or stimulus differences. In both cases, performance is codetermined by nonlinearities of stimulus processing and by noise, and hence inferences about the type of noise can only be made for a fixed nonlinear transformation (and vice versa).
  • (2) Inclusion of an early noise source complicates matters considerably, because early noise, like the stimulus itself, will be nonlinearly transformed through the system. This typically results in a noise term for which no closed-form expression exists. The behavior of such a model would thus require (extensive) numerical simulations, and fitting such a model to psychophysical data is likely to be cumbersome.
Here we propose a strategy to separately estimate at least two different components of noise in the visual system, early and late. Roughly speaking, we attribute early noise to photoreceptors and late noise to all subsequent neuronal processes. We use external noise (i.e., noise applied to the external stimulus, whose magnitude and properties are under the complete control of the experimenter) to estimate early and late noise components from psychophysical data. We adopt ideas from the theory of noise propagation through a network (Ahumada, 1987) and the transforms of distance metrics in a network (Malo, 1999; Malo, Epifanio, Navarro, & Simoncelli, 2006; Laparra, Muñoz, & Malo, 2010) to derive a perceptual metric that depends on the different noise sources and is calibrated by the external noise. 
Outline
The structure of the article is as follows: In Modeling framework and intuition, we introduce our modeling framework, the notation, and a simplified vision model that illustrates our method. We also describe the experimental data that we use to estimate early and late noise parameters. In Noise estimation I: Using only threshold data, we propose a method to estimate the noise sources using threshold-only data. The idea is to propagate a stimulus perturbed by multivariate noise through a vision model and to compute a noise-dependent Mahalanobis distance metric from these multiple propagations. We show how our method generalizes the classical result of Burgess and Colborne (1988) and discuss the key role of external noise to determine the relative magnitude of early and late noise sources in threshold-only experiments. In Noise estimation II: Using full psychometric functions, we show how we can estimate the relative magnitude of early and late noise sources even without external noise when we have the full psychometric function and not just thresholds. The Discussion explores the connection of our method with physiological estimates of (early) retinal noise and with other psychophysical methods for (inner) noise estimation. We also outline the implications of our estimates for other experimental methods and for information-theory approaches to study vision. Appendices present mathematical proofs, fitting procedures, specific simulation results, and suggestions on how to extend the presented methods. 
Modeling framework and intuition
Notation
We assume a vision model S, which transforms the visual stimulus, the vector \({\boldsymbol x}\), into a representation of the stimulus, the vector \({\boldsymbol y}\). We assume the transform S to be deterministic. The input to S is assumed to be a mixture of the stimulus \({\boldsymbol x}\) and early noise \({\boldsymbol n}_e\), which is added to \({\boldsymbol x}\) at an early stage, presumably the retina. We add late noise \({\boldsymbol n}_l\) to the output from S, which subsumes and reflects the noise that may happen at various levels of visual processing summarized in S. Early and late noise lead to a single noise term inner noise \({\boldsymbol n}_\mathcal {I}\), which is added to the deterministic response of the system. The inner noise \({\boldsymbol n}_\mathcal {I}\) is the difference between the noisy response and the deterministic response to the stimulus, \({\boldsymbol n}_\mathcal {I} = {\boldsymbol y} - S({\boldsymbol x})\). These concepts are summarized in the following diagram:  
Image not available
(1)
 
It is the noise after the transformation S (inner noise, \({\boldsymbol n}_\mathcal {I}\)) that limits the performance of an observer when discriminating two stimuli \({\boldsymbol x_1}\) and \({\boldsymbol x_2}\), in particular, when the difference of system responses \(\Delta S = S({\boldsymbol x_2}) - S({\boldsymbol x_1})\) is small relative to the inner noise, \({\boldsymbol n}_\mathcal {I} \gg \Delta S\). We call the first stage, before S has been applied, “early representation” and the second stage, after S has been applied, “late representation.” 
Vision model
The vision model we adopt as the deterministic transform S is a standard linear+nonlinear cascade. The model consists of a linear stage with wavelet filters, which are scaled in agreement with the contrast sensitivity function (CSF; Robson, 1966), followed by a point-wise nonlinearity m, which represents a simple model of masking:  
Image not available
(2)
The input to the deterministic model S consists of the stimulus plus the early noise, \({\boldsymbol x} +{\boldsymbol n}_e\), expressed in units of cd/m2. Our estimation methods, as seen in Equations 6 and 9, are general in the sense that they are not attached to a specific noise formulation, may depend on the input (Dayan & Abbott, 2005; Cottaris, Jiang, Ding, Wandell, & Brainard, 2019; Cottaris, Wandell, Rieke, & Brainard, 2020), and include correlations (Moreno-Bote et al., 2014; Kohn, Coen-Cagli, Kanitscheider, & Pouget, 2016). For the sake of simplicity, here we assume that the early noise follows a simple Gaussian–Poisson distribution (Wichmann, 1999; Dayan & Abbott, 2005; Cottaris et al., 2019; Cottaris et al., 2020) (see Equation A7 in Appendix A for the expression of the signal-dependent covariance). The psychophysical scaling parameter (Fano factor, βe) of this distribution is unknown, and its determination is one of the goals of our method. 
Figure 1.
 
Illustration of the vision model. The input image, \({\boldsymbol x}\), is corrupted by early noise, \({\boldsymbol n_e}\), leading to an early (noisy) representation. In this example, we assume Poisson noise, and its amplitude increases with the luminance of the input (see the noise in the green circle with respect to the noise in the red circle). Then the signal is analyzed by a set of linear wavelet-like oriented filters (tuned to 0, 3, 6, 12, and 24 cpd), with responses \({\boldsymbol w}\). In the figure, we use the classical representation of subbands in the wavelet literature (Simoncelli, Freeman, Adelson, & Heeger, 1992). The 24-cpd subband is not represented for clarity. Then, the responses are multiplied by the CSF weights in the wavelet domain (Malo, Pons, Felipe, & Artigas, 1997) (\(\mathbb {D}_{\mathit {CSF}}\)), shown in the lower left inset. Here the lighter gray corresponds to bigger weights. The represented CSF shows the bandpass behavior and the oblique effect. CSF weighting is apparent in the attenuation of high-frequency subbands in \({\boldsymbol w^{\prime }}\): See how the energy in responses in the solid blue circle is reduced in the solid yellow circle. Then, the responses undergo a fixed saturating nonlinearity, m(·) (Legge, 1981) that preserves the relative scale of each subband (Martinez, Bertalmío, & Malo, 2019; Malo, Esteve-Taboada, & Bertalmío, 2024). This nonlinearity takes the low-amplitude responses (e.g., in the dashed yellow circle) and leads to enhanced responses (in the dashed blue circle). Finally, late noise, \({\boldsymbol n_l}\), is added to the responses in this late representation. The Poisson nature of the late noise is apparent in that its amplitude in all subbands is larger in the spatial region with larger contrast and hence larger response of texture detectors (see larger noise at the right, e.g., in dashed green circle, and less noise at the left, dashed red circle). The amplitude of the early and late noises has been scaled for clarity (×40 and ×4, respectively).
Figure 1.
 
Illustration of the vision model. The input image, \({\boldsymbol x}\), is corrupted by early noise, \({\boldsymbol n_e}\), leading to an early (noisy) representation. In this example, we assume Poisson noise, and its amplitude increases with the luminance of the input (see the noise in the green circle with respect to the noise in the red circle). Then the signal is analyzed by a set of linear wavelet-like oriented filters (tuned to 0, 3, 6, 12, and 24 cpd), with responses \({\boldsymbol w}\). In the figure, we use the classical representation of subbands in the wavelet literature (Simoncelli, Freeman, Adelson, & Heeger, 1992). The 24-cpd subband is not represented for clarity. Then, the responses are multiplied by the CSF weights in the wavelet domain (Malo, Pons, Felipe, & Artigas, 1997) (\(\mathbb {D}_{\mathit {CSF}}\)), shown in the lower left inset. Here the lighter gray corresponds to bigger weights. The represented CSF shows the bandpass behavior and the oblique effect. CSF weighting is apparent in the attenuation of high-frequency subbands in \({\boldsymbol w^{\prime }}\): See how the energy in responses in the solid blue circle is reduced in the solid yellow circle. Then, the responses undergo a fixed saturating nonlinearity, m(·) (Legge, 1981) that preserves the relative scale of each subband (Martinez, Bertalmío, & Malo, 2019; Malo, Esteve-Taboada, & Bertalmío, 2024). This nonlinearity takes the low-amplitude responses (e.g., in the dashed yellow circle) and leads to enhanced responses (in the dashed blue circle). Finally, late noise, \({\boldsymbol n_l}\), is added to the responses in this late representation. The Poisson nature of the late noise is apparent in that its amplitude in all subbands is larger in the spatial region with larger contrast and hence larger response of texture detectors (see larger noise at the right, e.g., in dashed green circle, and less noise at the left, dashed red circle). The amplitude of the early and late noises has been scaled for clarity (×40 and ×4, respectively).
The first step in the model S is a set of local, oriented filters at different scales (see, e.g., Watson, 1987). This linear transform is followed by a scale-dependent weighting, which simulates the effect of the CSF. We use a steerable wavelet transform (Simoncelli et al., 1992) for the linear filters and the method detailed in Malo et al. (1997) to obtain the optimal CSF weights in that domain. This linear stage can be summarized as the product of two matrices: \({\boldsymbol w}^{\prime } = \mathbb {D}_{\mathrm{CSF}}\cdot W \cdot ({\boldsymbol x} + {\boldsymbol n}_e)\), where W contains the wavelet receptive fields in rows and \(\mathbb {D}_{\mathrm{CSF}}\) is a diagonal matrix with weights that represent the CSF in the diagonal. In the final step, we apply point-wise saturating functions to each coefficient of the linear transform (Nachmias & Sansbury, 1974; Legge & Foley, 1980; Legge, 1981). Specifically, \({\boldsymbol y} = m({\boldsymbol w}^{\prime }) + {\boldsymbol n}_l = K \odot \mathrm{sign}({\boldsymbol w^{\prime }})\odot |{\boldsymbol w}^{\prime }|^\gamma + {\boldsymbol n}_l\). \({\boldsymbol a} \odot {\boldsymbol b}\) represents the element-wise product of vectors \({\boldsymbol a}\) and \({\boldsymbol b}\), the exponent γ is applied element-wise to the absolute value of every component in \({\boldsymbol w^{\prime }}\), and the original sign of the components is preserved through the sign(·) function. We apply a correction at the origin to avoid singularities in the derivative (see the formulation of the γ-nonlinearity in Martinez, Cyriac, Batard, Bertalmío, & Malo, 2018; Malo, Esteve-Taboada, & Bertalmío, 2024), and we choose the constant K so as to keep the relative magnitude of the subbands and hence preserve the effect of the CSF (Martinez, Bertalmío, & Malo, 2019). 
Late noise is applied to the output of the model transform S. Again for convenience, we assume that the late noise \({\boldsymbol n}_l\) follows a Gaussian–Poisson distribution (see covariance in Equation A7 in Appendix A), and the goal is to obtain the Fano factor, βl, of this distribution. We assume level-dependent early and late noise sources for explicit exposition in this work. However, our method should also allow to investigate the role of level-independent late noise in models of spatial vision (see, e.g., Wichmann, 1999; Kontsevich, Chen, & Tyler, 2002; Georgeson & Meese, 2006; Schütt & Wichmann, 2017), an issue we return to in the discussion. 
A step-by-step visual illustration of the transforms in the model is shown in Figure 1. A MATLAB implementation of the model, the description of its parameters, and the data and methods used to estimate the noise levels are available online.1 The model is somewhat simplistic and surely not state-of-the-art in early spatial vision. Its elements, the wavelet transform and the CSF, are taken off-the-shelf, and we chose reasonable values for its free parameters, the saturation exponent, γ, and the subband-dependent constant, K. However, this reasonably good and reasonably simple model serves our present purpose to illustrate the method of noise propagation to derive a perceptual metric. It is reasonably good, because it allows us to predict human data from detection and discrimination experiments with periodic stimuli (shown below). It is reasonably simple, because it has well-behaved derivatives, ∇S, for computations in the optimization. In principle, our method can be applied to more sophisticated models of the transform S. However, the estimation of the derivatives might become more cumbersome or may require numerical approximations. 
Model response in the presence of noise: A two-pixel example
To provide an intuition for the effects of early versus late noise on the representation of the stimulus, we look at the model responses in a toy example of two-pixel images. In the special case of two-pixel images, the effects of noise and the model transform (S) can be illustrated in the two-dimensional space of brightness and contrast. 
Figure 2.
 
Modeling framework: propagation of stimulus and inner noise through the system in the absence of external noise. (A) In this toy example, input stimuli consist of two pixels of varying luminance (axes) shown at their corresponding positions in the two-dimensional coordinate system. (B) Early stimulus representation with added early noise (blue). Ellipses indicate the magnitude of variation resulting from the added noise. (C) The early representation goes through the fixed, deterministic vision model (S), which results in nonlinear transformations of the output. The x-axis is now nonlinear brightness and the y-axis nonlinear contrast. Note the different position and orientation of the blue ellipses. Then, late noise is added (red ellipses at the same positions as the stimulus representations). (D) Late representation of stimulus and inner (early + late) noise, which limits discrimination performance. The standard deviations and Fano factors control the size of the ellipsoids. The specific values were chosen to be illustrative, taking into account that in this example, the range of luminances is normalized to [0,1]. The interested reader can access and edit the expressions of the different noise sources of this two-pixel model available online in the aforementioned website.
Figure 2.
 
Modeling framework: propagation of stimulus and inner noise through the system in the absence of external noise. (A) In this toy example, input stimuli consist of two pixels of varying luminance (axes) shown at their corresponding positions in the two-dimensional coordinate system. (B) Early stimulus representation with added early noise (blue). Ellipses indicate the magnitude of variation resulting from the added noise. (C) The early representation goes through the fixed, deterministic vision model (S), which results in nonlinear transformations of the output. The x-axis is now nonlinear brightness and the y-axis nonlinear contrast. Note the different position and orientation of the blue ellipses. Then, late noise is added (red ellipses at the same positions as the stimulus representations). (D) Late representation of stimulus and inner (early + late) noise, which limits discrimination performance. The standard deviations and Fano factors control the size of the ellipsoids. The specific values were chosen to be illustrative, taking into account that in this example, the range of luminances is normalized to [0,1]. The interested reader can access and edit the expressions of the different noise sources of this two-pixel model available online in the aforementioned website.
Figure 2A shows the representation of two-pixel stimuli varying in their respective luminances in a coordinate system where the x- and y-axis define the luminances of pixels 1 and 2, respectively. Stimulus variations along the main diagonal reflect variations in mean luminance as both pixels have the same luminance (from low in the lower left to high in the upper right). Variations along the other diagonal (orthogonal to mean luminance) represent variations in contrast. Contrast along the main diagonal is zero and increases toward the upper left and lower right corners. We sampled the two-pixel images at constant steps in luminance. This will help to see the nonlinear deformation of the noisy stimulus representation before and after application of the model transform S
Figure 2B shows the noisy representation of the stimulus after adding early noise (blue ellipses indicate one standard deviation in each direction). The area of the ellipses increases with increasing stimulus level, because we use level-dependent noise. Specifically, we defined the early noise as independent, Poisson-like noise, following Cottaris et al. (2019), Cottaris et al. (2020), and Dayan and Abbott (2005). For this type of noise the horizontal and vertical widths of the blue ellipses increase with luminance, in agreement with the fact that the standard deviation of Poisson noise increases with the signal. The main diagonals of the blue ellipses are parallel to each of the axes in Figure 2B, respectively, because there is no correlation between adjacent sensors—the Poisson-like noise is independent at every sensor response. This early representation is the input to the model transform S
Figure 2C shows the representation after the model S has been applied to the early representation. The coordinate axes are now labeled brightness and contrast, respectively, because the linear transform W had the effect of rotating the original pixel space by 45 degrees. In the new coordinate system, the axes roughly correspond to these perceptual variables. After the masking transform m, the distance between stimuli is no longer equal. Stimuli of low luminance or low contrast are now further away from each other, and stimuli of higher luminance or contrast are closer. This is consistent with Weber’s law (Fairchild, 2013) and with the reduced discriminability of high-contrast stimuli (Nachmias & Sansbury, 1974; Legge & Foley, 1980; Legge, 1981). The transformation of distances resulting from S depends on the Jacobian of the deterministic transform S
Figure 2D shows the representation of the model output after late noise (ellipses in red) was added (Figure 2C). Similar to the early noise, the late noise is chosen to be independent and Poisson-like (red ellipses in Figure 2). 
The contributions of early (passed through S) and late noise combined define the inner noise (ellipses in purple). To illustrate the separate contributions of early and late noise to the late (stimulus+inner noise) representation in Figure 2D, and their respective effects on discrimination behavior, the purple ellipses are enlarged for two stimuli (zoom-insets). At low brightness and contrast (left inset), variability (inner noise, purple ellipses) in the late representation is dominated by early noise (blue and purple ellipses are very similar). At high brightness and contrast (right inset), variability in the late representation is fully determined by late noise (red and purple ellipses are almost identical). 
Discrimination between two stimuli, \({\boldsymbol x}\) and \({\boldsymbol x}+\Delta {\boldsymbol x}\), is possible when the Euclidean distance in the late representation, \(\Delta S = S({\boldsymbol x}+\Delta {\boldsymbol x}) - S({\boldsymbol x})\), is larger than the standard deviation of the inner noise in the same direction. Thus, discriminability depends not only on the distance between two points in the late representation but also on the distribution of the inner noise (ellipses in purple). These two factors can be taken into account simultaneously by using the multivariate Mahalanobis metric (Mahalanobis, 1936). We derive the theoretical Mahalanobis metric for stimuli passed through a vision model (S) with known early and late noise magnitudes and compare these with an empirically obtained Mahalanobis metric—the empirical detection and discrimination thresholds—for the same stimuli. We can repeat this for several early and late noise magnitudes and find the values that best account for what is observed experimentally. This is the first core idea of our method. 
External noise as a tool to estimate inner noise
In addition to the early and late noise sources of the visual system, we consider an extra noise source, the external noise, \({\boldsymbol n}_\varepsilon\). External noise is under the control of the experimenter and will be added to the stimulus as the tool to gauge the amount of early and late noise. Adding external noise expands the scenario described in Equation 1 in the following way:  
Image not available
(3)
where now the inner noise \({\boldsymbol n}_\mathcal {I}\) also includes a contribution of the external noise. This modification can be applied to any model, for instance, to our vision model S considered in Equation 2
Figure 3 illustrates the effects of external noise in our modeling framework for our toy example with two-pixel stimuli. Green ellipses indicate the luminance variations in the two-pixel stimuli, which depend on the type and amount of external noise applied in a particular experiment. Here we use pink noise as external noise, which has a 1/f amplitude spectrum (Henning, Bird, & Wichmann, 2002). This type of external noise has more energy (induces larger variation) in the mean luminance direction and less energy (smaller variation) in the contrast direction. It has the same covariance matrix (the same energy) for every stimulus across the stimulus space. These two features are evident in the stimulus representation (Figure 3A). Green ellipses are elongated in the main diagonal (more variation in mean luminance), and all ellipses have the same size.2 
Figure 3.
 
Modeling framework: propagation of stimulus and noise through the system in the presence of external noise. The figure is organized analogous to Figure 1. (A) External noise is applied to the luminances of each pixel so that the values of the two luminances vary around their mean (green ellipses). (B) Noise at early representation (cyan ellipses) comes from the superposition of external noise (green ellipses) with early noise (blue ellipses). (C) The early representation goes through the vision model, S, as in the previous figure. Then, late noise is added (red ellipses). (D) Late representation of noisy stimuli, with external, early, and late contributions.
Figure 3.
 
Modeling framework: propagation of stimulus and noise through the system in the presence of external noise. The figure is organized analogous to Figure 1. (A) External noise is applied to the luminances of each pixel so that the values of the two luminances vary around their mean (green ellipses). (B) Noise at early representation (cyan ellipses) comes from the superposition of external noise (green ellipses) with early noise (blue ellipses). (C) The early representation goes through the vision model, S, as in the previous figure. Then, late noise is added (red ellipses). (D) Late representation of noisy stimuli, with external, early, and late contributions.
Figure 3B shows the stimulus representation (light-blue ellipses) in the presence of early noise (dark blue). Early noise and late noise are identical to the previous case (Figure 2B). Note that this time, however, the external and early noise are summed together and commonly contribute to the noisy early representation. Figure 3C shows how the noisy stimulus representations are transformed by S, again leading to a deformation of the ellipses. Late noise (red ellipses) is added in the same way as in the previous case, resulting in the late representation (light-purple in Figure 3D). In this scenario, the late representation also includes external noise in addition to inner noise, which is evident from overall larger ellipses than in the scenario without external noise. This is illustrated in Figure 4, which shows the late inner representation of the stimuli in the presence (light violet) and absence of external noise (dark violet). 
Figure 4.
 
Late representation with and without external noise. Same as in Figures 2D and 3D, but visualized together for easy comparison. Differences imply that controlled variations in the external noise induce different variations of the thresholds for different pedestals and directions.
Figure 4.
 
Late representation with and without external noise. Same as in Figures 2D and 3D, but visualized together for easy comparison. Differences imply that controlled variations in the external noise induce different variations of the thresholds for different pedestals and directions.
The pink noise we used here has higher energy along the mean luminance direction (see above). Hence, differences in variability between late representations with and without external noise are larger along the brightness axis than along the contrast axis (see Figure 4). Also, this type of external noise has its largest effect on stimuli with low brightness and low contrast (lower left corner of Figure 4). It is evident from the figure that the inner noise ellipses of the two lowest contrast stimuli at the lowest brightness level have substantial overlap in the presence of external noise (light violet) and would hence be less discriminable than without external noise. The two highest contrast noise ellipses at the highest brightness level did not change with respect to their overlap. The change in discriminability as a function of varying types and/or magnitudes of external noise is the critical piece of information that renders external noise the gauging stick to calibrate the scale of internal noise. 
This is the second core idea in our noise estimation method, which we will describe in what follows. The theory allows us to compute the covariance matrices, which represent the noise-induced variability, and their associated ellipses at the different levels of representation, that is, before and after the transformation S. We can calculate ellipses, that is, the magnitude, of the three noise components (external, early, and late) at every representation level, regardless of where they actually arise. This means that we can go back and forth through the system and can, for instance, express the inner noise and its components in the coordinates of the stimulus. This allows us to study the effect of noise on performance not only at the late representation but also to project the noise back to the stimulus space and predict discrimination performance in units of the stimulus (the psychometric function, see below). 
Interim summary to our method’s intuition
We used a toy example of two-pixel images to illustrate the two core ideas of our method. First, discrimination between stimuli depends on two factors, their distance in the late representation and their variability, both taken into account by the Mahalanobis metric. Two stimuli \({\boldsymbol x}_A\) and \({\boldsymbol x}_B\) can be discriminated if their deterministic responses fulfill the following criterion: Given one of the deterministic response vectors, say \(S({\boldsymbol x}_A)\), the other, \(S({\boldsymbol x}_B)\) is far enough away from \(S({\boldsymbol x}_A)\) that it cannot be confused with its noisy version \(S({\boldsymbol x}_A) + {\boldsymbol n}_\mathcal {I}({\boldsymbol x}_A)\). Second, using noise propagation theory and metric transformations, we can represent external and early noise at the level of the late representation. Together, the different noise sources determine discrimination performance at this level of representation. 
In what follows, we present two versions of our approach to estimate early and late noise from psychophysical data. Noise estimation I: Using only threshold data explains a first approach using threshold-only data. We show how to estimate early and late noise contributions in the presence of external noise. Noise estimation II: Using full psychometric functions presents a second, simulation-based nonparametric approach. Here we derive early and late noise contributions as well, but this time, we use the full psychometric function and thus do not need external noise. 
Experimental data used for modeling
We apply our proposed method to detection and discrimination data with well-controlled external noise reported by Henning, Bird, and Wichmann (2002). Henning, Bird, and Wichmann (2002) compared the detection and discrimination of standard sine gratings with so-called pulse-train stimuli, containing not only energy at the fundamental frequency—as in sine gratings—but equally much at all harmonics (within the limits of the display system). Both detection and discrimination experiments were repeated with and without added external pink (1/f) noise. Experiments were conducted as two-alternative forced choice at a presentation duration of 79 ms within a rectangular temporal window. All gratings were horizontally oriented within a spatial Hanning window nominally subtending 3.8 degrees at the observers’ eyes. Data were collected using the method of constant stimulus with 50 trials per block with 500 to 600 trials in total per psychometric function. 
We chose to model the data from Henning, Bird, and Wichmann (2002) because in their paper, the authors argue that the pattern of results they obtained is consistent with the notion that low-contrast detection is limited by an early inner noise source but high-contrast discrimination is limited by late inner noise. They thus experimentally addressed the very same question we now believe to be able to address theoretically and computationally—hence, we felt it was appropriate to use their data for our analysis. 
All our simulations below use exactly the same stimuli as in the original experiments because we are running the same code to generate the images and the pink noise. This code is available online in the aforementioned website. 
Noise estimation I: Using only threshold data
Theory: Perceptual distance in terms of thresholds and noise
The key of our proposal is to relate the psychophysical thresholds for detection and discrimination with or without external noise to a perceptual metric derived from a noisy nonlinear model. In a discrimination experiment, where a target stimulus \({\boldsymbol x}\) is varied (“distorted”) in one direction of image space \(\Delta {\boldsymbol x}\), experimental thresholds in that direction \(|\Delta {\boldsymbol x}|_{\tau }\) define an empirical perceptual distance:  
\begin{eqnarray} D_{\rm {emp}} = \frac{1}{|\Delta {\boldsymbol x}|_{\tau }} \qquad \end{eqnarray}
(4)
As a consequence, a high threshold—a large \(|\Delta {\boldsymbol x}|_{\tau }\)—indicates a small perceptual distance in that direction of image space: The stimulus has to change a lot in this direction before an observer is able to discriminate the original from the distorted stimulus. 
However, in the previous section, we suggested that the departure in the inner domain is not the only determinant of discriminability but also the inner noise. Therefore, our proposal for the theoretical perceptual distance uses the Mahalanobis metric as it simultaneously takes both factors into account (Mahalanobis, 1936): the departure in the response, ΔS, weighted by the covariance of the inner noise:  
\begin{equation} D_{\rm {th}}^2 = \Delta S^\top \cdot \left( \Sigma _{\mathcal {I}} \right)^{-1} \cdot \Delta S \end{equation}
(5)
where \(\Delta S = S({\boldsymbol x}+\Delta {\boldsymbol x}) - S({\boldsymbol x})\), and \(\Sigma _{\mathcal {I}}\) is the covariance matrix of \({\boldsymbol n}_\mathcal {I}\).  
In order to express this proposed theoretical distance in terms of (1) the stimulus and (2) the different noise sources—external, early, and late noise—we invoke two known results: first, the change of the perceptual metric matrix under deterministic transforms (Malo et al., 2006) and, second, the change of the covariance of the noise under deterministic transforms (Ahumada, 1987). Both of these two results use the Taylor approximation of the nonlinear behavior of the system, S. This assumption is correct if the nonlinearity of the system is moderate or if the noise is smallish relative to the signal (low-noise limit). 
Applying the work of Malo et al. (2006) and Ahumada (1987) to Equation 5, we see how the different noise sources contribute to the theoretical perceptual distance induced by a variation in the stimulus \(\Delta {\boldsymbol x}\) (see Appendix A for derivation):  
\begin{eqnarray}\begin{array}{@{}l@{}} D_{\rm {th}}^2 = \Delta {\boldsymbol x}^\top \nabla S^\top \left( \underbrace{k^2 \, \Sigma _l}_{{\it late\ noise}} + \underbrace{k^2 \, \nabla S \Sigma _e \nabla S^\top }_{{\it early\ noise}} + \underbrace{2 k^2 \mathbb {E}[{\boldsymbol n}_l {\boldsymbol n}_e^\top \nabla S^\top ] }_{{\it corr.\ late-early}} \right.\\ \qquad +\left. \overbrace{ \underbrace{\nabla S \Sigma _\varepsilon \nabla S^\top }_{{\it external\ noise}} + \underbrace{2 k \nabla S \mathbb {E}[{\boldsymbol n}_e {\boldsymbol n}_\varepsilon ^\top ] \nabla S^\top }_{{\it corr.\ early-external}} + \underbrace{2 k \mathbb {E}[{\boldsymbol n}_l {\boldsymbol n}_\varepsilon ^\top \nabla S^\top ] }_{{\it corr.\ late-external}} }^{{\it external\ dependent}} \right)^{-1} \!\!\!\!\!\! \nabla S \Delta {\boldsymbol x} \end{array}\nonumber\\ \end{eqnarray}
(6)
where ∇S is the Jacobian of the model at \({\boldsymbol x}\), and the matrix Σl is the covariance of the late noise \({\boldsymbol n}_l\) at the point \(S({\boldsymbol x})\). The term that depends on Σe describes the early noise at \({\boldsymbol x}\) propagated up to the late representation. In addition, we introduced an arbitrary scale factor, k, on all the observer’s noise sources (early and late) to indicate the critical role the external noise with covariance Σε plays to obtain the scale of the uncertainties—we will return to this issue below. The terms with the expected value, \(\mathbb {E}[\cdot ]\), are all zero in case of independent noise sources. However, if noise sources are dependent, the terms with \(\mathbb {E}[\cdot ]\) are nonzero. In general, they are not zero: For instance, in the conventional Poisson choice we have made here, the different noises depend on the signal and thus the cross-correlation matrices do not vanish. The cross-correlation terms imply that the covariance of the sum of the contributions to the inner noise is not simply the sum of individual covariance matrices. The proper combination of the noise sources has to be calculated. 
Generality of Equation 6 and derivation of classical expressions as special cases
Equation 6 is general insofar as it makes no assumptions about the nature of the noise sources or their (in)dependence. Moreover, it can be applied to any model S—thus, Equation 6 holds for more complex (spatial vision) models than the one we consider here and introduced in Vision model. It also holds if one were to explore different noise models than the Poisson and pink noises we consider here (and in the particular cases further developed in Appendix A). The validity of Equation 6 is only tied to the Taylor expansions in Malo et al. (2006) and Ahumada (1987), and these expansions are reasonable in case of low noise and moderately nonlinear systems. 
From our general expression in Equation 6, one may deduce classic special cases if one considers the appropriate restrictions in the system and in the noise sources. One such special case is that of Burgess and Colborne (1988): a linear shift-invariant system without early noise and stationary and signal-independent external and late noises. In our terminology, in Burgess and Colborne (1988), we have zero early noise, \({\boldsymbol n}_e = 0\), and the late noise \({\boldsymbol n}_l\) is not Poisson but is constant (e.g., Gaussian with constant variance independent of the input \({\boldsymbol x}\) to the system S). This considerably simplifies Equation 6 as all terms with \({\boldsymbol n}_e\) vanish as well as all those with \(\mathbb {E}[\cdot ]\)
Moreover, we can derive the special case for a linear shift-invariant system in the Fourier domain. For such a system, its Jacobian can be written as ∇S = F · λ · F = ∇S, where F is a matrix with the Fourier basis functions (in rows), and λ is a diagonal matrix with the weights that are applied to each spatial frequency (the filter that represents the system). The covariance of stationary noises that do not depend on the signal can be formulated in the Fourier domain as well: Σε = F · Nε · F and Σl = F · k2Nl · F, where Nε is a diagonal matrix with the energy spectrum of the external noise, and k2Nl is the corresponding energy spectrum of the late noise (where we kept the scaling factor k to denote its amplitude). 
Introducing the above special cases in Equation 6, taking Σe = 0 and \({\boldsymbol n}_e = 0\), using the orthogonality of the Fourier matrix, and considering the values (in the Fourier domain) in the diagonals of λ, Nl, and Nε, we finally have  
\begin{equation} D_{\rm {th}}^2 = \Delta {\boldsymbol x}^\top \cdot F^\top \cdot \left( \frac{\lambda ^2}{k^2 \, N_l + \lambda ^2 \cdot N_\varepsilon } \right)\cdot F \cdot \Delta {\boldsymbol x} \end{equation}
(7)
which is equivalent to Equation (1a) in Burgess and Colborne (1988)
The special role of the external noise
Another interesting feature of Equation 6 is that it highlights the critical role of the external noise as a calibration parameter (or reference) to determine the scale of the other noise sources in the system (the subject of our study). In the absence of such a reference, the other noise sources could be arbitrarily scaled with no impact on the correlation between theory and experiment. 
Note that if no external noise were used in the experiments, the term that does not depend on k disappears. As a result, the theoretical perceptual distance reduces to  
\begin{eqnarray*} D_{\rm {th}}^2 = k^{-2} \big(\Delta {\boldsymbol x}^\top \nabla S^\top \big( \Sigma _l + \nabla S \Sigma _e \nabla S^\top\nonumber\\ +\;2 \mathbb {E}[{\boldsymbol n}_l {\boldsymbol n}_e^\top \nabla S^\top ] \big)^{-1} \nabla S \Delta {\boldsymbol x} \big) \end{eqnarray*}
This means that an arbitrary scaling k of the size of the noise sources only leads to a corresponding scaling k−2 of the theoretical distance, which has no effect on the correlation between Dth and Dexp
As a consequence, by maximizing the correlation with no external noise, one could fit the structure of the covariance of the noise sources but not its absolute scale, which would be of limited interest—external noise effectively anchors the absolute scale of the internal noises (early and late) relative to the known external noise. 
The sum of terms in Equation 6 implies that the different noise sources actually play the role of relative references for each other: Global scaling of all the noise sources at the same time has no effect on the correlation, but considering a bigger contribution (say of late noise) with no variation of the other sources would have an impact in reproducing the experiments. 
This observation suggests that the experiments used to determine the noise should use substantial amount of external noise and include early noise in the formulation to ensure the necessary constraints for the scale of the late noise and even noise at the (putative) decision stage. This role of the external noise as scale was also apparent in the classical Equation 7: In the absence of external noise, Nε = 0, the scale of the late noise k2 is arbitrary, as it goes out of the denominator and just scales the distance. The advantage in our Equation 6 is that it explicitly contains the early noise, so it can also be compared to the external noise. This special role is the reason to use accurate threshold measurements in external noise such as those in Henning, Bird, and Wichmann (2002)
Estimation I
Our proposal for noise estimation stated in Model response in the presence of noise: a two-pixel example consists of finding the noise parameters that maximize the correlation between Dexp (Equation 4) and Dth (Equation 6, which depends on the noise). In principle, given n experimental conditions to measure n thresholds, this optimization reduces to computing \(\frac{\delta D_{\rm {th}}^i}{\delta \theta }\), where θ are the noise parameters and \(D_{\rm {th}}^i\) is the distance for the ith experimental condition, with i = 1, …, n (Martinez et al., 2018). 
However, the inverse in Equation 6 poses serious computational problems. Note that the derivative of the distance depends on the derivative of the metric, and the inverse implies a dependence on \((\Sigma _\mathcal {I})^{-2}\). Computing the inverses of very large matrices on every iteration of the optimization is not feasible in practice3 unless strong restrictions in the nature of the noise are assumed.4 
Therefore, while the parametric expression presented above in Equation 6 is helpful to understand the problem—for instance, to link it to the classical results or to understand the relevance of the external noise as a scaling factor for the unknowns—in practice, the optimization is easier by taking a nonparametric computation of the theoretical distance. We consider the difference in response and the noise at the same time by taking the expected value of the Euclidean distance between the noisy responses, \({\boldsymbol y}({\boldsymbol x})\) and \({\boldsymbol y} ({\boldsymbol x}+\Delta {\boldsymbol x}) = {\boldsymbol y({\boldsymbol x})} + \Delta {\boldsymbol y}\):  
\begin{eqnarray} D_{\rm {th}}^2\; &=& \mathbb {E}\left[ \, \Delta {\boldsymbol y}^\top \cdot \Delta {\boldsymbol y} \, \right] \nonumber \\ &=& \mathbb {E}\left[\left|\Delta S + {\boldsymbol n^{\prime }}_{\,\mathcal {I}} - {\boldsymbol n}_{\mathcal {I}} \right|^2 \right]\qquad \end{eqnarray}
(8)
where \(\Delta S = S({\boldsymbol x}+\Delta {\boldsymbol x}) - S({\boldsymbol x})\) is the difference between the deterministic responses; \({\boldsymbol n}_{\mathcal {I}}\) and \({\boldsymbol n^{\prime }}_{\mathcal {I}}\) are different realizations of the inner noise at the points \(S({\boldsymbol x})\) and \(S({\boldsymbol x}+\Delta {\boldsymbol x})\), respectively; and \(\mathbb {E}[\cdot ]\) refers to expectation. Equation 8 implies that, when judging the difference between two stimuli, the observer compares two noisy responses: \(S({\boldsymbol x})+{\boldsymbol n}_{\,\mathcal {I}}\) and \(S({\boldsymbol x}+\Delta {\boldsymbol x}) + {\boldsymbol n^{\prime }}_{\,\mathcal {I}}\)
From a technical point of view, the nonparametric Equation 8 is not restricted by the local linear approximation required in the analytical Equation 6. Appendix B shows that derivatives of Equation 8 do not involve matrix inversion, so the optimization of the noise parameters to maximize the correlation between this Dth and Demp is feasible. 
In our results below, the pair of stimuli with and without the test \(\Delta {\boldsymbol x}\) to be detected or discriminated were put through the model, and then we computed the distances between the corresponding responses in the inner representation. \(\mathbb {E}[\cdot ]\) is estimated as an average over 5,000 realizations of inner noise. Appendix C shows that this amount is sufficient; in numerical simulations, we show that the nonparametric distance based on the average over the noisy samples (Equation 8) is equivalent to the parametric distance in Equation 6
Results I
Using the data and model considered above and the computationally convenient noise-dependent distance described in Equation 8, we looked for the noise parameters that maximize the correlation between theory and experiment. The corresponding covariances of the noise are given in Appendix A. These covariances were used to generate noisy inputs and responses in the nonparametric computation of the distances. 
Figure 5.
 
Early and late noise parameters from threshold data. (Left) Correlation between the theoretical distance (based on the noise) and the experimental distance (based on the thresholds) for various late (x-axis) and early (y-axis) noise Fano factors. The optimal parameters (red dot) that maximize the correlation were βe = 0.023 and βl = 1.52. (Right) Scatterplot for the best (maximum) correlation, for the data of both observers and conditions (with and without external noise) considered together. In this optimal case, the Pearson correlation is 0.65. The correlation drops to 0.64 and 0.61 if one neglects either early noise or late noise, respectively. As βl is more important to explain the data (has bigger impact in the correlation), βe is less constrained and hence it is more uncertain.
Figure 5.
 
Early and late noise parameters from threshold data. (Left) Correlation between the theoretical distance (based on the noise) and the experimental distance (based on the thresholds) for various late (x-axis) and early (y-axis) noise Fano factors. The optimal parameters (red dot) that maximize the correlation were βe = 0.023 and βl = 1.52. (Right) Scatterplot for the best (maximum) correlation, for the data of both observers and conditions (with and without external noise) considered together. In this optimal case, the Pearson correlation is 0.65. The correlation drops to 0.64 and 0.61 if one neglects either early noise or late noise, respectively. As βl is more important to explain the data (has bigger impact in the correlation), βe is less constrained and hence it is more uncertain.
Figure 5 shows the correlations obtained when the noises are assumed to be purely Poisson (with no Gaussian component, and hence αe = αl = 0). The blue-to-yellow color scale represents low-to-high correlations between theory and experiment. We used all the experimental detection and discrimination data using pulse-trains and sinusoids of different frequencies in external pink noise and with no external noise. We fitted the results of both observers at the same time, and the resulting noise parameters were βe = 0.023 and βl = 1.52. The figure on the right shows the best (maximum correlation) scatterplot. 
Figure 5 also illustrates the basic problem found in the determination of the noise parameters from the thresholds: The uncertainty of the optimum is large because the correlation surface is very flat. This problem is also found even if a different set of noise parameters is considered for each individual observer (see separated results per observer in Appendix D). When compared to the results in the next section, the use of restricted information (just thresholds as opposed to all the information in the psychometric functions) implies looser constraints and hence bigger uncertainties in the result. 
Interim conclusions
The proposed expression for the theoretical distance, Equation 6, has two interesting consequences: 
  • It generalizes the classical result in Burgess and Colborne (1988) for nonlinear models with sources of noise at different depths (not only signal-independent late noise).
  • It points out the special role of the noise at the input. Some noise at the input (either external noise or early noise) is necessary to find the scale of the late noise.
Noise estimation II: Using full psychometric functions
In this section, we follow the Green (1960) and Chapter 5 of Wichmann (1999) on the importance of full psychometric functions and not only thresholds for modeling behavior. In particular, we show that if the full psychometric function is used, there is no need to use external noise in order to scale the size of the early and late noises. 
Specifically, we use the definition of the psychometric function in terms of the density of the inner noise given in May and Solomon (2013): The probability of correct detection of a variation \(\Delta {\boldsymbol x}\) over the stimulus \({\boldsymbol x}\) is given by the cumulative density function of the inner noise in the direction of variation of the stimulus,  
\begin{eqnarray} P_{\rm {correct}} = \frac{1}{2} +\int _{0}^{|\Delta {\boldsymbol x}|} p({\boldsymbol u}^\top \cdot {\boldsymbol n}^x_\mathcal {I}) \,\,\,\, {\boldsymbol u}^\top \cdot d{\boldsymbol n}^x_\mathcal {I} \qquad \end{eqnarray}
(9)
where \(p({\boldsymbol u}^\top \cdot {\boldsymbol n}^x_\mathcal {I})\) is the probability density function (PDF) of the noise in the late representation expressed in the stimulus space and projected in the direction of variation of the signal, \({\boldsymbol u} = \frac{\Delta {\boldsymbol x}}{|\Delta {\boldsymbol x}|}\). Figure 6 illustrates this concept using the two-pixel example described in Modeling framework and intuition. In this context, the departures from a certain background or pedestal \({\boldsymbol x}\) in certain direction \(\Delta {\boldsymbol x}\) will be visible when the size of the departure can be discriminated over the noise in that specific direction, and this is related to the cumulative density function (CDF) of the noise in that direction (i.e., Equation 9). 
Figure 6.
 
Psychometric functions as a function of the inner noise projected back into the stimulus space (Left) Two-pixel example (as in Figures 2A and 3A) showing samples of noisy responses of nine stimuli of different luminance (along the diagonal direction) and different contrast (along the perpendicular-to-the-diagonal direction). For two of them, we represent departures from the average \({\boldsymbol x}\) in a specific direction \(\Delta {\boldsymbol x}\) with black and pink lines. The sigmoids in green and brown represent the CDFs of the noise projected in the direction of variation of the signal. (Right) Probability density functions (PDFs, top-right) and cumulative density functions (CDF, bottom-right) for the two considered clusters. Note how the CDF in brown squeezes when expressing the abscissa in contrast units (by dividing the variation by the average luminance).
Figure 6.
 
Psychometric functions as a function of the inner noise projected back into the stimulus space (Left) Two-pixel example (as in Figures 2A and 3A) showing samples of noisy responses of nine stimuli of different luminance (along the diagonal direction) and different contrast (along the perpendicular-to-the-diagonal direction). For two of them, we represent departures from the average \({\boldsymbol x}\) in a specific direction \(\Delta {\boldsymbol x}\) with black and pink lines. The sigmoids in green and brown represent the CDFs of the noise projected in the direction of variation of the signal. (Right) Probability density functions (PDFs, top-right) and cumulative density functions (CDF, bottom-right) for the two considered clusters. Note how the CDF in brown squeezes when expressing the abscissa in contrast units (by dividing the variation by the average luminance).
Estimation II
Given a model that includes noise (e.g., Equation 2) and discrimination data for increments \(\Delta {\boldsymbol x}\) over certain pedestals \({\boldsymbol x}\) (as the data by Henning, Bird, & Wichmann, 2002), we can estimate the noise parameters from the best reproduction of the psychometric functions. Specifically, the model is used to generate noisy responses for the stimuli and certain noise parameters. Then, we use our noise propagation method to express the noisy responses back in the stimulus domain. The cumulative histograms of the noisy samples in the direction of the signal variation are the prediction of the experimental psychometric functions for the considered noise parameters. We then calculate the root mean square error (RMSE) as a measure of discrepancy between experimental and predicted psychometric functions and use an optimization routine to find the noise parameters that minimize the RMSE. 
Note that at a certain stimulus in Figure 6, a smaller (bigger) amount of noise or a different orientation of the multivariate PDF would lead to more steeper (shallower) predictions for the corresponding psychometric function. In this case (as opposed to Noise estimation I: Using only threshold data), there is no need of external noise for scaling. Here the variation of the parameters of the inner noise directly leads to properly scaled predictions. 
Results II
The color surfaces in Figure 7 show the errors in predicting the psychometric functions for a range of early and late noise parameters obtained with the above procedure for the psychometric function data of the two observers in Henning, Bird, and Wichmann, (2002). We only used and required experimental data without external noise. The blue-to-yellow color scale represents low-to-high prediction error. The optimal noise parameters (minimum error) were βe = 0.005 and βl = 7.69 for observer GBH and βe = 0.01 and βl = 7.69 for observer CMB. The experimental data are shown as blue dots, and the continuous red and green curves represent different predictions from the model (Equation 9). The curves in black correspond to the optimal predictions for each observer using both early and late noise. The curves in red and green come from disregarding early noise and late noise, respectively. Models with only early or only late noise clearly lead to worse predictions: Both early and late noise is required to account for the experimental data. 
Figure 7.
 
Early and late noise parameters using full psychometric functions. The color surfaces show the average RMSE error between experimental and predicted psychometric functions, for varying noise parameters for the late (x-axis) and early (y-axis) noises. In each case, the optimum is given by the minimum in the error surface (marked in red). Please not that the optimum here is a low RMSE error (dark blue), whereas in the previous section, the optimum was a maximal correlation (light yellow). For the observers GBH and CMB, the optima, highlighted in red, are located at (βe = 0.005, βl = 7.69) and (βe = 0.01, βl = 7.69), respectively. The plots at the top represent the experimental data of the psychometric functions (blue dots) and different optimal predictions considering both early and late Poisson noise (in black), only late noise (in red), and only early noise (in green).
Figure 7.
 
Early and late noise parameters using full psychometric functions. The color surfaces show the average RMSE error between experimental and predicted psychometric functions, for varying noise parameters for the late (x-axis) and early (y-axis) noises. In each case, the optimum is given by the minimum in the error surface (marked in red). Please not that the optimum here is a low RMSE error (dark blue), whereas in the previous section, the optimum was a maximal correlation (light yellow). For the observers GBH and CMB, the optima, highlighted in red, are located at (βe = 0.005, βl = 7.69) and (βe = 0.01, βl = 7.69), respectively. The plots at the top represent the experimental data of the psychometric functions (blue dots) and different optimal predictions considering both early and late Poisson noise (in black), only late noise (in red), and only early noise (in green).
Early noise seems particularly relevant in detection, where contrast is small and hence (signal-dependent) late noise is negligible. In detection, neglecting the early noise leads to unrealistic (too steep) psychometric functions (see red curves in Figure 7). These predictions (according to the error surfaces) are not improved even if substantially bigger late noise is considered. On the contrary, early noise seems irrelevant to explain discrimination. In discrimination, the neural responses are already high and hence the signal-dependent contribution of late noise is much higher than the early noise transformed into the late representation. Note that early noise is compressed by the saturating nonlinearity at high amplitudes in the late representation. 
Late noise seems particularly relevant in discrimination. If late noise is neglected in discrimination, the predictions are unrealistic (too steep, see green curves in Figure 7). These predictions (according to the error surfaces) are not improved even if substantially bigger early noise is considered. On the contrary, late noise seems irrelevant in detection: Note that the no-late-noise green curve matches the black curve in detection. Explicit examples of the role of early and late noise in the PDF of the inner noise are given in Appendix E
Note, however, that the above result and interpretation does not hold in general for all spatial vision models: It is specific for the type of Gaussian–Poisson noise sources assumed in our model as well as for the specific, simple model S we use—this is not (yet) the answer to the long-standing question regarding the role of level-(in)dependent late noise in models of spatial vision (see, e.g., Wichmann, 1999; Kontsevich, Chen, & Tyler, 2002; Georgeson & Meese, 2006; Schütt & Wichmann, 2017), the issue we return to in the paragraph Ambiguity between transducer and input-dependent noise in Related work
On the other hand, the full model (black line) does not achieve a perfect fit of the blue dots. This may be due to the limitations of the model S assumed for the illustration: Note that the masking nonlinearity is just a fixed saturation, which is a gross oversimplification. This may certainly limit the maximum performance of the model. More accurate models, including input-dependent saturations (e.g., divisive normalization; Schütt & Wichmann, 2017; Martinez et al., 2018), could lead to better fits. However, note that the procedure to estimate the noise contributions proposed here is completely general and also applicable to more accurate models. 
Discussion
We first address the reliability of the proposed methods and then we discuss the connections of our psychophysical methods with other physiological models and previous psychophysical methods for noise estimation. Finally, we discuss the implications of our noise estimates on other experimental methods and on information-theoretic approaches to study vision. We conclude the work with an overview of our methods and findings. 
Reliability of the proposed methods
In optimization, the accuracy of the estimation is given by the steepness of the goal function at the optimum (Press, Teukolsky, Vetterling, & Flannery, 2007). In our case, accuracy is substantially higher for the full-psychometric method. Note the clear minima in Figure 7 as opposed to the flat plateaus obtained in the threshold-only method in Figures 5 and A2. See, for example, the size of the region, which leads to a change in the first significant figure of the goal-variable (either correlation or error): The uncertainty regions of the parameters defined in this way (Press et al., 2007) are small in case of using the full psychometric functions and huge in the threshold-only case. Moreover, the consistency between the results of the two observers is higher in the full-psychometric method with regard to the threshold-only method. In fact, the results of the threshold-only method can be considered compatible only because of their small accuracy (and hence high uncertainty). 
In summary, for the dataset and model explored here, we find improved accuracy and consistency from the full psychometric method over the threshold-only method. However, the data by Henning, Bird, and Wichmann (2002) only contain a single level of external noise, and we can thus not rule out that in experiments measuring thresholds only, but at various different noise levels (cf. Legge, Kersten, & Burgess, 1987; Pelli & Farell, 1999; Goris, Zaenen, & Wagemans, 2008), our threshold-only method could also lead to more accurate estimates of the internal noise. 
Related work
Psychophysical versus physiological estimates of early noise. Our psychophysical estimates of the noise should, ideally, have physiological correlates. One might think (naively) that our psychophysical estimate of the early noise is directly related to the noise in the electrical response of LMS retinal cones. Similarly, one may think that our late noise could be related to the noise at the cortex. However, direct comparison is not that simple. In fact, following the literature that relates basic psychophysical behavior (such as the contrast sensitivity functions or center-surround sensors) from noise removal goals (Atick, Li, & Redlich, 1992; Li, Gomez-Villa, Bertalmío, & Malo, 2022), it makes sense that the psychophysically estimated early noise has substantially less variance than the noise actually happening at the retina. Appendix F shows a comparison between our psychophysical estimates and a reasonable physiological estimation of the retinal noise using an accurate model of the retina, the ISETBio (first used in Cottaris et al., 2019; Cottaris et al., 2020). Simulated noise on top of illustrative images in Appendix F shows reasonable dependence on luminance, contrast, and frequency of early and late components of our psychophysical noise and its low variance compared to physiological estimates. This makes sense as the psychophysical noise should be barely visible. Our methods estimate the effective noise of the retina, which, due to the downstream circuit (not included here), has a much smaller impact on behavioral performance than the actual physiological noise. On the other hand, this can also be interpreted as an underestimation of the early noise because these noise removal processes in the retina and early visual cortex are not included in the model itself. In any case, the methods estimate the psychophysically effective noise. 
Ambiguity between transducer and input-dependent noise. Our work is focused on the separate determination of the early and late components of the inner noise, so it does not directly address the long-standing question of the ambiguity between the transducer function and the input-dependent variance of the inner noise (Wichmann, 1999; Kontsevich, Chen, & Tyler, 2002; Georgeson & Meese, 2006; García & Alcalá, 2009; Kingdom, 2016). However, the proposed formulation and nonparametric methods have some connection with that interesting discussion. 
First, May and Solomon (2013) factorize the shape parameter of the psychometric functions into the product of an internal noise component and a transducer component. According to this product, data may be fitted in different (equivalent) ways. Our expression, Equation 6 (or more clearly in the less detailed Equations 10 and A2), which considers the product of the covariance of the noise and the Jacobian of the transducer, is an alternative to the product in May and Solomon (2013) to describe such ambiguity. For instance, in Equations 10 and A2, it is obvious that a variation in the covariance of the inner noise can be compensated by a corresponding variation in the Jacobian; therefore, one has an ambiguity. However, Equation 6 explicitly includes the contributions of the early and external noises (not considered by May & Solomon, 2013), and this (1) clarifies that different models S′ affect the estimates of the late noise, but not of the early noise, and (2) suggests eventual ways to break the ambiguity. On the one hand, consider first a model S′ with a linearly scaled sensitivity so that ∇S′ = α∇S. In that case, Equation 6 says that the distance D stays the same if the late noise is scaled accordingly, that is, if \({\boldsymbol n^{\prime }_l} = \alpha \, {\boldsymbol n_l}\) and \(\Sigma ^{\prime }_l = \alpha ^2 \Sigma _l\). In the general case with a more complicated S′, α will not be constant but input-dependent. This local/adaptive scaling will affect the late noise in an equivalent adaptive way. However, the (same) early and external noises will be automatically scaled when transferred to the inner representation by S′. As a result, only the late noise is affected to keep the relative scale of all the noise sources in the inner domain. On the other hand, note that the distances stay the same (equivalent reproduction of the data) only if the Jacobian covaries with the late noise while the early noise is fixed, which is a rather specific situation. However, we have not pursued this possibility given the low accuracy of the results obtained using this threshold-only method (flat plateaus in Figures 5 and A2), even for a fixed nonlinearity. Similarly to the discussion between Kontsevich, Chen, and Tyler (2002) and Georgeson and Meese (2006), accuracy and significance of the results are critical, and hence, it is better not to address this issue with methods that intrinsically have low accuracy, as shown here for the threshold-only method. 
In this regard, our nonparametric method that uses the data from all the psychometric functions could be more appropriate for two reasons: One is its better accuracy (illustrated in Figure 5), and the other is that being nonparametric is not attached to the small-increment (or local linear) assumptions behind Equation 6, which was one of the factors for the ambiguity pointed out for simple univariate models in Kontsevich, Chen, and Tyler (2002) (see also the related discussion in Chapter 5 by Wichmann, 1999). 
Differences with other psychophysical estimations of the noise. The maximum likelihood difference scaling (MLDS) (Maloney & Yang, 2003) is a well-founded method to derive nonlinear responses and the size of the inner noise from simple comparisons of suprathreshold distortions. While the default method was designed to estimate signal-independent noise, it can be expanded to incorporate signal-dependent noise (Kingdom & Prins, 2010). Actually, the comparison between the nonlinearities found through MLDS (e.g., Shooner & Mullen, 2022) and the more traditional integration of incremental thresholds (Watson & Solomon, 1997) has been suggested as a way to disentangle the nonlinearity and the variability of the inner noise (Kingdom, 2016). In contrast discrimination tasks similar to the ones considered here, MLDS has been found to give substantially higher values for the variance of the noise than simple univariate estimations obtained from the integral of the incremental thresholds (Shooner & Mullen, 2022). More importantly, the original and the different variants of MLDS focus on the noise in the response or decision domain (i.e., by construction, it does not isolate the relative contributions of the early and late sources of noise). That is a fundamental difference with the methods proposed here, which explicitly take into account sources of noise at different parts of the model, which is not easy in MLDS. 
The psychometric functions in May and Solomon (2013) and Baldwin, Baker, and Hess (2016) could be used together with an image-computable model in nonparametric ways as we did in Estimation II. However, similarly to MLDS, these methods usually subsume all the noise sources in a single one at the inner representation, for example, what the authors in Baldwin, Baker, and Hess (2016) call equivalent noise, following Pelli and Farell (1999). In contrast, here we express metrics and covariance matrices at any depths in the same representation, and this is a way to generalize the models used in the other approaches so that they include uncertainty at different depths along the model. 
Our methods can deal with noise at the decision stage. Our Equations 68, and 9 allow obtaining the contributions to the multivariate inner noise \({\boldsymbol n}_\mathcal {I}\). However, previous literature has also considered the univariate uncertainty added to the variable that determines the decision (Pelli, 1985; Neri, 2010). Appendix G shows how the proposed methods can be extended to estimate this univariate component as well. Extension is easy because this additional source of uncertainty can be easily incorporated to the covariance of the inner noise. 
Our methods can deal with noise correlations. An important difference with all the psychophysical methods considered in the previous paragraph is that our formulation is intrinsically multivariate. The methods mentioned in the section above essentially obtain the noise parameters in the direction where the stimulus is experimentally changed. In this way, the consideration of multiple pedestals and incremental stimuli leads to a set of (in principle) disconnected univariate estimations of noise for the different experimental conditions. In contrast, our multivariate formulations (either the analytic one based on thresholds or the parametric one that uses the psychometric functions) are thought to be used with image-computable models and hence whatever multivariate expression of the noise can be assumed and be fitted using all the experimental data at the same time. In our simulations, we considered diagonal covariance matrices for Σe and Σl just for simplicity in the explanation, but there is no restriction to assume any correlation in the noise between the sensors at the early or late stages. This is a fundamental advantage of our method with regard to univariate psychophysical estimations given the relevance of the correlations in the noise, at least for neuroscience  (Averbeck, & Pouget, 2006; Moreno-Bote et al., 2014; Goris, Movshon, & Simoncelli, 2014; Kohn et al., 2016). 
Maximum differentiation and noise estimates. The concept of maximum differentiation has been proposed as an experimental way to decide between vision models (Wang & Simoncelli, 2008) or as a way to measure the free parameters of vision models (Malo & Simoncelli, 2015) using the associated perceptual distance. Originally, Wang and Simoncelli (2008) proposed a method to generate stimuli maximally/minimally discernible by observers according to an iterative procedure of perceptual distance maximization/minimization. Depending on the model, this approach may be extremely expensive. However, following the second-order approximation of the non-Euclidean distance (as used in Appendix A), the maximum differentiation method was simplified and reduced to the estimation of the eigenvectors of the discrimination ellipsoid (Malo & Simoncelli, 2015; Martinez et al., 2018). 
The theory proposed here allows the estimation of the covariance matrix of the inner noise and hence the design of interesting stimuli in cardinal directions of the noise-dependent metric. Note that according to the transform of the covariance of noise along nonlinear networks (Ahumada, 1987), the directions of maximum and minimum discrimination in the spatial domain depend on the Jacobian of the model and the covariance of the noise in the input:  
\begin{eqnarray} (\Sigma ^{x}_{\mathcal {I}})^{-1} = \nabla S^\top \cdot \Sigma ^{-1}_{\mathcal {I}} \cdot \nabla S \quad \end{eqnarray}
(10)
Eigenfunctions of this covariance (where ∇S and \(\Sigma ^{-1}_{\mathcal {I}}\) are point-dependent) lead to optimal stimuli in the spatial domain (maximally and minimally discriminable) that can be used to check the effect of the noise components and model parameters at different points of the stimulus space (e.g., detection and discrimination). 
Noise estimates in information-theoretic approaches. The proposed approach that describes discrimination using a noise-dependent Mahalanobis metric is related to methods that describe discrimination using Fisher information (Abbott & Dayan, 1999; Moreno-Bote et al., 2014; da Fonseca & Samengo, 2016; Berardino, Ballé, Laparra, & Simoncelli, 2018; Zhou, Duong, & Simoncelli, 2024). In fact, the linear Fisher information matrix (Kohn et al., 2016) has the same expression as our Mahalanobis metric in the input domain (our Equation 10). For instance in Zhou, Duong, and Simoncelli (2024), the Fisher information is \(F = \frac{\nabla \mu ^2}{\sigma ^2}\), where μ is the deterministic transform and σ2 is the variance of the inner noise, which, in matrix form, is \(F = \nabla \mu ^T \cdot \Sigma _{\mathcal {I}}^{-1} \cdot \nabla \mu\) (i.e., our Equation 10). Therefore, their expression of the discriminable distance and the corresponding expression for the threshold stimuli are the same as what we get from our Equation A2. First, from our Equation A2, the perceptual distance is proportional to the slope of the response and inversely proportional to the square root of the covariance (as in Equation 10 in Zhou, Duong, & Simoncelli, 2024). Second, also from our Equation A2, if a just noticeable variation \(\Delta {\boldsymbol x}_\tau\) is the one that leads to certain threshold value of the perceptual distance \(\Delta {\boldsymbol x}_\tau ^\top \cdot \nabla S^\top \cdot \Sigma _\mathcal {I}^{-1} \cdot \nabla S \cdot \Delta {\boldsymbol x}_\tau = \tau ^2\), all possible just noticeable distortions lie on an ellipsoid described by F−1. In this way, the magnitudes of the just noticeable variations, \(\Delta {\boldsymbol x}_\tau\), are inversely proportional to the square root of the Fisher information matrix (as discussed after Equation 10 in Zhou, Duong, & Simoncelli, 2024). Moreover, the expected value of random just noticeable variations would fulfill \(\mathbb {E}[\Delta {\boldsymbol x}_\tau \cdot \Delta {\boldsymbol x}_\tau ^\top ] = \tau ^2 \,\, \left( \nabla S^\top \cdot \Sigma _\mathcal {I}^{-1} \cdot \nabla S \right)^{-1}\). In this case, one would be describing the covariance matrix of the uncertainty of the observer at that point, and it is proportional to the inverse of the linear Fisher information matrix, as in Seung and Sompolinsky (1993) and Kohn et al. (2016)
The consistency with the Fisher information results is not surprising given the equivalence of our noise-based Mahalanobis metric with the linear Fisher information matrix. However, note that the explicit consideration of the early, late, and external noises in \(\Sigma _{\mathcal {I}}\), in our Equation 6, was not considered in the mentioned information-theoretic literature. As a consequence, our contributions here make the Fisher-information results richer. 
On the other hand, our estimations of the noise have an impact in literature that quantifies information loss in the retina–cortex pathway. In this regard, values that come from psychophysical data, as those presented here, are more relevant to describe the effective bottleneck than physiological estimates that may be compensated by denoising or enhancing mechanisms (Atick, Li, & Redlich, 1992; Martinez-Otero, Molano, Wang, Sommer, & Hirsch, 2014; Li et al., 2022). Noise estimations are critical either to quantify information flow (Malo, 2020) and functional connectivity (Li, Steeg, & Malo, 2024) or to derive subjective distortion metrics based on information loss (Sheikh & Bovik, 2006). In all these applications, the estimated variance of the early and late noise can be used for a more accurate account of the mutual information between the retinal image and the inner representation. With the proposed method, realistic values for the noise could be obtained for any model, and hence the results in Malo (2020) on the information transmitted by spatial versus chromatic mechanisms or by linear versus nonlinear transforms could be more than rough estimates with reasonable noise assumptions. Similarly, better noise estimates could lead to an improvement of image quality measures (e.g., Sheikh & Bovik, 2006) based on modeling information transference along the visual pathway (Kheravdar, Li, & Malo, 2021). 
Final remarks: Early and late noise in psychophysical models of early vision
In this work, we proposed and compared two methods to obtain the early and late multivariate contributions to the inner noise that limits visual performance in psychophysical models of early vision. Both methods are based on measuring detection and discrimination thresholds. The first method is based on the exclusive use of threshold data, while the second uses the full psychometric functions measured around the thresholds. 
The first method, based on the Mahalanobis distance, leads to an analytical expression of the perceptual distance, Equation 6, that can be compared with empirical estimates of the distance based on the thresholds. That expression generalizes (Burgess & Colborne, 1988) for any nonlinear model with noise sources at different depths (not only late noise) and points out the fact that the use of external noise in the experiments is strictly required to get the scale of the early and late noise sources when one only used the values of the thresholds. The second method, which consists of a nonparametric fit of the psychometric functions, points out that the early noise contribution is strictly required to explain detection results while the late noise contribution is more important in explaining discrimination (see Figure 7). This is because the inner noise is more similar to the early noise for low-contrast stimuli, while it is more similar to the late noise for high-contrast stimuli (see Figure A4). The second method (that uses more experimental data) is substantially more accurate than the first one. In fact, the results of both methods are compatible due to the large uncertainty in the result of the first method. 
Explicit representation of the inner noise in the image domain (in luminance units as in Figure A5) and its comparison with accurate estimations of photoreceptor noise (Cottaris et al., 2019; Cottaris et al., 2020) shows that our results based on psychophysics make more behavioral sense than results based on retinal physiology, which are way too visible. Our results thus agree with previous suggestions on the role of postretinal processing as ways to reduce retinal noise (Atick, Li, & Redlich, 1992; Martinez-Otero et al., 2014; Li et al., 2022; Akbarinia, Morgenstern, & Gegenfurtner, 2023). 
Given the low accuracy of the threshold-only method with the available data, we did not try to solve the debates on disentangling the noise and the nonlinearities of the models (Wichmann, 1999; Kontsevich, Chen, & Tyler, 2002; Georgeson & Meese, 2006). However, Equation 6 does show that the variation of the inner noise cancels the effect of the transducer (there is ambiguity) only for very specific variations of the early and late noises. This suggests that, with more data to constrain the results, the different factors (early, late, and model nonlinearity) could be fitted at the same time. 
Beyond the simultaneous estimation of the noise at different depths of the vision model (early, late, and even decision level), another fundamental difference of our work with regard to previous psychophysical methods (Neri, 2010; May & Solomon, 2013; Kingdom, 2016; Shooner & Mullen, 2022) is its multivariate nature. Our expressions allow us to accommodate correlations between the different sensors at any depth. This is important in information-theoretic approaches because some correlations may be more harmful than others (Moreno-Bote et al., 2014; Kohn et al., 2016), and we saw that the results of our noise-dependent Mahalanobis metric are equivalent to the results obtained from the linear Fisher information matrix (Seung & Sompolinsky, 1993; Zhou, Duong, & Simoncelli, 2024). 
Finally, the proposed psychophysical estimation may be used to modify the Euclidean assumptions on the inner metric in recent models of spatial vision (Schütt & Wichmann, 2017; Martinez et al., 2018). This can be used for better quantification of the information flow along the visual pathway (Malo, 2020) and to improve image quality metrics based on information loss along the visual pathway (Sheikh & Bovik, 2006; Kheravdar, Li, & Malo, 2021). 
Acknowledgments
Supported by the German Research Foundation (DFG): SFB 1233, Robust Vision: Inference Principles and Neural Mechanisms, TP C2, project number: 276693517 and by funding from VIS4NN, Programa Fundamentos 2022, Fundación BBVA. 
Commercial relationships: none. 
Corresponding author: Jesus Malo. 
Address: Campus de Paterna, Universidad de Valencia, C/Catedrático José Beltrán, 2, 46980 Paterna (Valencia), Spain. 
Footnotes
2  The statistical properties of the external noise are the calibration features, which the experimenter may choose to probe hypotheses about the system under study. For instance, changing the spatial spectrum of the external noise is equivalent to introducing correlations over the different locations (pixels) and hence changing the orientation of the green ellipses.
Footnotes
3  Note that dealing with 64 × 64 images and a 3-scale and 4-orientation steerable transform implies working with metric matrices of size 25,664 × 25,664. Optimization takes about 20–50 iterations, and in each iteration, we need to compute n of such inverses (one per data point).
Footnotes
4  For instance, if late noise is assumed to be independent of early noise (true in the low-noise limit), and we have restricted Poisson (i.e., only depending on the global energy of the response and not on the energy of each coefficient), the covariance of the transformed input noise can be diagonalized. The corresponding orthogonal matrices can be extracted from the inverse, and the inverse reduces to inverting a diagonal matrix.
References
Abbott, L. and Dayan, P. (1999). The effect of correlated variability on the accuracy of a population code. Neural Computation, 11(1), 91–101. [CrossRef] [PubMed]
Ahumada, A. J. (1987). Putting the visual system noise back in the picture. Journal of the Optical Society of America A Optics and Image Science, 4(12), 2372–2378. [CrossRef] [PubMed]
Akbarinia, A., Morgenstern, Y., and Gegenfurtner, K. R. (2023). Contrast sensitivity function in deep networks. Neural Networks, 164, 228–244. [CrossRef]
Atick, J., Li, Z., and Redlich, A. (1992). Understanding retinal color coding from first principles. Neural Computation, 4(4), 559–572. [CrossRef]
Averbeck, B. B., Latham P. E., and Pouget, A. (2006). Neural correlations, population coding and computation. Nature Reviews Neuroscience, 7(5), 358–366. [CrossRef] [PubMed]
Baldwin, A., Baker, D., and Hess, R. (2016). What do contrast threshold equivalent noise studies actually measure? Noise vs. nonlinearity in different masking paradigms. PLoS ONE, 11(3), e0150942. [CrossRef] [PubMed]
Berardino, A., Ballé, J., Laparra, V., and Simoncelli, E. P. (2018). Eigen-distortions of hierarchical representations.
Burgess, A. and Colborne, B. (1988). Visual signal detection, IV: Observer inconsistency. Journal of the Optical Society of America A, 5(4), 617–627. [CrossRef]
Cottaris, N. P., Jiang, H., Ding, X., Wandell, B. A., and Brainard, D. H. (2019). A computational-observer model of spatial contrast sensitivity: Effects of wave-front-based optics, cone-mosaic structure, and inference engine. Journal of Vision, 19(4), 8, doi:10.1167/19.4.8. [CrossRef] [PubMed]
Cottaris, N. P., Wandell, B. A., Rieke, F., and Brainard, D. H. (2020). A computational observer model of spatial contrast sensitivity: Effects of photocurrent encoding, fixational eye movements, and inference engine. Journal of Vision, 20(7), 17, doi:10.1167/jov.20.7.17. [CrossRef] [PubMed]
da Fonseca, M. and Samengo, I. (2016). Derivation of human chromatic discrimination ability from an information-theoretical notion of distance in color space. Neural Computation, 28(12), 2628–2655. [CrossRef] [PubMed]
Dayan, P. and Abbott, L. F. (2005). Theoretical neuroscience: Computational and mathematical modeling of neural systems. Cambridge, MA: MIT Press.
Fairchild, M. (2013). Color appearance models. TheWiley-IS&T Series in Imaging Science and Technology. Chichester, UK: Wiley.
García, M. and Alcalá, R. (2009). Fixed vs. variable noise in 2AFC contrast discrimination: Lessons from psychometric functions. Spatial Vision, 22(4), 273–300. [PubMed]
Georgeson, M. A. and Meese, T. S. (2006). Fixed or variable noise in contrast discrimination? The jury's still out. Vision Research, 46(25), 4294–4303. [CrossRef] [PubMed]
Goris, R., Zaenen, P., and Wagemans, J. (2008). Some observations on contrast detection in noise. Journal of Vision, 8(9):4, 1–15, doi:10.1167/8.9.4. [CrossRef] [PubMed]
Goris, R. L. T., Movshon, J. A., and Simoncelli, E. P. (2014). Partitioning neuronal variability. Nature Neuroscience, 17(6), nn.3711. [CrossRef]
Green, D. M. (1960). Psychoacoustics and detection theory. The Journal of the Acoustical Society of America, 32(10), 1189–1203. [CrossRef]
Green, D. M. and Swets, J. A. (1988). Signal detection theory and psychophysics. Los Altos Hill, USA: Peninsula Publishing.
Henning, G., Bird, C., and Wichmann, F. (2002). Contrast discrimination of pulse trains in pink noise. Journal of the Optical Society of America A, 19(7), 1259–1266. [CrossRef]
Kelly, D. H. (1979). Motion and vision. II. Stabilized spatio-temporal threshold surface. Journal of the Optical Society of America, 69(10), 1340–1349. [CrossRef] [PubMed]
Kheravdar, B., Li, Q., and Malo, J. (2021). Visual information fidelity with better vision models and better mutual information estimates. Journal of Vision, 21(9), 2351, doi:10.1167/jov.21.9.2351.
Kingdom, F. (2016). Fixed versus variable internal noise in contrast transduction: The significance of Whittle's data. Vision Research, 128, 1–5. [CrossRef] [PubMed]
Kingdom, F. and Prins, N. (2010). Psychophysics: A practical introduction. London, UK: Elsevier Academic Press.
Kohn, A., Coen-Cagli, R., Kanitscheider, I., and Pouget, A. (2016). Correlations and neuronal population information. Annual Review of Neuroscience, 39, 237–256. [CrossRef] [PubMed]
Kontsevich, L. L., Chen, C.-C., and Tyler, C. W. (2002). Separating the effects of response nonlinearity and internal noise psychophysically. Vision Research, 42(14), 1771–1784. [CrossRef] [PubMed]
Laparra, V., Muñoz, J., and Malo, J. (2010). Divisive normalization image quality metric revisited. Journal of the Optical Society of America A, 27(4), 852–864. [CrossRef]
Legge, G. (1981). A power law for contrast discrimination. Vision Research, 18, 68–91.
Legge, G. and Foley, J. (1980). Contrast masking in human vision. Journal of the Optical Society of America, 70, 1458–1471. [CrossRef] [PubMed]
Legge, G., Kersten, D., and Burgess, A. (1987). Contrast discrimination in noise. Journal of the Optical Society of America A, 4(2), 391–404. [CrossRef]
Li, Q., Gomez-Villa, A., Bertalmío,M., and Malo, J. (2022). Contrast sensitivity functions in autoencoders. Journal of Vision, 22(6), 8, doi:10.1167/jov.22.6.8. [CrossRef]
Li, Q., Steeg, G. V., and Malo, J. (2024). Functional connectivity via total correlation: Analytical results in visual areas. Neurocomputing, 571, 127143. [CrossRef]
Mahalanobis, P. (1936). On the generalized distance in statistics. Proceedings of the National Institute of Sciences of India, 2(1), 49–55.
Malo, J. (1999). Redundancy reduction in the human visual system: New formulation and applications to image and video coding. PhD thesis, School of Physics, University of Valencia.
Malo, J. (2020). Spatio-chromatic information available from different neural layers via gaussianization. Journal of Mathematical Neuroscience, 10(18), 1:40.
Malo, J., Epifanio, I., Navarro, R., and Simoncelli, E. (2006). Nonlinear image representation for efficient perceptual coding. IEEE Transactions on Image Processing, 15(1), 68–80. [CrossRef]
Malo, J., Esteve-Taboada, J., and Bertalmío, M. (2024). Cortical divisive normalization from Wilson–Cowan neural dynamics. Journal of Nonlinear Science, 34(35), 1:36.
Malo, J., Pons, A., Felipe, A., and Artigas, J. (1997). Characterization of the human visual system threshold performance by a weighting function in the gabor domain. Journal of Modern Optics, 44(1), 127–148. [CrossRef]
Malo, J. and Simoncelli, E. (2015). Geometrical and statistical properties of vision models obtained via maximum differentiation. In Proceedings of SPIE Electronic Imaging (p. 93940L). San Francisco, USA: International Society for Optics and Photonics.
Maloney, L. T. and Yang, J. N. (2003). Maximum likelihood difference scaling. Journal of Vision, 3(8), 5, doi:10.1167/3.8.5. [CrossRef]
Martinez, M., Bertalmío, M., and Malo, J. (2019). In praise of artifice reloaded: Caution with natural image databases in modeling vision. Frontiers in Neuroscience, 13:8, 1:17, https://doi.org/10.3389/fnins.2019.00008.
Martinez, M., Cyriac, P., Batard, T., Bertalmío, M., and Malo, J. (2018). Derivatives and inverse of cascaded L+NL neural models. PLoS, 13(10), e0201326. [CrossRef]
Martinez-Otero, L., Molano, M., Wang, X., Sommer, F., and Hirsch, J. (2014). Statistical wiring of thalamic receptive fields optimizes spatial sampling of the retinal image. Neuron, 81(4), 943–956. [PubMed]
May, K. A. and Solomon, J. A. (2013). Four theorems on the psychometric function. PLoS ONE, 8(10), 1–34.
Moreno-Bote, R., Beck, J., Kanitscheider, I., Pitkow, X., Latham, P., and Pouget, A. (2014). Information-limiting correlations. Nature Neuroscience, 17, 1410–1417. [PubMed]
Nachmias, J. and Sansbury, R. V. (1974). Grating contrast: Discrimination may be better than detection. Vision Research, 14(10), 10391042.
Neri, P. (2010). How inherently noisy is human sensory processing? Psychonomic Bulletin & Review, 17, 802–808. [PubMed]
Pelli, D. G. (1985). Uncertainty explains many aspects of visual contrast detection and discrimination. Journal of the Optical Society of America A Optics and Image Science, 2(9), 1508–1532. [PubMed]
Pelli, D. G. (1991). Noise in the visual system may be early (pp. 147–151). Cambridge, MA: MIT Press.
Pelli, D. G. and Farell, B. (1999). Why use noise? Journal of the Optical Society of America A Optics and Image Science, 16(3), 647–653.
Press, W., Teukolsky, S., Vetterling, W., and Flannery, B. (2007). Numerical recipes 3rd edition: The art of scientific computing. Cambridge, UK: Cambridge University Press.
Robson, J. G. (1966). Spatial and temporal contrast-sensitivity functions of the visual system. Journal of the Optical Society of America, 56(8), 1141–1142.
Schütt, H. and Wichmann, F. (2017). An image-computable psychophysical spatial vision model. Journal of Vision, 17(12), 12, doi:10.1167/17.12.12. [PubMed]
Seung, H. S. and Sompolinsky, H. (1993). Simple models for reading neuronal population codes. Proceedings of the National Academy of Sciences, 90(22), 10749–10753.
Sheikh, H. R. and Bovik, A. C. (2006). Image information and visual quality. IEEE Transactions on Image Processing, 15(2), 430–444.
Shooner, C. and Mullen, K. T. (2022). Linking perceived to physical contrast: Comparing results from discrimination and differencescaling experiments. Journal of Vision, 22(1), 13, doi:10.1167/jov.22.1.13. [PubMed]
Simoncelli, E. P., Freeman, W. T., Adelson, E. H., and Heeger, D. J. (1992). Shiftable multi-scale transforms. IEEE Trans Information Theory, 38(2), 587–607. Special Issue on Wavelets.
Swets, J. A. (1961). Is there a sensory threshold? Science, 134(3473), 168–177. [PubMed]
Tanner,W. P. and Swets, J. A. (1954). A decision-making theory of visual detection. Psychological Review, 61(6), 401–409. [PubMed]
Wang, Z. and Simoncelli, E. P. (2008). Maximum differentiation (MAD) competition: A methodology for comparing computational models of perceptual quantities. Journal of Vision, 8(12), 8, doi:10.1167/8.12.8.
Watson, A. (1987). The cortex transform: Rapid computation of simulated neural images. Computer Vision, Graphics and Image Processing, 39, 311–327.
Watson, A. B. and Solomon, J. A. (1997). Model of visual contrast gain control and pattern masking. Journal of the Optical Society of America A Optics and Image Science, 14(9), 2379–2391.
Wichmann, F. (1999). Some aspects of modelling human spatial vision: Contrast discrimination. PhD thesis, University of Oxford.
Zhou, J., Duong, L. R., and Simoncelli, E. P. (2024). A unified framework for perceived magnitude and discriminability of sensory stimuli. Proceedings of the National Academy of Sciences, 121(25), e2312293121.
Appendix A: Derivation of Equation 6 (general and particular cases)
General case. The theoretical distance in Equation 5 can be written in terms of the stimulus and the different noise sources by using the Taylor approximation of the nonlinear behavior of the system, as already done elsewhere (Ahumada, 1987; Malo, 1999; Malo et al., 2006; Laparra, Muñoz, & Malo, 2010), as follows:  
\begin{equation} S(x+\Delta x) \approx S(x) + \nabla S \cdot \Delta x \end{equation}
(A1)
where ∇S is the Jacobian of the model at \({\boldsymbol x}\). This Taylor approximation is correct in the low-noise limit or if the nonlinearity of the system is moderate. Therefore, under this approximation, ΔS = S(x + Δx) − S(x) = ∇S · Δx, and ΔS = Δx · ∇S, and the distance in Equation 5 may be written as (Malo et al., 2006)  
\begin{equation} D_{\rm {th}}^2 = \Delta x^\top \cdot \nabla S^\top \cdot \left(\Sigma _{\mathcal {I}}\right)^{-1} \cdot \nabla S \cdot \Delta x \end{equation}
(A2)
 
Now, by definition, the covariance matrix of the inner noise \(\Sigma _{\mathcal {I}}\) is given by the expected value of the outer product of the noise  
\begin{eqnarray} &&\Sigma _{\mathcal {I}} = \mathbb {E}\left[{\boldsymbol n}_{\mathcal {I}} \cdot {\boldsymbol n}_{\mathcal {I}}^\top \right] = \mathbb {E}\left[(S({\boldsymbol x} + {\boldsymbol n}_\varepsilon + k {\boldsymbol n}_e)+ k {\boldsymbol n}_l \right.\nonumber\\ && \left. - S({\boldsymbol x})) \cdot ( S({\boldsymbol x} + {\boldsymbol n}_\varepsilon + k {\boldsymbol n}_e) + k {\boldsymbol n}_l -S({\boldsymbol x}))^\top \right] \;\;\quad \end{eqnarray}
(A3)
where, as stated in Theory: perceptual distance in terms of thresholds and noise, we included an unknown scale factor, k, in the early and the late noise. Using the Taylor expansion again, we have  
\begin{eqnarray}\begin{array}{@{}r@{\;}c@{\;}l@{}} \Sigma _{\mathcal {I}} & = & \mathbb {E}\left[( \nabla S \cdot ({\boldsymbol n}_\varepsilon +k {\boldsymbol n}_e) + k {\boldsymbol n}_l ) \cdot ( ({\boldsymbol n}_\varepsilon + k {\boldsymbol n}_e)^\top \cdot \nabla S^\top + k {\boldsymbol n}_l^\top )\right] \\ & = & k^2 \mathbb {E}\left[ {\boldsymbol n}_l \cdot {\boldsymbol n}_l^\top \right] + k^2 \nabla S \cdot \mathbb {E}\left[ {\boldsymbol n}_e \cdot {\boldsymbol n}_e^\top \right] \cdot \nabla S^\top + 2 k \mathbb {E}\left[ {\boldsymbol n}_l \cdot {\boldsymbol n}_e^\top \right] \\ & & \cdot \nabla S^\top + \nabla S \cdot \mathbb {E}\left[ {\boldsymbol n}_\varepsilon \cdot {\boldsymbol n}_\varepsilon ^\top \right] \cdot \nabla S^\top + 2 k \nabla S \cdot \mathbb {E}\left[ {\boldsymbol n}_\varepsilon \cdot {\boldsymbol n}_e^\top \right]\\ && \cdot \nabla S^\top + 2 k^2 \mathbb {E}\left[ {\boldsymbol n}_l \cdot {\boldsymbol n}_\varepsilon ^\top \right] \cdot \nabla S^\top \end{array}\qquad \end{eqnarray}
(A4)
where the expected value of crossed terms does not vanish in general because (1) the early noise may depend on the signal and hence may depend on the external noise, and (2) the late noise may depend on the signal and hence may depend on the early and the external noise as well. In the above equation, we can identify the terms in Equation 6:  
\begin{eqnarray}\begin{array}{@{}l@{}} \Sigma _{\mathcal {I}} = \underbrace{k^2 \, \Sigma _l}_{{\it late\ noise}} + \underbrace{k^2 \, \nabla S \Sigma _e \nabla S^\top }_{{\it early\ noise}} + \underbrace{2 k^2 \mathbb {E}[{\boldsymbol n}_l {\boldsymbol n}_e^\top ] \nabla S^\top }_{{\it corr.\ late-early}}\\ \qquad + \overbrace{ \underbrace{\nabla S \Sigma _\varepsilon \nabla S^\top }_{{\it external\ noise}} + \underbrace{2 k \nabla S \mathbb {E}[{\boldsymbol n}_e {\boldsymbol n}_\varepsilon ^\top ] \nabla S^\top }_{{\it corr.\ early-external}} + \underbrace{2 k \mathbb {E}[{\boldsymbol n}_l {\boldsymbol n}_\varepsilon ^\top ] \nabla S^\top }_{{\it corr.\ late-external}} }^{{\it external\ dependent}} \end{array} \qquad \end{eqnarray}
(A5)
that represent how the covariance of the noise at the input is propagated through the network (Ahumada, 1987). The above equation contains the (general) covariance matrices of the late noise Σl, the early noise Σe, and the external noise Σε, which may include arbitrary dependence with the signal. The next paragraph makes a particular choice for this variation. 
Particular case: Gaussian–Poisson noise. If early and late noises are signal-dependent Gaussian–Poisson variables, their realizations are  
\begin{eqnarray} {\boldsymbol n}_e &\;=& \left( \alpha _e I + \beta _e\,\mathbb {D}_{|{\boldsymbol x}|^{1/2}} \right) \cdot {\boldsymbol n}_{G1}\nonumber\\ {\boldsymbol n}_l &\;=& \left( \alpha _l I + \beta _l\,\mathbb {D}_{|S({\boldsymbol x})|^{1/2}} \right) \cdot {\boldsymbol n}_{G2} \qquad \end{eqnarray}
(A6)
where \({\boldsymbol n}_{G1}\) and \({\boldsymbol n}_{G2}\) denote realizations of unit covariance Gaussian noise in the input space and in the inner space (which have different dimensions as there may be a different number of photoreceptors and cortical neurons), and \(\mathbb {D}_{(\cdot )}\) stands for a diagonal matrix with the elements of ( · ) in the diagonal. The corresponding covariance matrices of the different noise sources can be written as  
\begin{eqnarray} \Sigma _e({\boldsymbol x}) &\; = & \alpha _e^2 I + \beta _e^2 \, \mathbb {D}_{|{\boldsymbol x}|} + 2 \alpha _e \beta _e \, \mathbb {D}_{|{\boldsymbol x}|^{1/2}} \nonumber\\ \Sigma _l(S({\boldsymbol x})) &\;=& \alpha _l^2 I + \beta _l^2 \, \mathbb {D}_{|S({\boldsymbol x})|} + 2 \alpha _l \beta _l \, \mathbb {D}_{|S({\boldsymbol x})|^{1/2}}\qquad \end{eqnarray}
(A7)
where \(\alpha _e^2\) is the variance of the Gaussian component of the early noise and \(\beta _e^2\) is the Fano factor of the Poisson component of the early noise and equivalent concepts for the late noise. 
In this work, we assume that both the early and the late noise are pure Poisson sources, a usual assumption in neural systems (Cottaris et al., 2019; Cottaris et al., 2020; Dayan & Abbott, 2005), and hence our goal is determining the scale factors βe and βl
Appendix B: Optimization of correlation from Equation 8
Instead of optimizing the noise through the metric using the mathematical expression for the distance shown in Equation 6, which requires the derivative of the metric and, due to the inverse, this implies a dependence on \((\Sigma _\mathcal {I})^{-2}\), we suggest the optimization of the noise using a mathematical expression for the distance based on the difference in the noisy responses:  
\begin{eqnarray} {\boldsymbol y}({\boldsymbol x}) &\;=& S({\boldsymbol x})+{\boldsymbol n}_{\,\mathcal {I}}\nonumber\\ {\boldsymbol y}({\boldsymbol x}+\Delta {\boldsymbol x}) &\;=& S({\boldsymbol x}+\Delta {\boldsymbol x}) + {\boldsymbol n^{\prime }}_{\,\mathcal {I}} = {\boldsymbol y({\boldsymbol x})} + \Delta {\boldsymbol y} \qquad \end{eqnarray}
(A8)
and take the average Euclidean distance between them, resulting in  
\begin{eqnarray} D_{\rm {th}}^2 &\;=& \mathbb {E}\left[ \, \Delta {\boldsymbol y}^\top \cdot \Delta {\boldsymbol y} \, \right] \nonumber \\ &\;=& \mathbb {E}\big[\left|\Delta S + {\boldsymbol n^{\prime }}_{\,\mathcal {I}} - {\boldsymbol n}_{\mathcal {I}} \right|^2 \big]\qquad \end{eqnarray}
(A9)
where \(\Delta S = S({\boldsymbol x}+\Delta {\boldsymbol x}) - S({\boldsymbol x})\) is the difference between the deterministic responses, and \({\boldsymbol n}_{\mathcal {I}}\) and \({\boldsymbol n^{\prime }}_{\mathcal {I}}\) are different realizations of the inner noise at the points \(S({\boldsymbol x})\) and \(S({\boldsymbol x}+\Delta {\boldsymbol x})\), respectively. Note that, as already said, Equation A9 means that when judging the difference between two stimuli, the brain compares two noisy responses, \({\boldsymbol y}({\boldsymbol x})\) and \({\boldsymbol y}({\boldsymbol x})+\Delta {{\boldsymbol y}}\), that is, \(S({\boldsymbol x})+{\boldsymbol n}_{\,\mathcal {I}}\) and \(S({\boldsymbol x}+\Delta {\boldsymbol x}) + {\boldsymbol n^{\prime }}_{\,\mathcal {I}}\)
As stated above, our proposal for noise estimation explained in Theory: perceptual distance in terms of thresholds and noise consisted of finding the noise parameters that maximize the correlation between the experimental Dexp and theoretical Dth distances. As shown in Equation (S7.3) and (S7.5) in Martinez et al. (2018), given n experimental conditions to measure n thresholds, the maximization of this correlation requires its derivative with regard to the parameters of the model and reduces to computing \(\frac{\delta D_{\rm {th}}^i}{\delta \theta }\), where θ are the noise parameters, and \(D_{\rm {th}}^i\) is the distance for the ith experimental condition, with i = 1, …, n. According to Equation (S7.6) in Martinez et al. (2018), these derivatives can be written as  
\begin{eqnarray} &&\frac{\delta D_{\rm {th}}^i}{\delta \theta } = \frac{1}{D_{\rm {th}}^i} \cdot \left({\boldsymbol y}(x^i+\Delta x^i) - {\boldsymbol y}(x^i) \right)^\top \nonumber\\ &&\qquad\cdot \left[ \frac{\delta {\boldsymbol y}(x^i+\Delta x^i)}{\delta \theta } - \frac{\delta {\boldsymbol y}(x^i)}{\delta \theta }\right]\qquad\quad \end{eqnarray}
(A10)
To compute the derivatives \(\frac{\delta {\boldsymbol y}(\cdot )}{\delta \theta }\) we can apply the Taylor approximation to Equation 3 to get  
\begin{eqnarray} {\boldsymbol y}({\boldsymbol x}) &\;=& S({\boldsymbol x}+{\boldsymbol n}_\varepsilon + {\boldsymbol n}_e) + {\boldsymbol n}_l\, \approx\, S({\boldsymbol x})\nonumber\\ && + \nabla S \cdot {\boldsymbol n}_{\varepsilon } + \nabla S \cdot {\boldsymbol n}_e + {\boldsymbol n}_l \quad\qquad \end{eqnarray}
(A11)
 
And now, for the case in which early and late noises are signal-dependent Gaussian–Poisson variables, plugging Equation A6 into Equation A11, we obtain  
\begin{eqnarray} {\boldsymbol y}({\boldsymbol x}) &=& S({\boldsymbol x}) + \nabla S \cdot {\boldsymbol n}_{\varepsilon } + \nabla S \cdot \left( \alpha _e I + \beta _e\,\mathbb {D}_{|{\boldsymbol x}|^{1/2}} \right) \cdot {\boldsymbol n}_{G1}\nonumber\\ && + \left( \alpha _l I + \beta _l\,\mathbb {D}_{|S({\boldsymbol x}+{\boldsymbol n}_\varepsilon +{\boldsymbol n}_e)|^{1/2}} \right) \cdot {\boldsymbol n}_{G2} \end{eqnarray}
(A12)
Applying again the Taylor approximation to the term \(|S({\boldsymbol x}+{\boldsymbol n}_\varepsilon +{\boldsymbol n}_e)|^{1/2}\) we finally get  
\begin{eqnarray}&& {\boldsymbol y}({\boldsymbol x}) = S + \nabla S \cdot {\boldsymbol n}^{\varepsilon } + \nabla S \cdot \left( \alpha _e I + \beta _e\,\mathbb {D}_{|{\boldsymbol x}|^{1/2}} \right) \cdot {\boldsymbol n}_{G1} \nonumber\\ && + \left( \alpha _l I + \beta _l\,\mathbb {D}_{|S|^{1/2}+\frac{1}{2}|S|^{-1/2}\cdot \rm {sign}(S) \cdot \nabla S \cdot ({\boldsymbol n}_\varepsilon + {\boldsymbol n}_e)} \right) \cdot {\boldsymbol n}_{G2} \qquad \end{eqnarray}
(A13)
where for readability, the function \(S({\boldsymbol x})\) has been named S, and the function sign( · ) applies the sign of ( · ). 
With this result for \({\boldsymbol y}({\boldsymbol x})\), it is possible to obtain the derivatives wrt the earlye, βe) and latel, βl) noise parameters that are needed to maximize the correlation between the experimental Dexp and theoretical Dth distances:  
\begin{eqnarray} \begin{array}{@{}l@{\;}c@{\;}l@{}} \frac{\delta {\boldsymbol y}({\boldsymbol x})}{\delta \alpha _e} &=& \nabla S \cdot I \cdot n_{G1} + \left( \beta _l\,\mathbb {D}_{\frac{1}{2}|S|^{-1/2}\cdot \rm {sign}(S) \cdot \nabla S \cdot n_{G1}}\right) \cdot n_{G2}\\ \frac{\delta {\boldsymbol y}({\boldsymbol x})}{\delta \beta _e} &=& \nabla S \cdot \mathbb {D}_{|{\boldsymbol x}|^{1/2}} \cdot n_{G1} + \left( \beta _l\,\mathbb {D}_{\frac{1}{2}|S|^{-1/2}\cdot \rm {sign}(S) \cdot \nabla S \cdot \mathbb {D}_{|{\boldsymbol x}|^{-1/2}} \cdot n_{G1}}\right) \cdot n_{G2} \\ \frac{\delta {\boldsymbol y}({\boldsymbol x})}{\delta \alpha _l} &=& n_{G2} \\ \frac{\delta {\boldsymbol y}({\boldsymbol x})}{\delta \beta _l} &=& \left( \mathbb {D}_{|S|^{1/2}+\frac{1}{2}|S|^{-1/2}\cdot \rm {sign}(S) \cdot \nabla S \cdot \left( \alpha _e I + \beta _e\,\mathbb {D}_{|{\boldsymbol x}|^{1/2}} \right) \cdot {\boldsymbol n}_{G1}}\right) \cdot n_{G2} \end{array}\nonumber\\ \end{eqnarray}
(A14)
 
As can be seen from these last equations, in this nonparametric case, the derivatives wrt the early and late noise parameters do not involve matrix inversion, and hence they are easy to compute. This allows a practical optimization of the noise parameters to maximize the correlation between the experimental and theoretical distances, no matter the complexity of the noise, whenever ∇S is easy to compute. 
Appendix C: Equivalence of parametric and nonparametric distances
In this appendix, we show that the considered distances, Equations 6 and 8, are equally valid to get the corresponding estimation of the noise. First we show that the distances have an interesting anisotropic behavior depending on the noise: Perceptual distance is markedly different for distortions in different directions \(\Delta {\boldsymbol x}\). The noise-dependent anisotropy is obvious in Equation 6 because the metric depends on the covariance of the noise, but this is less obvious in the nonparametric Equation 8
The first part of this appendix analytically shows that the nonparametric distance displays this anisotropy. Afterward, we numerically show that the distance computed in both ways can be linearly related. As a result, since correlation is independent of a linear transformation applied to one of the axes, both definitions of Dth are equivalent for our purposes. 
Analytical part: nonparametric distance is anisotropic. The general expression for the average Euclidean distance in the nonparametric case, Equation A9, \(D_{\rm {th}}^2 = \mathbb {E}[|\Delta S + {\boldsymbol n^{\prime }}_{\,\mathcal {I}} - {\boldsymbol n}_{\mathcal {I}}|^2 ]\), can be written as  
\begin{eqnarray} D_{\rm {th}}^2 = \mathbb {E}\big[( \Delta S + ({\boldsymbol n^{\prime }}_{\,\mathcal {I}} - {\boldsymbol n}_{\mathcal {I}}))^\top \cdot ( \Delta S + ({\boldsymbol n^{\prime }}_{\,\mathcal {I}} - {\boldsymbol n}_{\mathcal {I}}) ) \big] \nonumber\\ \end{eqnarray}
(A15)
 
which can be expanded as follows:  
\begin{eqnarray} D_{\rm {th}}^2 &\;=& \mathbb {E}\left[\Delta S^\top \cdot \Delta S + \Delta S^\top \cdot \left({\boldsymbol n^{\prime }}_{\,\mathcal {I}} - {\boldsymbol n}_{\mathcal {I}}\right)\right.\nonumber\\ && + \left({\boldsymbol n^{\prime }}_{\,\mathcal {I}} - {\boldsymbol n}_{\mathcal {I}}\right)^\top \cdot\left. \Delta S + \left({\boldsymbol n^{\prime }}_{\,\mathcal {I}} - {\boldsymbol n}_{\mathcal {I}}\right)^\top \cdot \left({\boldsymbol n^{\prime }}_{\,\mathcal {I}} - {\boldsymbol n}_{\mathcal {I}}\right)\right]\nonumber \\ &=& \Delta S^\top \cdot \Delta S + 2 \, \mathbb {E}\left[\Delta S^\top \cdot \left({\boldsymbol n^{\prime }}_{\,\mathcal {I}} - {\boldsymbol n}_{\mathcal {I}}\right)\right] \nonumber \\ &&+ \mathbb {E}\left[{\boldsymbol n^{\prime }}_{\,\mathcal {I}}^\top \cdot {\boldsymbol n^{\prime }}_{\mathcal {I}}\right] - 2\,\mathbb {E}\left[{\boldsymbol n^{\prime }}_{\,\mathcal {I}}^\top \cdot {\boldsymbol n}_{\mathcal {I}}\right] + \mathbb {E}\left[{\boldsymbol n}_{\,\mathcal {I}}^\top \cdot {\boldsymbol n}_{\mathcal {I}}\right]\nonumber \\ &=& \left|\Delta S\right|^2 + \left|{\boldsymbol n}_{\,\mathcal {I}}\right|^2 + \left|{\boldsymbol n^{\prime }}_{\,\mathcal {I}}\right|^2 - 2\,\mathbb {E}\left[{\boldsymbol n^{\prime }}_{\,\mathcal {I}}^\top \cdot {\boldsymbol n}_{\mathcal {I}}\right]\nonumber \\ && + 2 \mathbb {E}\left[\Delta S^\top \cdot \left({\boldsymbol n^{\prime }}_{\,\mathcal {I}} - {\boldsymbol n}_{\mathcal {I}}\right)\right] \end{eqnarray}
(A16)
where the last term vanishes because it compares a deterministic component and a random component. When ΔS = 0, \({\boldsymbol n^{\prime }}_{\,\mathcal {I}} = {\boldsymbol n}_{\,\mathcal {I}}\), and then, \(2\,\mathbb {E}[{\boldsymbol n^{\prime }}_{\,\mathcal {I}}^\top \cdot {\boldsymbol n}_{\mathcal {I}}] = 2 |{\boldsymbol n}_{\,\mathcal {I}}|^2\), so \(D_{\rm {th}}^2 = 0\). When ΔS ≠ 0, as \({\boldsymbol n}_{\,\mathcal {I}}\) and \({\boldsymbol n^{\prime }}_{\,\mathcal {I}}\) are decorrelated, \(\mathbb {E} [{\boldsymbol n^{\prime }}_{\,\mathcal {I}}^\top \cdot {\boldsymbol n}_{\mathcal {I}}] = 0\), so we get  
\begin{equation} D_{\rm {th}}^2 = \left|\Delta S\right|^2 + \left|{\boldsymbol n}_{\,\mathcal {I}}\right|^2 + \left|{\boldsymbol n^{\prime }}_{\,\mathcal {I}}\right|^2 \end{equation}
(A17)
Note that, in this expression, |ΔS| and \(|{\boldsymbol n}_{\,\mathcal {I}}|\) are constants for a given \({\boldsymbol x}\). However, in general, the energy \(\left|{\boldsymbol n^{\prime }}_{\,\mathcal {I}}\right|^2\) depends on \(\Delta {\boldsymbol x}\) (or ΔS). This dependence is what generates the anisotropic behavior of the distance. For instance, if the inner noise has a Poisson component, the direction of ΔS matters: Increasing the response in a certain direction may increase the energy, while going in the opposite direction (reducing the response) may reduce the energy of the noise. 
Figure A1.
 
Equivalence between parametric and nonparametric distances. Plots (a) and (b) show the 4 × 5 locations (4 contrasts times 5 luminances) considered in the space of two-pixel images and in the inner representation of the illustrative model introduced in Model response in the presence of noise: a two-pixel example. The noisy images and responses are represented by the random samples in blue. Plots (c) and (d) magnify the highlighted rectangles in green and orange. In each location in the inner representation, we computed distances from the central point in the directions indicated by the black squares. The plot (e) represents the distances from the central point in plot (d) in the different directions computed through the parametric and nonparametric methods, Equations 6 and 8, in blue and red respectively. The plots in panel (f) show the same result at the other considered points all over the image space.
Figure A1.
 
Equivalence between parametric and nonparametric distances. Plots (a) and (b) show the 4 × 5 locations (4 contrasts times 5 luminances) considered in the space of two-pixel images and in the inner representation of the illustrative model introduced in Model response in the presence of noise: a two-pixel example. The noisy images and responses are represented by the random samples in blue. Plots (c) and (d) magnify the highlighted rectangles in green and orange. In each location in the inner representation, we computed distances from the central point in the directions indicated by the black squares. The plot (e) represents the distances from the central point in plot (d) in the different directions computed through the parametric and nonparametric methods, Equations 6 and 8, in blue and red respectively. The plots in panel (f) show the same result at the other considered points all over the image space.
Numerical illustration: nonparametric and parametric distances are linearly related. In this section, we use the simplified vision model introduced in Model response in the presence of noise: a two-pixel example (on two-pixel images) to illustrate explicitly how the nonparametric distance of Equation 8 follows the trends of the parametric metric in Equation 6. Using the model implemented here,5 we considered 4 × 5 locations (4 contrasts times 5 luminances) in the space of two-pixel images, and we generated 5,000 noisy responses for each of these images. See the noisy samples and responses in Figure A1. Then, we considered 20 points at constant Euclidean distance from each average response in equidistant directions along the Euclidean sphere (e.g., the black squares in the zoom in Figure A1.d). These points are convenient to point out the anisotropy of the measures and their eventual equivalence. 
The metric defined by the noisy samples implies that the locus of equidistant points from the center is the ellipsoid highlighted in red. As a result, the perceptual metric is anisotropic: Starting from the vertical direction in Figure A1.d, the distance is maximum, and it goes down and up in a periodic way, as given by the blue line in plot A1.e, computed according to the parametric distance. The distance of each black square was also computed from the expected value in Equation 8, leading to the red curve in plot A1.e. The plots in the panel (f) show that the equivalence between the two measures holds all over the image space with the same linear relation between the parametric and nonparametric measures. All the curves in red were obtained from the nonparametric distances and the same linear relation with the parametric distance. 
In summary, the (simpler to compute) nonparametric distance based on the average over the noisy samples is equivalent (i.e., leads to the same correlation) as the parametric distance, which requires the inversion of huge matrices in the optimization. 
Appendix D: Noise from thresholds for individual observers
The surfaces in Figure A2 show the correlation between the threshold-based experimental distances and the theoretical distances based on pure Poisson early and late noise for each individual observer. In this case, the data of each observer is considered separately, so we obtain different optima (highlighted in red) for each observer. The scatterplots on top show the predictions with better linear correlation. 
In this threshold-only method, separation of data per observer does not improve the optimization problem: Similarly to Figure 5, the correlation surfaces display large plateaus of almost constant values, thus indicating uncertainty in the optima. Moreover, observers display quite different results. In particular, observer GBH seems to drive the global result in Figure 5 because the data from observer CMB display a small correlation anyway and has a small impact in the global correlation. 
This uncertainty and disparity is in contrast with the narrow minima and consistency between observers found in the method that considers all the data in the psychometric functions (Figure 7). This suggests that the consideration of the full psychometric functions properly constrains the problem toward more accurate and consistent solutions. 
Figure A2.
 
Early and late noise parameters from threshold data for each individual observer. The top plots represent the best (maximum correlation) linear fits between the theoretical distance (that depends on the noise sources) and the experimental distance (based on the thresholds) for the considered observers. The surfaces at the bottom display the correlation between the theoretical distance and the experimental distance depending on the noise parameters. The optimal noise parameters, here highlighted in red, are those that maximize the correlation for each observer (βe = 0.04, βl = 0.91) for GBH and (βe = 0.02, βl = 17.5) for CMB.
Figure A2.
 
Early and late noise parameters from threshold data for each individual observer. The top plots represent the best (maximum correlation) linear fits between the theoretical distance (that depends on the noise sources) and the experimental distance (based on the thresholds) for the considered observers. The surfaces at the bottom display the correlation between the theoretical distance and the experimental distance depending on the noise parameters. The optimal noise parameters, here highlighted in red, are those that maximize the correlation for each observer (βe = 0.04, βl = 0.91) for GBH and (βe = 0.02, βl = 17.5) for CMB.
Appendix E: Role of early and late noise in detection and discrimination
To understand better the implications of the estimated noise parameters, we simulated the model with those parameters and analyzed the representations for a reduced case similar to our two-pixel toy example from Modeling framework and intuition. For this, we generate noisy responses for gratings of 2 cycles/deg of different contrast (0.01, 0.15, and 0.7) and different average luminance (15, 50, 100 cd/m2) using the model in Equation 2 and the results of the Poisson noise sources obtained with the full-psychometric method. 
Figure A3 shows the noisy signals with different contributions of the noise, at the two levels of representation. Computations are done with a full-scale model that works with 256 × 256 images. However, these two-dimensional projections are obtained by selecting two sensors of the image representation and two sensors of the late representation. For illustrative purposes, we took for the input the ones corresponding to the darkest and lightest locations of the gratings and from the late representation a wavelet unit from the low-frequency residual and another tuned to the 2 cycles/deg band. In this way, the meaning of the representations in Figure A3 is qualitatively similar to the representations in the toy scenario presented in Figures 2 and 3: The input spaces are completely equivalent, and in the projection that we consider here, the horizontal axis also represents brightness and the vertical axis also represents contrast. 
Figure A3.
 
Gratings corrupted by early and late noise with noise parameters as estimated in Results II. We show samples in two-dimensional projections of the stimulus space (top row) and the late representation (middle and bottom rows). The signal in the input representation corresponds to the brightest and the darkest locations of a 2 cpd grating for different values of luminance and contrasts. Here, red, green, and blue clusters correspond to progressively higher values of contrast [0.01, 0.15, 0.7]. Also, clusters progressively further away from the origin in the diagonal direction of the stimulus space correspond to images with higher average luminance, with values [15, 50, 100] cd/m2. The middle row shows the samples in the two-dimensional representation along the zero-frequency dimension (horizontal axis) and along a frequency of 2 cycles/deg (vertical axis). The rectangles in dashed style highlight the region of low-luminance gratings of different contrast. The bottom row zooms in on this region.
Figure A3.
 
Gratings corrupted by early and late noise with noise parameters as estimated in Results II. We show samples in two-dimensional projections of the stimulus space (top row) and the late representation (middle and bottom rows). The signal in the input representation corresponds to the brightest and the darkest locations of a 2 cpd grating for different values of luminance and contrasts. Here, red, green, and blue clusters correspond to progressively higher values of contrast [0.01, 0.15, 0.7]. Also, clusters progressively further away from the origin in the diagonal direction of the stimulus space correspond to images with higher average luminance, with values [15, 50, 100] cd/m2. The middle row shows the samples in the two-dimensional representation along the zero-frequency dimension (horizontal axis) and along a frequency of 2 cycles/deg (vertical axis). The rectangles in dashed style highlight the region of low-luminance gratings of different contrast. The bottom row zooms in on this region.
In this context, the clusters of the samples corrupted by the early noise in the input domain (top-left plot) display the properties of the assumed early Poisson noise: Note that (1) the width (or variance) of the clusters increases with the luminance, and (2) the ellipsoidal clusters are aligned with the axes because we assumed no correlation. Similarly, the late noise in the late representation (middle-right and bottom-right plots) also displays the Poisson properties: The variance increases with the average, and there is no correlation between their uncertainty. Interestingly, when the late noise is transformed back into the early representation (e.g., top-center plot), there is a strong correlation in the uncertainty of the different photoreceptors. The same is true in the case of the early noise in the late representation (its covariance is nondiagonal), although it is hard to see given the small scale of the early noise (middle-left and bottom-left plots). 
The interesting interaction between the Poisson nature of the noise sources and the nonlinearities of the model implies that in the late representation (where decisions are made), the different sources of noise (early or late) determine the performance in different conditions (detection or discrimination). Note that the saturating nonlinearity in the contrast response implies that the clusters of early noise get compressed for progressively bigger contrasts (see that the red cluster is wider than the blue in the bottom-left plot). On the contrary, the Poisson nature of the late noise implies that the uncertainty due to this noise for the high-contrast signals is bigger than the variability introduced for the responses to low-contrast gratings (blue cluster is wider than the red in the bottom-right plot). 
Figure A4.
 
Marginal PDFs of the early, late, and (early+late) noise sources in the inner representation. These PDFs are computed from the samples of the 2 cycles/deg sinusoidal gratings of different contrast (with average luminance of 15 cd/m2) shown in the bottom row of Figure A3. The marginals are taken in the direction of the sensor tuned to 2 cycles/deg (vertical axis in Figure A3).
Figure A4.
 
Marginal PDFs of the early, late, and (early+late) noise sources in the inner representation. These PDFs are computed from the samples of the 2 cycles/deg sinusoidal gratings of different contrast (with average luminance of 15 cd/m2) shown in the bottom row of Figure A3. The marginals are taken in the direction of the sensor tuned to 2 cycles/deg (vertical axis in Figure A3).
This is even more clearly represented in Figure A4, which shows the marginal PDFs for three cases in the bottom plots of Figure A3. Here the PDFs of the early and late noise are basically the same for low contrast (C = 0.01), but when the contrast is increased, the PDFs of the early noise are squeezed (solid curves) while the PDFs of the late noise widen (dashed curves). For low-contrast gratings, the total inner noise is basically determined only by early noise (and this effect would be even bigger for close-to-zero contrast used in actual detection experiments), while for high-contrast gratings, the inner noise is determined by late noise. 
This explains why the consideration of one noise source or the other is important to reproduce all experimental psychometric functions in Figure 7 and the different role of early and late noise in detection and discrimination. 
Appendix F: Psychophysical versus physiological estimates of early noise
The results and discussion in this appendix clarify why it is sensible to have one order of magnitude less noise in a psychophysical model rather than in a physiological model of the retina. 
Figure A5 (top panel) shows the standard deviation of our estimated early and late noise sources on top of a sinusoid for different average luminance and different contrasts. The equivalent ISETBio noise in cd/m2 is plotted on top for useful reference. Figure A5 (bottom panel) shows sample images with ISETBIO noise and with the inner noise estimated by us, both back in the spatial domain. 
Figure A5.
 
Comparison of the physiological (retinal) noise estimated in Appendix E following Brainard and Wandell (2020) and the psychophysical (early and late) estimations proposed here. Top panel: The standard deviation of the early and the late noises as a function of the luminance and contrast of a 4 cpd sinusoid. As expected from the Weber law, noise increases with luminance. Moreover, according with masking, the late noise increases with contrast, which is not the case for early noise. The deviation of the retinal noise (see Appendix E) is plotted in the same units as a reference. Bottom panel: An illustrative stimulus calibrated in luminance with the noise of both models inverted back into the image space. Luminance of the flat square is 30 cd/m2. Average luminance of the sinusoids is 60 cd/m2. Frequencies are 2, 4, and 8 cpd, and contrasts are 0.25, 0.75, and 0.9, respectively.
Figure A5.
 
Comparison of the physiological (retinal) noise estimated in Appendix E following Brainard and Wandell (2020) and the psychophysical (early and late) estimations proposed here. Top panel: The standard deviation of the early and the late noises as a function of the luminance and contrast of a 4 cpd sinusoid. As expected from the Weber law, noise increases with luminance. Moreover, according with masking, the late noise increases with contrast, which is not the case for early noise. The deviation of the retinal noise (see Appendix E) is plotted in the same units as a reference. Bottom panel: An illustrative stimulus calibrated in luminance with the noise of both models inverted back into the image space. Luminance of the flat square is 30 cd/m2. Average luminance of the sinusoids is 60 cd/m2. Frequencies are 2, 4, and 8 cpd, and contrasts are 0.25, 0.75, and 0.9, respectively.
If noise controls the discriminability, it should be just noticeable (or almost invisible) for the average observer. Clearly, that is not the case for the physiological noise in the retina: ISETBio noise is too big and clearly visible. This indicates the presence of downstream mechanisms to remove this uncertainty (motion compensation + evidence accumulation over time + spatial low-pass filtering...). Interestingly our noise estimate is almost invisible on top of the background pattern. In order to explore its nature and assess its early/late components, we show scaled versions of the noises on top of a flat background of 60 cd/m2. While the Poisson early noise (trivially) inherits the spatial structure of the luminance, the late noise reveals a contrast/frequency dependence due to the inner working of the nonlinear wavelet filters of the deterministic model. Note that more noise is allocated in the high-contrast regions. This is consistent with the results in Figure A3 (bottom right) and Figure A4 (right), which highlight the increase of the variance of late noise with contrast. Moreover, note also in Figure A5 that the frequency of the noise depends on the frequency of the background pattern because more active sensors are more affected by the late noise. A final factor that affects the magnitude of the late noise in the input domain is the inverse of the CSF that enhances the amplitude of the (less visible) high-frequency noise. 
Below we describe how we used ISETBIO (Cottaris et al., 2019; Cottaris et al., 2020) to get a reasonable physiological estimation of retinal noise. ISETBio allows the definition of a stimulus in physical units, for example, an achromatic stimulus (flat spectral radiance) of certain luminance in cd/m2 and subtending certain degrees in the visual field. The software incorporates accurate models of the human optics and of retina photoreceptors, so it computes both the blurred image at the retina and models the isomerization process so that the retinal image is transformed in noisy photocurrents at the L, M, and S sensors. This happens dynamically and incorporates fixation and microsaccade motion. 
One can use this accurate model to estimate the noise added to the input in cd/m2 units by doing two things: (1) modeling the relation between input luminance and brightness expressed in terms of cone photocurrents and (2) computing the noise in the brightness domain and transforming it back to luminance using the inverse of the luminance–brightness relation. In particular, we used the demo t_dynamicStimulusToPhotocurrent.m with default parameters. We just modified the function to have access to intermediate results or switch on/off the eye motion model when necessary, but we did not change the default choices. 
The estimation of the luminance–brightness relation is easy to do assuming that the brightness perception is driven by the sum of the responses of the L and M cones, as is usual in color vision models (Fairchild, 2013). By doing so, one can build a series of stimuli of different luminances; see the top row of Figure A6, and register the L+M responses in pA. In order to compare the spatial description of the stimuli and the photoreceptor mosaic, we spatially interpolated the L and M mosaics using linear interpolation (i.e., we introduced minimal spatial blur). In particular, we did not assume the wider spatial summation happening downstream in retinal ganglion cells or LGN cells. We used Gabors of 2 cpd with different average luminances and fixed contrast and put them through the model to empirically derive a luminance–photocurrent function. The responses integrated over short periods of time (e.g., 200 ms) display a substantial amount of noise, but the average value of L+M in pA can be used to derive a simple input–output curve that can be interpreted as a luminance–brightness response. The nonlinear plots at the right of Figure A6 display such curves (the top one in pA and the bottom one rescaled to have values in the range of the input luminance). We fitted a conventional exponential function to transform the input luminance into this brightness: See the parameters of the fit in the bottom-right figure. This function can be inverted in order to transform photocurrents into luminance values. This inverse is the one that allows us to represent the noisy responses obtained using short exposure times (200 ms) in a luminance scale (as, for instance, the second row of responses to the stimuli shown in the first row). 
We explicitly checked that the ocular motion has no effect in this estimation. Consistently with the reports in Kelly (1979), the speed induced by the ocular motion in ISETBio is about 0.1 deg/s, which, in 200 ms, just implies a spatial displacement of 0.02 degrees, which is small to induce a major luminance change in the 2 cpd Gabors used in our experiment. Therefore, the luminance–photocurrent curves do not change if we switch off the ocular motion in the model. 
We estimated the noise in the brightness domain by subtracting this noisy L+M signal integrated over a short period of time (200 ms) from the equivalent signal averaged over a large exposure time to “artificially” remove all the noise in the response. In this second long-exposure-time case (where we integrated over 8 seconds), we switch off the ocular motion in the model so that the noise removal does not come from changes in the spatial structure of the input. The temporally denoised responses in luminance units can be seen in the third row of Figure A6. The subtraction of the clean response (third row) from the noisy responses (second row) leads to the noise realizations shown in the fourth row of the figure. In this row that displays the physiological noise for different input luminance, it is obvious that the variance of this noise increases with the input luminance. 
Figure A6.
 
Procedure to estimate retinal noise in cd/m2 using ISETBio. Top row represents a series of achromatic Gabors of 2 cpd of fixed contrast with controlled luminance. Top-right plot displays the nonlinear relation between input luminance and L+M photocurrents in pA (with 200 ms exposure time). Bottom-right plot displays the same relation where the vertical axis has been rescaled to have values in the luminance range. The plot also displays the parameters of a exponential luminance–brightness fit. The responses of the ISETBio retina corresponding to the stimuli in the first row, using either a short exposure time (hence noisy) and averaged over a long exposure time (hence denoised), are shown in the second and third rows, respectively. The luminance–brightness fit has been used to express the responses of the second and third rows (given in pA by ISETBio) in luminance units (cd/m2). The subtraction of the second and third rows leads to the noises estimated in the fourth row.
Figure A6.
 
Procedure to estimate retinal noise in cd/m2 using ISETBio. Top row represents a series of achromatic Gabors of 2 cpd of fixed contrast with controlled luminance. Top-right plot displays the nonlinear relation between input luminance and L+M photocurrents in pA (with 200 ms exposure time). Bottom-right plot displays the same relation where the vertical axis has been rescaled to have values in the luminance range. The plot also displays the parameters of a exponential luminance–brightness fit. The responses of the ISETBio retina corresponding to the stimuli in the first row, using either a short exposure time (hence noisy) and averaged over a long exposure time (hence denoised), are shown in the second and third rows, respectively. The luminance–brightness fit has been used to express the responses of the second and third rows (given in pA by ISETBio) in luminance units (cd/m2). The subtraction of the second and third rows leads to the noises estimated in the fourth row.
Measuring the standard deviation of the noise in this series of realizations in the fourth row, one can obtain the result shown in Figure A5: According to ISETBio, the standard deviation of the noise at the retina depends on the luminance L as σretina ≈ 0.33 * L0.57  (cd/m2). 
Appendix G: Readout metric and noise at the decision stage
When there is noise at the decision stage or when the readout is not Euclidean, the proposed Equation 6 changes, as outlined below, but the relevance of the noise at the input to determine the scale of the different components of the noise mentioned above is still the same. 
On the one hand, univariate noise at the decision stage could be introduced at the distance variable: \(D_{\rm {th}}^{\rm {noisy}} = D_{\rm {th}} + n_{\rm {decision}}\). But this is equivalent to an extra expansion of the uncertainty in the direction of stimulus modification, whatever this direction is (i.e., an extra isotropic noise). This isotropic noise is equivalent to including an extra diagonal term in the covariance of the inner noise: \(\Sigma _{\mathcal {I}} + k^2 \sigma ^2 I\). Again, the scale factor k of this isotropic noise could not be obtained with no external or early noise. 
On the other hand, a non-Euclidean readout of the variations of the response according to some readout metric matrix R implies that \(D^2 = \Delta {\boldsymbol y}^\top \cdot R \cdot \Delta {\boldsymbol y}\). With R symmetric, it can be diagonalized by orthonormal transforms: R = B · λ · B. Therefore, this non-Euclidean metric in the response domain is equivalent to including an extra transform to a domain \({\boldsymbol y^{\prime }} = \lambda ^{\frac{1}{2}} \cdot B^\top \cdot {\boldsymbol y}\) where the readout is Euclidean. The role of this extra transform (for the metric matrix) is similar in spirit to a whitening transform (for the covariance matrix). In this new domain, the covariance of the inner noise would be \(\Sigma _{\mathcal {I}}^{y^{\prime }} = \lambda ^{\frac{1}{2}} \cdot B \cdot \Sigma _{\mathcal {I}} \cdot B^\top \cdot \lambda ^{\frac{1}{2}}\). Therefore, Equation 5 can be generalized to  
\begin{eqnarray} D_{\rm {th}}^2 = \Delta S^\top \cdot \left( \lambda ^{\frac{1}{2}} \cdot B \cdot \Sigma _{\mathcal {I}} \cdot B^\top \cdot \lambda ^{\frac{1}{2}} + k^2 \sigma ^2 I \right)^{-1} \cdot \Delta S \nonumber\\ \end{eqnarray}
(A18)
where \(\Sigma _{\mathcal {I}}\) has the same expression as the one in Equation 6. As a consequence, all the variables can be found through optimization in the same way, and all the discussion about the role of the external noise is the same: Even if both factors (non-Euclidean readout and noise in the decision) are taken together, the scale of the late noise and the decision noise would remain unknown if there is no external noise at the input. 
Figure 1.
 
Illustration of the vision model. The input image, \({\boldsymbol x}\), is corrupted by early noise, \({\boldsymbol n_e}\), leading to an early (noisy) representation. In this example, we assume Poisson noise, and its amplitude increases with the luminance of the input (see the noise in the green circle with respect to the noise in the red circle). Then the signal is analyzed by a set of linear wavelet-like oriented filters (tuned to 0, 3, 6, 12, and 24 cpd), with responses \({\boldsymbol w}\). In the figure, we use the classical representation of subbands in the wavelet literature (Simoncelli, Freeman, Adelson, & Heeger, 1992). The 24-cpd subband is not represented for clarity. Then, the responses are multiplied by the CSF weights in the wavelet domain (Malo, Pons, Felipe, & Artigas, 1997) (\(\mathbb {D}_{\mathit {CSF}}\)), shown in the lower left inset. Here the lighter gray corresponds to bigger weights. The represented CSF shows the bandpass behavior and the oblique effect. CSF weighting is apparent in the attenuation of high-frequency subbands in \({\boldsymbol w^{\prime }}\): See how the energy in responses in the solid blue circle is reduced in the solid yellow circle. Then, the responses undergo a fixed saturating nonlinearity, m(·) (Legge, 1981) that preserves the relative scale of each subband (Martinez, Bertalmío, & Malo, 2019; Malo, Esteve-Taboada, & Bertalmío, 2024). This nonlinearity takes the low-amplitude responses (e.g., in the dashed yellow circle) and leads to enhanced responses (in the dashed blue circle). Finally, late noise, \({\boldsymbol n_l}\), is added to the responses in this late representation. The Poisson nature of the late noise is apparent in that its amplitude in all subbands is larger in the spatial region with larger contrast and hence larger response of texture detectors (see larger noise at the right, e.g., in dashed green circle, and less noise at the left, dashed red circle). The amplitude of the early and late noises has been scaled for clarity (×40 and ×4, respectively).
Figure 1.
 
Illustration of the vision model. The input image, \({\boldsymbol x}\), is corrupted by early noise, \({\boldsymbol n_e}\), leading to an early (noisy) representation. In this example, we assume Poisson noise, and its amplitude increases with the luminance of the input (see the noise in the green circle with respect to the noise in the red circle). Then the signal is analyzed by a set of linear wavelet-like oriented filters (tuned to 0, 3, 6, 12, and 24 cpd), with responses \({\boldsymbol w}\). In the figure, we use the classical representation of subbands in the wavelet literature (Simoncelli, Freeman, Adelson, & Heeger, 1992). The 24-cpd subband is not represented for clarity. Then, the responses are multiplied by the CSF weights in the wavelet domain (Malo, Pons, Felipe, & Artigas, 1997) (\(\mathbb {D}_{\mathit {CSF}}\)), shown in the lower left inset. Here the lighter gray corresponds to bigger weights. The represented CSF shows the bandpass behavior and the oblique effect. CSF weighting is apparent in the attenuation of high-frequency subbands in \({\boldsymbol w^{\prime }}\): See how the energy in responses in the solid blue circle is reduced in the solid yellow circle. Then, the responses undergo a fixed saturating nonlinearity, m(·) (Legge, 1981) that preserves the relative scale of each subband (Martinez, Bertalmío, & Malo, 2019; Malo, Esteve-Taboada, & Bertalmío, 2024). This nonlinearity takes the low-amplitude responses (e.g., in the dashed yellow circle) and leads to enhanced responses (in the dashed blue circle). Finally, late noise, \({\boldsymbol n_l}\), is added to the responses in this late representation. The Poisson nature of the late noise is apparent in that its amplitude in all subbands is larger in the spatial region with larger contrast and hence larger response of texture detectors (see larger noise at the right, e.g., in dashed green circle, and less noise at the left, dashed red circle). The amplitude of the early and late noises has been scaled for clarity (×40 and ×4, respectively).
Figure 2.
 
Modeling framework: propagation of stimulus and inner noise through the system in the absence of external noise. (A) In this toy example, input stimuli consist of two pixels of varying luminance (axes) shown at their corresponding positions in the two-dimensional coordinate system. (B) Early stimulus representation with added early noise (blue). Ellipses indicate the magnitude of variation resulting from the added noise. (C) The early representation goes through the fixed, deterministic vision model (S), which results in nonlinear transformations of the output. The x-axis is now nonlinear brightness and the y-axis nonlinear contrast. Note the different position and orientation of the blue ellipses. Then, late noise is added (red ellipses at the same positions as the stimulus representations). (D) Late representation of stimulus and inner (early + late) noise, which limits discrimination performance. The standard deviations and Fano factors control the size of the ellipsoids. The specific values were chosen to be illustrative, taking into account that in this example, the range of luminances is normalized to [0,1]. The interested reader can access and edit the expressions of the different noise sources of this two-pixel model available online in the aforementioned website.
Figure 2.
 
Modeling framework: propagation of stimulus and inner noise through the system in the absence of external noise. (A) In this toy example, input stimuli consist of two pixels of varying luminance (axes) shown at their corresponding positions in the two-dimensional coordinate system. (B) Early stimulus representation with added early noise (blue). Ellipses indicate the magnitude of variation resulting from the added noise. (C) The early representation goes through the fixed, deterministic vision model (S), which results in nonlinear transformations of the output. The x-axis is now nonlinear brightness and the y-axis nonlinear contrast. Note the different position and orientation of the blue ellipses. Then, late noise is added (red ellipses at the same positions as the stimulus representations). (D) Late representation of stimulus and inner (early + late) noise, which limits discrimination performance. The standard deviations and Fano factors control the size of the ellipsoids. The specific values were chosen to be illustrative, taking into account that in this example, the range of luminances is normalized to [0,1]. The interested reader can access and edit the expressions of the different noise sources of this two-pixel model available online in the aforementioned website.
Figure 3.
 
Modeling framework: propagation of stimulus and noise through the system in the presence of external noise. The figure is organized analogous to Figure 1. (A) External noise is applied to the luminances of each pixel so that the values of the two luminances vary around their mean (green ellipses). (B) Noise at early representation (cyan ellipses) comes from the superposition of external noise (green ellipses) with early noise (blue ellipses). (C) The early representation goes through the vision model, S, as in the previous figure. Then, late noise is added (red ellipses). (D) Late representation of noisy stimuli, with external, early, and late contributions.
Figure 3.
 
Modeling framework: propagation of stimulus and noise through the system in the presence of external noise. The figure is organized analogous to Figure 1. (A) External noise is applied to the luminances of each pixel so that the values of the two luminances vary around their mean (green ellipses). (B) Noise at early representation (cyan ellipses) comes from the superposition of external noise (green ellipses) with early noise (blue ellipses). (C) The early representation goes through the vision model, S, as in the previous figure. Then, late noise is added (red ellipses). (D) Late representation of noisy stimuli, with external, early, and late contributions.
Figure 4.
 
Late representation with and without external noise. Same as in Figures 2D and 3D, but visualized together for easy comparison. Differences imply that controlled variations in the external noise induce different variations of the thresholds for different pedestals and directions.
Figure 4.
 
Late representation with and without external noise. Same as in Figures 2D and 3D, but visualized together for easy comparison. Differences imply that controlled variations in the external noise induce different variations of the thresholds for different pedestals and directions.
Figure 5.
 
Early and late noise parameters from threshold data. (Left) Correlation between the theoretical distance (based on the noise) and the experimental distance (based on the thresholds) for various late (x-axis) and early (y-axis) noise Fano factors. The optimal parameters (red dot) that maximize the correlation were βe = 0.023 and βl = 1.52. (Right) Scatterplot for the best (maximum) correlation, for the data of both observers and conditions (with and without external noise) considered together. In this optimal case, the Pearson correlation is 0.65. The correlation drops to 0.64 and 0.61 if one neglects either early noise or late noise, respectively. As βl is more important to explain the data (has bigger impact in the correlation), βe is less constrained and hence it is more uncertain.
Figure 5.
 
Early and late noise parameters from threshold data. (Left) Correlation between the theoretical distance (based on the noise) and the experimental distance (based on the thresholds) for various late (x-axis) and early (y-axis) noise Fano factors. The optimal parameters (red dot) that maximize the correlation were βe = 0.023 and βl = 1.52. (Right) Scatterplot for the best (maximum) correlation, for the data of both observers and conditions (with and without external noise) considered together. In this optimal case, the Pearson correlation is 0.65. The correlation drops to 0.64 and 0.61 if one neglects either early noise or late noise, respectively. As βl is more important to explain the data (has bigger impact in the correlation), βe is less constrained and hence it is more uncertain.
Figure 6.
 
Psychometric functions as a function of the inner noise projected back into the stimulus space (Left) Two-pixel example (as in Figures 2A and 3A) showing samples of noisy responses of nine stimuli of different luminance (along the diagonal direction) and different contrast (along the perpendicular-to-the-diagonal direction). For two of them, we represent departures from the average \({\boldsymbol x}\) in a specific direction \(\Delta {\boldsymbol x}\) with black and pink lines. The sigmoids in green and brown represent the CDFs of the noise projected in the direction of variation of the signal. (Right) Probability density functions (PDFs, top-right) and cumulative density functions (CDF, bottom-right) for the two considered clusters. Note how the CDF in brown squeezes when expressing the abscissa in contrast units (by dividing the variation by the average luminance).
Figure 6.
 
Psychometric functions as a function of the inner noise projected back into the stimulus space (Left) Two-pixel example (as in Figures 2A and 3A) showing samples of noisy responses of nine stimuli of different luminance (along the diagonal direction) and different contrast (along the perpendicular-to-the-diagonal direction). For two of them, we represent departures from the average \({\boldsymbol x}\) in a specific direction \(\Delta {\boldsymbol x}\) with black and pink lines. The sigmoids in green and brown represent the CDFs of the noise projected in the direction of variation of the signal. (Right) Probability density functions (PDFs, top-right) and cumulative density functions (CDF, bottom-right) for the two considered clusters. Note how the CDF in brown squeezes when expressing the abscissa in contrast units (by dividing the variation by the average luminance).
Figure 7.
 
Early and late noise parameters using full psychometric functions. The color surfaces show the average RMSE error between experimental and predicted psychometric functions, for varying noise parameters for the late (x-axis) and early (y-axis) noises. In each case, the optimum is given by the minimum in the error surface (marked in red). Please not that the optimum here is a low RMSE error (dark blue), whereas in the previous section, the optimum was a maximal correlation (light yellow). For the observers GBH and CMB, the optima, highlighted in red, are located at (βe = 0.005, βl = 7.69) and (βe = 0.01, βl = 7.69), respectively. The plots at the top represent the experimental data of the psychometric functions (blue dots) and different optimal predictions considering both early and late Poisson noise (in black), only late noise (in red), and only early noise (in green).
Figure 7.
 
Early and late noise parameters using full psychometric functions. The color surfaces show the average RMSE error between experimental and predicted psychometric functions, for varying noise parameters for the late (x-axis) and early (y-axis) noises. In each case, the optimum is given by the minimum in the error surface (marked in red). Please not that the optimum here is a low RMSE error (dark blue), whereas in the previous section, the optimum was a maximal correlation (light yellow). For the observers GBH and CMB, the optima, highlighted in red, are located at (βe = 0.005, βl = 7.69) and (βe = 0.01, βl = 7.69), respectively. The plots at the top represent the experimental data of the psychometric functions (blue dots) and different optimal predictions considering both early and late Poisson noise (in black), only late noise (in red), and only early noise (in green).
Figure A1.
 
Equivalence between parametric and nonparametric distances. Plots (a) and (b) show the 4 × 5 locations (4 contrasts times 5 luminances) considered in the space of two-pixel images and in the inner representation of the illustrative model introduced in Model response in the presence of noise: a two-pixel example. The noisy images and responses are represented by the random samples in blue. Plots (c) and (d) magnify the highlighted rectangles in green and orange. In each location in the inner representation, we computed distances from the central point in the directions indicated by the black squares. The plot (e) represents the distances from the central point in plot (d) in the different directions computed through the parametric and nonparametric methods, Equations 6 and 8, in blue and red respectively. The plots in panel (f) show the same result at the other considered points all over the image space.
Figure A1.
 
Equivalence between parametric and nonparametric distances. Plots (a) and (b) show the 4 × 5 locations (4 contrasts times 5 luminances) considered in the space of two-pixel images and in the inner representation of the illustrative model introduced in Model response in the presence of noise: a two-pixel example. The noisy images and responses are represented by the random samples in blue. Plots (c) and (d) magnify the highlighted rectangles in green and orange. In each location in the inner representation, we computed distances from the central point in the directions indicated by the black squares. The plot (e) represents the distances from the central point in plot (d) in the different directions computed through the parametric and nonparametric methods, Equations 6 and 8, in blue and red respectively. The plots in panel (f) show the same result at the other considered points all over the image space.
Figure A2.
 
Early and late noise parameters from threshold data for each individual observer. The top plots represent the best (maximum correlation) linear fits between the theoretical distance (that depends on the noise sources) and the experimental distance (based on the thresholds) for the considered observers. The surfaces at the bottom display the correlation between the theoretical distance and the experimental distance depending on the noise parameters. The optimal noise parameters, here highlighted in red, are those that maximize the correlation for each observer (βe = 0.04, βl = 0.91) for GBH and (βe = 0.02, βl = 17.5) for CMB.
Figure A2.
 
Early and late noise parameters from threshold data for each individual observer. The top plots represent the best (maximum correlation) linear fits between the theoretical distance (that depends on the noise sources) and the experimental distance (based on the thresholds) for the considered observers. The surfaces at the bottom display the correlation between the theoretical distance and the experimental distance depending on the noise parameters. The optimal noise parameters, here highlighted in red, are those that maximize the correlation for each observer (βe = 0.04, βl = 0.91) for GBH and (βe = 0.02, βl = 17.5) for CMB.
Figure A3.
 
Gratings corrupted by early and late noise with noise parameters as estimated in Results II. We show samples in two-dimensional projections of the stimulus space (top row) and the late representation (middle and bottom rows). The signal in the input representation corresponds to the brightest and the darkest locations of a 2 cpd grating for different values of luminance and contrasts. Here, red, green, and blue clusters correspond to progressively higher values of contrast [0.01, 0.15, 0.7]. Also, clusters progressively further away from the origin in the diagonal direction of the stimulus space correspond to images with higher average luminance, with values [15, 50, 100] cd/m2. The middle row shows the samples in the two-dimensional representation along the zero-frequency dimension (horizontal axis) and along a frequency of 2 cycles/deg (vertical axis). The rectangles in dashed style highlight the region of low-luminance gratings of different contrast. The bottom row zooms in on this region.
Figure A3.
 
Gratings corrupted by early and late noise with noise parameters as estimated in Results II. We show samples in two-dimensional projections of the stimulus space (top row) and the late representation (middle and bottom rows). The signal in the input representation corresponds to the brightest and the darkest locations of a 2 cpd grating for different values of luminance and contrasts. Here, red, green, and blue clusters correspond to progressively higher values of contrast [0.01, 0.15, 0.7]. Also, clusters progressively further away from the origin in the diagonal direction of the stimulus space correspond to images with higher average luminance, with values [15, 50, 100] cd/m2. The middle row shows the samples in the two-dimensional representation along the zero-frequency dimension (horizontal axis) and along a frequency of 2 cycles/deg (vertical axis). The rectangles in dashed style highlight the region of low-luminance gratings of different contrast. The bottom row zooms in on this region.
Figure A4.
 
Marginal PDFs of the early, late, and (early+late) noise sources in the inner representation. These PDFs are computed from the samples of the 2 cycles/deg sinusoidal gratings of different contrast (with average luminance of 15 cd/m2) shown in the bottom row of Figure A3. The marginals are taken in the direction of the sensor tuned to 2 cycles/deg (vertical axis in Figure A3).
Figure A4.
 
Marginal PDFs of the early, late, and (early+late) noise sources in the inner representation. These PDFs are computed from the samples of the 2 cycles/deg sinusoidal gratings of different contrast (with average luminance of 15 cd/m2) shown in the bottom row of Figure A3. The marginals are taken in the direction of the sensor tuned to 2 cycles/deg (vertical axis in Figure A3).
Figure A5.
 
Comparison of the physiological (retinal) noise estimated in Appendix E following Brainard and Wandell (2020) and the psychophysical (early and late) estimations proposed here. Top panel: The standard deviation of the early and the late noises as a function of the luminance and contrast of a 4 cpd sinusoid. As expected from the Weber law, noise increases with luminance. Moreover, according with masking, the late noise increases with contrast, which is not the case for early noise. The deviation of the retinal noise (see Appendix E) is plotted in the same units as a reference. Bottom panel: An illustrative stimulus calibrated in luminance with the noise of both models inverted back into the image space. Luminance of the flat square is 30 cd/m2. Average luminance of the sinusoids is 60 cd/m2. Frequencies are 2, 4, and 8 cpd, and contrasts are 0.25, 0.75, and 0.9, respectively.
Figure A5.
 
Comparison of the physiological (retinal) noise estimated in Appendix E following Brainard and Wandell (2020) and the psychophysical (early and late) estimations proposed here. Top panel: The standard deviation of the early and the late noises as a function of the luminance and contrast of a 4 cpd sinusoid. As expected from the Weber law, noise increases with luminance. Moreover, according with masking, the late noise increases with contrast, which is not the case for early noise. The deviation of the retinal noise (see Appendix E) is plotted in the same units as a reference. Bottom panel: An illustrative stimulus calibrated in luminance with the noise of both models inverted back into the image space. Luminance of the flat square is 30 cd/m2. Average luminance of the sinusoids is 60 cd/m2. Frequencies are 2, 4, and 8 cpd, and contrasts are 0.25, 0.75, and 0.9, respectively.
Figure A6.
 
Procedure to estimate retinal noise in cd/m2 using ISETBio. Top row represents a series of achromatic Gabors of 2 cpd of fixed contrast with controlled luminance. Top-right plot displays the nonlinear relation between input luminance and L+M photocurrents in pA (with 200 ms exposure time). Bottom-right plot displays the same relation where the vertical axis has been rescaled to have values in the luminance range. The plot also displays the parameters of a exponential luminance–brightness fit. The responses of the ISETBio retina corresponding to the stimuli in the first row, using either a short exposure time (hence noisy) and averaged over a long exposure time (hence denoised), are shown in the second and third rows, respectively. The luminance–brightness fit has been used to express the responses of the second and third rows (given in pA by ISETBio) in luminance units (cd/m2). The subtraction of the second and third rows leads to the noises estimated in the fourth row.
Figure A6.
 
Procedure to estimate retinal noise in cd/m2 using ISETBio. Top row represents a series of achromatic Gabors of 2 cpd of fixed contrast with controlled luminance. Top-right plot displays the nonlinear relation between input luminance and L+M photocurrents in pA (with 200 ms exposure time). Bottom-right plot displays the same relation where the vertical axis has been rescaled to have values in the luminance range. The plot also displays the parameters of a exponential luminance–brightness fit. The responses of the ISETBio retina corresponding to the stimuli in the first row, using either a short exposure time (hence noisy) and averaged over a long exposure time (hence denoised), are shown in the second and third rows, respectively. The luminance–brightness fit has been used to express the responses of the second and third rows (given in pA by ISETBio) in luminance units (cd/m2). The subtraction of the second and third rows leads to the noises estimated in the fourth row.
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×