Free
Article  |   May 2015
Connecting psychophysical performance to neuronal response properties I: Discrimination of suprathreshold stimuli
Author Affiliations
Journal of Vision May 2015, Vol.15, 8. doi:10.1167/15.6.8
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to Subscribers Only
      Sign In or Create an Account ×
    • Get Citation

      Keith A. May, Joshua A. Solomon; Connecting psychophysical performance to neuronal response properties I: Discrimination of suprathreshold stimuli. Journal of Vision 2015;15(6):8. doi: 10.1167/15.6.8.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

One of the major goals of sensory neuroscience is to understand how an organism's perceptual abilities relate to the underlying physiology. To this end, we derived equations to estimate the best possible psychophysical discrimination performance, given the properties of the neurons carrying the sensory code.We set up a generic sensory coding model with neurons characterized by their tuning function to the stimulus and the random process that generates spikes. The tuning function was a Gaussian function or a sigmoid (Naka-Rushton) function.Spikes were generated using Poisson spiking processes whose rates were modulated by a multiplicative, gamma-distributed gain signal that was shared between neurons. This doubly stochastic process generates realistic levels of neuronal variability and a realistic correlation structure within the population. Using Fisher information as a close approximation of the model's decoding precision, we derived equations to predict the model's discrimination performance from the neuronal parameters. We then verified the accuracy of our equations using Monte Carlo simulations. Our work has two major benefits. Firstly, we can quickly calculate the performance of physiologically plausible population-coding models by evaluating simple equations, which makes it easy to fit the model to psychophysical data. Secondly, the equations revealed some remarkably straightforward relationships between psychophysical discrimination performance and the parameters of the neuronal population, giving deep insights into the relationships between an organism's perceptual abilities and the properties of the neurons on which those abilities depend.

Introduction
The key motivation behind this work is to facilitate the construction of physiologically plausible models of psychophysical performance that are mathematically tractable. Pioneering work in the 1960s and 1970s led to a standard type of model of early vision consisting of a bank of independent channels selective for different stimulus feature values (Graham, 1989). Within each channel, the signal strength increased (often nonlinearly) with stimulus contrast. Performance was limited by the addition of (usually Gaussian) noise somewhere in the model. This kind of model has the advantage of mathematical simplicity: The relationships between the model's parameters and its psychophysical performance can be described by simple mathematical expressions, allowing us to (a) easily fit the model to data and (b) understand why the model behaves as it does. Unfortunately, it has become increasingly clear that the connection between real neurons and the psychophysical channels in this kind of model is tenuous at best (Goris, Putzeys, Wagemans, & Wichmann, 2013). Recent modeling work has shown that more physiologically plausible population-coding models of early vision can provide a unified account of many diverse psychophysical phenomena while at the same time providing a stronger connection to physiology than has traditionally been the case (Goris et al., 2013). However, the complexity of this kind of model can sometimes obscure the relationships between properties of the model and characteristics of performance, making it difficult to understand the model's behavior. In this article, we describe sensory coding models that have a close connection to physiology but are nevertheless simple enough to be described by equations that reveal straightforward relationships between the neuronal parameters and psychophysical performance. 
We assume a generic sensory coding model in which the observer monitors a set of spiking neurons and makes a maximum-likelihood estimate of the stimulus from the set of spike counts generated by those neurons. Thus, we find the best possible decoding performance that the encoding scheme allows. The reciprocal of the variance of the estimated stimulus values is called the precision. The precision determines the expected performance on perceptual tasks, and so, if we can estimate the precision from the neuronal parameters, we can estimate task performance. 
For an unbiased maximum-likelihood decoder, the precision cannot exceed a quantity called the Fisher information. This limit is known as the Cramér-Rao bound (Rao, 1945; Cramér, 1946; see Dayan & Abbott, 2001, pp. 120–121). For a reasonable spike rate or number of neurons, the precision of the generic sensory coding model that we describe in this article is actually very close to the Cramér-Rao bound, so we can use the Fisher information as an approximation of the decoding precision. The Fisher information is calculated from the properties of the neurons, and therefore forms a bridge between the neuronal properties and perceptual performance. 
For each model parameterization that we studied, we set up a Monte Carlo simulation which showed that the model's performance is very close to that predicted from the Fisher information. We used two measures of model performance: One was the precision, defined already, and the other was the discrimination threshold in a simulated two-alternative forced-choice (2AFC) discrimination experiment. Both were very well predicted from the Fisher information. 
Our work has two main benefits. One is a practical one: We can quickly calculate the performance of realistic population-coding models by evaluating simple equations rather than using slow and laborious Monte Carlo simulations, which can be difficult to program. This allows us to quickly fit the model to data. The second benefit is that the equations give a deep insight into the relationships between psychophysical performance and the properties of the neurons that carry the sensory code, showing us why these relationships occur and how generally they apply. 
Our equations give the upper limit on the possible psychophysical performance level, given the neuronal parameters. There is plenty of evidence that human perceptual performance falls far short of the maximum level that is theoretically possible (Banks, Geisler, & Bennett, 1987; Geisler, 1989; Pelli, 1990; Pelli & Farell, 1999; Dakin, 2001; Pelli, Burns, Farell, & Moore-Page, 2006; Morgan, Chubb, & Solomon, 2008; Solomon, 2010). Nevertheless, there are several reasons why it is useful to know this theoretical limit, and we outline these reasons in remainder of this Introduction
Firstly, knowing the theoretical maximum performance allows us to compare different encoding schemes. By “encoding scheme,” we mean the population of neurons that are used to encode the stimulus. It is of great interest to know why neurons have the properties that they do, and a lot of progress has been made by showing that observed physiological properties reflect encoding strategies that are optimal in some way (Laughlin, 1981; Srinivasan, Laughlin, & Dubs, 1982; Atick & Redlich, 1990, 1992; Atick, 1992; Atick, Li, & Redlich, 1992, 1993; Tadmor & Tolhurst, 2000; Zhaoping, 2014, chapter 3). Given a particular encoding scheme, the psychophysical performance will depend on how appropriate the decoding algorithm is for that particular encoding scheme (Beck, Ma, Pitkow, Latham, & Pouget, 2012). A fair way to compare two encoding schemes is to compare the maximum performance that each allows. 
A second reason for wanting to know the maximum performance level that a set of neurons can support is that this knowledge allows us to perform sequential ideal-observer analysis. This approach, developed by Geisler (1989), calculates the efficiency with which the information at each stage is processed. Roughly speaking, efficiency is the proportion of the available information that the observer appears to use. If we have a sufficiently good model of processing up to a particular point in the processing stream, we can construct an ideal observer for processing the information known to exist at that point and then compare the ideal observer's performance against that of a real observer. An efficiency of 1 would mean that the real observer's performance matched that of the ideal observer, so no further information was lost beyond that point in the processing stream. We can proceed like this from very early stages, e.g., the optics of the eye, through to later stages, seeing at each stage what proportion of the available information is lost by later stages of processing. To perform this kind of analysis, we need to be able to calculate the best possible performance allowed by the information at each stage. 
Sometimes, it is not necessary to calculate the absolute efficiency. A third reason why it is useful to know the maximum performance theoretically possible is that it can explain patterns of performance. If we assume that, after a certain point in the processing stream, efficiency is constant across different stimulus values, then we can predict the detection or discrimination threshold up to a multiplicative factor. This kind of approach has been used to explain observed patterns of psychophysical performance, such as the shapes of the contrast sensitivity function and wavelength discrimination function (Banks et al., 1987; Geisler, 1989; Zhaoping, Geisler, & May, 2011). If we know the proportions of neurons of different types at a particular point in the processing stream, we can calculate the decoding precision up to a multiplicative factor; assuming constant efficiency of processing beyond that point, this gives us the expected variance of the decoded stimulus values (and hence discrimination thresholds) up to a multiplicative factor. As we will show, the neuronal parameters (or functions of the neuronal parameters) have multiplicative effects on the decoding precision and thus will have the same effect on the ideal decoder's performance as they will on the performance of the decoder with constant inefficiency. 
Many of the derivations and technical details are included in appendices in the supplementary material. Each supplementary appendix is labeled with a letter. Equations and figures in an appendix are labeled with the appendix's letter, followed by a dot, followed by the equation or figure number within that appendix. Supplementary Appendix A provides a list of the main symbols used in the text, and their meanings. 
The sensory coding model
Throughout this article, we represent random variables using uppercase letters and their values on particular trials1 using corresponding lowercase letters. Thus X is a random variable representing the stimulus level, and its value is x. R is a random variable representing a neuron's mean number of spikes, and its value is r(x), the output of the neuron's tuning function, which gives the neuron's mean spike count for a given stimulus x (note that the tuning function's output is often measured in spikes per second, but we find it more convenient to use units of spikes, without making assumptions about the time period over which the neuron's output is integrated; this is equivalent to measuring the output in spikes per unit of time using units of time scaled so that the spike integration period is one unit). N is a random variable representing the spike count of a neuron, i.e., the number of spikes produced by the neuron, and n is its value on a particular trial. Because we will often be dealing with populations of neurons, we use bold letters N and n to represent vectors holding the spike counts of all the neurons in the population. N is a random variable representing the population response, and n is its value on a particular trial. 
We assume a generic sensory coding model that has three elements: (a) a tuning function r(x) for each neuron, which specifies the neuron's mean spike count for a given stimulus x; (b) a random process that generates spikes at the given rate; and (c) a method of decoding the spike counts of the neurons to give an estimate of the stimulus. We now describe each of these processes in turn. 
The tuning function
In this article, we consider two different tuning functions: the Naka-Rushton function and the Gaussian function. Both have parameters r0, rmax, z, q, and b, which serve the same or analogous purposes in the two functions. 
The Naka-Rushton tuning function
For visual stimulus contrast, the tuning function is called the contrast-response function. It usually has a sigmoidal shape that is well described by the Naka-Rushton function, also known as the hyperbolic ratio function (Naka & Rushton, 1966; Albrecht & Hamilton, 1982):  The variable c is the contrast in linear (e.g., Michelson) units, r0 is the spontaneous firing rate in response to zero contrast, and rmax is the asymptotic increment from r0 as contrast increases. The term Display FormulaImage not available , called the semisaturation contrast, is the contrast for which the mean response exceeds r0 by rmax/2. We use a left subscript on r(·) to indicate the form of the tuning function, in this case “N-R” for Naka-Rushton.  
Examples of the Naka-Rushton function are plotted in Figure 1A. Some readers may not think of the Naka-Rushton function as a tuning function, but we use the latter term in its most general sense, to mean the function that gives the mean response for a particular stimulus value. 
Figure 1
 
Tuning functions. (A) Three Naka-Rushton functions, with q equal to 1, 2, and 4. Each function has c1/2 = 0.1. The value of r(c1/2) falls halfway between r0 and r0 + rmax. (B) The same three Naka-Rushton functions as in (A), plotted as functions of log10 contrast. The log semisaturation contrast is given by z = −1. (C) Three Gaussian tuning functions, with z = −1 and q equal to 1, 2, and 4.
Figure 1
 
Tuning functions. (A) Three Naka-Rushton functions, with q equal to 1, 2, and 4. Each function has c1/2 = 0.1. The value of r(c1/2) falls halfway between r0 and r0 + rmax. (B) The same three Naka-Rushton functions as in (A), plotted as functions of log10 contrast. The log semisaturation contrast is given by z = −1. (C) Three Gaussian tuning functions, with z = −1 and q equal to 1, 2, and 4.
For our purposes, it is convenient to represent the contrast in units of log Michelson contrast, so the stimulus value x is given by  which gives  Using Equation 3 to substitute for c in Equation 1, we can re-express the Naka-Rushton function as a function of log contrast x:  where    
Figure 1B plots the Naka-Rushton function as a function of log contrast. On the log contrast scale, the gradient of the Naka-Rushton function peaks at a log contrast of x = z. In what follows, whenever we use the term “contrast” without specifying the units, we mean log Michelson contrast. In all our simulations, we used log to base 10 (i.e., b = 10), but our equations are derived for any b
The Gaussian tuning function
We will use z to represent the stimulus value corresponding to the center (peak) of the Gaussian tuning function. This is analogous to the semisaturation contrast z, which is at the center (peak of gradient) of the Naka-Rushton contrast-response function when expressed in terms of log contrast. The Gaussian tuning function is given by  where x is a value along the (unspecified) stimulus dimension, rmax is the maximum increment from the spontaneous firing rate r0, and q is a tuning sharpness parameter, analogous to the exponent q of the Naka-Rushton function. As before, the left subscript on r(x)—in this case “Gauss” for Gaussian—indicates the form of the tuning function. Examples of the Gaussian tuning function are plotted in Figure 1C.  
If σtuning is the standard deviation of the Gaussian tuning function, then its bandwidth (full width at half height) w is given by w2 = (8ln2)Display FormulaImage not available . Thus we have    
If x is measured in units that are the log to base b of the physical stimulus units, then w will also be in logb units. We can convert w to octaves (i.e., log2 units) by multiplying by log2b (this is because logby × log2b = log2y). If ω is the bandwidth in octaves, then  Using Equation 8 to substitute for w in Equation 7 gives    
The random process for spike generation
In our model, the spikes are generated by a model of neuronal spiking recently proposed by Goris, Movshon, and Simoncelli (2014). Each model neuron has a Poisson spiking process whose mean spike rate is modulated by a multiplicative gain mechanism that is also a random process, so the overall spiking process is a doubly stochastic Poisson process. Specifically, for each model neuron, the rate of its Poisson spiking process is determined by the output of its tuning curve r(x) multiplied by a gain value that changes randomly from trial to trial. We let G represent the gain random variable, and g its value on a particular trial. The spiking distribution conditioned on the gain is an ordinary Poisson distribution:    
In the spiking model of Goris et al. (2014), the gain values are sampled from a gamma distribution with a mean of 1 and a standard deviation σG that is a free parameter. The gamma gain distribution and Poisson spiking distribution combine to produce a gamma-Poisson mixture distribution that has the form of a negative binomial distribution of spike counts, given by  where Γ(·) is the gamma function. The distribution in Equation 11 fits well to the spike distributions obtained in physiological recordings (see figure 1c and 1d of Goris et al., 2014). Because the mean gain is 1, the mean of the spike distribution in Equation 11 is r(x).  
Goris et al. (2014) allowed both the spiking process and the gain process to be correlated between neurons. The covariance between the spike counts of neurons i and j is then given by  where ri(x) and rj(x) are the outputs of the tuning functions of neurons i and j given a stimulus x; Display FormulaImage not available and Display FormulaImage not available are the standard deviations of the two neurons' gain processes; Display FormulaImage not available is the Pearson correlation between the two neurons' Poisson spiking processes; and Display FormulaImage not available is the Pearson correlation between the two neurons' gain processes. If we let i = j in Equation 12, then we obtain the variance of neuron j:    
This can be converted to the neuron's Fano factor (i.e., the ratio of variance to mean) by dividing by the mean response rj(x):    
Equation 14 shows that the model neuron's Fano factor is an affine (i.e., straight line) function of the spike rate, with gradient equal to the variance of the gain process: The Fano factor is around 1 for very low spike rates and increases linearly with the spike rate. 
In order to meet our stated goal of mathematical simplicity, we imposed three restrictions on the aforementioned parameters: 
  1.  
    Each neuron has the same gain standard deviation, which is set as a model parameter σG.
  2.  
    For all i and j, Display FormulaImage not available = 1.
  3.  
    For all ij, Display FormulaImage not available = 0.
The second and third restrictions state, respectively, that each neuron shares the same fluctuating gain signal and that each neuron's Poisson spiking process is independent. This is an extreme version of the model of Goris et al. (2014), but it seems not too far from the truth for many pairs of neurons: Very high-quality electrophysiological recordings indicate that, in awake monkeys whose only task is to fixate the stimulus, V1 neurons “show virtually no correlated variability” (Ecker et al., 2010, p. 584), suggesting that Display FormulaImage not available is usually close to zero. Ecker et al. (2014) argue that the very much higher correlations shown in other circumstances, especially under anesthesia, can be accounted for by a single gain variable that varies from trial to trial. They showed that a simple model with independent Poisson spiking processes and a single random gain signal shared between all the neurons was able to capture the dependence of spike-count correlations on firing rate and tuning similarity shown by real neurons. Thus, a model very similar to ours, in which Display FormulaImage not available = 0 and Display FormulaImage not available = 1, has been shown to account for real neuronal data.  
These restrictions help us to simplify the analysis of the model considerably. Firstly, instead of specifying the entire covariance matrix of the population, we can just specify the gain standard deviation σG. The covariance matrix itself still has a complicated structure because each term in the covariance matrix depends on σG and the sensitivities of the two neurons to the stimulus, but all of this complexity can be reduced to a single scalar random variable G with a single parameter σG. Secondly, as explained earlier, we aim to calculate the performance of the decoder that makes best use of the encoded signals; and the best possible decoder will have access to the shared gain signal. If Display FormulaImage not available = 0, then although the neurons' spike counts are correlated due to the shared gain signal, the conditional spiking probabilities (conditioned upon the gain signal) are independent. Thus, a decoder that knows the gain signal can express the spike distributions as independent Poisson distributions, which considerably simplifies both the maximum-likelihood decoding algorithm and our mathematical analysis of its performance. We also investigated the performance of two suboptimal decoders that did not know the gain signal. Loss of knowledge of the gain signal greatly impaired performance with the Naka-Rushton tuning function, but with the Gaussian tuning function, decoding performance was only slightly affected.  
With the three restrictions listed, we can rewrite Equation 12 as    
The Pearson correlation between the two neurons' spike counts is then given by     
As noted by Goris et al. (2014), their model elegantly captures two key physiological findings, which both follow from their equations: Pair-wise correlations generally increase with both firing rate (de la Rocha, Doiron, Shea-Brown, Josic, & Reyes, 2007; Ecker et al., 2010, 2014; Cohen & Kohn, 2011) and tuning similarity (van Kan, Scobey, & Gabor, 1985; Zohary, Shadlen, & Newsome, 1994; Smith & Kohn, 2008; Ecker et al., 2010, 2014). Supplementary Appendix B confirms that, for nonzero σG, both of these characteristics emerge naturally from Equation 17. The dependence of ρij on tuning similarity is particularly noteworthy, since in our parameterizations of the model we have forced the Poisson and gain correlations Display FormulaImage not available and Display FormulaImage not available to be constant across all pairs of neurons, regardless of tuning similarity.  
The decoding method
In our sensory coding model, a stimulus triggers a set of spike counts n in the neurons being monitored by the observer: The vector n holds the spike counts of all the different neurons. We investigated three methods for decoding these spike counts. 
The first is a maximum-likelihood method that uses knowledge of the gain signal to express the spike distributions as independent Poisson distributions; in this case, maximum-likelihood decoding is straightforward because the likelihood function for the population response is simply the product of the likelihood functions for the individual responses. We call this the Known Gain method. 
We were also interested in finding out how well the spike counts could be decoded without knowledge of the gain signal, to see how critically the accuracy of our equations depended on the assumption that the gain signal is known. If the decoder does not know the gain signal, then it can only express the spike distributions as correlated gamma-Poisson mixture distributions. Maximum-likelihood decoding in this case would require an expression for the multivariate gamma-Poisson mixture distribution, so that we could calculate the likelihood function for the population response. Unfortunately, a closed-form expression for this distribution beyond the bivariate case is not known, so we used two suboptimal methods, which we refer to collectively as the Unknown Gain methods. The first of these methods simply ignores the correlations and decodes the neurons as if they were independent (an approach also taken by Goris et al., 2013). This approach is not always suboptimal: Averbeck, Latham, and Pouget (2006) showed that even in cases where correlations cause a large performance deficit, it can still be optimal to decode the neuronal responses as if they were independent (see figure 3a of Averbeck et al., 2006). More generally, however, it is more informative to take account of the correlations when decoding a set of correlated neurons. The second of our Unknown Gain methods uses an analytical form for the bivariate gamma-Poisson mixture distribution to decode the population in a way that takes into account all pair-wise statistical dependencies but not higher order dependencies. We refer to these two Unknown Gain methods as the Univariate and Bivariate methods, respectively. We find that the Bivariate method performs better than the Univariate method. Surprisingly, when the tuning function is a Gaussian, both methods are almost as good as the Known Gain method. 
The Known Gain decoder
Here, the spike counts of the neurons are decoded using a maximum-likelihood method in which the gain signal is known. In maximum-likelihood decoding, we find the stimulus value x that had the highest probability of giving rise to the obtained set of spike counts, i.e., the value of x that maximizes the probability P(N = n|X = x, G = g). Because of the statistical independence of the spiking distributions conditioned on the gain signal, we can write   where K is the number of neurons and the neurons are indexed by j. The second equality (Equation 19) arises because each rj (x) is a deterministic function of the stimulus value x. For large populations, the product in Equation 19 can be too small to represent using floating-point values on a standard computer, so instead we calculated the logarithm of this value, which peaks at the same value of x and is given by  The probability in each term in Equation 20 is evaluated using Equation 10, and the decoder uses the simplex search method (Nelder & Mead, 1965) to find the stimulus value x that maximizes this sum.  
The Univariate (Unknown Gain) decoder
Without knowledge of the gain signal, the spiking distributions are negative binomial distributions with the form given in Equation 11. The Univariate decoder decodes the spike counts as if they were independent. Thus we found the stimulus value x that maximized the sum of log probabilities Display FormulaImage not available , where the probability in each term is evaluated using Equation 11.  
The Univariate decoder is not as good as the Bivariate decoder (described next), and our principal motivation for including it was to confirm that the Bivariate decoder did indeed perform better. 
The Bivariate (Unknown Gain) decoder
Arbous and Kerrich (1951) derived an expression for the bivariate gamma-Poisson mixture distribution, which gives the joint probability of the spike counts of any pair of neurons in our restricted parameterizations of the spiking model of Goris et al. (2014). Substituting our variables for the corresponding variables or expressions in Arbous and Kerrich's equation 5.11, we obtain   Here we have deliberately expressed Arbous and Kerrich's equation in a form that makes explicit the many parallels between the expressions for the bivariate and univariate distributions (Equations 22 and 11, respectively).  
To decode the neuronal population, we took each pair-wise combination of neurons and calculated the likelihood for that pair, using Equation 22. The estimated stimulus value x was that which maximized the sum of log likelihoods across all pairs of neurons. Thus, we decoded the population as if each pair of neurons were statistically independent from each other pair. As noted earlier, this decoding method takes account of pair-wise statistical dependencies but not higher order dependencies. 
Parameterizing the population
In principle, every parameter of every neuron in the population could vary independently. However, to maximize the simplicity of the model, we consider two simpler, more restricted classes of parameterization. 
The Constant parameterization
To begin with, we consider a very simple class of parameterizations, in which rmax, r0, q, and σG are each constant across different neurons, and the tuning curve centers z are equally spaced along the x-axis, with constant spacing δz between values of zmin and zmax. We define a density parameter h equal to 1/δz. The model thus has seven parameters: rmax, r0, q, σG, h, zmin, and zmax. Since each of these parameters is assumed to be constant across different neurons, we call this the Constant parameterization. 
One important property of the Constant parameterization is that if the stimulus value x is the logarithm of the physical stimulus level ξ, then the model will behave according to Weber's law. The reason for this is that with all the parameters constant, if we move from a low stimulus value to a high stimulus value, we are faced with an identical decoding situation, just shifted along the x-axis: The tuning functions have the same shape and density, and the noise properties are the same. Thus the standard deviation of the decoded value of x will be constant across stimulus levels. This means that Δxθ, the just-noticeable difference in x, will be constant. If ξp is the pedestal stimulus (i.e., the lower of the two stimuli to be discriminated) expressed in physical units, and Δξθ is the just-noticeable difference in physical units for a pedestal of ξp, then Δxθ = logb(ξp + Δξθ) − logb(ξp). This can be rearranged to give Δξθ/ξp = Display FormulaImage not available − 1, which is constant for constant Δxθ. Thus, the Weber fraction Δξθ/ξp is constant, which is the definition of Weber's law.  
The Exponential parameterization
Although Weber's law has often been found in suprathreshold discrimination experiments, the Weber fraction for contrast discrimination generally decreases with increasing contrast, to give a slope of around 0.6–0.7 when the threshold difference of Michelson contrasts is plotted as a function of the pedestal Michelson contrast on log-log axes (Legge, 1981; Meese, Georgeson, & Baker, 2006). This is often referred to as the near-miss to Weber's law. To capture this behavior, we used a parameterization in which h, rmax, and q can increase with the tuning function's position along the x-axis. Specifically, we let h, rmax, and q vary as exponential functions of the tuning function position z:    The parameters kh, Display FormulaImage not available , and kq give the values of h, rmax, and q when z = 0; mh, Display FormulaImage not available , and mq are parameters that determine how rapidly h, rmax, and q change as a function of z. We call this the Exponential parameterization. It is a generalization of the Constant parameterization: The Constant parameterization is the Exponential parameterization with mh = Display FormulaImage not available = mq = 0. For simplicity, we will assume that r0/rmax is constant in the Exponential parameterization. As before, z falls between zmin and zmax. Supplementary Appendix C shows how to generate a set of z values when h varies exponentially across the stimulus axis.  
Predicting model performance
As noted in the Introduction, the decoding precision (i.e., the reciprocal of the variance of the decoded stimulus value ) is closely approximated by the Fisher information. In this section, we derive expressions for the Fisher information for decoding the neurons when the gain is known. On each trial, the tuning function of each neuron in our model is multiplied by the gain signal g, so the effective tuning function for neuron j on that trial is grj (x). If the decoder knows the gain signal, then it knows the effective tuning functions grj (x) and it can express the spiking distributions as independent Poisson distributions. For a set of independent Poisson-spiking neurons with tuning functions grj (x), the Fisher information is given by  where Display FormulaImage not available (x) is the first derivative with respect to x of neuron j's tuning function (see Dayan & Abbott, 2001, chapter 3). Thus the variance of the estimated stimulus value on trials with stimulus x and gain g will be approximated by  Over all trials with stimulus x, the variance will be given by   Thus, the precision τ(x) for decoding a stimulus with value x is given by    
Supplementary Appendix D shows that, for the gamma-distributed gain signal in the model of Goris et al. (2014), mean[1/G] = 1/(1 − Display FormulaImage not available ), giving    
From Relation 31, we have τ(x) ≈ J(x, 1 − Display FormulaImage not available ). Interestingly, 1 − Display FormulaImage not available is the value of the gain with the highest probability density. Thus, the decoding precision is the modal value of the Fisher information across trials, not the mean.  
In this article, we consider two-alternative forced-choice discrimination experiments, in which observers have to discriminate a pedestal stimulus value from a slightly higher stimulus value. Suppose x is the log to base b of the physical stimulus value ξ. If the pedestal stimulus value is ξp and the difference in physical stimulus values at threshold is Δξθ, then the Weber fraction W is defined as    
Supplementary Appendix E shows that, in this case, the Weber fraction is approximated by  where Φ is the integral of a Gaussian with unit area and variance and zero mean, and Pθ is the proportion of correct responses that defines the threshold level. We can then calculate the Weber fraction by using Relation 31 to substitute for τ(x) in Relation 33.  
In summary, to predict the model's performance, we first calculate J(x, 1), the Fisher information that would occur when the gain is 1. Then we multiply this by 1 − Display FormulaImage not available to give τ(x), the overall precision when the gain varies with standard deviation σG (Relation 31). We can then calculate the Weber fraction from the precision using Relation 33. The next two sections derive various expressions for J(x, 1) for different model parameterizations. The first of these two sections derives exact expressions; the second derives compact approximations of these expressions, which can often give a better insight into the way that the different neuronal parameters are related to perceptual performance in a population-coding model.  
Exact expressions for the Fisher information
Using the Naka-Rushton function (Equation 4) to expand rj (x) in Equation 26, we get  where the parameters on the right-hand side are the neuronal parameters of the Naka-Rushton function from Equation 4, and can vary from neuron to neuron (strictly speaking, each parameter should be indexed by the neuron number j, but we omit these indices for clarity). The left subscript on N-RJ(x, 1) indicates the form of the tuning function, in this case “N-R” for Naka-Rushton. When r0 = 0, Equation 34 reduces to    
Going through a similar process for the Gaussian tuning function, we obtain  which, when r0 = 0, reduces to    
Equations 34 through 37 are good for fitting the model to psychophysical data, but each of these equations is a sum with one term for each neuron, and this complexity can make it difficult to see how the different neuronal parameters are related to perceptual performance in a population-coding model. We therefore derived approximations of these equations by approximating the sum using an integral. This allowed us to write down the Fisher information for the whole population in a single compact expression. We call these expressions integral approximations. To distinguish them from the true Fisher information expressions, we use the letter I (for “integral”) to represent them rather than J
These approximations are described in the next section. We consider three classes of parameterization: the Constant parameterization with the Naka-Rushton and the Gaussian tuning functions and the Exponential parameterization with the Naka-Rushton tuning function. For each of these, we derive an integral approximation of the Fisher information. These expressions for Fisher information can be used in place of J(x, 1) in Relation 31 to predict the decoding precision. 
Integral approximations of the Fisher information
The Constant Naka-Rushton parameterization
The Constant Naka-Rushton parameterization is the Constant parameterization that uses the Naka-Rushton tuning function. Since all the parameters except z are constant, and δz = 1/h, we can rearrange the right-hand side of Equation 34 to give  As δz approaches zero, the right-hand side of Equation 38 can be approximated by an integral, which we call Display FormulaImage not available (x, 1). As long as the stimulus value x is sufficiently far from the edges of the range of z values, we can take the limits of z to be ±∞, giving    
We used Mathematica (Wolfram Research) to solve the integral and obtained  where  Figure 2 plots Q(r0/rmax) for 0 ≤ r0/rmax ≤ 1, and it can be seen that Q(r0/rmax) smoothly decreases with increasing r0/rmax. Thus, as r0 increases from 0, Display FormulaImage not available (x, 1) undergoes a multiplicative attenuation that is a function only of the ratio r0/rmax. Because the attenuation is a function of the ratio r0/rmax rather than r0 alone, we will often take this ratio, rather than r0, to be a model parameter. We refer to this ratio as the relative spontaneous firing rate. When r0 = 0, Q(r0/rmax) = 1, and Equation 40 reduces to  The ln(b)/2 part of this expression is just a constant that depends on the arbitrary choice of base of logarithm that we use to represent contrast (and reduces to 1 for b = e2). The interesting part is rmaxqh: This is the simplest expression that we could possibly imagine, given that the Fisher information has to increase with increasing rmax, q, and h. Equation 42 therefore reveals a remarkably straightforward relationship between the Fisher information and the neuronal parameters.  
Figure 2
 
Attenuation of Fisher information for nonzero r0. The curve plots Q(r0/rmax) as defined in Equation 41.
Figure 2
 
Attenuation of Fisher information for nonzero r0. The curve plots Q(r0/rmax) as defined in Equation 41.
Note that both expressions for Display FormulaImage not available (x, 1) (Equations 40 and 42) are independent of the stimulus value x. From Relation 33, this leads to a constant Weber fraction, i.e., Weber's law.  
The Constant Gaussian parameterization
The Constant Gaussian parameterization is the Constant parameterization that uses the Gaussian tuning function. We begin by assuming that r0 = 0. Since all the parameters except z are constant, and δz = 1/h, we can rearrange the right-hand side of Equation 37 to get  When δz approaches zero, the right-hand side of Equation 43 can be approximated by an integral, which we call Display FormulaImage not available (x, 1):  The integral in Equation 44 is a standard definite integral, and Equation 44 reduces to    
Equation 45 applies only to the case of r0 = 0 because we started with Equation 37. If instead we start with Equation 36, we obtain an expression that applies to all r0:    
When δz approaches zero, the right-hand side of Equation 46 can be approximated by an integral:  where    
In Supplementary Appendix F, we show that  Using Equation 49 to substitute for S(q) in Relation 47, we obtain  We can write S(1) as  Note that we were able to drop the x that appears in the function being integrated, because this just shifts the function horizontally by a finite amount x along the z-axis but does not change its integral between infinite limits; so S(1) is a function of r0/rmax only. Unfortunately, for r0 > 0, we cannot find a closed-form expression for S(1). However, for the range of relative spontaneous firing rates likely to occur, we have found that it can be very closely approximated by  where the function Q is defined in Equation 41. Supplementary Appendix F shows that, for 0 < r0/rmax < 0.119, the approximation on the right-hand side of Relation 52 slightly overestimates S(1) by a factor that never exceeds 0.7% of the true value; for r0/rmax > 0.120, the right-hand side of Relation 52 underestimates S(1), but not by much: Even for r0/rmax as high as 1, the underestimation is only about 3%, and it is always less than 6% of S(1). Using Relation 52 to substitute for S(1) in Relation 50, we obtain our integral approximation Display FormulaImage not available (x, 1) for any r0:    
Note that, apart from having a different multiplicative constant—2Display FormulaImage not available instead of ln(b)/2—Equation 53 is identical to Equation 40, which gives the corresponding expression for the Naka-Rushton tuning function. Similarly, Display FormulaImage not available (x, 1) is independent of x, which leads to Weber's law.  
The Exponential Naka-Rushton parameterization
The Exponential Naka-Rushton parameterization is the Exponential parameterization that uses the Naka-Rushton tuning function. We begin by assuming that r0 = 0, so the Fisher information is given by Equation 35. For now, let us also assume that q is constant, while h and rmax vary with z according to Equations 23 and 24. Then we can rearrange the right-hand side of Equation 35 to get  Using Equation 24 to substitute for rmax in Equation 54, we have    
To convert to an integral, we need to transform the z values so that the transformed values ζ are equally spaced. Supplementary Appendix C shows that an appropriate transformation is given by  giving  Using Equation 57 to substitute for z in Equation 55, we obtain  As shown in Supplementary Appendix C, the definition of ζ in Equation 56 causes the neurons to be separated in equal steps of size δζ = 1/kh along the ζ axis when h varies exponentially with z according to Equation 23. Thus we have  As δζ approaches zero, the right-hand side of Equation 59 can be approximated by an integral, which we call Display FormulaImage not available (x, 1):   The limits of 0 and ∞ on the integral arise because, as before, we assume that the stimulus value x is far from the ends of the range of z values, and so the limits of z are effectively ±∞; from Equation 56, as z approaches −∞, ζ approaches zero; and as z approaches ∞, ζ approaches ∞. In Supplementary Appendix G, we derive an expression for the integral in Equation 60. This integral has a finite solution if qlnb > mh + Display FormulaImage not available , in which case the solution is given by Equation G.18. Using Equation G.18 to substitute for the integral in Equation 60, and simplifying, we obtain  where  and  subject to the restriction that  It can be shown that Display FormulaImage not available (x, 1) approaches Display FormulaImage not available (x, 1) as m approaches zero, which is essential because, if the m parameters are all zero, then the Exponential parameterization reduces to the Constant parameterization.  
Equation 61 shows two notable features. Firstly, the Fisher information is an exponential function of x: Specifically, when h ∝ exp(mhz) and rmax ∝ exp(Display FormulaImage not availablez), the Fisher information is proportional to exp[(mh + Display FormulaImage not available )x]. Secondly, Display FormulaImage not available (x, 1) is a function of the sum of mh and Display FormulaImage not available , regardless of their individual values. Thus, mh and Display FormulaImage not available can be exactly traded off against each other and the Fisher information will not change, as long as mh + Display FormulaImage not available remains constant.  
Equation 61 assumes that q is constant (i.e., mq = 0). Allowing q to vary with z greatly complicates the integral, and we were unable to solve it, so instead we used an approximation. First, we extend the definition of γ (Equation 63) as follows:  Here, q is treated as a function of x, but using the parameters that define how it varies with z (Equation 25); however, the approximation is good enough, because the performance for a log contrast of x will be dominated by the neurons with z close to x. If we then use Equation 65 instead of Equation 63 to substitute for γ in Equation 61, we obtain a very good approximation of the Fisher information when q varies as an exponential function of z.  
Equation 61 also assumes r0 = 0. We found that increasing r0 causes an approximately multiplicative attenuation that is close to the factor Q(r0/rmax) (Equation 41) derived for the Constant parameterization. Thus, even though the Q function was not derived for the Exponential parameterization, we can borrow it to approximate the effect of nonzero r0 for this parameterization:  where m is given by Equation 62 and γ is given by Equation 65.  
Fitting the model to psychophysical data
In this section, we test the accuracy of our equations by comparing their predictions against the true performance of the model, determined from Monte Carlo simulations. In principle, we could make this comparison for any set of model parameters. However, we are most interested in testing the accuracy of our equations when the model's performance is close to human levels. The Cramér–Rao bound given by the Fisher information can substantially overestimate the decoding precision when the performance level or number of neurons is low (Xie, 2002; May & Solomon, 2015); if the Fisher information deviated substantially from the model's decoding precision at performance levels shown by humans, then our equations would be of little value. It is therefore important to verify that our equations closely predict the model's decoding precision for the performance levels shown by observers in psychophysical tasks. The best way to be sure of this is to compare the simulations and equations at human performance levels, i.e., to fit the model to psychophysical data. 
Because the model performance can be estimated so quickly from the equations, we used the equations to fit the model to psychophysical data, and then we carried out Monte Carlo simulations using the fitted parameter values. This section therefore serves two purposes: As well as validating our equations for relevant performance levels, it provides a demonstration of how our equations can be used to fit the generic sensory coding model to psychophysical data. 
The parameters were fitted to the psychophysical data using the simplex algorithm to minimize the sum of squared differences between predicted and actual log discrimination thresholds. On each iteration of the fitting procedure, the population of neurons was set up as described in Supplementary Appendix C; then the exact Fisher information for each pedestal level was found, using Equation 34 or 36 as appropriate, and this was used to calculate the precision τ(x) using Relation 31. The precision was then used to calculate the Weber fraction W, using Relation 33 with Pθ set to the threshold performance level that had been used in the psychophysical study. The predicted Weber fraction was multiplied by the pedestal level to give the predicted threshold (Δfθ in Figure 3, or Δcθ in Figures 6 and 7). We took the logarithms of these predicted thresholds and found the sum of squared differences between the predicted log thresholds and the log thresholds from the psychophysical data. The simplex algorithm adjusted the model parameters to minimize this sum. 
Figure 3
 
Fitting the Constant Gaussian parameterization to data from Mayer and Kim (1986). The black triangles in the bottom panels show Mayer and Kim's spatial frequency discrimination data for their subject MJ, condition R 2PFC, transcribed from their figure 7. Mayer and Kim's data actually show the difference in spatial frequency between the 0.25 point and 0.75 point of the psychometric function. The pedestal frequency should fall halfway between these points, so we halved their thresholds to obtain Δfθ, the difference between target and pedestal at a threshold performance level of Pθ = 0.75. Four of the model's parameters were fixed as follows: zmin = −0.3, zmax = 1.7, r0/rmax = 0.03, and ω = 1.5 octaves. The values of rmax and σG were set as indicated above the top of each panel in the top row. The remaining parameter, h, was fitted to Mayer and Kim's data; the fitted values are shown in the top panels. Having fitted this parameter, we carried out Monte Carlo simulations as described in the text and in Supplementary Appendix H. The blue circles in the top row of panels plot the decoding precision from the Monte Carlo simulations using the Known Gain decoder. These points show decoding precision for every 30th value along the x-axis that we calculated. The rest are omitted from the figure for clarity, although the stimulus estimates for these x values were used in the 2AFC simulations, where we needed a fairly fine sampling of the x-axis to fit the psychometric function and hence find the discrimination threshold. The red lines in the top row of panels plot the precision predicted from the Fisher information GaussJ(x, 1) (Equation 36), while the thick gray lines plot the precision predicted from the integral approximation of the Fisher information Image not available (x, 1) (Equation 53); in both cases, we converted ω to q using Equation 9 before applying Equation 36 or 53. The predicted precision is found by multiplying the Fisher information by 1 − Image not available (see Relation 31). In the bottom row, the blue circles plot the thresholds Δfθ obtained from the Monte Carlo simulations of the 2AFC discrimination task with the Known Gain decoder. The red lines in the bottom row plot the thresholds predicted from GaussJ(x, 1) using Relation 33 with Pθ = 0.75. The thick gray lines plot the thresholds predicted in the same way, except using Image not available (x, 1) to approximate the Fisher information. Note that the abscissas in the top row are identical to those on the bottom row, i.e., each position on the abscissa in the top row represents the same stimulus as the same position on the abscissa in the bottom row. In the top row, we have marked the abscissas with log units, since the precision is calculated from the values in these units; in the bottom row, we have marked the abscissas with linear units, to be compatible with the threshold, which is defined as the difference of spatial frequencies.
Figure 3
 
Fitting the Constant Gaussian parameterization to data from Mayer and Kim (1986). The black triangles in the bottom panels show Mayer and Kim's spatial frequency discrimination data for their subject MJ, condition R 2PFC, transcribed from their figure 7. Mayer and Kim's data actually show the difference in spatial frequency between the 0.25 point and 0.75 point of the psychometric function. The pedestal frequency should fall halfway between these points, so we halved their thresholds to obtain Δfθ, the difference between target and pedestal at a threshold performance level of Pθ = 0.75. Four of the model's parameters were fixed as follows: zmin = −0.3, zmax = 1.7, r0/rmax = 0.03, and ω = 1.5 octaves. The values of rmax and σG were set as indicated above the top of each panel in the top row. The remaining parameter, h, was fitted to Mayer and Kim's data; the fitted values are shown in the top panels. Having fitted this parameter, we carried out Monte Carlo simulations as described in the text and in Supplementary Appendix H. The blue circles in the top row of panels plot the decoding precision from the Monte Carlo simulations using the Known Gain decoder. These points show decoding precision for every 30th value along the x-axis that we calculated. The rest are omitted from the figure for clarity, although the stimulus estimates for these x values were used in the 2AFC simulations, where we needed a fairly fine sampling of the x-axis to fit the psychometric function and hence find the discrimination threshold. The red lines in the top row of panels plot the precision predicted from the Fisher information GaussJ(x, 1) (Equation 36), while the thick gray lines plot the precision predicted from the integral approximation of the Fisher information Image not available (x, 1) (Equation 53); in both cases, we converted ω to q using Equation 9 before applying Equation 36 or 53. The predicted precision is found by multiplying the Fisher information by 1 − Image not available (see Relation 31). In the bottom row, the blue circles plot the thresholds Δfθ obtained from the Monte Carlo simulations of the 2AFC discrimination task with the Known Gain decoder. The red lines in the bottom row plot the thresholds predicted from GaussJ(x, 1) using Relation 33 with Pθ = 0.75. The thick gray lines plot the thresholds predicted in the same way, except using Image not available (x, 1) to approximate the Fisher information. Note that the abscissas in the top row are identical to those on the bottom row, i.e., each position on the abscissa in the top row represents the same stimulus as the same position on the abscissa in the bottom row. In the top row, we have marked the abscissas with log units, since the precision is calculated from the values in these units; in the bottom row, we have marked the abscissas with linear units, to be compatible with the threshold, which is defined as the difference of spatial frequencies.
Modeling Weber's law with Gaussian-tuned neurons: The Constant Gaussian parameterization
As noted earlier, the Constant parameterizations give rise to Weber's law. Mayer and Kim's (1986) data on spatial frequency discrimination conform to Weber's law, and spatial frequency tuning functions in primary visual cortex are approximately Gaussian functions of log spatial frequency (De Valois, Albrecht, & Thorell, 1982), so it is appropriate to fit the Constant Gaussian parameterization to Mayer and Kim's data. These data are plotted as black triangles in Figure 3
In the case of Weber's law for spatial frequency discrimination, if the pedestal frequency is fp and the frequency difference at threshold is Δfθ, then the plot of Δfθ against fp on log-log axes will be a straight line of gradient 1. There is only one degree of freedom in this plot—the vertical position—so we needed only one free parameter to fit the data. Our approach was to hold all the model parameters constant except for the density h, and to fit h to the psychophysical data. 
For the modeling shown in Figure 3, we set the tuning bandwidth ω to 1.5 octaves, which is close to the median value found physiologically (De Valois et al., 1982); q was then calculated from ω using Equation 9. We set r0/rmax to 0.03 for all the modeling; the rationale for this choice was that Geisler and Albrecht (1997) found that the median r0 for a 200-ms stimulus was 0.17 for monkey V1 neurons, which is 0.03 when expressed as a proportion of median rmax = 5.7 spikes for neurons tuned to the stimulus. Different columns of panels in Figure 3 show results for different combinations of σG and rmax (as indicated above each panel in the top row). The values of σG = 0.2 and 0.4 are close to the mean values obtained by Goris et al. (2014) for awake and anaesthetized monkeys, respectively. The lower value of rmax = 4 was close to the median value (5.7 spikes) reported by Geisler and Albrecht (1997) for a 200-ms stimulus, and the purpose of the higher value (rmax = 16) was to show how this affects the size of the neuronal correlations and Fano factors. 
Having fitted h, we constructed a set of model neurons with z equally spaced along the log spatial frequency axis between zmin = −0.3 and zmax = 1.7 at spacing δz = 1/h. Then we performed Monte Carlo simulations. The full details of the Monte Carlo simulations are given in Supplementary Appendix H, but briefly, they were carried out as follows. First, we sampled a large number of points along the stimulus (x) axis. For each stimulus value x, we used the stochastic spiking model to generate 10,000 sets of spike counts and decoded each set of spike counts to give 10,000 estimated stimulus levels. The decoding precision was then calculated as the reciprocal of the variance of the stimulus estimates (decoding precision for the Known Gain decoder is plotted as blue circles in the top row of panels in Figure 3). The stimulus estimates were also used to simulate a 2AFC discrimination task (described in detail in Supplementary Appendix H). 2AFC trials consisted of two stimulus presentations, each with a different randomly generated gain value g. The lower stimulus value was the pedestal, and the higher value was the target. On each 2AFC trial, the model selected as the target the stimulus with the highest estimated value, and we found the proportion of correct responses for each combination of pedestal and target. For each pedestal, we fitted a Weibull psychometric function (May & Solomon, 2013) to the model's proportion-correct data and obtained a discrimination threshold from the fitted function as described in Supplementary Appendix H. The discrimination thresholds for the Known Gain decoder are plotted as blue circles in the bottom row of panels in Figure 3
The red lines in the top row of panels give the predicted decoding precision, calculated from the Fisher information Display FormulaImage not available (x, 1) using Relation 31 and Equation 36. Table 1 shows that the true decoding precision obtained from the Monte Carlo simulations with known gain differs from the predicted value by less than 0.5%. This close match confirms that the Fisher information gives a sufficiently close approximation of decoding precision to allow us to calculate model performance and gain insights into the relationships between physiology and behavior.  
Table 1
 
Precision scores expressed as a proportion of the precision predicted from GaussJ(x, 1) for the Constant Gaussian parameterizations of Figure 3. For each of the 104 stimulus levels x between 0.6 and 0.8, we expressed the precision for each decoder as a proportion of that predicted from GaussJ(x, 1). The table shows the mean value of this proportion for each condition and decoder. The top row shows the precision predicted from Image not available (x, 1), expressed as a proportion in the same way. Each value in the table is the mean of 104 ratios. For each ratio, the precisions were calculated from 10,000 trials, except for the Bivariate decoder on the two conditions with rmax = 4. These conditions had a very large number of neurons, so the Bivariate decoding algorithm, which calculated a likelihood for every pair of neurons, was very slow. Due to time constraints, we were only able to decode 4,200 of the 10,000 trials for σG = 0.2, rmax = 4, and 3,500 of the 10,000 trials for σG = 0.4, rmax = 4. The slightly higher precision score for the Bivariate over the Known Gain decoder for σG = 0.2, rmax = 4 is a result of sampling error due to an insufficient number of trials.
Table 1
 
Precision scores expressed as a proportion of the precision predicted from GaussJ(x, 1) for the Constant Gaussian parameterizations of Figure 3. For each of the 104 stimulus levels x between 0.6 and 0.8, we expressed the precision for each decoder as a proportion of that predicted from GaussJ(x, 1). The table shows the mean value of this proportion for each condition and decoder. The top row shows the precision predicted from Image not available (x, 1), expressed as a proportion in the same way. Each value in the table is the mean of 104 ratios. For each ratio, the precisions were calculated from 10,000 trials, except for the Bivariate decoder on the two conditions with rmax = 4. These conditions had a very large number of neurons, so the Bivariate decoding algorithm, which calculated a likelihood for every pair of neurons, was very slow. Due to time constraints, we were only able to decode 4,200 of the 10,000 trials for σG = 0.2, rmax = 4, and 3,500 of the 10,000 trials for σG = 0.4, rmax = 4. The slightly higher precision score for the Bivariate over the Known Gain decoder for σG = 0.2, rmax = 4 is a result of sampling error due to an insufficient number of trials.
The red lines in the bottom row of panels show the discrimination thresholds predicted from Display FormulaImage not available (x, 1) using Relation 33 with Pθ = 0.75. It was these predicted thresholds that were used to fit the model to Mayer and Kim's (1986) data in the first place, so it is no surprise that they fit well to Mayer and Kim's data. What is more important is how accurately the thresholds predicted from the Fisher information match those obtained from the Monte Carlo simulations with the Known Gain decoder (compare the red lines against the blue circles in the bottom row of Figure 6).  
The thick gray lines in the top row of Figure 3 plot the precision predicted from the integral approximation of the Fisher information Display FormulaImage not available (x, 1), as defined in Equation 53. The thick gray lines in the bottom row plot the thresholds predicted from Display FormulaImage not available (x, 1). The integral approximation of the Fisher information differs from the true Fisher information by less than 1% (see Table 1) and provides a better insight into the relationship between psychophysical performance and the neuronal parameters than we get from Display FormulaImage not available (x, 1), which is a sum with one term for each neuron.  
In the Monte Carlo simulations described so far, the decoder knew the gain on each stimulus presentation. We also carried out analogous simulations using the Univariate and Bivariate decoders, which did not know the gain signal. These decoders were applied to the same simulated spike data as the Known Gain decoder—it was just the decoding algorithm that differed. Table 1 shows that, for three of the four conditions, the precision of the Bivariate decoder was within 2% of the value predicted from the Fisher information. For the most physiologically plausible condition (σG = 0.2, rmax = 4), the Bivariate decoder's performance was almost identical to the predicted value; even on the Bivariate decoder's worst condition (σG = 0.4, rmax = 16), its precision was within 6% of the value predicted from the Fisher information. This is remarkably close, considering that the estimate based on the Fisher information assumes that the decoder knows the gain signal on each trial and is performing maximum-likelihood estimation; in reality, the Bivariate decoder does not know the gain signal and is not a maximum-likelihood decoder, as it can only take account of pair-wise statistical dependencies. The true maximum-likelihood decoder for the Unknown Gain situation would almost certainly yield a precision even closer to the predicted value. 
Why should knowledge of the gain signal yield so little benefit? For a Gaussian-tuned neuron, increasing the gain has a similar effect to moving the stimulus value toward the peak of the neuron's function curve. In response to an unknown change of gain, neurons with tuning peaks on either side of the true stimulus value will tend to pull the decoded stimulus value in opposite directions, so the effects largely cancel out, and gain fluctuations have little effect on the decoded stimulus values even when the gain value is unknown to the decoder. Because the gain signal conveys so little extra information, we can relax the assumption that the gain signal is known, and our equations still predict performance closely. 
The closeness of the Bivariate and Univariate decoders' precision values shows that it is not even particularly important for the decoder to take into account pair-wise statistical dependencies. The Univariate decoder decodes the neuronal responses as if they were entirely independent of one another, yet for three of the four conditions, the precision of the Univariate decoder was within 1% or 2% of the value predicted from the Fisher information. 
We also analyzed the spike counts generated by the model to check that they showed the expected Fano factors and spike-count correlations. As already noted, the Fano factor of these model neurons depends on the mean spike count and the standard deviation of the gain: Equation 14 shows that the plot of Fano factor against mean spike count should be a straight line, with gradient Display FormulaImage not available , passing through the point (0, 1). Figure 4 confirms that our model neurons do show this relationship. Figure 5 confirms that the spike-count correlations between pairs of model neurons follow the pattern predicted by Equation 17. The ranges covered by the thick pink lines in Figures 4 and 5 approximately indicate the ranges of Fano factors and correlations that occur in each parameterization.  
Figure 4
 
Fano factors of the neurons in the modeling of Figure 3. Each panel in this figure shows the Fano factors for the parameterization in the corresponding column of panels in Figure 3. For each model neuron and each stimulus level, we found the mean and variance of the spike count across the 10,000 repetitions. We divided the variance by the mean to give the Fano factor. Because of the large number different combinations of neuron and stimulus level (>105 in each panel), there were too many points to plot as individual dots, so we sorted the points in order of mean spike count and then chunked them into groups of 1,000. The thick pink lines plot the average mean spike count against the average Fano factor for each group. The black line in each plot is the predicted relationship between mean and Fano factor given by Equation 14.
Figure 4
 
Fano factors of the neurons in the modeling of Figure 3. Each panel in this figure shows the Fano factors for the parameterization in the corresponding column of panels in Figure 3. For each model neuron and each stimulus level, we found the mean and variance of the spike count across the 10,000 repetitions. We divided the variance by the mean to give the Fano factor. Because of the large number different combinations of neuron and stimulus level (>105 in each panel), there were too many points to plot as individual dots, so we sorted the points in order of mean spike count and then chunked them into groups of 1,000. The thick pink lines plot the average mean spike count against the average Fano factor for each group. The black line in each plot is the predicted relationship between mean and Fano factor given by Equation 14.
Figure 5
 
Spike-count correlations between pairs of neurons in the modeling of Figure 3. Each panel in this figure shows the correlations for the parameterization in the corresponding column of panels in Figure 3. Because of the large number of pairs of neurons, we just analyzed the responses to the stimulus level in the middle of the range of stimuli used in the modeling. For each pair of model neurons, we found the Pearson correlation between the spike counts of the two neurons across the 10,000 repetitions. We then sorted the data points in order of the predicted correlation given by Equation 17 and chunked them into groups of 1,000. The thick pink lines plot the mean predicted correlation against the mean actual correlation for each group. The black lines plot the outcome that would occur if the actual correlation were always exactly equal to the predicted correlation.
Figure 5
 
Spike-count correlations between pairs of neurons in the modeling of Figure 3. Each panel in this figure shows the correlations for the parameterization in the corresponding column of panels in Figure 3. Because of the large number of pairs of neurons, we just analyzed the responses to the stimulus level in the middle of the range of stimuli used in the modeling. For each pair of model neurons, we found the Pearson correlation between the spike counts of the two neurons across the 10,000 repetitions. We then sorted the data points in order of the predicted correlation given by Equation 17 and chunked them into groups of 1,000. The thick pink lines plot the mean predicted correlation against the mean actual correlation for each group. The black lines plot the outcome that would occur if the actual correlation were always exactly equal to the predicted correlation.
Modeling Weber's law for suprathreshold contrast discrimination: The Constant Naka-Rushton parameterization
The Constant parameterization yields Weber's law, and the Naka-Rushton function is the tuning function for contrast. Thus, it is appropriate to fit the Constant Naka-Rushton parameterization to the data from a contrast discrimination experiment that gave rise to Weber's law. Although contrast discrimination generally shows a near-miss to Weber's law (Legge, 1981; Meese et al., 2006), there have been some reports of Weber's law for this task (Swift & Smith, 1983; Bird, Henning, & Wichmann, 2002). We fitted the model to some of the data from Bird et al. (plotted as black triangles in Figure 6) and carried out simulations analogous to those in Figure 3, but using the Naka-Rushton tuning function instead of the Gaussian. The model parameters are given in the caption of Figure 6. In this simulation, we only considered psychophysical data from pedestals that were clearly suprathreshold: Very low contrasts would fall below the range of semisaturation contrasts found physiologically and would not stimulate many neurons, and in these circumstances, the Fisher information is unlikely to be close to the decoding precision, and cannot be used for estimating the model's psychophysical performance. 
Figure 6
 
Fitting the Constant Naka-Rushton parameterization to data from Bird et al. (2002). The black triangles in the bottom panels show the contrast discrimination data from Bird et al. (2002) for their subject GBH, for 4.19-c/deg sine-wave gratings, transcribed from their figure 3b. Four of the model's parameters were fixed as follows: q = 3, r0/rmax = 0.03, zmin = −3, zmax = 1 (q = 3 is close to the mean value found physiologically; see table 1 of May & Solomon, 2015). The values of rmax and σG were set as indicated above the top of each panel in the top row . The remaining parameter, h, was fitted to the data from Bird et al.; the fitted values are shown in the top panels. Monte Carlo simulations were conducted in a similar way to those in Figure 3, except using the Naka-Rushton tuning function. All plotting conventions are the same as or analogous to those in Figure 3. The blue circles plot Monte Carlo simulation results for every 18th value along the x-axis that we calculated (the rest are omitted from the figure for clarity, although the stimulus estimates for these x values were used for fitting psychometric functions in the 2AFC simulations). The red lines plot performance predicted from the true Fisher information Image not available (x, 1) (Equation 34), while the thick gray lines plot performance predicted from the integral approximation of the Fisher information Image not available (x, 1) (Equation 40). Precision was predicted by multiplying the Fisher information by 1 − Image not available (see Relation 31). The predicted discrimination threshold was derived from the predicted precision using Relation 33 with Pθ = 0.75.
Figure 6
 
Fitting the Constant Naka-Rushton parameterization to data from Bird et al. (2002). The black triangles in the bottom panels show the contrast discrimination data from Bird et al. (2002) for their subject GBH, for 4.19-c/deg sine-wave gratings, transcribed from their figure 3b. Four of the model's parameters were fixed as follows: q = 3, r0/rmax = 0.03, zmin = −3, zmax = 1 (q = 3 is close to the mean value found physiologically; see table 1 of May & Solomon, 2015). The values of rmax and σG were set as indicated above the top of each panel in the top row . The remaining parameter, h, was fitted to the data from Bird et al.; the fitted values are shown in the top panels. Monte Carlo simulations were conducted in a similar way to those in Figure 3, except using the Naka-Rushton tuning function. All plotting conventions are the same as or analogous to those in Figure 3. The blue circles plot Monte Carlo simulation results for every 18th value along the x-axis that we calculated (the rest are omitted from the figure for clarity, although the stimulus estimates for these x values were used for fitting psychometric functions in the 2AFC simulations). The red lines plot performance predicted from the true Fisher information Image not available (x, 1) (Equation 34), while the thick gray lines plot performance predicted from the integral approximation of the Fisher information Image not available (x, 1) (Equation 40). Precision was predicted by multiplying the Fisher information by 1 − Image not available (see Relation 31). The predicted discrimination threshold was derived from the predicted precision using Relation 33 with Pθ = 0.75.
The blue circles in Figure 6 show the Known Gain decoder's decoding precision and discrimination thresholds; these are very well predicted from both the true Fisher information expressions (thin red lines in Figure 6) and the integral approximations (thick gray lines). Table 2 shows that, in each case, the true decoding precision is within 2% or 3% of that predicted from the Fisher information. The integral approximation of the Fisher information is exceptionally close to the true Fisher information. 
Table 2
 
Precision scores expressed as a proportion of the precision predicted from N-RJ(x, 1) for the Constant Naka-Rushton parameterizations of Figure 6. For each of the 139 stimulus levels x between −1.5 and −0.5, we expressed the precision for each decoder as a proportion of that predicted from N-RJ(x, 1). The table shows the mean value of this proportion for each condition and decoder. The top row shows the precision predicted from Image not available (x, 1), expressed as a proportion in the same way. Each value in the table is the mean of 139 ratios. For each ratio, the precisions were calculated from 10,000 trials.
Table 2
 
Precision scores expressed as a proportion of the precision predicted from N-RJ(x, 1) for the Constant Naka-Rushton parameterizations of Figure 6. For each of the 139 stimulus levels x between −1.5 and −0.5, we expressed the precision for each decoder as a proportion of that predicted from N-RJ(x, 1). The table shows the mean value of this proportion for each condition and decoder. The top row shows the precision predicted from Image not available (x, 1), expressed as a proportion in the same way. Each value in the table is the mean of 139 ratios. For each ratio, the precisions were calculated from 10,000 trials.
Table 2 shows that, unlike with Gaussian tuning functions, both Unknown Gain decoders were much worse than the Known Gain decoder. The Bivariate decoder does better than the Univariate decoder, but in both cases, the gain fluctuations have a catastrophic effect on decoding precision. It is easy to see why this happens. For any Naka-Rushton-tuned neuron, an increase in gain has the same effect as an increase in contrast, so if the gain is unknown, then random gain fluctuations will be interpreted as fluctuations in contrast, and on each stimulus presentation, the decoded contrast will be biased in the direction of the gain value on that stimulus presentation, leading to inaccurate estimation. 
Modeling the near-miss to Weber's law for contrast discrimination: The Exponential Naka-Rushton parameterization
The black triangles in the bottom row of Figure 7 show human contrast discrimination data from Meese et al. (2006). The log-log slope is clearly shallower than that obtained by Bird et al. (2002), plotted in Figure 6. We fitted the Exponential Naka-Rushton parameterization to the data of Meese et al. by using the Fisher information to predict the model's thresholds, as described earlier, and performed similar Monte Carlo simulations (see Supplementary Appendix H). The psychophysical data have two degrees of freedom: the height and slope of the log-log plot of threshold against pedestal. Thus, we now needed two degrees of freedom in our model fit. In all the parameterizations in Figure 7, we fitted the parameter kh, which gives the density for a stimulus value of x = 0. Each column of Figure 7 uses a different choice of which other parameter(s) to fit. In column A, we fitted mh, so that h varied with z, and set the other m parameters (Display FormulaImage not available and mq) to zero so that rmax and q were constant across the neurons (equal to Display FormulaImage not available and kq, respectively). In column B, we fitted Display FormulaImage not available , so that rmax varied with z, and set the other m parameters (mh and mq) to zero. In column C, we fitted mq, so that q varied with z, and set the other m parameters (mh and Display FormulaImage not available ) to zero. In column D, we fitted all three m parameters, subject to the constraint that mh = Display FormulaImage not available = mq; thus, there were still only two degrees of freedom in the fit, but h, rmax, and q all varied with z. In each parameterization, all the parameters that were not fitted were set to reasonable values (see the caption of Figure 7 for these values).  
Figure 7
 
Fitting the Exponential Naka-Rushton parameterization to data from Meese et al. (2006). The data are plotted as black triangles in the bottom row of this figure (these data are from the Binocular condition of Meese et al., plotted as squares in their figure 5, and kindly supplied by Tim Meese). Each column of panels gives the data for one parameterization. In each case, σG = 0.2, r0/rmax = 0.03, zmin = −3, and zmax = 1. We fitted two parameter values to the data (fitted values are given in the top panel) and chose reasonable values for the others. One of the fitted parameters was always kh. The other fitted and fixed parameters in the different parameterizations were as follows. (A) The parameter mh was fitted, so that h varied with z, and we set Image not available = mq = 0, Image not available = 5.7, and kq = 3. (B) The parameter Image not available was fitted, so that rmax varied with z, and we set mh = mq = 0, Image not available = 16, and kq = 3. (C) The parameter mq was fitted, so that q varied with z, and we set mh = Image not available = 0, Image not available = 5.7, and kq = 7. (D) The parameters mh, Image not available , and mq were fitted, subject to the constraint that mh = Image not available = mq, so that h, rmax, and q varied with z, and we set Image not available = 8 and kq = 4. Plotting conventions are the same as in Figure 6. The value of N-RJ(x, 1) was calculated using Equation 34, and Image not available (x, 1) was calculated using Equation 66. Predicted discrimination threshold was predicted from each Fisher information expression using Relation 33 with Pθ = 1 − 0.5/e = 0.816…. Blue circles plot performance for every 18th stimulus level that we evaluated.
Figure 7
 
Fitting the Exponential Naka-Rushton parameterization to data from Meese et al. (2006). The data are plotted as black triangles in the bottom row of this figure (these data are from the Binocular condition of Meese et al., plotted as squares in their figure 5, and kindly supplied by Tim Meese). Each column of panels gives the data for one parameterization. In each case, σG = 0.2, r0/rmax = 0.03, zmin = −3, and zmax = 1. We fitted two parameter values to the data (fitted values are given in the top panel) and chose reasonable values for the others. One of the fitted parameters was always kh. The other fitted and fixed parameters in the different parameterizations were as follows. (A) The parameter mh was fitted, so that h varied with z, and we set Image not available = mq = 0, Image not available = 5.7, and kq = 3. (B) The parameter Image not available was fitted, so that rmax varied with z, and we set mh = mq = 0, Image not available = 16, and kq = 3. (C) The parameter mq was fitted, so that q varied with z, and we set mh = Image not available = 0, Image not available = 5.7, and kq = 7. (D) The parameters mh, Image not available , and mq were fitted, subject to the constraint that mh = Image not available = mq, so that h, rmax, and q varied with z, and we set Image not available = 8 and kq = 4. Plotting conventions are the same as in Figure 6. The value of N-RJ(x, 1) was calculated using Equation 34, and Image not available (x, 1) was calculated using Equation 66. Predicted discrimination threshold was predicted from each Fisher information expression using Relation 33 with Pθ = 1 − 0.5/e = 0.816…. Blue circles plot performance for every 18th stimulus level that we evaluated.
Note that the individual fitted m parameter values in column D are approximately one third of the values obtained when only one of the three parameters was fitted, indicating that the contribution of all three of these parameters to the Fisher information is additive. The precise additivity of mh and Display FormulaImage not available was proved analytically earlier, and the fitting results in column D suggest that this additivity also applies to mq, at least approximately.  
The plotting conventions in Figure 7 are the same as in Figure 6. The top row shows predicted and actual precision, while the bottom row shows 2AFC discrimination thresholds. Blue circles show the precision and thresholds from the Monte Carlo simulations with the Known Gain decoder; thin red curves show the precision and thresholds predicted from the true Fisher information N-RJ(x, 1) (Equation 34); thick gray curves show the precision and thresholds predicted from the integral approximation of the Fisher information Display FormulaImage not available (x, 1) (Equation 66). Thresholds were predicted from the Fisher information using Relation 33 with Pθ = 1 − 0.5/e = 0.816…, which was the performance level that defined the threshold in the study by Meese et al. (2006). These threshold predictions map out almost straight lines on the log-log plots, which fit well to the data from Meese et al.  
Recall that the derivation of Display FormulaImage not available (x, 1) was exactly correct only when r0 = mq = 0 (i.e., zero spontaneous firing rate, and q constant with respect to z). When either r0 ≠ 0 or mq ≠ 0 (as is the case in each panel of Figure 7), the integral was intractable, so we used work-arounds to give an approximate expression. Nevertheless, Table 3 shows that the precision predicted from the integral approximations of the Fisher information never differed by more than about 6% from that predicted from the true Fisher information. Meanwhile, the actual precision from the Known Gain decoder was never more than 6% lower than that predicted from the Fisher information. As with our other simulations of contrast discrimination, the performance of the Unknown Gain decoders was much worse than that of the Known Gain decoder.  
Table 3
 
Precision scores expressed as a proportion of the precision predicted from N-RJ(x, 1) for the Exponential Naka-Rushton parameterizations of Figure 7. For each of the 191 stimulus levels x between −1.5 and −0.5, we expressed the precision for each decoder as a proportion of that predicted from N-RJ(x, 1). The table shows the mean value of this proportion for each condition and decoder. The top row shows the precision predicted from Image not available (x, 1), expressed as a proportion in the same way. Each value in the table is the mean of 191 ratios. For each ratio, the precisions were calculated from 10,000 trials.
Table 3
 
Precision scores expressed as a proportion of the precision predicted from N-RJ(x, 1) for the Exponential Naka-Rushton parameterizations of Figure 7. For each of the 191 stimulus levels x between −1.5 and −0.5, we expressed the precision for each decoder as a proportion of that predicted from N-RJ(x, 1). The table shows the mean value of this proportion for each condition and decoder. The top row shows the precision predicted from Image not available (x, 1), expressed as a proportion in the same way. Each value in the table is the mean of 191 ratios. For each ratio, the precisions were calculated from 10,000 trials.
Discussion
The purpose of this study was to construct psychophysical models whose performance could easily be connected to the parameters of the neurons that encode the stimulus. We used a restricted parameterization of the model of neuronal spiking from Goris et al. (2014), in which each neuron has an independent Poisson spiking process and the spike rate of each neuron is modulated by a multiplicative, gamma-distributed gain signal that is shared between all the neurons. The individual neurons were characterized by the parameters of their tuning function: r0 (spontaneous spike rate), rmax (maximum increment in spike rate from r0), q (tuning sharpness), and z (position along the stimulus axis). Tuning functions could be sigmoidal (Naka-Rushton) or Gaussian. The population of neurons was characterized by the density h of neurons along the stimulus axis. The gain fluctuations (parameterized by the standard deviation σG) had both neuron-specific and population-wide effects. The neuron-specific effects of the gain fluctuations were the Fano factors (Figure 4), and the population-wide effects were the spike-count correlations that resulted from having the neurons share the same gain signal (Figure 5); both effects arise from Equation 12, which describes the covariance matrix for a population of model neurons of this kind. 
Model performance was calculated analytically by using the neuronal parameters to calculate the Fisher information, from which we can estimate the decoding precision. Although one can always find the performance level by setting up a Monte Carlo simulation (Chirimuuta, Clatworthy, & Tolhurst, 2003; Clatworthy, Chirimuuta, Lauritzen, & Tolhurst, 2003; Chirimuuta & Tolhurst, 2005), the long time required to complete the simulations for a single parameter set makes it difficult to fit the model to data (Chirimuuta & Tolhurst, 2005), and this method provides little insight into why the observed patterns of results were obtained or whether the results generalize to other parameterizations of the model. Our work solves all of these problems by providing fairly simple equations that can be evaluated to give a close approximation of the decoding precision and thus the discrimination threshold. 
Our expressions for the decoding precision were derived assuming that the decoder knows the gain signal g. In this case, we can derive the Fisher information as if the neurons were statistically independent; we then take the Fisher information for the case of g = 1 and multiply it by 1 − Display FormulaImage not available to obtain the predicted decoding precision (Relation 31). This predicted decoding precision is the modal value of the Fisher information across all stimulus presentations. With Gaussian tuning functions, we can relax the assumption that the decoder knows the gain signal and our equations still provide an accurate prediction of the model's performance.  
We derived two kinds of expression for the Fisher information. One was an exact expression (represented by the letter J), which consists of a sum with one term for each neuron. The other kind of expression (represented by the letter I) approximates this sum using an integral. These integral approximations are much more compact and can help to shed light on the relationships between psychophysical performance and the neuronal parameters. 
We outlined two basic types of parameterization of the generic sensory coding model: the Constant and Exponential parameterizations. In the Constant parameterization, all the neuronal parameters are constant with respect to z, and z is distributed with constant density along the stimulus axis. Our integral approximations revealed some particularly simple relationships between Fisher information and the neuronal parameters for the Constant parameterization. For both Naka-Rushton and Gaussian tuning functions, if r0/rmax is held constant, the Fisher information is proportional to rmaxqh (see Equation 40 for the Naka-Rushton function and Relation 50 for the Gaussian tuning function). For the Naka-Rushton tuning function, the Fisher information is proportional to a decreasing function of the relative spontaneous firing rate r0/rmax, which we call Q (see Equation 41 and Figure 2); as r0 increases, the Fisher information undergoes a multiplicative attenuation that is a function only of the ratio r0/rmax. The same effect holds to a very close approximation for the Gaussian tuning function (see Supplementary Appendix F). Another feature of the Constant parameterization is that the Fisher information is approximately constant across the stimulus axis (i.e., it is independent of x). Because of this, if x is the logarithm of the physical stimulus value, then the performance of the Constant parameterization will obey Weber's law (discrimination threshold proportional to pedestal). We used the exact Fisher information expressions to fit the Constant parameterizations of the model to real psychophysical data that conformed to Weber's law: Mayer and Kim's (1986) spatial frequency discrimination data (Figure 3) and the suprathreshold contrast discrimination data (Figure 6) of Bird et al. (2002). In all cases, the thresholds predicted from the Fisher information expressions gave excellent matches to the thresholds obtained from Monte Carlo simulations. 
Many studies of suprathreshold contrast discrimination have found a near-miss to Weber's law, where the plot of threshold against pedestal is a straight line with a slope of about 0.6–0.7 on a log-log plot. We showed that this could be accounted for by allowing rmax,q, or h to vary exponentially with z. The rates of increase were determined by parameters Display FormulaImage not available , mq, and mh, respectively. We call this the Exponential parameterization, and it is a generalization of the Constant parameterization (the Constant parameterization is the Exponential parameterization with Display FormulaImage not available = mq = mh = 0). The integral approximation of the Fisher information (Equation 66) revealed two features that were not explicit in the exact expressions. Firstly, the Fisher information for the Exponential parameterization is an exponential function of the stimulus level: It is proportional to exp(mx), where Display FormulaImage not available . Secondly, the Fisher information is a function of the sum of Display FormulaImage not available and mh, regardless of their individual values. We also found that, as with the Constant parameterizations, we could closely model the effect of r0 by multiplying the Fisher information by Q(r0/rmax) as defined in Equation 41. We used the exact Fisher information expressions to fit the Exponential Naka-Rushton parameterization to the contrast discrimination data of Meese et al. (2006). The thresholds predicted from the Fisher information expressions gave a good match to the thresholds obtained from Monte Carlo simulations.  
The fact that we needed an exponential increase in at least one neuronal parameter with increasing semisaturation contrast shows that the physiological constraints imposed by the near-miss to Weber's law are quite different from those imposed by the true Weber's law, which results if all the neuronal parameters are constant. This fact seems not to have been widely appreciated, because contrast discrimination performance is usually modeled using transducer models with additive noise (e.g., Wilson, 1980; Legge & Foley, 1980; Meese et al., 2006), and the transducer only needs a slight tweak to alter its predictions from Weber's law to the near-miss. 
Implicit decorrelation when the gain is known
In our model, if the decoder knows the gain signal, it can express the neuronal spike distributions as independent Poisson distributions. This is a form of decorrelation, but the neurons' responses are not changed and are therefore not explicitly decorrelated. The neuronal responses are correlated across all trials but are uncorrelated within each subset of trials that share the same gain signal. The decoder's knowledge of the gain signal allows it to identify which uncorrelated subset of trials the current trial belongs to, so the neuronal spiking distributions, conditioned on this knowledge, are statistically independent. The neurons have thus been decorrelated implicitly by virtue of the decoder's knowledge of what caused the correlations. Given the finding that a large proportion of neuronal variability is explained by fluctuating internal gain signals that can be shared between neurons (Ecker et al., 2014; Goris et al., 2014), it could be that many apparently correlated populations of neurons are implicitly uncorrelated to a large extent. 
Effect of correlations when the gain is unknown
Although knowledge of the gain signal can result in an implicit decorrelation of the neuronal responses, our simulations with the Gaussian tuning function show that it is not always necessary to know the gain in order to achieve near-optimal performance. The Bivariate decoder does not know the gain signal and expresses each pair of neuronal responses as a correlated gamma-Poisson mixture distribution. Nevertheless, its decoding performance is nearly as good as that of the Known Gain decoder. Furthermore, performance was not much worse with the Univariate decoder, which knows nothing whatever about the correlation structure of the population. This is highly convenient, as it means that for Gaussian tuning functions, our analytical expressions for the model's performance are not restricted to the class of models for which the decoder knows the gain signal or even knows about the pair-wise correlations. 
With the Naka-Rushton tuning function, a decoder that does not know the gain signal cannot distinguish changes in gain from changes in stimulus contrast, so the gain fluctuations greatly impair performance. Because of this, for Naka-Rushton tuning functions, our analytical expressions for the model's decoding performance apply only to the Known Gain decoder. 
So far, we have explained this difference between Gaussian and Naka-Rushton tuning functions intuitively: For the Naka-Rushton function, a change of gain will bias all the neurons' likelihood functions in the same direction, leading to inaccurate decoding, whereas for the Gaussian tuning function, neurons with tuning peaks on either side of the true stimulus value will have their likelihood functions biased in opposite directions, so the effect of a gain fluctuation cancels out and the stimulus estimate is largely unaffected. We can gain a more formal insight into this difference between the Gaussian and Naka-Rushton tuning functions by considering the Fisher information for decoding a pair of our model neurons that share a gain signal unknown to the decoder. In this two-neuron system, the likelihood function is given by Equation 22. Supplementary Appendix I shows that the Fisher information for decoding a stimulus of value x in this case is given by  where r1(x) and r2(x) are the tuning functions of the two neurons, and Display FormulaImage not available (x) and Display FormulaImage not available (x) are their first derivatives. If the tuning functions are Gaussian, then the slopes Display FormulaImage not available (x) and Display FormulaImage not available (x) will often be opposite in sign and will partially or completely cancel out in the third (subtractive) term of Equation 67. When x falls exactly midway between the peaks of two identically shaped Gaussian tuning functions, the third term is zero and the Fisher information is the same as for two independent Poisson-spiking neurons (i.e., equivalent to σG = 0); this gives an insight into why gain fluctuations have so little effect on decoding Gaussian-tuned neurons, even when the gain signal is unknown. On the other hand, the slopes of the Naka-Rushton functions are always positive and so they never cancel out in this way, so the third term of Equation 67 generally subtracts more from the Fisher information for Naka-Rushton-tuned neurons than it does for Gaussian-tuned neurons.  
It is important not to read too much into this analysis, because the Fisher information can be a poor estimator of decoding precision when the number of neurons is small (Xie, 2002; May & Solomon, 2015; also see the discussion of this issue in Supplementary Appendix I). The key point is that, for Gaussian tuning functions, the Cramér-Rao upper bound on the decoding precision can be independent of the gain variance, whereas for Naka-Rushton tuning functions, the upper bound on the precision will always decrease with increasing gain variance. 
Comparisons with other studies
Sanborn and Dayan (2011) argued that the central puzzle for contrast discrimination at high contrast is that the near-miss to Weber's law for contrast discrimination is difficult to reconcile with neuronal noise. They explained the near-miss to Weber's law using a model in which the neuronal noise was Gaussian with response variance that was nearly constant for low mean response levels and nearly proportional to the mean response for high mean response levels; they called this the “hinge noise” model. They set up a model in which the stimulus was encoded by a population of neurons with this form of noise. Each neuron had a linear contrast-response function, and the neurons differed from each other in their sensitivity to the stimulus orientation. Sanborn and Dayan showed that optimal decoding of this population gave rise to a contrast discrimination function with a log-log slope of 0.84 at high contrasts, i.e., a near-miss to Weber's law. 
A weakness of Sanborn and Dayan's model is that although it attempts to reconcile psychophysical and physiological findings, none of the key elements of the model are physiologically plausible. Firstly, the function mapping mean response to variance in their hinge noise model is quite unlike neuronal noise, which is proportional to the mean for low responses and proportional to the square of the mean for high responses—a relationship that is well captured by the gamma-Poisson mixture distribution that we used (see Goris et al., 2014). Secondly, Sanborn and Dayan's linear contrast-response function is quite different from those of real neurons, which are well described by the Naka-Rushton function (Albrecht & Hamilton, 1982). Thirdly, although Sanborn and Dayan's model has a range of orientation channels, it has only one contrast channel. In the visual cortex, neurons tend to be sensitive to changes in contrast over a relatively narrow contrast range, which suggests that the contrast code is distributed across a population of neurons with different semisaturation contrasts (Teo & Heeger, 1994; Chirimuuta et al., 2003; Clatworthy et al., 2003). Our work shows that, if we adopt this more physiologically plausible notion, then the “central puzzle” raised by Sanborn and Dayan disappears: The effect of the pedestal on threshold depends not just on the form of noise, but also on the distribution of the contrast-response functions along the contrast axis. 
Our approach of coding contrast across a range of model neurons with different semisaturation contrasts was used by Chirimuuta and Tolhurst (2005) to model contrast discrimination, and indeed our work was originally inspired by theirs. However, Chirimuuta and Tolhurst lacked an analytical description of their model's performance. This seriously hampered their attempts to fit their model to psychophysical data, as they were only able to try a small number of model parameter sets, due to the slowness of the Monte Carlo simulations. In their figure 9A, they used a model with eight Naka-Rushton-tuned neurons. Each neuron had q = 2, and there was additionally a threshold applied to the output of the Naka-Rushton function. For each of the eight neurons, rmax and z were free parameters, and Chirimuuta and Tolhurst adjusted these 16 parameters by hand to fit their psychophysical data. Our analytical approach allows us to reduce the parameter set and, more importantly, to understand exactly what contribution each parameter makes to the decoding precision. We can then fix most parameters to physiologically plausible values and adjust no more parameters than we need to fit the data (i.e., one parameter for fitting Weber's law and two parameters for fitting the near-miss to Weber's law). 
Our general approach to modeling psychophysical performance also has much in common with those of Goris et al. (2013) and Itti, Koch, and Braun (2000). In both cases, the stimulus was encoded over a population of model neurons which were assumed to be decoded in a way that was optimal (or quasioptimal) for the psychophysical task. The emphasis in both of those studies was to explain a large set of data using a single model. The emphasis in our study has been to derive mathematical relationships between the neuronal response properties and characteristics of performance. 
The distribution of neurons along the stimulus axis
Although our Constant and Exponential parameterizations of the model with Naka-Rushton tuning functions are successful in giving rise to, respectively, Weber's law and the near-miss to it, one might object that these models cannot be right, because the semisaturation contrasts measured in physiological experiments have neither a constant nor an exponential distribution on the log-contrast axis—the distribution is close to Gaussian (see figure 6 of Clatworthy et al., 2003, and supplementary appendix I of May & Solomon, 2015). However, it is difficult to know which neurons are being used for a particular task; the distribution of Clatworthy et al. may include many neurons that do not contribute to contrast discrimination performance. In addition, it should be noted that their empirical distribution was compiled by pooling a large group of neurons, regardless of the neurons' stimulus preferences. Psychophysical contrast sensitivity (the reciprocal of detection threshold) varies greatly across spatial and temporal frequency (Robson, 1966), suggesting that the lowest semisaturation contrast (zmin) would vary greatly between different subpopulations tuned to different spatiotemporal frequency combinations. Even if the distribution of z were flat, say, between zmin and zmax for each subpopulation, by pooling the subpopulations we would be adding together distributions with different lower limits, creating the graded drop-off that we see in the full population. A similar argument could be made regarding the upper end of the distribution. 
Different kinds of noise
In most psychophysical models, the noise is a random variable added to the model's deterministic response, and so the variance of the noise is the variance of the model's output signal. In our model, the output signal is the estimated stimulus value, and the variance on this output signal does not necessarily show the same characteristics as the noise on the neurons itself. For example, in our model, the variance of the noise on a single neuron increases with the mean response according to Equation 13: Variance is approximately proportional to the mean for low firing rates and approximately proportional to the square of the mean for high firing rates. However, for the Constant parameterization, the variance of the estimated log contrast is constant with respect to the stimulus level. Similarly, Sanborn and Dayan (2011) showed that in their model, with a linear transducer and Gaussian noise with variance proportional to contrast, the variance of the stimulus estimate was proportional to contrast only at very high contrasts: At low contrasts, the variance of the stimulus estimate was proportional to the square of the contrast. In summary, with a population coding model, the characteristics of the noise on the model's output are not necessarily similar to the characteristics of the noise shown by individual neurons. 
Physiological plausibility versus mathematical tractability
In any attempt to model brain processes, there is a trade-off between physiological plausibility and mathematical tractability. As we make the model more and more realistic from a biological point of view, it becomes harder and harder to understand the model's psychophysical behavior. Thus in modeling psychophysical behavior, it is not always desirable for the mechanisms of the model to be as realistic and complex as when we are modeling the behavior of individual neurons. 
Traditionally, psychophysical performance has usually been modeled using very simple models, in which real-valued signals are sent through a deterministic transducer and (usually Gaussian) noise is added to the transducer's output (e.g., Legge & Foley, 1980; Wilson, 1980; Meese et al., 2006). This kind of model is straightforward to understand mathematically and can provide a useful functional description of the system, because the relationships between model characteristics and performance characteristics are well understood (May & Solomon, 2013). However, the transducer model is such a simplification of the biological reality that it may shed little light on the connection between psychophysical performance and the properties of the neurons. 
In the work presented here, we have attempted to push our psychophysical modeling much closer towards physiological plausibility without losing the mathematical tractability that traditional models benefit from. To allow mathematical analysis of our model, we had to make some simplifying assumptions. Nevertheless, our model has realistic tuning functions or contrast-response functions, and responds with discrete, integer spike counts that are generated by a promising new model of neuronal variability (Goris et al., 2014), which gives rise to a realistic correlation structure in which correlations increase with spike rate and tuning similarity. We derived equations that accurately predict the model's performance and reveal surprisingly simple relationships between psychophysical performance and the neuronal parameters. 
Acknowledgments
This work was supported by EPSRC grant EP/H033955/1 to JAS. The authors would like to thank Julia Föster for helpful discussions and Tim Meese for providing the data from Meese et al. (2006), which we plotted in Figure 7. We also thank Wei Ji Ma and Peter Neri for insightful reviews of our manuscript. 
Commercial relationships: none. 
Corresponding author: Keith A. May. 
Email: keith@keithmay.org. 
Current address: Department of Computer Science, University College London, London, UK. 
References
Albrecht D. G., Hamilton D. B. (1982). Striate cortex of monkey and cat: Contrast response function. Journal of Neurophysiology, 48, 217–237.
Arbous A. G., Kerrich J. E. (1951). Accident statistics and the concept of accident-proneness. Biometrics, 7, 340–432.
Atick J. J. (1992). Could information theory provide an ecological theory of sensory processing? Network: Computation in Neural Systems, 3, 213–251.
Atick J. J., Li Z., Redlich A. N. (1992). Understanding retinal color coding from first principles. Neural Computation, 4, 559–572.
Atick J. J., Li Z., Redlich A. N. (1993). What does post-adaptation color appearance reveal about cortical color representation? Vision Research, 33, 123–129.
Atick J. J., Redlich A. N. (1990). Towards a theory of early visual processing. Neural Computation, 2, 308–320.
Atick J. J., Redlich A. N. (1992). What does the retina know about natural scenes? Neural Computation, 4, 196–210.
Averbeck B. B., Latham P. E., Pouget A. (2006). Neural correlations, population coding and computation. Nature Reviews Neuroscience, 7, 358–366.
Banks M. S., Geisler W. S., Bennett P. J. (1987). The physical limits of grating visibility. Vision Research, 27, 1915–1924.
Beck J. M., Ma W. J., Pitkow X., Latham P. E., Pouget A. (2012). Not noisy, just wrong: The role of suboptimal inference in behavioral variability. Neuron, 74, 30–39.
Bird C. M., Henning G. B., Wichmann F. A. (2002). Contrast discrimination with sinusoidal gratings of different spatial frequency. Journal of the Optical Society of America A, 19, 1267–1273.
Chirimuuta M., Clatworthy P. L., Tolhurst D. J. (2003). Coding of the contrasts in natural images by visual cortex (V1) neurons: A Bayesian approach. Journal of the Optical Society of America A, 20, 1253–1260.
Chirimuuta M., Tolhurst D. J. (2005). Does a Bayesian model of V1 contrast coding offer a neurophysiological account of human contrast discrimination? Vision Research, 45, 2943–2959.
Clatworthy P. L., Chirimuuta M., Lauritzen J. S., Tolhurst D. J. (2003). Coding of the contrasts in natural images by populations of neurons in primary visual cortex (V1). Vision Research, 43, 1983–2001.
Cohen M. R., Kohn A. (2011). Measuring and interpreting neuronal correlations. Nature Neuroscience, 14, 811–819.
Cramér H. (1946). Mathematical methods of statistics. Princeton, NJ: Princeton University Press.
Dakin S. C. (2001). Information limit on the spatial integration of local orientation signals. Journal of the Optical Society of America A, 18, 1016–1026.
Dayan P., Abbott L. F. (2001). Theoretical neuroscience: Computational and mathematical modeling of neural systems. Cambridge, MA: MIT Press.
de la Rocha J., Doiron B., Shea-Brown E., Josic K., Reyes A. (2007). Correlation between neural spike trains increases with firing rate. Nature, 448, 802–806.
De Valois R. L., Albrecht D. G., Thorell L. G. (1982). Spatial frequency selectivity of cells in macaque visual cortex. Vision Research, 22, 545–559.
Ecker A. S., Berens P., Cotton R. J., Subramaniyan M., Denfield G. H., Cadwell C.R., Tolias A. S. (2014). State dependence of noise correlations in macaque primary visual cortex. Neuron, 82, 235–248.
Ecker A. S., Berens P., Keliris G. A., Bethge M., Logothetis N. K., Tolias A. S. (2010). Decorrelated neuronal firing in cortical microcircuits. Science, 327, 584–587.
Geisler W. S. (1989). Sequential ideal-observer analysis of visual discriminations. Psychological Review, 96, 267–314.
Geisler W. S., Albrecht D. G. (1997). Visual cortex neurons in monkeys and cats: Detection, discrimination, and identification. Visual Neuroscience, 14, 897–919.
Goris R. L. T., Movshon J. A., Simoncelli E. P. (2014). Partitioning neuronal variability. Nature Neuroscience, 17, 858–865.
Goris R. L. T., Putzeys T., Wagemans J., Wichmann F. A. (2013). A neural population model for visual pattern detection. Psychological Review, 120, 472–496.
Graham N. V. S. (1989). Visual pattern analyzers. New York: Oxford University Press.
Itti L., Koch C., Braun J. (2000). Revisiting spatial vision: Toward a unifying model. Journal of the Optical Society of America A, 17, 1899–1917.
Laughlin S. (1981). A simple coding procedure enhances a neuron's information capacity. Zeitung für Naturforschung C, 36, 910–912.
Legge G. E. (1981). A power law for contrast discrimination. Vision Research, 21, 457–467.
Legge G. E., Foley J. M. (1980). Contrast masking in human vision. Journal of the Optical Society of America, 70, 1458–1471.
May K. A., Solomon J. A. (2013). Four theorems on the psychometric function. PLoS ONE, 8 (10), e74815.
May K. A., Solomon J. A. (2015). Connecting psychophysical performance to neuronal response properties: II. Contrast decoding and detection. Journal of Vision, 15 (6): 10, 1–21, http://www.journalofvision.org/content/15/6/10, doi:10.1167/15.6.10.[Article]
Mayer M. J., Kim C. B. Y. (1986). Smooth frequency discrimination functions for foveal, high-contrast, mid spatial frequencies. Journal of the Optical Society of America A, 3, 1957–1969.
Meese T., Georgeson M. A., Baker D. H. (2006). Binocular contrast vision at and above threshold. Journal of Vision, 6 (11): 7, 1224–1243, http://www.journalofvision.org/content/6/11/7, doi:10.1167/6.11.7.[PubMed] [Article]
Morgan M., Chubb C., Solomon J. A. (2008). A ‘dipper' function for texture discrimination based on orientation variance. Journal of Vision, 8 (11): 9, 1–8, http://www.journalofvision.org/content/8/11/9, doi:10.1167/8.11.9.[PubMed] [Article]
Naka K. I., Rushton W. A. H. (1966). S-potentials from colour units in the retina of fish (Cyprinidae). Journal of Physiology, 185, 536–555.
Nelder J. A., Mead R. (1965). A simplex method for function minimization. The Computer Journal, 7, 308–313.
Pelli D. G. (1990). The quantum efficiency of vision. In Blakemore C. (Ed.), Vision: Coding and efficiency (pp. 3–24). Cambridge, UK: Cambridge University Press.
Pelli D. G., Burns C. W., Farell B., Moore-Page D. C. (2006). Feature detection and letter identification. Vision Research, 46, 4646–4674.
Pelli D. G., Farell B. (1999). Why use noise? Journal of the Optical Society of America A, 16, 647–653.
Rao C. R. (1945). Information and accuracy obtainable in the estimation of statistical parameters. Bulletin of the Calcutta Mathematical Society, 37, 81–89.
Robson J. G. (1966). Spatial and temporal contrast-sensitivity functions of the visual system. Journal of the Optical Society of America, 56, 1141–1142.
Sanborn A. N., Dayan P. (2011). Optimal decisions for contrast discrimination. Journal of Vision, 11 (14): 9, 1–13, http://www.journalofvision.org/content/11/14/9, doi:10.1167/11.14.9.[PubMed] [Article]
Smith M. A., Kohn A. (2008). Spatial and temporal scales of neuronal correlation in primary visual cortex. Journal of Neuroscience, 28, 12591–12603.
Solomon J. A. (2010). Visual discrimination of orientation statistics in crowded and uncrowded arrays. Journal of Vision, 10 (14), 19, 1–16, http://www.journalofvision.org/content/10/14/19, doi:10.1167/10.14.19.[PubMed] [Article]
Srinivasan M. V., Laughlin S. B., Dubs A. (1982). Predictive coding: A fresh view of inhibition in the retina. Proceedings of the Royal Society of London B, 216, 427–459.
Swift D. J., Smith R. A. (1983). Spatial frequency masking and Weber's law. Vision Research, 23, 495–505.
Tadmor Y., Tolhurst D. J. (2000). Calculating the contrasts that retinal ganglion cells and LGN neurones encounter in natural scenes. Vision Research, 40, 3145–3157.
Teo P. C., Heeger D. J. (1994). Perceptual image distortion. Proceedings of SPIE, 2179, 127–141.
van Kan P. L. E., Scobey R. P., Gabor A. J. (1985). Response covariance in cat visual cortex. Experimental Brain Research, 60, 559–563.
Wilson H. R. (1980). A transducer function for threshold and suprathreshold human vision. Biological Cybernetics, 38, 171–178.
Xie X. (2002). Threshold behaviour of the maximum likelihood method in population decoding. Network: Computation in Neural Systems, 13, 447–456.
Zhaoping L. (2014). Understanding vision: Theory, models, and data. Oxford, UK: Oxford University Press.
Zhaoping L., Geisler W. S., May K. A. (2011). Human wavelength discrimination of monochromatic light explained by optimal wavelength decoding of light of unknown intensity. PLoS ONE, 6 (5), e19248.
Zohary E., Shadlen M. N., Newsome W. T. (1994). Correlated neuronal discharge rate and its implications for psychophysical performance. Nature, 370, 140–143.
Footnotes
1  In this article, we use the word “trial” in two ways. Firstly, we use it in the way a physiologist would, to mean a stimulus presentation. Secondly, we use it to mean a trial on a two-alternative forced-choice psychophysical experiment, in which the observer is presented with two stimuli and has to make a response. To distinguish these two meanings, we always refer to the latter type of trial as a “2AFC trial.”
Figure 1
 
Tuning functions. (A) Three Naka-Rushton functions, with q equal to 1, 2, and 4. Each function has c1/2 = 0.1. The value of r(c1/2) falls halfway between r0 and r0 + rmax. (B) The same three Naka-Rushton functions as in (A), plotted as functions of log10 contrast. The log semisaturation contrast is given by z = −1. (C) Three Gaussian tuning functions, with z = −1 and q equal to 1, 2, and 4.
Figure 1
 
Tuning functions. (A) Three Naka-Rushton functions, with q equal to 1, 2, and 4. Each function has c1/2 = 0.1. The value of r(c1/2) falls halfway between r0 and r0 + rmax. (B) The same three Naka-Rushton functions as in (A), plotted as functions of log10 contrast. The log semisaturation contrast is given by z = −1. (C) Three Gaussian tuning functions, with z = −1 and q equal to 1, 2, and 4.
Figure 2
 
Attenuation of Fisher information for nonzero r0. The curve plots Q(r0/rmax) as defined in Equation 41.
Figure 2
 
Attenuation of Fisher information for nonzero r0. The curve plots Q(r0/rmax) as defined in Equation 41.
Figure 3
 
Fitting the Constant Gaussian parameterization to data from Mayer and Kim (1986). The black triangles in the bottom panels show Mayer and Kim's spatial frequency discrimination data for their subject MJ, condition R 2PFC, transcribed from their figure 7. Mayer and Kim's data actually show the difference in spatial frequency between the 0.25 point and 0.75 point of the psychometric function. The pedestal frequency should fall halfway between these points, so we halved their thresholds to obtain Δfθ, the difference between target and pedestal at a threshold performance level of Pθ = 0.75. Four of the model's parameters were fixed as follows: zmin = −0.3, zmax = 1.7, r0/rmax = 0.03, and ω = 1.5 octaves. The values of rmax and σG were set as indicated above the top of each panel in the top row. The remaining parameter, h, was fitted to Mayer and Kim's data; the fitted values are shown in the top panels. Having fitted this parameter, we carried out Monte Carlo simulations as described in the text and in Supplementary Appendix H. The blue circles in the top row of panels plot the decoding precision from the Monte Carlo simulations using the Known Gain decoder. These points show decoding precision for every 30th value along the x-axis that we calculated. The rest are omitted from the figure for clarity, although the stimulus estimates for these x values were used in the 2AFC simulations, where we needed a fairly fine sampling of the x-axis to fit the psychometric function and hence find the discrimination threshold. The red lines in the top row of panels plot the precision predicted from the Fisher information GaussJ(x, 1) (Equation 36), while the thick gray lines plot the precision predicted from the integral approximation of the Fisher information Image not available (x, 1) (Equation 53); in both cases, we converted ω to q using Equation 9 before applying Equation 36 or 53. The predicted precision is found by multiplying the Fisher information by 1 − Image not available (see Relation 31). In the bottom row, the blue circles plot the thresholds Δfθ obtained from the Monte Carlo simulations of the 2AFC discrimination task with the Known Gain decoder. The red lines in the bottom row plot the thresholds predicted from GaussJ(x, 1) using Relation 33 with Pθ = 0.75. The thick gray lines plot the thresholds predicted in the same way, except using Image not available (x, 1) to approximate the Fisher information. Note that the abscissas in the top row are identical to those on the bottom row, i.e., each position on the abscissa in the top row represents the same stimulus as the same position on the abscissa in the bottom row. In the top row, we have marked the abscissas with log units, since the precision is calculated from the values in these units; in the bottom row, we have marked the abscissas with linear units, to be compatible with the threshold, which is defined as the difference of spatial frequencies.
Figure 3
 
Fitting the Constant Gaussian parameterization to data from Mayer and Kim (1986). The black triangles in the bottom panels show Mayer and Kim's spatial frequency discrimination data for their subject MJ, condition R 2PFC, transcribed from their figure 7. Mayer and Kim's data actually show the difference in spatial frequency between the 0.25 point and 0.75 point of the psychometric function. The pedestal frequency should fall halfway between these points, so we halved their thresholds to obtain Δfθ, the difference between target and pedestal at a threshold performance level of Pθ = 0.75. Four of the model's parameters were fixed as follows: zmin = −0.3, zmax = 1.7, r0/rmax = 0.03, and ω = 1.5 octaves. The values of rmax and σG were set as indicated above the top of each panel in the top row. The remaining parameter, h, was fitted to Mayer and Kim's data; the fitted values are shown in the top panels. Having fitted this parameter, we carried out Monte Carlo simulations as described in the text and in Supplementary Appendix H. The blue circles in the top row of panels plot the decoding precision from the Monte Carlo simulations using the Known Gain decoder. These points show decoding precision for every 30th value along the x-axis that we calculated. The rest are omitted from the figure for clarity, although the stimulus estimates for these x values were used in the 2AFC simulations, where we needed a fairly fine sampling of the x-axis to fit the psychometric function and hence find the discrimination threshold. The red lines in the top row of panels plot the precision predicted from the Fisher information GaussJ(x, 1) (Equation 36), while the thick gray lines plot the precision predicted from the integral approximation of the Fisher information Image not available (x, 1) (Equation 53); in both cases, we converted ω to q using Equation 9 before applying Equation 36 or 53. The predicted precision is found by multiplying the Fisher information by 1 − Image not available (see Relation 31). In the bottom row, the blue circles plot the thresholds Δfθ obtained from the Monte Carlo simulations of the 2AFC discrimination task with the Known Gain decoder. The red lines in the bottom row plot the thresholds predicted from GaussJ(x, 1) using Relation 33 with Pθ = 0.75. The thick gray lines plot the thresholds predicted in the same way, except using Image not available (x, 1) to approximate the Fisher information. Note that the abscissas in the top row are identical to those on the bottom row, i.e., each position on the abscissa in the top row represents the same stimulus as the same position on the abscissa in the bottom row. In the top row, we have marked the abscissas with log units, since the precision is calculated from the values in these units; in the bottom row, we have marked the abscissas with linear units, to be compatible with the threshold, which is defined as the difference of spatial frequencies.
Figure 4
 
Fano factors of the neurons in the modeling of Figure 3. Each panel in this figure shows the Fano factors for the parameterization in the corresponding column of panels in Figure 3. For each model neuron and each stimulus level, we found the mean and variance of the spike count across the 10,000 repetitions. We divided the variance by the mean to give the Fano factor. Because of the large number different combinations of neuron and stimulus level (>105 in each panel), there were too many points to plot as individual dots, so we sorted the points in order of mean spike count and then chunked them into groups of 1,000. The thick pink lines plot the average mean spike count against the average Fano factor for each group. The black line in each plot is the predicted relationship between mean and Fano factor given by Equation 14.
Figure 4
 
Fano factors of the neurons in the modeling of Figure 3. Each panel in this figure shows the Fano factors for the parameterization in the corresponding column of panels in Figure 3. For each model neuron and each stimulus level, we found the mean and variance of the spike count across the 10,000 repetitions. We divided the variance by the mean to give the Fano factor. Because of the large number different combinations of neuron and stimulus level (>105 in each panel), there were too many points to plot as individual dots, so we sorted the points in order of mean spike count and then chunked them into groups of 1,000. The thick pink lines plot the average mean spike count against the average Fano factor for each group. The black line in each plot is the predicted relationship between mean and Fano factor given by Equation 14.
Figure 5
 
Spike-count correlations between pairs of neurons in the modeling of Figure 3. Each panel in this figure shows the correlations for the parameterization in the corresponding column of panels in Figure 3. Because of the large number of pairs of neurons, we just analyzed the responses to the stimulus level in the middle of the range of stimuli used in the modeling. For each pair of model neurons, we found the Pearson correlation between the spike counts of the two neurons across the 10,000 repetitions. We then sorted the data points in order of the predicted correlation given by Equation 17 and chunked them into groups of 1,000. The thick pink lines plot the mean predicted correlation against the mean actual correlation for each group. The black lines plot the outcome that would occur if the actual correlation were always exactly equal to the predicted correlation.
Figure 5
 
Spike-count correlations between pairs of neurons in the modeling of Figure 3. Each panel in this figure shows the correlations for the parameterization in the corresponding column of panels in Figure 3. Because of the large number of pairs of neurons, we just analyzed the responses to the stimulus level in the middle of the range of stimuli used in the modeling. For each pair of model neurons, we found the Pearson correlation between the spike counts of the two neurons across the 10,000 repetitions. We then sorted the data points in order of the predicted correlation given by Equation 17 and chunked them into groups of 1,000. The thick pink lines plot the mean predicted correlation against the mean actual correlation for each group. The black lines plot the outcome that would occur if the actual correlation were always exactly equal to the predicted correlation.
Figure 6
 
Fitting the Constant Naka-Rushton parameterization to data from Bird et al. (2002). The black triangles in the bottom panels show the contrast discrimination data from Bird et al. (2002) for their subject GBH, for 4.19-c/deg sine-wave gratings, transcribed from their figure 3b. Four of the model's parameters were fixed as follows: q = 3, r0/rmax = 0.03, zmin = −3, zmax = 1 (q = 3 is close to the mean value found physiologically; see table 1 of May & Solomon, 2015). The values of rmax and σG were set as indicated above the top of each panel in the top row . The remaining parameter, h, was fitted to the data from Bird et al.; the fitted values are shown in the top panels. Monte Carlo simulations were conducted in a similar way to those in Figure 3, except using the Naka-Rushton tuning function. All plotting conventions are the same as or analogous to those in Figure 3. The blue circles plot Monte Carlo simulation results for every 18th value along the x-axis that we calculated (the rest are omitted from the figure for clarity, although the stimulus estimates for these x values were used for fitting psychometric functions in the 2AFC simulations). The red lines plot performance predicted from the true Fisher information Image not available (x, 1) (Equation 34), while the thick gray lines plot performance predicted from the integral approximation of the Fisher information Image not available (x, 1) (Equation 40). Precision was predicted by multiplying the Fisher information by 1 − Image not available (see Relation 31). The predicted discrimination threshold was derived from the predicted precision using Relation 33 with Pθ = 0.75.
Figure 6
 
Fitting the Constant Naka-Rushton parameterization to data from Bird et al. (2002). The black triangles in the bottom panels show the contrast discrimination data from Bird et al. (2002) for their subject GBH, for 4.19-c/deg sine-wave gratings, transcribed from their figure 3b. Four of the model's parameters were fixed as follows: q = 3, r0/rmax = 0.03, zmin = −3, zmax = 1 (q = 3 is close to the mean value found physiologically; see table 1 of May & Solomon, 2015). The values of rmax and σG were set as indicated above the top of each panel in the top row . The remaining parameter, h, was fitted to the data from Bird et al.; the fitted values are shown in the top panels. Monte Carlo simulations were conducted in a similar way to those in Figure 3, except using the Naka-Rushton tuning function. All plotting conventions are the same as or analogous to those in Figure 3. The blue circles plot Monte Carlo simulation results for every 18th value along the x-axis that we calculated (the rest are omitted from the figure for clarity, although the stimulus estimates for these x values were used for fitting psychometric functions in the 2AFC simulations). The red lines plot performance predicted from the true Fisher information Image not available (x, 1) (Equation 34), while the thick gray lines plot performance predicted from the integral approximation of the Fisher information Image not available (x, 1) (Equation 40). Precision was predicted by multiplying the Fisher information by 1 − Image not available (see Relation 31). The predicted discrimination threshold was derived from the predicted precision using Relation 33 with Pθ = 0.75.
Figure 7
 
Fitting the Exponential Naka-Rushton parameterization to data from Meese et al. (2006). The data are plotted as black triangles in the bottom row of this figure (these data are from the Binocular condition of Meese et al., plotted as squares in their figure 5, and kindly supplied by Tim Meese). Each column of panels gives the data for one parameterization. In each case, σG = 0.2, r0/rmax = 0.03, zmin = −3, and zmax = 1. We fitted two parameter values to the data (fitted values are given in the top panel) and chose reasonable values for the others. One of the fitted parameters was always kh. The other fitted and fixed parameters in the different parameterizations were as follows. (A) The parameter mh was fitted, so that h varied with z, and we set Image not available = mq = 0, Image not available = 5.7, and kq = 3. (B) The parameter Image not available was fitted, so that rmax varied with z, and we set mh = mq = 0, Image not available = 16, and kq = 3. (C) The parameter mq was fitted, so that q varied with z, and we set mh = Image not available = 0, Image not available = 5.7, and kq = 7. (D) The parameters mh, Image not available , and mq were fitted, subject to the constraint that mh = Image not available = mq, so that h, rmax, and q varied with z, and we set Image not available = 8 and kq = 4. Plotting conventions are the same as in Figure 6. The value of N-RJ(x, 1) was calculated using Equation 34, and Image not available (x, 1) was calculated using Equation 66. Predicted discrimination threshold was predicted from each Fisher information expression using Relation 33 with Pθ = 1 − 0.5/e = 0.816…. Blue circles plot performance for every 18th stimulus level that we evaluated.
Figure 7
 
Fitting the Exponential Naka-Rushton parameterization to data from Meese et al. (2006). The data are plotted as black triangles in the bottom row of this figure (these data are from the Binocular condition of Meese et al., plotted as squares in their figure 5, and kindly supplied by Tim Meese). Each column of panels gives the data for one parameterization. In each case, σG = 0.2, r0/rmax = 0.03, zmin = −3, and zmax = 1. We fitted two parameter values to the data (fitted values are given in the top panel) and chose reasonable values for the others. One of the fitted parameters was always kh. The other fitted and fixed parameters in the different parameterizations were as follows. (A) The parameter mh was fitted, so that h varied with z, and we set Image not available = mq = 0, Image not available = 5.7, and kq = 3. (B) The parameter Image not available was fitted, so that rmax varied with z, and we set mh = mq = 0, Image not available = 16, and kq = 3. (C) The parameter mq was fitted, so that q varied with z, and we set mh = Image not available = 0, Image not available = 5.7, and kq = 7. (D) The parameters mh, Image not available , and mq were fitted, subject to the constraint that mh = Image not available = mq, so that h, rmax, and q varied with z, and we set Image not available = 8 and kq = 4. Plotting conventions are the same as in Figure 6. The value of N-RJ(x, 1) was calculated using Equation 34, and Image not available (x, 1) was calculated using Equation 66. Predicted discrimination threshold was predicted from each Fisher information expression using Relation 33 with Pθ = 1 − 0.5/e = 0.816…. Blue circles plot performance for every 18th stimulus level that we evaluated.
Table 1
 
Precision scores expressed as a proportion of the precision predicted from GaussJ(x, 1) for the Constant Gaussian parameterizations of Figure 3. For each of the 104 stimulus levels x between 0.6 and 0.8, we expressed the precision for each decoder as a proportion of that predicted from GaussJ(x, 1). The table shows the mean value of this proportion for each condition and decoder. The top row shows the precision predicted from Image not available (x, 1), expressed as a proportion in the same way. Each value in the table is the mean of 104 ratios. For each ratio, the precisions were calculated from 10,000 trials, except for the Bivariate decoder on the two conditions with rmax = 4. These conditions had a very large number of neurons, so the Bivariate decoding algorithm, which calculated a likelihood for every pair of neurons, was very slow. Due to time constraints, we were only able to decode 4,200 of the 10,000 trials for σG = 0.2, rmax = 4, and 3,500 of the 10,000 trials for σG = 0.4, rmax = 4. The slightly higher precision score for the Bivariate over the Known Gain decoder for σG = 0.2, rmax = 4 is a result of sampling error due to an insufficient number of trials.
Table 1
 
Precision scores expressed as a proportion of the precision predicted from GaussJ(x, 1) for the Constant Gaussian parameterizations of Figure 3. For each of the 104 stimulus levels x between 0.6 and 0.8, we expressed the precision for each decoder as a proportion of that predicted from GaussJ(x, 1). The table shows the mean value of this proportion for each condition and decoder. The top row shows the precision predicted from Image not available (x, 1), expressed as a proportion in the same way. Each value in the table is the mean of 104 ratios. For each ratio, the precisions were calculated from 10,000 trials, except for the Bivariate decoder on the two conditions with rmax = 4. These conditions had a very large number of neurons, so the Bivariate decoding algorithm, which calculated a likelihood for every pair of neurons, was very slow. Due to time constraints, we were only able to decode 4,200 of the 10,000 trials for σG = 0.2, rmax = 4, and 3,500 of the 10,000 trials for σG = 0.4, rmax = 4. The slightly higher precision score for the Bivariate over the Known Gain decoder for σG = 0.2, rmax = 4 is a result of sampling error due to an insufficient number of trials.
Table 2
 
Precision scores expressed as a proportion of the precision predicted from N-RJ(x, 1) for the Constant Naka-Rushton parameterizations of Figure 6. For each of the 139 stimulus levels x between −1.5 and −0.5, we expressed the precision for each decoder as a proportion of that predicted from N-RJ(x, 1). The table shows the mean value of this proportion for each condition and decoder. The top row shows the precision predicted from Image not available (x, 1), expressed as a proportion in the same way. Each value in the table is the mean of 139 ratios. For each ratio, the precisions were calculated from 10,000 trials.
Table 2
 
Precision scores expressed as a proportion of the precision predicted from N-RJ(x, 1) for the Constant Naka-Rushton parameterizations of Figure 6. For each of the 139 stimulus levels x between −1.5 and −0.5, we expressed the precision for each decoder as a proportion of that predicted from N-RJ(x, 1). The table shows the mean value of this proportion for each condition and decoder. The top row shows the precision predicted from Image not available (x, 1), expressed as a proportion in the same way. Each value in the table is the mean of 139 ratios. For each ratio, the precisions were calculated from 10,000 trials.
Table 3
 
Precision scores expressed as a proportion of the precision predicted from N-RJ(x, 1) for the Exponential Naka-Rushton parameterizations of Figure 7. For each of the 191 stimulus levels x between −1.5 and −0.5, we expressed the precision for each decoder as a proportion of that predicted from N-RJ(x, 1). The table shows the mean value of this proportion for each condition and decoder. The top row shows the precision predicted from Image not available (x, 1), expressed as a proportion in the same way. Each value in the table is the mean of 191 ratios. For each ratio, the precisions were calculated from 10,000 trials.
Table 3
 
Precision scores expressed as a proportion of the precision predicted from N-RJ(x, 1) for the Exponential Naka-Rushton parameterizations of Figure 7. For each of the 191 stimulus levels x between −1.5 and −0.5, we expressed the precision for each decoder as a proportion of that predicted from N-RJ(x, 1). The table shows the mean value of this proportion for each condition and decoder. The top row shows the precision predicted from Image not available (x, 1), expressed as a proportion in the same way. Each value in the table is the mean of 191 ratios. For each ratio, the precisions were calculated from 10,000 trials.
Supplement 1
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×