**In a wide variety of neural systems, neurons tuned to a primary dimension of interest often have responses that are modulated in a multiplicative manner by other features such as stimulus intensity or contrast. In this methodological study, we present a demonstration that it is possible to use psychophysical experiments to compare competing hypotheses of multiplicative gain modulation in a neural population, using the specific example of contrast gain modulation in orientation-tuned visual neurons. We demonstrate that fitting biologically interpretable models to psychophysical data yields physiologically accurate estimates of contrast tuning parameters and allows us to compare competing hypotheses of contrast tuning. We demonstrate a powerful methodology for comparing competing neural models using adaptively generated psychophysical stimuli and demonstrate that such stimuli can be highly effective for distinguishing qualitatively similar hypotheses. We relate our work to the growing body of literature that uses fits of neural models to behavioral data to gain insight into neural coding and suggest directions for future research.**

*N*neurons, a

*neural encoding*model

*P*(

**r|s**,

*) specifies the probability of observing neural responses*

**θ****r**= (

*r*

_{1}, …,

*r*)

_{N}^{T}as a function of stimulus parameters

**s**and neuronal population parameters

*(Borst & Theunissen, 1999; Paninski et al., 2007). Perhaps the simplest possible neural encoding model is a set of tuning curves specifying the expected firing rate of each neuron in the population as a function of the sensory variable*

**θ****s**, for instance, the orientation-tuning curves shown in Figure 1a. In this case, the population parameters

*would represent the properties of this set of tuning curves, for instance, the centers*

**θ***μ*

_{1}, …,

*μ*, tuning curve width

_{N}*σ*, and amplitude

*A*. Similarly, a

*neural decoding*model

*P*(

**s**|

**r**,

*) specifies the probability of a stimulus*

**ω****s**being present as a function of the observed neural responses

**r**and possibly additional parameters

*(Paninski et al., 2007).*

**ω****Figure 1**

**Figure 1**

*behavioral decoding*model

*P*(

*b*|

**r**,

*) as specifying the probability of a behavioral response*

**ω***b*as a function of neural responses

**r**as well as additional decoding parameters

*. In this formulation, the stochastic neural responses*

**ω****r**, which is the output of the neural encoding model, serves as the input to the behavioral decoding model, as illustrated in Figure 1b and c. The behavioral decoding model may deterministically specify

*b*as a function of

**r**,

*, as in the example shown in Figure 1b, which compares the decision variable*

**ω***b*probabilistically in order to model stimulus-independent “decision noise” (Shadlen, Britten, Newsome, & Movshon, 1996). The joint probability of observing a behavior

*b*and neural response

**r**as a function of a stimulus

**s**may be written as the product of a neural encoding model and behavioral decoding model using the basic probability law

*P*(

*A*,

*B*) =

*P*(

*A*|

*B*)

*P*(

*B*) (Bishop, 2006), yielding the expression By marginalizing the joint probability

*P*(

*b*,

**r**|

**s**,

*,*

**ω***) over*

**θ****r**, we can express the probability of a behavior entirely as a function of the stimulus parameters

**s**and model parameters

*,*

**θ***without any dependence on unobserved neural responses. This follows from the basic probability law ∫*

**ω***P*(

*A*,

*B*)

*dB*=

*P*(

*A*). Marginalizing Equation 1 over

**r**yields the equation

**r**conditioned on the stimulus

**s**. In the case of fixed decoding model parameters

*, so that*

**ω̂***P*(

*) =*

**ω***δ*(

*−*

**ω***) (where*

**ω̂***δ*denotes the Dirac delta function), we can use Equation 2 to derive an expression for the posterior probability of the neural encoding model parameters

*given only psychophysical trial data*

**θ***P*(

*) about the neural encoding model parameters, Equation 3 becomes the likelihood. In the application presented in this study, we do not incorporate informative priors on*

**θ***and simply attain maximum likelihood point estimates.*

**θ***P*(

**r**|

**s**,

*), or behavioral decoding, i.e.,*

**θ***P*(

*b*|

**r**,

*) models. Even with such assumptions, one must be aware that there are practical limitations on the number of neuronal parameters that can be accurately estimated during the course of a psychophysical experiment. As studies with classification images show (Ahumada, 1996; Eckstein & Ahumada, 2002; Mineault, Barthelmé, & Pack, 2009; Murray, 2011), binomial (e.g., yes/no) responses provide relatively little information per trial, necessitating a large number of trials to attain accurate estimates of the perceptual filter. However, we demonstrate here that it is very realistic to use psychophysical data to estimate and compare low-dimensional analytical models (e.g., May and Solomon, 2015a, 2015b; Pestilli et al., 2011; Pestilli et al., 2009) in a process of focused hypothesis testing.*

**ω***c*(0 ≤

*c*≤ 100%) has been tilted (by

*dϕ*°) with respect to vertical. In order to do this, we must specify concretely the hypothesized neural code

**r**, the observable behaviors

*b*, the hypothesized neural encoding model

*P*(

**r**|

**s**,

*), and the hypothesized behavioral decoding model*

**θ***P*(

*b*|

**r**,

*).*

**ω***ψ*(

*c*) denotes the contrast tuning (also called

*contrast gain*) of neurons in the population, and

*K*∝

*ϕ*

_{0}=

*π*/2) at 100% contrast.

*ψ*(

*c*). One form suggested from neurophysiological findings (Albrecht & Hamilton, 1982) is the Naka-Rushton function having parameters

**η**^{(1)}= (

*n, c*

_{50})

^{T}. This functional form (Equation 5) is also sometimes referred to as the hyperbolic ratio function (Albrecht & Hamilton, 1982). Another form is the hyperbolic tangent (tanh) function commonly used in machine learning (Bishop, 2006), having parameter

**η**^{(2)}= (

*b*)

^{T}. Both of these functional forms (Naka-Rushon, Tanh) are shown in Figure 2. Finally, we consider a Gaussian form that allows for the possibility of a nonmonotonic relationship between contrast and firing rate, given by with parameters

**η**^{(3)}= (

*μ*,

*σ*)

^{T}.

**Figure 2**

**Figure 2**

*qualitative*comparison because the two models are qualitatively very different (monotonic vs. nonmonotonic) whereas the comparison between Naka-Rushton and Tanh is a fine-grained

*quantitative*comparison because the two models are both monotonic and qualitatively very similar (Figure 2).

*d′*(25), we can use thresholds taken at multiple contrasts to estimate the psychometric function parameters using least-squares curve fitting. Figure 3 shows the best fit of the model (Equation 4) with Naka-Rushton contrast gain (Equation 5) to the data from Skottun, Bradley, Sclar, Ohzawa, and Freeman (1987; their figure 1). We see in Figure 3 that this model provides an excellent fit to their data (Supplementary Figure S1). We find that the values recovered for the Naka-Rushton contrast function parameters

*n*,

*c*

_{50}from their threshold data lie within the range measured in previous neurophysiological work (Albrecht & Hamilton, 1982) as shown in Figure 4 (red circles).

**Figure 3**

**Figure 3**

**Figure 4**

**Figure 4**

**η**^{(1)}= (

*n*,

*c*

_{50})

^{T}were not obtained as in most psychophysical experiments, in which one finds the maximum likelihood estimate of model parameters using stimulus-response data

*P*(

*b*= 1|

**s**= (

*c*,

*dϕ*)

^{T},

*K*,

**η**^{(1)}) are shown in Figure 5 (and Supplementary Figure S3) with fits of the model (Equation 4) with Naka-Rushton gain (Equation 5) to subject data in the middle column. We found in a subsequent experiment (Supplementary Material) that this model could also generalize reasonably well for most (but not all) subjects to predict responses to a small validation set of novel stimuli (Supplementary Figure S4).

**Figure 5**

**Figure 5**

*n*,

*c*

_{50}estimated from our Experiment 1 data (black diamonds) lie within the neurophysiologically observed range. Numerical values of these parameters are given in Supplementary Tables S1 and S2. Interestingly, we find that all of our estimates of the half-saturation parameter

*c*

_{50}obtained in these experiments (along with five of six estimates of

*c*

_{50}from Skottun et al., 1987) lie toward the lower end of the physiologically observed range (i.e., around 5% contrast; see Albrecht & Hamilton, 1982). This suggests the subjects may be using the neurons that are most sensitive to contrast when they perform the task, consistent with the “lower envelope” principle of sensory coding (Egger & Britten, 2013; Mountcastle, LaMotte, & Carli, 1972; L. Wang et al., 2007).

*ψ*(

*c*) was not known beforehand from physiological recordings, we may be interested in evaluating various possibilities by fitting the model (Equation 4) to psychophysical data with different choices for

*ψ*(

*c*) and seeing which best accounts for the observed results. Such information derived from relatively fast and inexpensive psychophysical experiments could provide important clues to guide subsequent neurophysiology research.

*ψ*(

*c*) define a discrete space of three competing neural encoding models, which we index by

*i*= 1, 2, 3. By fitting each model to psychophysical data, we may evaluate their relative likelihoods using the Akaike Information Criterion (AIC), which measures goodness-of-fit while penalizing model complexity (Akaike, 1974; Burnham & Anderson, 2003). Previous work has shown that it is important that any model comparison method takes complexity into account because an overly complex model often fits training data well but fails to generalize to novel observations (Bishop, 2006; Pitt & Myung, 2002).

*i*-th model by AIC

*, with model*

_{i}*i*being preferred to model

*j*if AIC

*> AIC*

_{i}*. We define a*

_{j}*model preference index*where a positive value of

*P*indicates model

_{i–j}*i*is preferred to model

*j*, and a negative value indicating

*j*is preferred to

*i*. The model preference index is defined implicitly with respect to a fixed number of observations, i.e.,

*P*=

_{i–j}*P*(

_{i–j }*n*), where

*n*is the number of trials used to compute the AIC. We define a

*change in model preference*after

*k*additional trials as In our analysis, model 1 assumes Naka-Rushton contrast tuning (Equation 5), model 2 assumes Tanh tuning (Equation 6), and model 3 assumes Gaussian tuning (Equation 7).

*P*

_{1–2}(Naka-Rushton–Tanh) and

*P*

_{1–3}(Naka-Rushton–Gaussian). We see in Figure 6a that the Naka-Rushton model is preferred over the Gaussian model for all nine subjects and over the Tanh model for seven of nine subjects, with the preference being quite strong for many subjects. Statistical tests show that over these nine subjects, both model preferences are significantly different from zero (sign-rank test,

*n*= 9;

*P*

_{1–2}> 0:

*p*= 0.02,

*P*

_{1–3}> 0:

*p*= 0.004). Figure 6b shows how this model preference

*P*

_{1–2}evolves with the number of experimental trials. We see that, as more trials are collected, the model preference (for most subjects) seems to change in favor of the Naka-Rushton model, whose better ability to fit the data overcomes the complexity penalty imposed by the AIC. We also see from Figure 6b that the final model preferences are established after about 1,000–1,200 trials. Similar results were obtained using the Bayes Information Criterion, which more severely penalizes model complexity (Bishop, 2006), changing the final model preference for only one subject (Supplementary Figure S5 and S6).

**Figure 6**

**Figure 6**

**s**= (

*c*,

*dϕ*)

^{T}may be found by maximizing the expression where

*P*

_{0}(

*i*) is the prior probability of each model,

*D*

_{KL}the Kullbeck-Lieber divergence (Cover & Thomas, 2006),

*p*(

*b*|

**s**,

*i*) is the response probability conditioned on the stimulus and model, and

*p*(

*b*|

**s**) is the overall response probability averaged across models. Intuitively, this method minimizes uncertainty about which model is true by presenting stimuli that are expected to yield a posterior density with most of the probability mass on one or a few models, i.e., a density with minimum entropy (Cover & Thomas, 2006). This information–theoretic criterion has been used in cognitive science to choose stimuli optimized for testing competing hypotheses of memory decay and decision making under risk (Cavagnaro, Gonzalez, Myung, & Pitt, 2013; Cavagnaro, Pitt, & Myung, 2011).

*N*

_{E}= 1,200 trials) a single OCS was found by optimizing (Equation 10) based on fits of model 1 (Naka-Rushton) and model 2 (Tanh) to Experiment 1 data. Search for the OCS was restricted to contrasts greater than 1% and orientations from 0° to 20°, based on observation of at what point the two models seemed to differ the most as well as the fact that stimuli presented at values less than 1% contrast are often barely visible (Campbell & Robson, 1968). The OCS for each subject are illustrated in Figure 7 (left panels). Note that many of these stimuli have contrast

*c*≈ 1 and orientation

*dϕ*> 5° and hence lie outside the range of stimuli (contrasts and orientations) used to estimate the models Supplementary Figures S2 and S4).

**Figure 7**

**Figure 7**

*N*

_{C}= 200 trials during the Experiment 2 C-phase, interleaved with 200 stimuli chosen at random with uniform probability from the stimulus grid used during the Experiment 1 (Supplementary Figure S2) for 400 trials total. We will heretofore refer to these randomly chosen Experiment 1 (E-phase) stimuli as IID stimuli. We see from Figure 7 (right panels) that for many (but not all) subjects the OCS (blue curves) does a much better job than the IID stimuli (green curves) of shifting the model preference

*P*

_{1–2}in the direction of the Naka-Rushton model, Δ

*P*

_{1–2}=

*P*

_{1–2}(

*N*

_{E}+

*N*

_{C}) –

*P*

_{1–2}(

*N*

_{E}) > 0. Statistical analysis demonstrates that over all subjects, the median value of Δ

*P*

_{1–2}is significantly larger for the OCS (median Δ

*P*

_{1–2}= 5.41) than IID (median Δ

*P*

_{1–2}= −0.04) trials (sign-rank test,

*n*= 9,

*p*= 0.0117).

*P*

_{1–2}for both IID and OCS data collection strategies in which the Naka-Rushton model was assumed true. In the actual experiments, at the end of the E-phase, there was already a model preference (

*P*

_{1–2}≠ 0, see Figure 6a), so in order to determine how often the two data collection strategies (OCS, IID) would result in a correct choice given no initial preference, we set the initial model preference to zero so that Δ

*P*

_{1–2}=

*P*

_{1–2}.

*N*

_{mc}= 100 Monte Carlo simulations of Experiment 2 are shown in Figure 8. In each panel, we plot the median value of

*P*

_{1–2}(thick lines: blue = OCS, green = IID), the range containing 95% of simulations (thin lines), and the trajectory of

*P*

_{1–2}observed experimentally (red lines). For many (but not all) subjects, we see a reasonably good agreement between the simulation predictions and the observed change in model preferences during the C-phase. We find that over the group of subjects, there is a correlation (Pearson,

*n*= 9,

*r*= 0.71,

*p*= 0.03) between the predictions of Δ

*P*

_{1–2}predicted by the simulations and those observed experimentally (Supplementary Figure S7). The simulations tend to predict a larger value of Δ

*P*

_{1–2}than observed experimentally (median: experiments = 5.41, simulations = 13.59) although, just like the experiments, the median Δ

*P*

_{1–2}obtained is larger for simulations using OCS than IID (median = 0.67) data collection strategies. We also find that one is more likely to make a correct model choice using the OCS data collection method (Supplementary Figure S3) with IID yielding a correct choice after

*N*

_{C}= 200 trials (given no initial preference) in 80% of simulations but OCS in about 99%. Additional simulations also reveal that OCS stimuli can also be more effective for model comparison in cases in which model 2 is the ground truth (Supplementary Figure S8). These simulations suggest the potential usefulness of this adaptive stimulus optimization method for comparing competing models of neural encoding.

**Figure 8**

**Figure 8**

*accuracy maximization analysis*, which finds optimal neural encoding models for specific natural perception tasks (Burge, Fowlkes, & Banks, 2010; Burge & Geisler, 2011, 2014, 2015; W. Geisler, Perry, Super, & Gallogly, 2001; W. S. Geisler, 2008; W. S. Geisler et al., 2009). This methodology has been applied to determine the neural receptive fields that would be optimal for performing natural vision tasks, such as separating figure from ground (Burge et al., 2010; W. S. Geisler et al., 2009), estimating retinal disparity (Burge & Geisler, 2014), and estimating the speed of visual motion (Burge & Geisler, 2015). The neural encoding models derived account for experimentally observed neural tuning properties, and although these models were not estimated by fitting psychophysical data (as done here), a Bayesian ideal observer reading out these optimal neural codes manages to accurately account for human psychophysical performance (e.g., Burge & Geisler, 2015).

*c*

_{50}) and shapes (

*n*) of contrast gain functions (Albrecht & Hamilton, 1982). Therefore, our results only demonstrate that our subpopulation of interest is sufficient to explain the observed psychophysical behavior and does not rule out the possibility that other neurons not considered by our model may contribute as well.

*I*

_{F}(

*ϕ*) =

*F*(

*ϕ*,

*) as a function of reference orientation*

**v***ϕ*, which would serve as the link between the neural population code and psychophysical performance (May & Solomon, 2015a; Wei & Stocker, 2015). After estimating the parameters

*from psychophysical experiments, one can then optimize neural population code parameters*

**v̂***to minimize ∫*

**θ**_{[0,}

_{π}_{)}(

*F*(

*ϕ*,

*) −*

**v̂***I*(

_{F}*ϕ*,

*))*

**θ**^{2}

*dϕ*, where

*I*

_{F}(

*ϕ*,

*) denotes the Fisher information predicted by a neural encoding model having parameters*

**θ***. Because many different neural population codes are capable of giving rise to very similar Fisher information profiles (Wei & Stocker, 2015), additional constraints, such as coding efficiency (Ganguli & Simoncelli, 2014), may be necessary in order to get a unique solution for neural population code parameters. Conducting such technically challenging psychophysics experiments aimed at understanding the large-scale organization of neural population codes is an interesting direction of future research.*

**θ***Perception*,

*25*, ECVP abstract supplement.

*Automatic Control, IEEE Transactions on*, 19 (6), 716–723.

*Journal of Neurophysiology*, 48 (1), 217–237.

*Science*, 299 (5609), 1073–1075.

*The Journal of Neuroscience*, 34 (10), 3632–3645.

*Neuron*, 60 (6), 1142–1152.

*Nature Neuroscience*, 14 (5), 642–648.

*The Journal of Neuroscience*, 28 (3), 776–786.

*The Journal of Neuroscience*, 32 (31), 10618–10626.

*Pattern recognition and machine learning*. New York: Springer.

*The Journal of Neuroscience*, 32 (37), 12684–12701.

*Nature Neuroscience*, 2 (11), 947–957.

*The Journal of Neuroscience*, 12 (12), 4745–4765.

*The Journal of Neuroscience*, 30 (21), 7269–7280.

*Proceedings of the National Academy of Sciences, USA*, 108 (40), 16849–16854.

*Nature Communications, 6*, 7900.

*Model selection and multimodel inference: A practical information-theoretic approach*. New York: Springer Science & Business Media.

*The Journal of Physiology*, 197 (3), 551–566.

*Nature Reviews Neuroscience*, 13 (1), 51–62.

*Management Science*, 59 (2), 358–375.

*Neural Computation*, 22 (4), 887–905.

*Psychonomic Bulletin & Review*, 18 (1), 204–210.

*Vision Research*, 45 (23), 2943–2959.

*Vision Research*, 43 (18), 1983–2001.

*The Journal of Neuroscience*, 29 (20), 6635–6648.

*Elements of information theory*. Hoboken, NJ: John Wiley & Sons.

*Theoretical neuroscience, volume 806*. Cambridge, MA: MIT Press.

*Neural Computation*, 23 (9), 2242–2288.

*Frontiers in Neural Circuits, 7*, 101.

*Proceedings of the National Academy of Sciences, USA*, 110 (33), 13678–13683.

*Proceedings of the National Academy of Sciences, USA*, 95 (23), 13988–13993.

*Vision Research*, 39 (19), 3197–3221.

*Journal of Vision*, 2(1):i, doi:10.1167/2.1.i. [PubMed] [Article]

*Visual Neuroscience*, 30 (5–6), 315–330.

*Advances in neural information processing systems*(pp. 658–666). Cambridge, MA: MIT Press.

*Neural Computation*, 26, 2103–2134.

*Vision Research*, 41 (6), 711–724.

*Annual Review of Psychology*, 59, 167–192.

*Nature Neuroscience*, 14 (7), 926–932.

*Annual Review of Neuroscience*, 30, 535–574.

*Psychological Review*, 120 (3), 472–496.

*Nature Neuroscience*, 14 (2), 239–245.

*Handbook of physiological optics*. Mineola, NY: Dover.

*Discharge patterns of single fibers in the cat's auditory nerve*. Cambridge, MA: MIT.

*Current Biology*, 24 (13), 1542–1547.

*Psychophysics: A practical introduction*. London: Academic Press.

*Neural Computation*, 21 (3), 619–687.

*Journal of Neurophysiology*, 90 (1), 204–217.

*Vision Research*, 50 (22), 2308–2319.

*Nature Neuroscience*, 9 (11), 1432–1438.

*Nature Neuroscience*, 14 (6), 783–790.

*The Journal of Neuroscience*, 32 (11), 3679–3696.

*Journal of Neurophysiology*, 35 (1), 122–136.

*The Journal of Neuroscience*, 27 (43), 11687–11699.

*Journal of Mathematical Psychology*, 57 (3), 53–67.

*Vision Research*, 44 (26), 3053–3064.

*Nature Neuroscience*, 5 (8), 812–816.

*Vision Research*, 46 (16), 2465–2474.

*Annual Review of Neuroscience*, 35, 463–483.

*Progress in Brain Research*, 165, 493–507.

*Annual Review of Neuroscience*, 21 (1), 227–277.

*Neuron*, 72 (5), 832–846.

*Vision Research*, 49 (10), 1144–1153.

*Psychological Review*, 112 (4), 715–743.

*PLoS Computational Biology*, 5 (11), e1000579.

*Trends in Cognitive Sciences*, 6 (10), 421–425.

*Nature Neuroscience*, 8 (1), 99–106.

*Proceedings of the National Academy of Sciences, USA*, 110 (50), 20332–20337.

*Neuroscience*, 296, 116–129.

*Vision Research*, 38 (7), 963–972.

*Progress in Neurobiology*, 103, 41–75.

*The Journal of the Acoustical Society of America*, 56 (6), 1835–1847.

*The Journal of Neuroscience*, 28 (13), 3415–3426.

*PLoS Biology*, 4 (12), e387.

*The Journal of Neuroscience*, 16 (4), 1486–1510.

*Journal of Neurophysiology*, 86 (4), 1916–1936.

*Journal of Neurophysiology*, 57 (3), 773–786.

*Adaptive modeling of marmoset inferior colliculus neurons in vivo*. PhD thesis, Johns Hopkins University, Baltimore, MD.

*Psychological Review*, 121 (1), 124–149.

*The Journal of Neuroscience*, 10 (11), 3543–3558.

*Science*, 145, 1007–1016.

*The Journal of Neuroscience*, 27 (3), 582–589.

*Perception & Psychophysics*, 47 (1), 87–91.

*Nature Neuroscience*, 18 (10), 1509–1517.

*Proceedings of the National Academy of Sciences, USA*, 111 (23), 8619–8624.

*The Journal of Neuroscience*, 24 (7), 1617–1626.

*Philosophical Transactions of the Royal Society of London*, 92, 12–48.

*,*

**θ***in terms of the data likelihood (Equation 11), yielding where*

**ω***Z*is a normalizing constant, and

*P*(

*,*

**θ***) reflects any prior beliefs about the neural code and the decoding parameters. Using Equations 11 and 2, we may rewrite Equation 12 as Assuming that*

**ω***and*

**θ***are independent, so*

**ω***P*(

*,*

**θ***) =*

**ω***P*(

*)*

**θ***P*(

*), marginalizing Equation 13 over*

**ω***yields In the case of fixed decoding model parameters*

**ω***, so that*

**ω̂***P*(

*) =*

**ω***δ*(

*−*

**ω***) (where*

**ω̂***δ*denotes the Dirac delta function), we obtain from Equation 14. A symmetrical argument of the same form as that presented above can be used to show that we can use psychophysical data to estimate the parameters of a behavioral decoding model given a neural encoding model with known parameters

*using the equation*

**θ̂****s**_ = (

*ϕ*

_{0}–

*dϕ*,

*c*)

^{T}denote the clockwise stimulus and

**s**

_{+}denote the counterclockwise stimulus with parameters

**s**

_{+}= (

*ϕ*

_{0}+

*dϕ*,

*c*)

^{T}. We will assume that orientation and contrast are coded by a population of

*N*independent neurons, whose expected noisy (Poisson) response

*r*to a stimulus

_{i}**s**= (

*ϕ*,

*c*)

^{T}is given by the 2-D contrast-modulated tuning curve

*f*(

_{i}*ϕ*,

*c*) =

*ψ*(

*c*)

*f*(

_{i}*ϕ*), where

*f*(

_{i }*ϕ*) describes the orientation tuning of the

*i*th unit and

*ψ*(

*c*) describes the contrast gain with 0 ≤

*ψ*(

*c*) ≤ 1 for contrast (in percentage) 0 ≤

*c*≤ 100. We will also assume that all of the units decoded for the behavioral decision have the same contrast gain function

*ψ*(

*c*). For tractability, we approximate the Poisson noise response by Gaussian noise with mean and variance

*μ*=

*σ*

^{2}=

*f*. This fully specifies our neural encoding model

_{i}*P*(

**r**|

**s**,

*) as a factorial Gaussian distribution.*

**θ***P*(

*b*|

**r**,

*), where*

**ω***b*= 1 indicates a correct response (

*b*= 0 incorrect), we assume that the responses of all units are pooled linearly to form a new decision variable where the

*ω*are dependent on the perceptual task. Because the weighted sum of Gaussian variables is also Gaussian, this new decision variable (Equation 17) is Gaussian, and the expected value for stimulus

_{i}**s**

_{0}= (

*ϕ*

_{0},

*c*)

^{T}is given by with variance Because our perturbed stimuli

**s**_ and

**s**

_{+}are assumed to be very close to the reference

**s**

_{0}, we will assume the variance of the response to these stimuli is also equal to the same

*σ*

^{2}in Equation 19. The expected value of responses to stimulus

**s**

_{±}= (

*ϕ*

_{0}±

*dϕ*,

*c*)

^{T}is given by and from Equations 18 through 20, we obtain an expression for the well-known psychophysical quantity for a perturbation of size

*dϕ*where Equation 22 is obtained readily by plugging Equations 19 and 20 into Equation 21.

**f**

*= (*

_{ϕ}*f*

_{1}(

*ϕ*), … ,

*f*(

_{N }*ϕ*))

^{T}and

**Σ**

*= diag [*

_{ϕ}**f**

*] and suppressing arguments, we can rewrite Equation 22 as and we recognize the term in brackets as the ratio of variability between groups to that within groups when observations are projected onto the vector*

_{ϕ}*. The vector maximizing this ratio is known as the Fisher linear discriminant and is given by For small perturbations*

**ω***dϕ*, the direction of the vector

**ω**_{F}does not depend on

*dϕ*because we may approximate

*ϕ*

_{+}–

*ϕ*_ = 2

*dϕ*. Substituting Equation 24 into Equation 23 and using this approximation yields where is the population Fisher information about the stimulus orientation

*ϕ*around the reference stimulus

*ϕ*

_{0}, a well-known result from population coding theory (Dayan & Abbott, 2001). Given our expression (Equation 25) for

*d′*and using the fact that the probability of correct response (

*b*= 1) in the two-alternative forced choice task is Φ(

*d*′/2) (single interval) or Φ(

*d*′/

*K*∝

*ϕ*

_{0}at 100% contrast.