When performing a perceptual task, precision pooling occurs when an organism's decisions are based on the activities of a small set of highly informative neurons. The Adaptive Precision Pooling Hypothesis links perceptual learning and decision making by stating that improvements in performance occur when an organism starts to base its decisions on the responses of neurons that are more informative for a task than the responses that the organism had previously used. We trained human subjects on a visual slant discrimination task and found their performances to be suboptimal relative to an ideal probabilistic observer. Why were subjects suboptimal learners? Our computer simulation results suggest a possible explanation, namely that there are few neurons providing highly reliable information for the perceptual task, and that learning involves searching for these rare, informative neurons during the course of training. This explanation can account for several characteristics of human visual learning, including the fact that people often show large differences in their learning performances with some individuals showing no performance improvements, other individuals showing gradual improvements during the course of training, and still others showing abrupt improvements. The approach described here potentially provides a unifying framework for several theories of perceptual learning including theories stating that learning is due to adaptations of the weightings of read-out connections from early visual representations, external noise filtering or internal noise reduction, increases in the efficiency with which learners encode task-relevant information, and attentional selection of specific neural populations which should undergo adaptation.

^{1}A contribution of this article is its examination of an extreme form of this idea in which the weightings are biased to be sparse, meaning that only a relatively small number of weights are nonzero, and thus, only a small subset of neurons contribute to decision making (see also Liu & Weinshall, 2000).

*τ,*was calculated. The subject then performed 4 blocks of training trials in which each block consisted of 240 trials. Surface slants on training trials were either −

*τ*or +

*τ*. A subject performed 5 training blocks on each of Days 2 and 3.

*θ*∈ {0, ±15, ±30, ±45}, spatial frequency

*f*∈ {1, 1.4, 2, 2.8, 4}, and phase

*ϕ*∈ {0, 90, 180, 270}. The filtered images were then rectified using the half-squaring operator. The resulting values formed a set of phase-sensitive maps (one for each orientation, frequency, and phase), which can be interpreted as activation patterns across a retinotopic population of simple cells in area V1 (Heeger, 1992). The retinotopic maps of phase-sensitive units were then combined into phase-invariant maps (analogous to complex cells) by pooling phase-sensitive units that form quadrature pairs, as is done in energy models of visual motion perception (Adelson & Bergen, 1985). Each phase-invariant map was normalized by dividing it by a frequency-dependent normalization term, thereby producing a map whose total activation was approximately constant for above-threshold stimulus contrasts (Heeger, 1992). Lastly, each phase-invariant map was pooled across space using a Gaussian weighting kernel.

_{ t}denote the values of the output units of the representational front end on trial

*t*(because the front-end outputs are the inputs to a model,

_{ t}also denotes these inputs). In addition, let

*y*

_{ t}* be a binary variable indicating whether the image depicted a left-closer (

*y*

_{ t}* = 1) or right-closer surface (

*y*

_{ t}* = 0) on trial

*t*. Finally, let

_{ t}denote the observer's parameter values (mean vectors and covariance matrices for the left-closer and right-closer distributions) on trial

*t*. The observer computed the posterior probability that the image depicted a left-closer surface via Bayes' rule:

*P*(

*y*

_{ t}* = 1) and

*P*(

*y*

_{ t}* = 0) were each set to 1/2. If

*P*(

*y*

_{ t}* = 1∣

_{ t},

_{ t}) ≥ 0.5, then the observer decided “left-closer”; otherwise, it decided “right-closer.”

*t,*the observer had viewed the images on all trials up to and including trial

*t,*along with the corresponding correct judgments of these images. The observer should use all of this information to optimally set its parameters. Let

*X*

_{1,…,t}denote the values of the front-end output units on trials 1 through

*t*(i.e.,

_{1},…,

_{t}), and let

*Y*

_{1}*

_{,…,t}denote the corresponding values of the binary target variable (i.e.,

*y*

_{1}*,…,

*y*

_{t}*). The optimal setting of the parameter values (in a maximum a posteriori sense) is the values that maximize the posterior distribution of the parameter values given the data

*X*

_{1,…,t}and

*Y*

_{1}*

_{,…,t}. Using Bayes' rule, this distribution can be written as

*I*

_{L}be the set of trial indices for trials in which a left-closer surface was depicted, and let

*I*

_{R}be the set of trial indices for trials in which a right-closer surface was depicted.

*I*

_{L}and

*I*

_{R}are disjoint sets whose union is the set {1, …,

*t*}. The mean vector of an observer's left-closer Gaussian distribution was set to the mean of the data {

_{i}}

_{i∈IL}, whereas the mean vector of the right-closer distribution was set to the mean of the data {

_{i}}

_{i∈IR}.

_{ L}and Σ

_{ R}denote the covariances of the data {

_{ i}}

_{ i∈ I L}and {

_{ i}}

_{ i∈ I R}, respectively. In some simulations (those of the Anisotropic Observer described below), Σ

_{ L}and Σ

_{ R}were restricted to be diagonal matrices, meaning they only represented the variances of the inputs, and covariances among pairs of inputs were assumed to be zero. In other simulations (those of the Isotropic Observer described below), each of these matrices was further restricted to have the same value along the diagonal, equal to the average variance of the inputs. Lastly, our simulations assumed that the left-closer and right-closer distributions had equal covariance matrices. This common matrix, denoted Σ, was set to the pooled covariance matrix:

*n*

_{ L}and

*n*

_{ R}are the number of elements in

*I*

_{ L}and

*I*

_{ R}, respectively. In the field of statistics, this pooled covariance matrix is also referred to as the average class-conditional covariance matrix.

^{2}This observer had 945 parameters: 315 mean parameters for each distribution and 315 variance parameters shared by both distributions. The Isotropic Observer was identical to the Anisotropic Observer except that its covariance matrix was further restricted such that its diagonal entries were equal to each other. Because the same variance parameter was used for all inputs, the Isotropic Observer used a uniform scaling in all input dimensions. This observer had 631 parameters: 315 mean parameters for each distribution and 1 shared variance parameter.

*τ*on training trial

*t,*then an observer was exposed to an image depicting a surface with slant

*τ*on its

*t*th training trial. The replications of each observer differed in the noise added to the grids defining each surface (see Figure 1).

*d*′, a measure based on signal detection theory (Green & Swets, 1966). If we let

*d*

_{subj}′(

*b*) denote a subject's performance on training block

*b,*and let

*d*

_{AO}′(

*b*) denote the Anisotropic Observer's performance on training block

*b,*then (

*d*

_{subj}′(

*b*)/

*d*

_{AO}′(

*b*))

^{2}is the subject's discrimination efficiency at block

*b*(this quantity goes between 0 and 1; a value of 1 means that the subject performed optimally in the sense that the subject performed as well as the ideal probabilistic observer, whereas a value less than 1 indicates that the subject performed suboptimally; Tanner & Birdsall, 1958). On average, subjects' discrimination efficiencies were 0.09 on block 1 and increased to 0.31 on block 14. These data indicate that subjects did not learn as much as they theoretically could have (based on the assumptions underlying the Anisotropic Observer).

_{i}

^{L}(

*t*) denote the mean of the

*i*th unit's activities over trials 1, …,

*t*when images depicted left-closer surfaces, and let

_{i}

^{R}(

*t*) denote the corresponding value when images depicted right-closer surfaces. Let

*σ*

_{i}

^{2}(

*t*) denote the unit's average class-conditional variance. The regression coefficient corresponding to the

*i*th front-end output unit at training trial

*t*is

*w*

_{i}(

*t*) = (

_{i}

^{L}(

*t*) −

_{i}

^{R}(

*t*))/

*σ*

_{i}

^{2}(

*t*). To quantify the degree to which the regressor makes use of unit

*i*at trial

*t,*we define the relative weight for unit

*i*as

*r*

_{i}(

*t*) = ∣

*w*

_{i}(

*t*)∣/

*w*

_{j}(

*t*)∣.

_{ i}

^{ L}(

*t*) and

_{ i}

^{ R}(

*t*) denote the means of the

*i*th neuron's activities over trials 1, …,

*t*when images depicted left-closer and right-closer surfaces, respectively, and let

*σ*

_{ i}(

*t*) denote the neuron's average class-conditional standard deviation. The information carried by model neuron

*i*at trial

*t*can be quantified as

*d*

_{ i}′(

*t*) = ∣

_{ i}

^{ L}(

*t*) −

_{ i}

^{ R}(

*t*)∣/

*σ*

_{ i}(

*t*) (Green & Swets, 1966). If neuron

*i*'s activities are very different when images depict left-closer versus right-closer surfaces, then

*d*

_{i}′ will be a large number. Otherwise, it will be near zero.

*d*′ values at the end of training. The results are shown in Figure 4. It is consistently the case that a small number of neurons are highly informative for the visual slant discrimination task, whereas the vast majority of neurons are only moderately or mildly informative.

^{3}

*d*′(

*t*) values at the end of training. We then simulated observers based on neurons ranked 1, 5, 10, and 20. The results are shown in Figure 5. When averaged across subjects, observers based on neurons ranked 1, 5, 10, and 20 achieved performances of 99%, 95%, 87%, and 80% correct at the end of training, respectively. Surprisingly, to achieve approximately the same level of performance as the human subjects, an observer needed to use the 10th most informative neuron (out of the population of 315 neurons).

*t*indexes the trial number,

*ɛ*is a learning rate parameter,

*t*) is a vector of front-end output neuron responses on trial

*t,*and

_{{L,R}}(

*t*) is the mean vector of either the left-closer or right-closer Gaussian distribution, depending on which type of visual stimulus was displayed on trial

*t*. The update rule for the observer's covariance matrix was

**diag**operator turns a vector into a diagonal matrix, and where the square of a vector is performed on an element-by-element basis. The only free parameter is the learning rate

*ɛ,*which was set to a value (

*ɛ*= 0.001) that produced levels of performance at the end of training approximately equal to those of the human subjects. The results are shown in Figure 7. In all simulations, the learning curves have a stereotyped pattern in which performance improved in a slow, steady manner. Although these learning curves resemble those obtained by averaging the curves of many subjects, they do not resemble those of individual subjects. That is, the learning process of the adaptive global pooling observer does not produce a range of learning performance dynamics matching the range of dynamics exhibited by individual human subjects.

^{1}There exist in the literature at least two explanations at the level of neural processing for visual learning, namely that learning results from changes in the tunings of neurons' sensitivity functions (tuning functions might shift, broaden, or sharpen) and that learning results from changes in the weights assigned to the responses of neurons contributing to a psychological response. These two explanations are compatible because reweightings of neural responses will necessarily result in changes to the tuning functions of all mechanisms (including decision mechanisms) subsequent to the reweighting.

^{2}Our reasons for using the term “scaling” can be illustrated as follows. As discussed below, a Gaussian mixture model can be converted to an equivalent logistic regressor that maps a model's inputs to the probability that the inputs represent the image of a left-closer surface. The regressor corresponding to the Anisotropic Observer multiplies input

*i*by 1/

*σ*

_{ i}

^{2}, where

*σ*

_{ i}

^{2}is input

*i*'s average class-conditional variance. That is, the regressor scales each input dimension by the variance along that dimension. The regressor corresponding to the Isotropic Observer multiplies all inputs by the same value, meaning that this regressor uses a uniform scaling in all input dimensions.

^{3}Recall that a subject was trained to distinguish two surface slants (

*τ*and −

*τ*). In addition, recall that the model neurons are tuned to specific spatial locations, frequencies, and orientations. Depending on the slant values used when training a subject, the receptive field properties of a neuron may make its responses highly diagnostic of whether a left-closer or right-closer surface was displayed on a trial, or may make it only moderately or mildly diagnostic.

*d*′ and

*η*as psychophysical measures. Journal of the Acoustical Society of America, 30, 922–928. [CrossRef]