Human performance in visual detection, discrimination, identification, and search tasks typically improves with practice. Psychophysical studies suggest that perceptual learning is mediated by an enhancement in the coding of the signal, and physiological studies suggest that it might be related to the plasticity in the weighting or selection of sensory units coding task relevant information (learning through attention optimization). We propose an experimental paradigm (optimal perceptual learning paradigm) to systematically study the dynamics of perceptual learning in humans by allowing comparisons to that of an optimal Bayesian algorithm and a number of suboptimal learning models. We measured improvement in human localization (eight-alternative forced-choice with feedback) performance of a target randomly sampled from four elongated Gaussian targets with different orientations and polarities and kept as a target for a block of four trials. The results suggest that the human perceptual learning can occur within a lapse of four trials (<1 min) but that human learning is slower and incomplete with respect to the optimal algorithm (23.3% reduction in human efficiency from the 1^{st}-to-4^{th} learning trials). The greatest improvement in human performance, occurring from the 1^{st}-to-2^{nd} learning trial, was also present in the optimal observer, and, thus reflects a property inherent to the visual task and not a property particular to the human perceptual learning mechanism. One notable source of human inefficiency is that, unlike the ideal observer, human learning relies more heavily on previous decisions than on the provided feedback, resulting in no human learning on trials following a previous incorrect localization decision. Finally, the proposed theory and paradigm provide a flexible framework for future studies to evaluate the optimality of human learning of other visual cues and/or sensory modalities.

^{th}) trial of a learning block, the observers have to identify the target present throughout that learning block. At the end of each learning trial, feedback is provided to the observers about the location of the target for that trial but not the target’s identity. At the end of the last (4

^{th}) learning trial following the identification decision, feedback is provided about the identity of the target present throughout that learning block. Figure 1 outlines the timeline of the experimental procedure.

^{st}through 4

^{th}for the current study).

^{st}-to-4

^{th}trial, the observer can use the location feedback to collect evidence about the presence of one or another specific target in that learning block. The varying amounts of evidence for each of the possible targets at the target location specified by the feedback can be used on the subsequent trial to increase the weights of sensory units tuned to targets associated with higher evidence and reduce the weight of sensory units tuned to targets associated with lower evidence. Performance in the localization task improves because of the increase in the optimality of the weighting of the different sensors.

*g*) and chooses the location with the highest posterior probability. The posterior probability at the

*i*

^{th}location can be related to the likelihood of the data at all locations given signal presence at the

*i*

^{th}location, through Bayes’ rule (Peterson, Birdsall, & Fox, 1954; Green & Swets, 1966): where

*P*(

*i*|

*g)*is the posterior probability of the signal being present at the

*i*

^{th}location given the data at all locations (

*g*),

*P*(

*g*|

*i)*is the probability of the data at all locations given target presence at the

*i*

^{th}location and is typically known as the likelihood (

*l*), and

*P*(

*i*) is the prior probability of the signal being present at the

*i*

^{th}location.

*P*(

*g*) is the probability of the data, which is independent of locations and can, therefore, be replaced by 1 without affecting the outcome of the decisions.

*t*= 1) of a learning block, the optimal Bayesian observer (see Figure 2) calculates the posterior probability. However, because there is uncertainty about which of the signals is present for that block of learning trials, it computes the posterior probability for each of the

*J*possible signals (

*J*= 4 for the task in the present work). This is equivalent to computing a ratio of the likelihood of the data at the

*i*

^{th}location given signal presence (

*P*(

**g**

_{i}|

*s*) and the likelihood of the data at the

_{j}*i*

^{th}location given signal absence (

*P*(

**g**|

*n*) (Green & Swets, 1966). The optimal observer then sums the likelihood ratios across signal types to compute a sum of weighted likelihoods for each location. The individual likelihood ratios are weighted by the prior expectation of each of the possible signals. On the first learning trial, the prior is 1/

*J*given that each signal has equal probability of being sampled. On trial

*t*, the location with the highest weighted sum of likelihoods (

*SLR*) is chosen as containing the target: where ℓ

_{i,t}_{i,t,j}is the likelihood ratio of the data at location

*i*, for the

*t*

^{th}learning trial and for the

*j*

^{th}signal, and π

_{j,t}is the weight (known as the prior) given to the likelihood of the

*j*

^{th}signal on the

*t*

^{th}trial. For white Gaussian noise, the likelihood ratio for each location and signal is given by (Peterson et al., 1954): where

**s**

_{j}is a column vector containing the

*j*

^{th}signal and

**g**

_{i,t}is a column vector containing the data at the

*i*

^{th}location for the

*t*

^{th}trial, and

*E*is the energy of the

_{j}*j*

^{th}signal (

*E*= s

_{j}_{j}

^{T}s

_{j}where the superscript

*T*stands for transpose). Note that s

_{j}

^{T}

**g**

_{i,t}can be thought of as the response of a linear sensor (matched to the

*j*

^{th}signal) and the data (

**g**) at the

*i*

^{th}location. Also

*σ*

^{2}is the variance of the noise at each pixel.

^{st}trial decision, feedback is given about the target location. The ideal observer has perfect memory. Therefore, the algorithm retains the likelihood of the data at the target present location for each of the

*J*possible signals: where the subscript

*sp*for the likelihood and the data vector

**g**refers to the location that contains the signal (

*sp*= signal present). All other symbols are defined as in Equation 3.

^{nd}trial, the optimal observer will calculate the individual likelihood ratios for each location and target for the new image. However, on this trial it will weight (i.e.,

*π*the prior in Equation 2) the 2

_{j,t}^{nd}trial likelihoods for each signal by the calculated likelihoods from the signal present location of the 1

^{st}trial (ℓ

_{splj}, Equation 4). In other words, if there was more evidence for one of the signals on the 1

^{st}trial, then the optimal observer increases the weight to that signal on the 2

^{nd}trial. This process is repeated for the 3

^{rd}and 4

^{th}trials, updating the prior for each possible signal with the likelihoods from the previous trials as given by

*t*′ trial given a particular signal

*j*(ℓ

_{sp,r′,j}) is calculated by Equation 4.

^{th}learning trial, the optimal observer calculates the joint likelihood of the data at the signal present location on the 1

^{st}through 4

^{th}trials given the presence of each of the possible signals and chooses the signal with the highest likelihood. The joint likelihood across all trials is calculated as

*Pc*= 80 %):

*Pc*). For the numerator, the investigator calculates the signal contrast that leads the ideal observer to perform at that same level (

*Pc*) measured experimentally for the human observer.

*Pc*).

*π*is the weight for the

_{j,t}*j*

^{th}signal on the

*t*

^{th}trial, and 1/

*N*stands for the uniform priors at the initial learning trial.

*ch*for the likelihood (ℓ) indicates that the likelihood of each signal is calculated for the chosen location (the location with the highest sum of likelihoods). Note that in those trials in which the localization was correct, the model updates the priors in the same way as an ideal observer. However, on the trials in which the localization decision is incorrect, the model updates its prior based on a location that contained only noise and no signal information. Thus, this model’s performance will decrease on trials following previous incorrect localization trials.

*p*, of the trials in which the decision was correct on the

*t*

^{th}trial, the model learns optimally on the

*t*+1

^{th}trial, whereas on the proportion, 1−

*p*, of the trials in which the decision was incorrect, there is no prior updating, and, therefore, no learning on the

*t*+1

^{th}trial. The progression of the distribution of priors for this model can be compared to that of the optimal Bayesian observers using the relative entropy measure. Figure 8 shows the relative entropy of the “prior update on correct trials only” increases more slowly than the optimal observer.

*r*

_{j,t}is the linear response of the sensor matched to the

*j*

^{th}signal,

**s**

_{j}is a vector containing the elements of the

*j*

^{th}signal,

**g**

_{sp,t}is a vector containing the data at the signal present location for the

*t*

^{th}learning trial, and (

*b*) is a constant added to avoid negative priors.

*SD*= 0.301°, 8 pixels, and minor axis

*SD*= 0.075°, 2 pixels) with one of four orientations: 0°, 45°, 90°, and 135° with two polarities: (1) positive for the 0° and 90° orientations; and (2) negative for the 45° and 135° orientations (see Figure 1).

*SD*of 4.9 cd/m

^{2}(25 gray levels of the linearized luminance scale).

^{2}and was calibrated to result in a linear relationship between digital gray level and luminance. Experiment images were displayed on an Image Systems M17LMAX monochrome monitor with maximum resolution of 1664 × 1280 pixels (Image Systems, Minnetonka, MN).

^{th}trial, the observer was asked to make an identification decision by placing the mouse cursor on top of one of the four high-contrast copies of the possible signals that were shown on the top of the screen.

^{th}learning trials was also calculated.

*Pc*) as a function of trial number for three naïve observers. Although absolute performance was significantly different across observers, all three observers showed similar improvements with learning trials. Average improvement in

*Pc*from the 1

^{st}-to-4

^{th}learning trial was 6.5% for KC, 6.2% for AB, and 7.5% for LL. All improvements were statistically significant (

*p*< .01). For all observers, the largest improvement occurred between the 1

^{st}and 2

^{nd}learning trials (Figure 9).

^{st}-to-4

^{th}learning trial by 5.34% for KC, 4.36% for AB, and 4.34% for LL. Measured as a percentage of the efficiency in the 1

^{st}learning trial, the decreases represent a 20.3% (KC), 27.9% (AB), and 21.7% (LL) reduction. Patterns of efficiency as a function of trial number were similar across all three observers.

^{st}trial. Learning for the “prior update on correct trials only” model was larger than human learning. Figure 11b compares human performance to two other suboptimal models: linear prior update model and “prior update based on chosen location” model. Results show that the linear prior update model resulted in learning comparable to that of human; however, learning on the 2

^{nd}trial seems to be lower than human, whereas learning on the 4

^{th}trial seems to be consistently larger than human. On the other hand, the “prior update based on chosen location” model resulted in virtually no learning.

^{st}learning trial

^{nd}, 3

^{rd}, and 4

^{th}learning trials for those trials in which localization on the 1

^{st}trial was correct (continuous lines) versus those in which the localization on the 1

^{st}trial was incorrect (dashed lines). For all three observers, performance improvement across learning trials was significantly larger for trials in which the observers correctly localized the signal on the 1

^{st}trial. Figure 12a–12d show localization performance for correct and incorrect localizations on the 1

^{st}trial for the optimal Bayesian observer, the “prior update on correct trials only” model, and the linear prior update model. Both the optimal Bayesian and linear prior update models also showed sequential effects, but the effects were smaller than those in humans. The sequential effects for the “prior update on correct trials only” model were more comparable to the lack of human learning on 2

^{nd}trials, following incorrect 1

^{st}trial localizations (Figure 12a vs. 12c).

^{th}learning trial

^{th}learning trial for all three observers (plotted above the

*x*-axis label: ID): 0.959 (KC), 0.848 (AB), and 0.83 (LL). The identification efficiencies were 11.23% (KC), 4.75% (AB), and 4.28% (LL).

^{st}and 2

^{nd}learning trials and slower after the 2

^{nd}trial. This early fast learning followed by reduced late learning might be interpreted to reflect two learning algorithms or different learning-dependent neurophysiological events evolving within different time frames (Atienza, Cantero, & Dominguez-Marin, 2002). However, comparison to the optimal Bayesian learner suggests otherwise. The larger amount of learning from the 1

^{st}-to-2

^{nd}learning trial is also present in the optimal observer (Figure 11), suggesting that this effect is not particular of the human neural learning algorithm but might be a property inherent to the task and stimuli. Furthermore, the efficiency analysis (Figure 10) shows that for all three human observers the largest drop in efficiency occurred between the 1

^{st}trial and the 2

^{nd}learning trials. This suggests that even though humans learned the most between the 1

^{st}and 2

^{nd}trials, they improved only a fraction of what the optimal observer does.

^{nd}trials following incorrect 1

^{st}trials (Figure 10a). This result suggests that observers were unable to use the location feedback following incorrect localization decisions to update the signal priors. This outcome might be due to observers’ inability to remember the image presented at a missed target location.

^{nd}trials in which localization was correct on the 1

^{st}trial. Figures 13a and 13b show the prior updating for correct and incorrect 1

^{st}trial localization trials for an optimal Bayesian observer for learning blocks with signal 1 as the target. Note that for the 2

^{nd}learning trial, priors for signal 2 and 4 are close to zero following 1

^{st}trials with correct localization (Figure 13a), whereas they are non-zero following 1

^{st}trials with incorrect localization (Figure 13b). In addition, the weighting of the relevant signal 1 is 0.8 following 1

^{st}trials with correct localization and is 0.6 following 1

^{st}trials with incorrect localization. This higher weighting of the relevant target will lead to higher performance localizing the target on 2

^{nd}trials following correct 1

^{st}trials (Figure 12a).

^{nd}trial on whether the 1

^{st}learning trial was correct is much larger than that of the optimal observer and also than the linear prior update model (Figure 12d).

^{nd}trial following incorrect 1

^{st}trials are unchanged leading to no performance improvement from the 1

^{st}-to-2

^{nd}trial (Figure 13c).

^{st}-to-4

^{th}learning trials and with the unlikely scenario that internal noise reset itself to a high level for the 1

^{st}learning trial of the next learning block.

^{st}-to-4

^{th}trial = 23.3%). The largest improvement in human performance, occurring from 1

^{st}-to-2

^{nd}trial, reflects a property inherent to the visual task and not a property particular to the human perceptual learning mechanism. One important difference between the human and ideal observer is that human learning relies (suboptimally) more heavily on previous decisions than on the feedback, resulting in no human learning on trials following an incorrect localization decision.

^{1}Note that uncertainty in this context is used to refer to the general idea of lack of full knowledge about the visual properties of the signal being presented and not to a particular nonlinear decision rule to integrate information across possible signals, such as in previous work (Pelli, 1985; Eckstein, Ahumada, & Watson, 1997).

^{2}Previous studies have compared human performance with respect to an ideal observer in a standard task in which an ideal observer does not learn (Gold et al., 1999). Thus, these studies allow for the identification of the mechanism mediating the learning; they do not allow for comparisons of the amount of learning in humans and in an optimal learner.