**Using an asymmetrical set of vernier stimuli (−15″, −10″, −5″, +10″, +15″) together with reverse feedback on the small subthreshold offset stimulus (−5″) induces response bias in performance (Aberg & Herzog, 2012; Herzog, Eward, Hermens, & Fahle, 2006; Herzog & Fahle, 1999). These conditions are of interest for testing models of perceptual learning because the world does not always present balanced stimulus frequencies or accurate feedback. Here we provide a comprehensive model for the complex set of asymmetric training results using the augmented Hebbian reweighting model (Liu, Dosher, & Lu, 2014; Petrov, Dosher, & Lu, 2005, 2006) and the multilocation integrated reweighting theory (Dosher, Jeter, Liu, & Lu, 2013). The augmented Hebbian learning algorithm incorporates trial-by-trial feedback, when present, as another input to the decision unit and uses the observer's internal response to update the weights otherwise; block feedback alters the weights on bias correction (Liu et al., 2014). Asymmetric training with reversed feedback incorporates biases into the weights between representation and decision. The model correctly predicts the basic induction effect, its dependence on trial-by-trial feedback, and the specificity of bias to stimulus orientation and spatial location, extending the range of augmented Hebbian reweighting accounts of perceptual learning.**

*right*response. Variations on the basic training set used different relative proportions of the stimuli during training and different proportions of feedback reversal (Herzog & Fahle, 1999) and showed sensitivity to the specific asymmetric and reversed feedback training.

**Figure 1**

**Figure 1**

**Figure 2**

**Figure 2**

*left*–

*right*) responses as outputs. It learns on each trial by adjusting the connection weights between the representation units and the decision unit. The model replays each experimental protocol, including the number of trials of each kind of stimulus, the nature of feedback, and the number of training sessions—that is, it reprises the experimental protocols experienced by the observers. Each simulated experiment was repeated 1,000 times to generate the predictions of the model.

*w*= (

_{i}*θ*/ 45)

_{i}*w*. These parameters were held constant over all of the simulations reported here.

_{init}*σ*, decision noise

_{m}*σ*, scaling factor

_{d}*a*, the weight on feedback

*w*, and (model) learning rate

_{f}*η*. The learning rate and weights on the bias and feedback units were adjusted to approximate the pattern of learning and bias in the data. Detailed optimization of the fits of the model to the data, carried out in some of our previous articles (Liu et al., 2010, 2012; Lu et al., 2010), are extremely time consuming—sometimes taking months of grid search computations to yield just slightly better fits. However, many regions of the parameter space generate predictions consistent with the qualitative properties of the observed data pattern(s). Here we perform only approximate fits of model to data in order to enable us to examine a wider range of experimental findings. The match between model and data is indexed by the rank-order correlation between the model predictions and the observed data points, Kendall's

*τ*(Kendall, 1938; Kendall & Gibbons, 1990)—a measure of concordance between data and model predictions that is relatively robust to distributional issues (Newson, 2002): where

*n*is the number of data pairs with concordant order between the two data sets and

_{c}*n*is the number of data pairs with discordant order between the two data sets. We also report the parametric estimate of the proportion of variance accounted for by the model where

_{D}*x*is an observed data value,

_{i}*is the corresponding value predicted by the model, and*

_{i}*x̄*is the mean of the observed data. Since we do not carry out precise quantitative fits of the model to data, the values of the proportion of variance accounted for by the model are lower than if we had. For a number of experiments, these values were also limited by the small range and noise in the behavioral data sets.

*right*responses—as though the misleading feedback on this relatively ambiguous (below-threshold offset) stimulus induces observers to lower their criterion for a

*right*response.

**Figure 3**

**Figure 3**

**Table 1**

*right*responses. This increasingly reduces the percentage correct for the left offset stimuli (left offset stimuli were the only data shown in the original article). Then, when the reversed feedback is replaced with accurate feedback (at the vertical dashed line), performance shifts rapidly toward more

*left*responses and therefore increased percentage correct on left offset stimuli. The model predictions qualitatively replicate the pattern in the data; they are rank-order consistent with the observed data (Kendall's

*τ*= 0.692,

*p*≪ 0.001). The proportion variance accounted for by the model is

*r*

^{2}= 0.636 (

*p*< 0.01). Model parameters, selected to approximately mimic the levels in observed data, are listed in Table 1. A time-intensive grid search on model parameters and the corresponding simulated results would almost surely provide a slightly improved detailed fit to the data, but the predicted ordinal pattern occurs over most of the model parameter space.

*right*), performance of all left verniers dropped and then quickly rebounded with the introduction of correct feedback. Also, if the probability of reversed feedback is higher, the biasing effect is more prominent. We show the AHRM prediction for experiment 3, but the AHRM is broadly consistent with all the experiments in the study.

*right*response. The reverse feedback for this small offset drives the postsynaptic activity at the decision unit toward the incorrect response, shifting weights to favor the rewarded response through Hebbian learning. These changes are concentrated in the weights for orientation channels near the vertical (0°, +15°, and −15°) that are most sensitive to the very small angles of the vernier stimuli. Subsequent training with accurate feedback shifts weights to favor the now-dominant

*left*feedback.

*left*and right stimuli categorized as

*right*) for each stimulus type (labeled

*big left*,

*middle left*,

*small left*,

*middle right*, and

*big right*). Figure 4b shows the same data as percentage

*right*in order to more clearly show the separation of performance for the five vernier offsets and more clearly track the overall shifts toward

*left*or

*right*responses both in the data and in the model predictions. Figure 4e duplicates the data from Aberg and Herzog (2012, figure 3) for derived criteria from signal detection analysis, and Figure 4f shows the AHRM predictions.

**Figure 4**

**Figure 4**

*right*responses, corresponding with the distribution of feedback. The AHRM correctly predicts shifts downward (toward increased

*left*responses) for the trial-by-trial correct-feedback condition (upper middle subpanel), in which

*left*feedback dominates, reflecting the relative frequencies of the stimuli. The model also correctly predicts weak or nonexistent induced bias in the no-feedback and various blocked-feedback conditions.

*right*responses) for the reverse-feedback conditions and downward shifts (more

*left*responses) for the correct-feedback conditions. Other than minor adjustments in the initial scaling constants (

*a*s) and noise terms used to approximately match performance in the initial training blocks, the same model parameters were used to predict performance in the different feedback groups. Figure 4e and f shows criterion estimates derived from a signal detection analysis for the data and the model; the data show smaller deviations between the derived criteria of the reversed and accurate trial-by-trial feedback conditions at a single point in block 11, not mirrored in the model predictions (for discussion, see Aberg & Herzog, 2012). This issue is not seen in corresponding fits to the percentage

*right*. (The model predictions are a better match to criterion estimates from all data rather than from the subthreshold singleton used in Figure 4e by Aberg and Herzog [2012].)

*τ*= 0.772,

*p*≪ 0.01). The proportion variance accounted for by the model is

*r*

^{2}= 0.883 (

*p*≪ 0.01). Although parameters in the model are varied to get the scale and overall rates, the differential feedback conditions all use the same parameters, with the exception of small differences in scaling factor and noise terms to slightly adjust for minor overall group differences. Model parameters are listed in Table 1. Further search of the parameter space might slightly improve the fit. These model results are consistent with an earlier AHRM account of differential learning when using different forms of feedback in standard (unbiased) vernier perceptual learning (see Liu et al., 2014).

*right*response, while accurate trial-by-trial feedback shifts toward

*left*responses. As found previously (Herzog & Fahle, 1997), no-feedback conditions do little to improve performance accuracy. Although block feedback can in some circumstances promote learning (Herzog & Fahle, 1997), in this case it is ineffective.

*right*. The weights tend not to change substantially in the absence of feedback because the (early) postsynaptic activation at the decision unit tends to be very small for vernier stimuli: The orientations are all very close to vertical or zero. Trial-by-trial accurate feedback shifts weights and performance in favor of the dominant

*left*response. Although block feedback can lead to learning in balanced designs, it is not sufficient to induce the overall shifts to

*right*or

*left*response because only trial-by-trial feedback can shift the postsynaptic weight toward a particular response. See Appendix B for a depiction of the changes in the weight structures and a more detailed discussion.

*τ*= 0. 575,

*p*≪ 0.01). The simulation shows a good qualitative fit of the data pattern: The bias developed for vertical and horizontal verniers is independent. The proportion variance accounted for by the model is

*r*

^{2}= 0.667 (

*p*< 0.01). Model parameters, selected to approximately parallel the data, are also listed in Table 1. A time-intensive grid search of the model parameters might improve the quantitative fit, which tracks the sum of the squared deviations between the model predictions and the data. In this case the model and data also differ for structural reasons. Our simulation did not implement the choice of Herzog et al. (2006) to assign the direction of induced bias in initial training in the same direction as a predetermined response bias for each observer, which led to an amplified initial bias in the empirical data. The simulation shows the predicted results for symmetric initial weights (priors) and a criterion control unit seeking a 50%–50% response distribution. The AHRM could be modified to incorporate either initial or ongoing preference for one response over the other in criterion control to mimic a natural bias in responses; we elected not to complicate the simulation in this way. Instead, we focus here on predicting the striking general patterns in the data in which the responses to the larger left and right stimuli diverge during the false feedback phase and converge once feedback is corrected.

**Figure 5**

**Figure 5**

*singleton*and

*partner (large)*and the smaller offset stimuli as

*partner (small)*; the graph shows the hit rate for each (rather than percentage

*right*). The induced biases are in the opposite direction in the two locations.

**Figure 6**

**Figure 6**

*τ*= 0.750,

*p*≪ 0.01). The percentage variance accounted for by the model is

*r*

^{2}= 0.860 (

*p*< 0.01). Model parameters were selected to approximate the levels in observed data (see Table 1). Further search of the parameter space might improve the match to overall performance level and hence the proportion of variance accounted for by the model. The model is exactly symmetric in the two locations, while the data appear to show lower performance for the right location (Figure 6c, right), accounting for much of the reduction in the proportion of variance accounted for by the model. This might have been handled by incorporating differences in scaling factors or internal noises in the two locations, but we elected not to pursue this because, as indicated by the high Kendall's

*τ*, the model does a very good job of accounting for the qualitative patterns of opposite bias induction followed by convergence with accurate feedback. The biases developed for two locations are generally independent of each other in both the data and the model.

*right*and

*left*responses, and the biases induced from reversed feedback carried average out in the weights from the location-independent representation layer to decision, so this layer does not contribute to the induced biases. The running average of postsynaptic activation of verniers was independently tracked for each location, which supported segregated bias learning. Appendix B provides examples of changing model weights and discusses the impact of location-specific representations and other implementation details.

*d′*and criterion

*c*estimates of standard signal detection. Indeed, the model predicts that the learned biases predominantly reflect shifts in evidence distributions feeding into decision—and only secondarily as compensatory variations in criterion offsets. This same caveat is true for all signal detection theory–based estimates in evaluation of behavioral data: What looks like shift in bias or criterion can in many cases be equivalently produced by a shift in evidence distributions. That is, moving a criterion down can be equivalent to moving the mean of evidence distributions up.

*Vision Research*, 49, 2087–2094.

*Journal of Neuroscience*, 17, 8621–8644.

*Journal of Vision*, 12 (9): 767, doi:10.1167/12.9.767. [Abstract]

*Proceedings of the National Academy of Sciences, USA*, 110, 13678–13683.

*Proceedings of the National Academy of Sciences, USA*, 95, 13988–13993.

*Vision Research*, 39 , 3197–3221.

*Learning & Perception*, 1 , 37–58.

*Perceptual learning*. Cambridge, MA: MIT Press.

*Visual Neuroscience*, 9 , 181–197.

*Vision Research*, 46, 3761–3770.

*Vision Research*, 37, 2133–2141.

*Biological Cybernetics*, 78, 107–117.

*Vision Research*, 39, 4232–4243.

*Vision Research*, 61, 25–32.

*The Journal of the Acoustical Society of America*, 133, 970–981.

*Science*, 265, 679–682.

*Biometrika*, 30 , 81–93.

*Rank correlation methods*(5th ed.). London, United Kingdom: Griffin.

*Nature Neuroscience*, 12, 655–663.

*Vision Research*, 99 , 46–56.

*Vision Research*, 61, 15–24.

*Encyclopedia of the sciences of learning*(pp. 3415–3418). Berlin, Germany: Springer.

*Vision Research*, 50, 375–390.

*The Stata Journal*, 2, 45–64.

*Psychological Review*, 112, 715–743.

*Vision Research*, 46, 3177–3197.

*Science*, 256, 1018–1021.

*Vision Research*, 35, 519–527.

*Vision Research*, 51 , 1552–1566.

*Nature Reviews Neuroscience*, 11, 53–60.

*The Journal of Physiology*, 483 (Pt. 3), 797–810.

*Journal of Vision*, 13 (9): 248, doi:10.1167/13.9.248. [Abstract]

*Journal of Vision*, 14 (10): 474, doi:10.1167/14.10.474. [Abstract]

*Neural Computation*, 5 , 694–718.

*Psychonomic Bulletin & Review*, 13, 656–661.

*x,y*) are tuned to spatial frequency

*f*, orientation

*θ*, and spatial phase

*ϕ*. The set of filters consisted of the joint product of five spatial frequencies (8, 11.3, 16, 22.6, 32 cycles/degree), seven orientations (0°, ±15°, ±30°, ±45°), and four spatial phases (0°, 90°, 180°, 270°). (The simulation with both vertical and horizontal verniers requires 12 orientations spanning from −90° to 75° with a step size of 15°.) Spatial frequency tuning and orientation tuning bandwidths were set at

*h*= 1 octaves and

_{f}*h*= 30° (half amplitude, full bandwidth). These values are the same as those used in prior applications of this form of the AHRM (Dosher et al., 2013; Liu et al., 2010, 2012; Petrov et al., 2005, 2006) and were based on estimates of cellular tuning bandwidths in the primary visual cortex.

_{θ}*I*(

*x*,

*y*) is convolved with each unit, followed in succession by half-squaring rectification, spatial phase pooling, and then inhibitory normalization (Heeger, 1992), respectively: and

*N*is tuned weakly for spatial frequency and is independent of orientation (see Petrov et al., 2005).

_{f}*a*is a scaling factor; the saturation constant

*k*is relevant for extremely small contrasts. In this application, we pool over spatial phase and a stimulus evidence region with kernel of radius

*W*.

_{r}_{1}has mean 0 and standard deviation

*σ*

_{1}, with a Gaussian distribution. The internal multiplicative noise ε

_{2}of mean 0 and standard deviation

*σ*

_{2}introduces another source of stochastic variability. The activation in each orientation and spatial frequency tuned unit is computed as follows:

*w*and a bias factor

_{i}*b*with weight

*w*and incorporating random Gaussian decision noise ε

_{b}*(mean 0 and standard deviation*

_{d}*σ*):

_{d}*o*′ maps to one response (

*left*), while a positive

*o*′ maps to the other response (

*right*).

*A*

_{max}= ±1); smaller feedback weights may only slightly shift activation toward the correct response. If feedback is not present, learning operates without the benefit of this shift toward a correct response (o =

*o*′). Except for very low accuracy conditions, the learned weights tend to move toward a more optimal weight distribution because

*o*′ tends to correlate with the correct response.

*η*, the presynaptic activation

*A*(

*θ*,

*f*), how far the postsynaptic activation is from its long-term average, (

*o*–

*ō*), and the distance between the current weights and their saturation values,

*w*

_{min}or

*w*

_{max}. Weights are changed (learned) according to this rule: with and average postsynaptic activation of

*ō*is inherited from block to block and is independently tracked for inputs of different orientations or from different locations. It is only reset between sessions, such as after a night's sleep as in Aberg and Herzog (2012). This treatment better replicates the qualitative pattern of the behavioral results.

*left*–

*right*decisions, which indirectly augments learning. The bias correction term

*b*tracks deviations of the recent response frequencies from 50% (or the instructed presentation probabilities) of the simulated observer. Criterion control input

*b*weighted by

*w*is input to the decision unit. The bias on each trial is an exponentially weighted average of the responses with a time constant of 50 trials (

_{b}*ρ*= 0.02):

*R*(

*t*) is the current trial's response (

*left*= −1 and

*right*= +1), and

*r*(

*t*) is the response running average that exponentially discounts past trials. Bias control is more important to learning in the absence of trial-by-trial external feedback (Petrov et al., 2006).

*right*responses. The experiments by Herzog and colleagues present more

*left*stimuli but systematically bias feedback toward

*right*and are designed to generate biases in responding. This is equivalent to shifting the criterion in a compensatory direction. Higher bias weights (

*w*) increase the impact of the bias correction term. Liu et al. (2014) used a hypothesized relationship between the accuracy in the last block of trials—either from block feedback or estimated in trial feedback conditions—and the bias weight (

_{b}*w*). In essence, the system has more confidence in the bias information when accuracy is high and less confidence in the bias information when accuracy is low. The minimum and maximum of the bias weight are at 0 and 1 for performance accuracies (proportion correct) between chance at 0.50 and perfect performance at 1.0, with the bias weight set to twice the percentage correct minus one. The bias weight changes after every block in the block-feedback conditions.

_{b}*w*= (

_{i}*θ*/ 45)

*w*. Learning and bias in these experiments reflect relatively subtle changes that tilt these weights toward one response or the other; the

_{init}*left*and

*right*vernier is very similar in representation space (see Figure 3b), and the percentage changes of weights are in many cases quite small. In order to make these subtle changes more visible, we display changes in the weights as a function of training, relative to initial values. We chose to scale these as proportional changes relative to the average magnitude of all the initial weights. In previous applications of the model to experiments with widely varying stimuli, changes were visible in the weights themselves (Dosher et al., 2013; Liu et al., 2014; Petrov et al., 2005, 2006).

*right*feedback. Subsequent training with accurate feedback shifts weights toward left, now tracking the more likely

*left*feedback. Indeed, the largest shifts are for orientation channels of 0°, +15°, and −15° that are most sensitive to the very small angles of the vernier stimuli. The shifts in the first phase change more slowly than those after the switch to accurate feedback. This occurs for two reasons. In reverse feedback, the slight stimulus information in the small subthreshold offset in the initial weights opposes its false feedback, while with accurate feedback they move in the same direction. Additionally, in the Hebbian rule, the size of the weight change

*δ*=

_{i}*ηA*(

*θ*)(

_{i}f_{i}*o*–

*ō*) is proportional to the difference of the postsynaptic output and its average over time. The contrasts with a right-shifted average postsynaptic output,

*ō*, inherited from the reverse feedback phase, increases the effective weight change

*δ*at the beginning of the correct trial-by-trial feedback phase.

**Figure B1.**

**Figure B1.**

*left*feedback and the asymmetric stimuli in the correct (accurate) feedback in the final phase of training. The weight change in this third phase reprises that in the first phase of the correct-feedback condition, starting from the weight state at the end of phase two. This rerelease of new learning after the no-feedback training phase is also a peculiar interaction in which the size of weight change depends on the contrast of the postsynaptic output and its running average, or (

*o*–

*ō*). At the beginning of training, the running average begins at zero; as time goes on, the average postsynaptic activity trends negative, or

*left*, and so the asymmetric left feedback has a smaller impact and weight change slows in the trial-by-trial feedback conditions. Three blocks of training with vernier stimuli without feedback, where the postsynaptic output reflects only stimulus information yielding postsynaptic outputs that are so close to zero, reinstates the conditions of early learning.

**Figure B2.**

**Figure B2.**

*right*data in the last three blocks of correct feedback training in the behavioral data (see Figure 6), it is not clear how strongly this feature of the model is tested in the current data sets. This peculiar predicted interaction with interspersed no-feedback training seems to be a property of the small offsets of the vernier stimuli combined with the asymmetric stimulus design. If taken seriously, this property seems to predict a possibly testable advantage to cycling feedback training with no-feedback training.

**Figure B3.**

**Figure B3.**

*δ*depends directly on the activity in that spatial frequency and orientation channel

_{i}*A*(

*θ*,

*f*) and because that activity is largely focused on orientations near the vertical for vertical vernier judgments (and vice versa for horizontal), the weight changes on units relevant to horizontal judgments are largely unchanged (except for random drift) during the vertical vernier training phases. Similarly, the weight changes on units relevant to vertical judgments are largely unchanged (except for random drift) during the horizontal vernier training phases. Otherwise, the percentage weight changes during reverse feedback training and accurate feedback training phases mirror those described earlier.

**Figure B4.**

**Figure B4.**