**Perceptual learning is classically thought to be highly specific to the trained stimuli's retinal locations. However, recent research using a novel double-training paradigm has found dramatic transfer of perceptual learning to untrained locations. These results challenged existing models of perceptual learning and provoked intense debate in the field. Recently, Hung and Seitz (2014) showed that previously reported results could be reconciled by considering the details of the training procedure, in particular, whether it involves prolonged training at threshold using a single staircase procedure or multiple staircases. Here, we examine a hierarchical neural network model of the visual pathway, built upon previously proposed integrated reweighting models of perceptual learning, to understand how retinotopic transfer depends on the training procedure adopted. We propose that the transfer and specificity of learning between retinal locations can be explained by considering the task-difficulty and confidence during training. In our model, difficult tasks lead to higher learning of weights from early visual cortex to the decision unit, and thus to specificity, while easy tasks lead to higher learning of weights from later stages of the visual hierarchy and thus to more transfer. To model interindividual difference in task-difficulty, we relate task-difficulty to the confidence of subjects. We show that our confidence-based reweighting model can account for the results of Hung and Seitz (2014) and makes testable predictions.**

_{1}layer), one intermediate component (referred to as V

_{4}layer), and one decision unit. Each V

_{1}component represents a group of orientation-selective neurons possibly located in V

_{1}or early in the visual pathway. The units in the V

_{1}layer represent sets of neurons in two distinct retinal locations in different quadrants (but not necessarily in a different meridian) in the visual field, referred to as “Loc-1” and “Loc-2,” respectively. These V

_{1}units generate the location-dependent representations of the input stimulus. The stimulus is presented to the network through activation of the V

_{1}layer that corresponds to the stimulus location in that trial (Loc-1 or Loc-2). In addition to orientation-selective units, the V

_{1}layer contains noise units corresponding to the presence of peripheral neurons whose activity is not related to the task but influences the final decision.

**Figure 1**

**Figure 1**

_{4}layer receive inputs from both V

_{1}components. As a result, they provide a location-independent representation of the inputs. More precisely, each unit in V

_{4}layer receives its input from three different V

_{1}units in each location. This results in the V

_{4}units having broader tuning in orientation. The decision unit represents high-level decision-making areas in the brain and receives location-specific and location-independent representations from V

_{1}and V

_{4}layers, respectively. The decision unit determines the output of the network for the task by integrating the weighted inputs it receives from both V

_{1}and V

_{4}layers. Plasticity of the connections between the location-independent representations in V

_{4}and the decision unit is responsible for transfer to occur between retinal locations.

_{1}layer

_{1}layer are modelled as in Sotiropoulos et al. (2011). The orientation-selective units in each V

_{1}component model simple cells characterized by elongated receptive fields, which are maximally triggered in the presence of a particular orientation. The additional noise units model the neighboring neurons whose activity is independent of the input presented. They yield additional decisional noise and their activity is drawn from scaled values of standard normal distribution to have values similar to that of their orientation-selective counterparts.

*r*of each orientation-selective unit is mathematically described by: where

*ξ*introduces multiplicative noise in the final response and is drawn from the standard normal distribution. The mean firing rate

*r̂*of the oriented filter as a result of the projection of a particular input stimulus is given by: where

*r*is the maximum possible firing rate,

_{max}*q*is the total value derived from the receptive field function in response to the stimulus,

*q*

_{0}is the threshold indicating the minimum value of

*q*that enables the neuron to fire and

*g*is a gain parameter. [.]

_{+}is the rectification operator that sets negative values of firing rate to zero, to ensure positive firing rate values. The value

*q*derived from the receptive field function in response to the stimulus is given by: where

*I*(

*x*,

*y*) is the intensity of the stimulus at the point (

*x*,

*y*),

*ε*is a value drawn from standard normal distribution introducing noise in the output of the units and the mathematical description

*G*(

*x*,

*y*) of the function of a receptive field is given by a Gabor function: where the parameters

*σ*and

_{x}*σ*define the width of 2D Gaussian envelope along the

_{y}*x*and

*y*axes, respectively; and

*f*and

*ϕ*determine the spatial frequency and phase of the sinusoidal component, respectively. The receptive fields are modeled as Gabor functions with parameters specified in Table 1. The orientation preferences of different orientation-selective units are obtained by standard rotation of coordinates for the implementation of respective receptive fields:

*θ*is the angle of preferred orientation.

_{1}component consists of units with 13 different preferred orientations equally spaced in the interval [−90°, 90°]. For each preferred orientation there are seven units with different values of spatial phase, resulting in 91 orientation selective units. Different values of spatial phase account for the model's accuracy while simulating spatial jittering to facilitate position invariant judgements. There are 59 additional noise units and thus each V

_{1}component contains 150 units.

_{4}layer

_{4}layer consists of 150 units, and each unit receives input from three different V

_{1}units with adjacent orientation preferences and same-phase preference from both V

_{1}components. This ensures that the V

_{4}units have broader tuning curves than their V

_{1}counterparts. For example, the V

_{4}unit with an orientation preference for 0

^{0}receives input from the V

_{1}units with orientation preference −15

^{0}, 0

^{0}, and 15

^{0}in each V

_{1}locations. There are 77 orientation-selective V

_{4}units and 73 additional noise units. The relatively greater number of noise units in the V

_{4}layer simulates a lower signal-to-noise ratio compared to V

_{1}layer, consistent with biological data (Ahissar & Hochstein, 2004). The noisy units are modeled similar to those in the V

_{1}layer. Each non-noisy unit receives the mean input from the three V

_{1}units it is connected to. is given by: where is the response of the V

_{1}units to which it is connected. The index

*j*varies such that each V

_{4}unit receives input from three V

_{1}units of neighboring orientations.

_{1}and V

_{4}and implements the comparison between the test and reference stimulus as follows: where

*f*(.) is log-sigmoid function given by: ( ) and ( ) are the weighted responses of V

_{1}component and V

_{4}component for test (reference) stimulus respectively, given by: is the response of the

*i*V

^{th}*component (*

_{k}*k*= 1 or 4) and is the corresponding connection weight.

*O*in the interval [0, 1], not necessarily a binary value. However, in a 2AFC task the network is required to give a binary output, i.e., 0 or 1. To achieve this, when

*O*is greater than 0.5, the response is counted as 1 and when

*O*is less than 0.5 the response is counted as 0. This binarized value is used to assess whether the network responses count as correct or incorrect in the material that follows.

*O*(Equation 8) given by the log-sigmoid function takes a value closer to 0 or 1 and for trials at threshold it takes a value closer to 0.5. As a result, the confidence (

*C*) takes values closer to 1 for high values of stimulus offset (easy trials) and values closer to 0 for trials at threshold (difficult trials).

_{1}& V

_{4}to decision unit

_{1}/V

_{4}components and the decision unit are updated using the delta rule algorithm (Widrow & Hoff, 1960) and are initialized with values drawn from a uniform distribution in [−1, 1]. The delta rule algorithm minimizes the difference between the desired output (provided as a feedback) and the actual response of the network after each trial. According to the confidence-based learning mechanism we propose here, weights connecting V

_{4}layer to the decision unit learn more on easy trials and weights connecting V

_{1}layer to the decision unit learn more on trials at the threshold. The weights are updated after each trial and the weight update is mathematically given below:

_{4}layer to the decision unit: where is the learning rate of connections joining the V

*component to the decision unit (*

_{k}*k*= 1, 4).

*C*is the confidence calculated for the given trial, given by Equation 11. The output of the network

*O*is a function of connection weights from V

*to the decision unit (Equations 8 and 10). Hence*

_{k}*O′*is the derivative of output of the network with respect to the corresponding connection weights. Since the output is a log-sigmoid function (Equation 9), this derivative is given by:

*Y*is the desired binary output provided as a feedback after each trial . is the response of the

*i*unit in the V

^{th}*component (*

_{k}*k*= 1, 4), which for the 2AFC task is given by: where ( ) is the response of the V

*node (*

_{k}*k*= 1, 4) node to the test (reference) stimulus.

*t*trial are thus given by: is given by Equations 12 and 13 above.

^{th}_{1}/V

_{4}layers to decision unit are randomly initiated. To simulate different subjects, we used different seeds to generate random initial weights for these conditions.

_{1}units with zero phase and with centers separated by 40 arcmin. The Gabors are misaligned by an offset value specified for each trial (see Figure 2 for an example stimulus).

**Figure 2**

**Figure 2**

_{4}to the decision unit update at a faster rate with easy trials (high confidence). Conversely, the weights from V

_{1}to the decision unit update faster when training with difficult trials (low confidence). This results in higher V

_{4}-decision unit weight optimization in the multiple staircase training method and higher V

_{1}-decision unit weight optimization in the single staircase training method.

_{1}/V

_{4}layer to the decision unit are initialized randomly. To address this discrepancy in the baseline performance, Sotiropoulos et al. (2011) used a probabilistic mechanism that sets the response of the network to the desired output with a probability

*p*that depends on the easiness (offset) of the task, irrespective of the training received. In principle, this probability

*p*can be estimated from the baseline performance in real subjects. Here we simply set the probability

*p*to follow a linear function taking the value 0.5 (chance) for the smallest offset and 0.8 for the largest offset. The exact shape of the probability function does not affect the results of our simulations.

_{1}units to the decision unit weights. Figure 3A is highly comparable to the behavioral results (e.g., figure 4D in Hung & Seitz, 2014) in the single staircase condition.

**Figure 3**

**Figure 3**

_{4}–decision unit weights during the course of training. This enables transfer of learning between retinal locations. The simulation results in Figure 3B are highly comparable to the behavioral results (e.g., figure 4B in Hung & Seitz, 2014).

_{1}units and V

_{4}units in the final decision. Thus we modeled the transfer group by increasing the contribution of V

_{4}units and decreasing the contribution of V

_{1}units in Equation 8, which in this case reads:

**Figure 4**

**Figure 4**

*k*= 1.6. The simulation results are shown in Figure 4. In the transfer group, substantial transfer is observed between pre-training and mid-training testing sessions. No significant additional transfer is observed between mid-training and post-training testing sessions again matching the behavioral results (e.g., figure 5C in Hung & Seitz, 2014).

_{4}(which has broader tuning and lower signal-to-noise ratio) in the Transfer group will result in lower performance for small offsets and thus, on average and as a result of the staircase procedure, larger offsets during training. As a test of this prediction, we reanalyzed the behavioral data during training from the Hung and Seitz (2014) data set and found that this was indeed the case. As can be seen in Figure 5A, the Transfer group on average had larger offset (5.2 ± 0.035 arcmin) compared to the Specificity group (4.7 ± 0.029 arcmin), this difference was highly significant (

*p*< 0.0001, unpaired

*t*test). The model showed the same pattern of results, although the model did attain lower thresholds during training than the human subjects (Figure 5B). Overall, this data shows further support for the hypothesis that greater precision of training stimuli leads to greater specificity of learning (Jeter, Dosher, Petrov, & Lu, 2009).

**Figure 5**

**Figure 5**

**Figure 6**

**Figure 6**

_{1}and V

_{4}stages of the model to the decision-making process.

_{1}and V

_{4}representations to the decision unit. This allowed us to model the Transfer versus the Specificity groups (Figure 4). The Transfer group was modeled by assuming a greater contribution of location-independent representations (V

_{4}layer) in the final decision process. Since the V

_{4}units have broader tuning and lower signal-to-noise ratio, higher thresholds were observed in the model during training for the Transfer group compared to the Specificity group. This model prediction was tested and found to be true in a novel analysis of the experimental data of Hung and Seitz (2014) where the transfer group indeed showed less precise stimuli during training.

_{4}than the V

_{1}representations in the model.

_{1}and V

_{4}layers and have the most reliable layer contributing most to learning. In essence, this would be implemented by splitting Equation 8 into components for each layer (

*O*

_{V1}and

*O*

_{V4}) and by deriving a confidence score for each independently (

*C*

_{V1}and

*C*

_{V4}). The key for any such mechanism to work is that at greater precision V

_{1}provides a more reliable answer (because of tighter tuning and lower noise) and that at less precision V

_{4}provides a more reliable answer. Also, an interesting future direction would be to conduct a series of experiments that measure subjects' confidence about their decision at the end of each trial. If subjective confidence is directly related to how the decision structures in the model weight their inputs and determine learning, then quantifying subjects' confidence and using this instead of abstract linear relationship that we adopted in this paper could provide a better account of individual subject differences in learning.

*Proceedings of the National Academy of Sciences, USA*, 90 (12), 5718–5722.

*Trends in Cognitive Sciences*, 8 (10), 457–464.

*Vision Research*, 74, 30–39.

*Proceedings of the National Academy of Sciences*,

*USA*, 110 (33), 13678–13683.

*Proceedings of the National Academy of Sciences*,

*USA*, 95 (23), 13988–13993.

*The Journal of Neuroscience*, 34 (25), 8423–8431.

*Journal of Vision*, 9 (3): 1, 1–13, doi:10.1167/9.3.1. [PubMed] [Article]

*Proceedings of the National Academy of Sciences*,

*USA*, 88 (11), 4966–4970.

*Psychological Review*, 112 (4), 715.

*Nature*, 412, 549–552.

*Proceedings of the National Academy of Sciences*,

*USA*, 102 (41), 14895–14900.

*Trends in Cognitive Sciences*, 9 (7), 329–334.

*Perception & Psychophysics*, 52 (5), 582–588.

*Vision Research*, 51 (6), 585–599.

*PLoS One*, 2 (12), 1323.

*Neural Computation*, 5 (5), 695–718.

*Institute of Radio Engineers, Western Electronic Show and Conventions, Convention Record*, 4, 96–104.

*Current Biology*, 18 (24), 1922–1926.

*Journal of Vision*, 13 (4): 19, 1–13, doi:10.1167/13.4.19. [PubMed] [Article]

*Vision Research*, 50, 368–374.