Free
Article  |   January 2013
A computational developmental model for specificity and transfer in perceptual learning
Author Affiliations
Journal of Vision January 2013, Vol.13, 7. doi:https://doi.org/10.1167/13.1.7
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Mojtaba Solgi, Taosheng Liu, Juyang Weng; A computational developmental model for specificity and transfer in perceptual learning. Journal of Vision 2013;13(1):7. https://doi.org/10.1167/13.1.7.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract
Abstract
Abstract:

Abstract  How and under what circumstances the training effects of perceptual learning (PL) transfer to novel situations is critical to our understanding of generalization and abstraction in learning. Although PL is generally believed to be highly specific to the trained stimulus, a series of psychophysical studies have recently shown that training effects can transfer to untrained conditions under certain experimental protocols. In this article, we present a brain-inspired, neuromorphic computational model of the Where-What visuomotor pathways which successfully explains both the specificity and transfer of perceptual learning. The major architectural novelty is that each feature neuron has both sensory and motor inputs. The network of neurons is autonomously developed from experience, using a refined Hebbian-learning rule and lateral competition, which altogether result in neuronal recruitment. Our hypothesis is that certain paradigms of experiments trigger two-way (descending and ascending) off-task processes about the untrained condition which lead to recruitment of more neurons in lower feature representation areas as well as higher concept representation areas for the untrained condition, hence the transfer. We put forward a novel proposition that gated self-organization of the connections during the off-task processes accounts for the observed transfer effects. Simulation results showed transfer of learning across retinal locations in a Vernier discrimination task in a double-training procedure, comparable to previous psychophysical data (Xiao et al., 2008). To the best of our knowledge, this model is the first neurally-plausible model to explain both transfer and specificity in a PL setting.

Introduction
Perceptual learning (PL) is the long-lasting improvement in perception followed by repeated practice with a stimulus. The fact that low-level sensory perception is still highly plastic in adult humans sheds light on the underlying mechanisms of learning and plasticity. The subject of PL has long attracted researchers interested in behavioral (Drever, 1960; Gibson, 1963), modeling (Gibson, 1969; Shi, Zhongzhi, & Zucker, 2003; Tsodyks & Gilbert, 2004), and physiological (Law & Gold, 2008; Schoups, Vogels, Qian, & Orban, 2001) implications of perceptual learning. 
Conventional paradigms of perceptual learning studies have established the specificity (as opposed to transfer) of PL to the trained stimulus: orientation, direction of motion, eye of presentation, and retinal location (for a review of different tasks see Ahissar & Hochstein, 1997; Ball & Sekuler, 1987; Fahle & Edelman, 1993; Fine & Jacobs, 2002; Fiorentini & Berardi, 1980; Jeter, Dosher, & Liu, 2007; O'Toole & Kersten, 1992; Poggio, Fahle, & Edelman, 1992; Ramachandran & Braddick, 1973; Yu, Klein, & Levi, 2004). For example, in a well-known study, Schoups et al. (2001) observed that the slope of the tuning curve of orientation sensitive neurons in V1 increased only at the trained location. Furthermore, the change was retinotopic and orientation specific. Karni and Sagi (1991) reported that in a texture discrimination task, PL effects were retinotopically specific, strongly monocular and orientation specific. 
In recent years there has been accumulating experimental evidence that has challenged the specificity during perceptual learning, i.e., specificity is not an inherent property of perceptual learning but rather a function of experimental paradigms (e.g., Pavlovskay & Hochstein, 2011; Xiao et al., 2008; Zhang, Xiao, Klein, Levi, & Yu, 2010, Zhang, Zhang, et al., 2010). As illustrated in Figure 1, there seems to be a general pattern in many of the studies that showed transfer in PL: Training the perceptual task in one condition accompanied by exposure to a second condition results in transfer of learning effects to the second condition. The model of transfer presented in this article is inspired by this general pattern, although we will show that the observed improved performance in transfer condition is a result of gated self-organization mechanisms rather than literal transfer of the information learned for one condition to a novel condition. 
Figure 1
 
General pattern observed in transfer studies. Regardless of the order, a training and an exposure step seem to be common prior to transfer.
Figure 1
 
General pattern observed in transfer studies. Regardless of the order, a training and an exposure step seem to be common prior to transfer.
In this paper, our definition of term exposure is as the following: 
 

Exposure is receiving (being exposed to) any sensory signals that have at least one feature (including location) in common with the Transfer condition. Exposure can be either active (e.g., doing a task) or passive (e.g., only viewing the stimulus).

 
This is a generic definition of exposure which includes what has been called pretest (Zhang, Xiao, et al., 2010), double-training (Xiao et al., 2008), and exposure (Zhang, Zhang, et al., 2010). 
Although exposure to the features of the transfer condition is neither necessary (counter example Zhang, Xiao, et al., 2010, figure 4E) nor sufficient (counter example Zhang, Zhang, et al., 2010, figure 1d) to cause transfer, we believe that it is an important assisting factor for triggering the neural processes which, in turn, result in transfer. The ease of transfer in PL seems to depend on both the stimulus and task. For example, Vernier discrimination requires a lot of exposure, such as double training in Xiao et al. (2008), whereas transfer for orientation discrimination seems to require little or no exposure, such as in Zhang, Xiao, et al. (2010). We speculate that such differences are due to different amounts of pre-experiment experience with a particular feature/task and constraints imposed by specific sensory processing mechanisms for different features. In the present study, we investigate how transfer could occur in a neural network model of the visual system. However, we do not aim to explain why and under what circumstances transfer happens, although we present a conjecture for that too—exposure to untrained, transfer conditions. 
Previous models of perceptual learning attribute the improvement in stimulus discrimination to neural modification in either low-level feature representation areas, such as V1, or the connection patterns from the low-level to high-level areas. From a computational point of view, models that predict specificity of training effects are not very difficult to come by.1 Therefore, not surprisingly, nearly all of the computational models of perceptual learning predict specificity but not transfer. 
The first group of models (we call them low-level based, or lower models) are inspired by the retinotopic nature of the lower visual areas, e.g., Adini, Sagi, and Tsodyks (2002); Teich and Qian (2003); Zhaoping, Herzog, and Dayan (2003a). These models predict specificity—not transfer—of training effects since the stimulus reaches only the parts of the V1 that retinotopically correspond to the specific trained features and locations in the visual field. 
The second group of perceptual learning models (we call them reweighting based, or higher models), unlike the first group, assume that discrimination takes place in higher stages (e.g., post V1) of visual processing (e.g., Dosher & Lu, 1998; Lu, Liu, & Dosher, 2009; Poggio et al., 1992), and perceptual experience improves the readouts from the sensory cortex by modifying (reweighting) the connections from low-level representation areas to high-level decision making areas (Liu, Lu, & Dosher, 2010; Petrov, Dosher, & Lu, 2003). Since practice with visual stimuli at a certain location and feature reaches only certain connections from low to high-level areas, these models also predict specificity of perceptual learning across locations and features. 
How then do the neural circuits manage to generalize (transfer) the learning effects to untrained locations and features? As stated above, existing computational models fail to explain this. A rule-based learning model by Zhang, Zhang, et al. (2010) attempted this important question by assuming that a set of location-invariant or feature-invariant heuristics (i.e., rules) can be learned during perceptual practice, given appropriate experimental settings. This theory lacks neuromorphic level detail and is not implemented and verified by computer simulation. 
In this paper, we propose a model of perceptual learning based on the brain-inspired computational framework proposed by Weng (2010). The general assumption of the model is that the brain consists of a cross-connected network of neurons in which most of the modules and their connectivity patterns emerge from neural activities. These assumptions were based on neuroanatomical observations that there are extensive two-way connections between brain areas and developmental neurobiological studies showing that the brain develops its network in an individual's life time (see, e.g., Felleman & Van Essen, 1991; Kandel, Schwartz, & Jessell, 2000). 
Before providing the details of the model in the next section, we highlight several key aspects of the model that are relevant to PL. In terms of architecture, the model is distinct from existing models by attributing the training effects to not only the improved connections from the sensory to higher cortical areas (e.g., motor areas) but also the improved representations in the sensory cortex due to neuronal recruitment. Moreover, in order for transfer to occur, a critical role is assumed for descending (top-down) connections, from motor areas that represent concepts down to adaptively selected internal feature neurons. 
In terms of algorithm, we present a rather unconventional and counterintuitive mechanism for transfer in PL, namely gated self-organization. A prevalent assumption in the PL research community seems to be that transfer of learning is caused by the reuse of the representations learned for trained conditions during testing for untrained conditions. Our model, however, does not assume any representational overlap between training and transfer conditions. It assumes a base performance level for the PL task, which simulates the condition where human subjects can always perform at a high level on an easy task without extensive training. The discrimination power existing in this base performance level is improved via gated self-organization as a result of exposure effects accumulated during the prolonged training and testing sessions. These mechanisms occur during off-task processes when the model is not actively engaged in a PL task, resulting in performance improvement as significant as those for the trained conditions. In essence, the training sessions merely prime the neuronal circuits corresponding to the untrained conditions to utilize the information already stored in the network (even before the PL training sessions) and bootstrap their performance to the trained level via self-organization. 
The model is tested on a simulated Vernier discrimination task. It predicts specificity of training effects under conventional experimental settings, as well as transfer of feature discrimination improvement across retinal locations when the subject is exposed to another stimulus at the transfer location (“double training” per Xiao et al. 2008). Although the results presented here are only for the Vernier discrimination task and transfer across locations, the general model presents a detailed network-level explanation of how transfer can happen regardless of task, feature, or location, because the network's developmental mechanisms are independent of stimuli (e.g., Vernier) and outputs of the network (e.g., type, orientation, location, etc.). In other words, since our model is a developmental network in which the internal representations are developed from experience, as opposed to being fixed, predesigned feature detectors such as Gabor filters, the presented results should in principle generalize to other types of stimuli and experimental settings. 
Model
The overall architecture: Introduction to Where-What Networks (WWN)
Where-What Networks (Ji, Weng, & Prokhorov, 2008) are a visuomotor version of the brain-inspired model outlined in Weng (2010), modeling the dorsal (where) stream and the ventral (what) stream of visual and behavioral processing. A major advance from the existing rich studies of the two streams is to attribute the major causality of the where and what representations to the higher concept areas in the frontal cortex, since motor signals participate in the formation of representations along each stream through top-down connections. That is, each feature neuron represents not only a bottom-up feature vector x in the bottom-up source but instead a joint feature (x, z) consisting of both bottom-up feature vector x from receptors and top-down feature vector z from effectors. In order for such a neuron to win the lateral competition and subsequently fire, its internal representation must match well with both the top-down part of its input signal, z, and the bottom-up part of its input signal, x
Where-What Networks (WWN) have been successfully trained to perform a number of tasks such as visual attention and recognition from complex backgrounds (Luciw & Weng, 2010), stereo vision without explicit feature matching to generate disparity outputs (Solgi & Weng, 2010), and early language acquisition and language-based generalization (Miyan and Weng, 2010). Figure 2 shows a schematic of the version of the network used in this study to model PL as part of an integrated sensorimotor system. The network is developmental in the sense of Weng et al. (2001), i.e., none of the internal feature sensitive neurons are predesigned by the programmer, but rather they are developed (learned) via agent's interactions with the natural stimuli. 
Figure 2
 
A schematic of the Where-What Networks (WWN). It consists of a sensory cortex which is connected to the What area in the ventral pathway and to the Where area in the in the dorsal pathway.
Figure 2
 
A schematic of the Where-What Networks (WWN). It consists of a sensory cortex which is connected to the What area in the ventral pathway and to the Where area in the in the dorsal pathway.
In order for internal network structures to emerge through such interactions, the initial structure of the network does not impose many restrictions. As illustrated in Figure 2, the network consists of one area of neurons modeling the early sensory areas LGN/V1/V2. The signals then diverge into two pathways, the dorsal (or where) pathway, and the ventral (or what) pathway. The two pathways are bidirectionally connected to the location area and the type area in the frontal cortex, respectively. Unlike the sensory cortex, we assume that the outputs from the location area and the type area can be observed and supervised by teachers (e.g., via the motor areas in the frontal cortex). 
The lobe component analysis (LCA) (Weng & Luciw, 2009) is used as an algorithm for neural learning in a cortical area in WWN. It uses the Hebbian mechanism to enable each neuron to learn based on the presynaptic and post-synaptic activities that are locally available to each synapse. In other words, the learning and operation of WWN do not require a central controller. 
In the following subsection, the learning algorithm and signal processing operations in the network are laid out. It is assumed that the network has the overall structure shown in Figure 2. Namely, the internal sensory cortex consists of a two-dimensional array of cortical columns, laid out in a grid fashion, where each column receives bottom-up input from a local patch on the retina (input image) and has bidirectional connections with all of the neural columns in the concept area. Although the concept areas in the brain have a similar six-laminar structure, we implemented only a single-layer structure for the concept areas since there is no top-down input to the concept areas in this simplified model of the brain. 
The learning algorithm
The learning algorithm in WWN is inspired by the six-layer structure of the laminar cortex (Callaway, 2004). The internal area of the network (see Figure 3) consists of a two-dimensional grid of columns of neurons. As shown in Figure 3C, each column has three functional layers (Layers 2, 3, and 4, shown enclosed in dotted rectangles in the figure) as well as three assistant layers (Layers 5a, 5b, and 6, not shown for simplicity of illustration). No functional role is assumed for Layer 1, hence it is not included in the model. We speculate that the computational advantage of the laminar structure of the neocortex is that each area can process its incoming bottom-up and top-down signals separately before combining them. The bottom-up signals first reach Layer 4, where they are prescreened via lateral interaction in the layer assisted by Layer 6. Similarly, the top-down signals are first captured and prescreened by the lateral interactions in Layer 2, assisted by Layer 5a. The result of these two separate parallel operations is then integrated in Layer 3, processed via the lateral interactions assisted by Layer 5b, and then projected to the next higher level (concept areas in this case). The Hebbian learning rule is used for updating the bottom-up weights of Layer 4 and the top-down weights of Layer 2, while all the other connection weights are one-to-one and fixed. Below is a step-by-step algorithmic description of the operations. For simplicity of notations, the time factor, t, is not shown in the equations. 
Figure 3
 
How training and exposure accompanied by off-task processes can cause the learning effects to transfer. Each circle schematically represents a column of neurons with laminar architecture (see a column's details in Figure 3C), solid lines show connections made (or improved) via direct perceptual learning, and dashed lines are the connections made or improved via off-task processes. (A) Transfer across locations in Where-What Networks. See the text for explanation. (B) Recruitment of more neurons in the sensory and concept areas. Many connections are not shown for the sake of visual simplicity. See text for details. (C) A cortical column from the internal layer magnified, along with its neighboring columns. The column is depicted in the jagged rectangle, and the arrows show the bottom-up and top-down signal passings among the layers. Only three functional layers (two, three, and four) are shown. We conjecture that Layers 5 and 6 have an assistive role in modulating the lateral connections (depicted by gray vertical arrows in each layer). They are not shown here for the simplicity of illustration. No information processing role is assumed for Layer 1, hence, not shown in the Figure. In short, Layer 4 and Layer 2 processes bottom-up and top-down signals, respectively. Then, Layer 3 integrates the output of Layer 4 and 2 and projects signals to higher concept areas.
Figure 3
 
How training and exposure accompanied by off-task processes can cause the learning effects to transfer. Each circle schematically represents a column of neurons with laminar architecture (see a column's details in Figure 3C), solid lines show connections made (or improved) via direct perceptual learning, and dashed lines are the connections made or improved via off-task processes. (A) Transfer across locations in Where-What Networks. See the text for explanation. (B) Recruitment of more neurons in the sensory and concept areas. Many connections are not shown for the sake of visual simplicity. See text for details. (C) A cortical column from the internal layer magnified, along with its neighboring columns. The column is depicted in the jagged rectangle, and the arrows show the bottom-up and top-down signal passings among the layers. Only three functional layers (two, three, and four) are shown. We conjecture that Layers 5 and 6 have an assistive role in modulating the lateral connections (depicted by gray vertical arrows in each layer). They are not shown here for the simplicity of illustration. No information processing role is assumed for Layer 1, hence, not shown in the Figure. In short, Layer 4 and Layer 2 processes bottom-up and top-down signals, respectively. Then, Layer 3 integrates the output of Layer 4 and 2 and projects signals to higher concept areas.
Prescreening of bottom-up signals in Layer 4
For each ith neuron, ni, in Layer 4, the bottom-up weight vector of the neuron, wb,i(L4), and the bottom-up input to the neuron, bi(L4), are normalized and then multiplied. Dot product is used to multiply the two vectors, as it measures the cosine of the angle between the vectors—a measure of similarity and match between two vectors We call i(L4) the initial or preresponse of the ith neuron before lateral interactions in the layer. The lateral interactions, which yield the response of the neuron, consist of lateral inhibition and lateral excitation. In the current version of the model, there are no explicit lateral connections which make the algorithms more computationally efficient by avoiding oscillations necessary to stabilize lateral signals while getting essentially the same effects. Lateral inhibition is roughly modeled by the top-k winner rule, i.e., the k ≥ 1 neurons with the highest preresponse inhibit all the other neurons with lower preresponse from firing by setting their response values to zero. This process simulates the lateral competition process and was proposed by Fukai and Tanaka (1997) and O'Reilly (1998), among others, who used the term k-winner-takes-all (kWTA). The preresponse of these top-k winners are then multiplied by a linearly declining function of neuron's rank: where ← denotes the assignment of the value, and 0 ≤ ri < k is the rank of the neuron with respect to its preresponse value (the neuron with the highest preresponse has a rank of zero, second most active neuron get the rank of one, etc.). Each neuron competes with a number of other neurons for its rank in its local neighborhood in the two-dimensional grid of neurons of the layer. A parameter called competition window size, ω, determines the local competitors of the neuron. A competition windows of size ω = 5, centered on the neuron, is used for the reported results. The modulation in Equation 2 simulates lateral inhibition among the top-k winners. 
Prescreening of top-down signals in Layer 2
The exact same algorithm of prescreening described above for Layer 4 runs in Layer 2 as well. The only difference is that Layer 2 receives top-down signals from a higher area instead of bottom-up input from a lower area. 
Integration and projection to higher areas in Layer 3
In each cortical column (See Figure 3C), the neuron in Layer 3, ni, receives the response value of the neuron in Layer 4, bi, and the neuron in Layer 2, ei, and sets its preresponse value to be the average of the two values: The preresponse value of the Layer 3 neuron, zi(L3), is then updated after lateral interactions with other neurons in Layer 3, following the exact same algorithm for lateral inhibition described for Layer 4 neurons. For simplicity of terminology, we choose to equate the preresponse and response of Layer 3 with the preresponse and response of the whole column. 
To model lateral excitation in the internal area, neuronal columns in the immediate vicinity of each of the k winner columns are also allowed to fire and update their connection weights. In the current implementation, only eight columns in the 3 × 3 neighborhood (in the two-dimensional sheet of neuronal columns) are excited. The response level of the excited columns are set to the response level of the their neighboring winner column, multiplied by an exponentially declining function of their distance (in the two-dimensional grid of columns) to the winner columns: where the distance d = 1 for immediate neighbors of the winner columns and d = 2 for the diagonal neighbors in the 3 × 3 neighborhood of the columns. The output of the neurons in Layer 3 are projected to the next higher area (concept areas in the experiments of this article). 
Hebbian learning in the winning cortical columns
If a cortical column of neurons wins in the multistep lateral competitions described above and projects signals to higher areas, i.e., if the Layer 3 neuron in the column has a nonzero response value, the adaptable weights of Layer 2 and Layer 4 neurons in the column will be updated using the following Hebbian learning rule: where β1 and β2 determine retention and learning rate of the neuron, respectively: with β1 + β2 ≡ 1. In the equation above, mi is the column's maturity level (or age) which is initialized to one, i.e., mi = 1 in the beginning, and increments by one, i.e., mimi + 1, every time the column wins. The maturity level parameter, mi, is used to simulate the amount of neural plasticity or learning rate in the model. Similar to the brain, the model's plasticity decreases as the maturity level or age increases. This is compatible with human development; neural plasticity decreases as people get older. 
μ is a monotonically increasing function of mi that prevents the learning rate β2 from converging to zero as mi increases For the results reported here, we used the typical value t1 = 10, t2 = 103, c = 2, and r = 104. See Appendix A for detailed description of these parameters. 
Equation 5 is an implementation of the Hebbian learning rule. The second term in the right-hand-side of the equation, which implements the learning effect of the current cycle, consists of response of the presynaptic firing rate vector, bi(L4), multiplied by post-synaptic firing rate, zi(L4). This ensures that a connection weight is strengthened only if the pre- and post-synaptic neurons are firing together, hence, the Hebbian rule. 
The same Hebbian learning rule updates the top-down weights of neurons in Layer 2: The neurons in the Where and What concept areas use the same Hebbian learning above for updating their weight vectors. They also utilize the same dot-product rule and lateral interactions for computing their response values. During the times when the firing of a concept neuron is imposed, however, e.g., during supervised training or off-task processes, the response value of each neuron in the concept areas is set to either zero (not firing) or one (firing). 
The off-task processes triggered by exposure
Off-task processes in WWN are the neural interactions during the times when the network is not attending to any stimuli or task. In contrast with most neural network models, WWN runs the off-task processes to simulate the internal neural activities of the brain even when sensory input is absent or not attended. The off-task processes are run all the time when the network is not in the training mode, e.g., during perceptual learning. As explained in detail below, these processes may or may not alter the network connections depending on the recent experience of the network. 
During the off-task processes, the cortical columns in the internal area operate using the exact same algorithm described in the section Hebbian learning in the winning cortical column, while the bottom-up input is irrelevant to the trained and transfer tasks (random pixel background images were used in the current implementation). Similarly, the neurons in the Where and What concept areas operate using the same algorithms. Whether or not a concept neuron fires during off-task processes is a function of the amount of recent exposure of the network to the concept (location or feature) that the neuron is representing. 
The probability of a concept neuron firing during off-task processes, given no other neuron is firing in the same concept area, is modeled as a monotonically increasing function, a logistic sigmoid function, of the amount of recent exposure to the corresponding concept, i.e.,  where γi ≥ 0 measures the amount of recent exposure to the concept that neuron ni represents. To simulate lateral inhibition in the concept area, the conditional part of the probabilities in Equations 9 and 10 models lateral inhibition in the concept areas—it ensures that a concept neuron does not fire if there is already another neuron firing in the same area. 
The amount of exposure to the ith concept, γi, is formulated as the accumulation of the effect of exposure across trials of experiment. The effect of each trial, in turn, is a decreasing function of the time passed since the trial, i.e., where α is a small positive normalization factor, c is a positive constant, and Δtj = ttj is the difference between the time of jth exposure, tj, and the time of off-task processes, t. This ensures that a trial of exposure has the highest effect factor if it is immediately followed by the off-task processes (Δtj = 0), and the effect exponentially decreases in time. In the simulations implemented in this paper, the time t of trials is assumed to have the granularity of one hour. Therefore, the trials of experiments in the same day all have the same time stamp t and the following day gets a time stamp of t + 24, etc. 
We chose the parameters so that p(zi = 1|∀ji, zj = 0) ≃ 1 during off-task processes if extensive recent (in the scale of several days) exposure has occurred for the ith combination of location and orientation and p(zi = 1) ≃ 0 otherwise. See Appendix A for the parameter values used for the reported results. 
After the firing of neurons in the concept area is determined from the mechanisms above, the response of the internal area is first computed using the algorithms in the section The learning algorithm. Next, the same probabilistic mechanisms above determine whether or not a winner neuron will fire, depending on its amount of recent exposure. This is an approximate way to simulate the phenomenon that an active brain region tends to keep active for a short while. This could be caused by diffused neurotransmitters such as norepinephrine released from active neurons which indicate a kind of novelty or recency (e.g., Yu & Dayan 2005). 
Using the gated self-organization terminology, the role of exposure, modeled by Equations 5, 8, and 11 above, is to open the gates for the concept neurons corresponding to the exposed conditions to fire in later off-task processes. The nonlinear accumulation of exposure effects in time (Equation 11) can be considered as the gatekeeper for self-organization mechanisms during the off-task processes. 
How off-task signals and neural recruitment result in transfer
Here, we first explain the idea of transfer in a general form, only its essence while skipping details for simplicity. We then add more details by describing the idea in the context of Where-What Networks for transfer across locations and later adding the neural recruitment explanation. 
Transfer via off-task processes
Expanding on the general pattern discussed in Introduction section—that training in one condition accompanied by exposure to another condition results in transfer to the second condition—we introduce our theory of transfer in three phases: 
(a) The training phase performs stimulus-specific and concept-specific learning, e.g., L1F1 (green) and L2F2 (red) in Figure 3A. This phase establishes associations between presented stimuli and trained concepts via the internal layer of the network using the neural mechanisms laid out in the section The learning algorithm. 
(b) The testing phase was meant for measuring a baseline performance, corresponding to the transfer skill. But this phase also provides a weak but important memory for the later off-task periods to associate two concepts, e.g., L1 with F2 and L2 with F1 shown in Figure 3A. In the general case, the exposure can be active or passive engagement with stimuli. The section The off-task processes triggered by exposure describes how the effect of exposure is systematically implemented in the model during off-task processes. 
(c) The off-task processes are assumed to take place while the human subject is not required to perform the training or testing tasks, such as during taking a brief pause. Inside the network, the off-task processes are passes of signals in cortical pathways when no direct sensory signal is present (or attended). The off-task processes cause the concept neurons (e.g., L1 and F2) and internal (feature) columns of neurons (e.g., L1F2) to not only reweight their existing connections but also grow new ascending and descending connections which did not previously exist, causing new circuits to be developed. These new circuits along with recruitment of new neurons (See the section Neural recruitment facilitates fine learning and transfer) represent the newly learned network functions that cause performance improvement in the transfer condition. 
What we refer to as off-task processes are not necessarily processes that could happen when the subject is off training sessions. Rather, they are any kind of top-down driven neuronal processes which could occur when the task at hand does not fully occupy the subject's brain. Such processes are mostly unconscious and could take place during any task pause or even during a trial if the task is not very attentionally demanding. 
Example: Transfer across locations
Here we use an example of transfer of learning across retinal locations to explain how the WWN mechanisms enable the above general theory to be realized. Consider Figure 3A which depicts a WWN. In the following example, we denote the input area (stimulus) as X, the internal area as Y, and the concept areas as Z. The concept area consists of two subareas Where or location denoted by ZL and What or feature denoted by ZF. The connections in our model are denoted by the following concise notation: where the sign ⇄ denotes a connection in each of the two directions. Y has bidirectional connections with both areas in [ZL, ZF]. 
Here we present a step-by-step description of the connection changes during the experimental procedure. 
The training phase:
The area X is presented with stimulus X(L1/F1), denoting that the feature F1 is presented at retinal location L1. The subject is taught with two concepts in the area Z, Where area with concept L1 denoted by ZL(L1) and What area with concept F1 denoted by ZF(F1). While top-k firing neuronal columns (see the section The learning algorithm) fire in the response pattern Y(L1F1), they connect with the firing neurons in Where and What bidirectionally due to the Hebbian learning mechanisms. The resulting network connections are denoted as and shown by the green links in Figure 3A
Similarly, in the second part of the training phase, the area X is presented with stimulus X(L2F2), denoting that the feature F2 is presented at retinal location L2. The resulting network connections are denoted as and shown by the red links in Figure 3A
The testing phase:
This phase was meant for measuring the baseline performance before the above intensive training sessions after the intensive training sessions at L1F1 and again after intensive training sessions at L2F2 for measuring improvements in performance after each step. However, this phase also results in weak but important connections and priming for the off-task processes as discussed below. For example, the area X is presented with feature F1 at location L2, i.e., X(L2F1), resulting in the following network connections:  
Similarly, the feature F2 is also presented at location L1, i.e., X(L1F2), resulting in the following weak network connections:  
The off-task processes:
During the off-task processes, the concept neurons which were primed (i.e., fired repeatedly) in training and testing phases spontaneously fire. This spontaneous firing in the absence of relevant stimuli is justified by an accumulated recency effect, formulated in the section The off-task processes triggered by exposure. In particular, during the off-task processes around the L1F1 sessions (temporal recency), the model thinks about L1 often which means that ZL(L1) fires often, which excites Y(L1F2) to fire as a recall using the ZL-to-Y link in Equation 16, and vice versa. This process is denoted as:  
Likewise, the model thinks about F1 often which means that ZF(F1) fires often which excites Y(L2F1) to fire as a recall using the ZF -to-Y link in Equation 15, and vice versa: Similarly, during the off-task processes around the L2F2 sessions, we have: from Equation 15 and from Equation 16. Consequently, the Hebbian mechanism during off-task processes strengthens all the two-way dashed links in Figure 3A. In particular, the neural changes denoted by Equations 17 and 20 above result in a transfer to the untrained condition L1F2. Similarly, the neural changes denoted by Equations 18 and 19 result in transfer to the untrained condition L2F1. Therefore, a double transfer effect takes place. 
Such firing of active neurons in the above four expressions not only reweights and strengthens the connections between the corresponding cofiring Y neurons and Z neurons but also recruit more Y and Z neurons for improving the representation of the L1F2 and L2F1 combinations. The newly recruited neuronal columns for condition L2F1, for example, are depicted by the two gray circles labeled L2F1 in Figure 3B
Example: Activation patterns during off-task processes
Here we present a concrete example of activation patterns and neural changes during one iteration of the off-task processes where F1 and L2 happen to be spontaneously activated. Figure 4 illustrates the example below while showing only 5 × 5 neuronal columns in the internal area (50 × 50 × 2 in the actual implementation) for the sake of explanation. 
Figure 4
 
An example of activation patterns and neuronal changes during the off-task processes in the network. Only 5 × 5 = 25 neuronal columns in the internal area are shown. Each small square represents a neuron, and each grid represents a layer in the model laminar cortex. Neurons at the same location on each grid belong to the same neuronal column. For each of the Layers 2, 3, and 4 the pattern on the grid on the left shows the preresponse of the neuron (activation level before lateral competition), and the grid on the right shows the final response of the neuron. See the section Example: Activation patterns during off-task processes for a step-by-step description of the neural activation patterns, concept F1, and the second neuron in the Where area (corresponding to the concept L2) that happen to be active.
Figure 4
 
An example of activation patterns and neuronal changes during the off-task processes in the network. Only 5 × 5 = 25 neuronal columns in the internal area are shown. Each small square represents a neuron, and each grid represents a layer in the model laminar cortex. Neurons at the same location on each grid belong to the same neuronal column. For each of the Layers 2, 3, and 4 the pattern on the grid on the left shows the preresponse of the neuron (activation level before lateral competition), and the grid on the right shows the final response of the neuron. See the section Example: Activation patterns during off-task processes for a step-by-step description of the neural activation patterns, concept F1, and the second neuron in the Where area (corresponding to the concept L2) that happen to be active.
(a) Each neuron in Layer 4 receives bottom-up input from its bottom-up receptive field of 10 × 10 pixels. During the off-task processes each input pixel value is a random number between zero and 100 (maximum pixel intensity is 255). Each Layer 4 neuron then computes its preresponse according to Equation 1. Then, neurons in Layer 4 with highest preresponse values win in lateral competition, and their activation level is scaled depending on their rank, according to Equation 2. All the other neurons in Layer 4 lose and their activation level is set to zero. 
(b) Each neuron in Layer 2 receives a joint top-down input from Where and What concept areas. In this example, top-down input from one neuron corresponding to an instance of concept F1, e.g., offset negative four horizontal bars is one and input from all the other What neurons is zero. The same applies to the top-down input from Where neurons; input from L2 is one and from L1 is zero. The neurons in Layer 2 then compute their preresponse and final activation level via lateral inhibition simulated by ranking and scaling, similar to the operations explained for Layer 4 above. 
(c) After computations in Layer 4 and Layer 2, each Layer 3 neuron receives two types of input signals; the activation level of the Layer 4 neuron in the same column and the activation of the Layer 2 neuron in the same column. Each Layer 3 neuron then takes the average of these two numbers according to Equation 3 as its preresponse value. This value is also considered as the preresponse for the column that the Layer 3 neuron belongs to. 
(d) Each neuronal column computes the rank of its preresponse value among a neighborhood of 5 × 5 × 2 columns. The losing columns are suppressed (activations set to zero) and the winning columns get to scale their preresponse depending on their rank (the same formula as in Equation 2). The winning columns then laterally excite their immediate 3 × 3 neighboring columns to fire as well. The active columns (winners and their neighbors) then get to update the bottom-up connection weights to their Layer 4 neuron according to Equation 5 and the top-down connection weights to their Layer 2 neuron according to Equation 8
Improved performance for the condition L2F1 (transfer to L2F1) is due to the newly modified connections of the winning columns and their neighbors (enclosed in dashed, red square in Figure 4 on the response level of Layer 3 neurons). In our terminology, these columns were recruited for the condition L2F1 (or recruited more, if they were already recruited), since they develop connections to both concept neurons L2 and F1. New columns of neurons are recruited because of lateral excitation. These newly recruited neurons provide necessary representational resources for the untrained transfer condition to demonstrate improvements as large as the trained conditions. More discussions on this matter will be presented in the section Neural recruitment facilitates fine learning and transfer. 
An experimental example of this type of transfer is Xiao et al. (2008) where F1 is Vernier of vertical orientation and F2 is Vernier of horizontal orientation as illustrated in Figure 5. It is important to note that the concepts learned by two concept areas of WWN do not have to be location and Vernier of a certain orientation. At least in principle, a concept area can be taught to represent virtually any concept, such as location, feature type (face, object), scale, color, lighting, and so on. 
Figure 5
 
Sample images of Vernier input to the model. (left) Sample vertical Vernier stimulus at upper left corner (loc1_ori1). (middle) Sample horizontal Vernier stimulus at lower left corner (loc2_ori2). (right) Background (no input) used as input during network's off-task mode.
Figure 5
 
Sample images of Vernier input to the model. (left) Sample vertical Vernier stimulus at upper left corner (loc1_ori1). (middle) Sample horizontal Vernier stimulus at lower left corner (loc2_ori2). (right) Background (no input) used as input during network's off-task mode.
Neural recruitment facilitates fine learning and transfer
The Hebbian learning process among top-winner neuronal columns enables the firing neurons to update their connection weights in a way which makes the neurons more selective for specific inputs (consisting of both bottom-up and top-down components). We say the neuron is recruited for that input. The more often a type of input (e.g., L2F1 ) is present, the more effective is its recruitment. When an increasing number of neurons are recruited to represent a particular input type, each recruited neuron is more sharply tuned (more sensitive) since more neurons partition the same area of input space. In particular, the lateral excitation in the same area during off-task processes enables multiple winner neuronal columns to fire (instead of a single winner column), which results in recruitment of new representations for the transfer condition and can lead to transfer effects as large as the direct learning effect. 
Figure 3B demonstrates neural recruitment for training the task corresponding to feature F1 at location L1 and its transfer to the second location L2. For the sake of visual simplicity, not all the connections are drawn in the figure. The neuronal columns shown in gray are newly recruited columns and the two marked L2F1 are the columns that are specifically recruited during the off-task processes. 
Simulation
In order to verify the model at the algorithmic level, we implemented WWN to simulate transfer across locations in a Vernier discrimination task, as done by Xiao et al. (2008). Similar to the behavioral study by Xiao et al. (2008), in our simulated experiments the input consisted of two horizontal or vertical Gabor patches presented at upper left or lower left corner of the visual field. 
The neuronal resources of the WWN are shown in Figure 2. It has 50 × 50 × 2 (simulated) neuronal columns in the internal sensory area, 20 neurons in the What area, and two neurons in the Where area. Each of the 20 neurons in the What area are taught for a certain orientation of the Vernier stimulus, vertical or horizontal, and a certain Vernier offset, ranging from −5 to +5—excluding zero—pixels (where negative offset indicates left/below, and positive offset indicates right/above). The two neurons in the Where area represent the two locations loc1 and loc2. There are no excitatory lateral connections between the What neurons corresponding to different tasks, e.g., −5° offset for the vertical Vernier task and −5° for the horizontal Vernier task. There are only implicit inhibitory lateral connections between them, i.e., if one is active, it inhibits the other one from being active. Each sensory neuronal column had a bottom-up receptive field of 10 × 10 pixels and was fully connected to all the concept neurons, which were in turn fully connected to the neuronal columns in the sensory cortex. The Vernier input to the network was two Gabor patches with wavelength λ = 10 pixels and the standard deviation of the Gaussian envelope σ = 4 pixels. The offset of the two Gabors (the amount of misalignments) could be any integer value in the range [–5, +5], where the sign of offset (positive or negative) specifies left versus right (or equivalently, above vs. below), the magnitude of offset is the amount of misalignment in pixels, and zero offset denotes perfect alignment. The center of the Vernier stimuli were placed on either loc1 at (r, c) = (20, 20) or loc2 at (r, c) = (75, 20) in the 100 × 100 zero intensity image. Then a noise of random intensity in range [0, 100) was added to the image where the maximum intensity was clamped at 255.2 Figure 5 shows two samples of input to the network. 
In the simulation results reported below, we used the method of constant stimuli to train and test our model. This is slightly different from the psychophysical experiments conducted by Xiao et al. (2008), which used adaptive staircases. This should not affect our results, as our model is based on the Lobe Component Analysis theory (Weng & Luciw, 2009) in which the effects of training are a function of statistical properties of the stimuli (see Equations 5 and 8). We do not claim that each trial in our simulation is exactly equivalent to one trial in real human experiments—one neural updating in the model could be many times stronger or weaker than what happens in a human brain. Since the staircase procedure tries to adapt the stimulus difficulty level to the subject's performance, it simply is a more efficient way for training which reduces the number of trials, while the number of useful trials (those that contribute to learning) is statistically consistent with the method of constant stimuli with noise. In separate simulations, we have verified that the staircase procedure and method of constant stimuli produced similar results on our network. 
The current implementation of the network assumes that teaching signals are available with great precision (a teacher knows the exact offset of the Vernier bars from −5 to +5). In most PL experiments (including the ones in Xiao et al., 2008), however, participants are given feedback only on the correctness of their response and not the precise features of the stimuli. We used this standard training regime out of convenience. In fact, we have shown in other works (e.g., Solgi & Weng, 2010, in a supervised learning setting and Paslaski, VanDam, & Weng, 2011 in a reinforcement learning setting) that a probabilistic teaching signal (where, for example, a stimuli is taught to be offset −3 with 0.25 probability, offset −4 with 0.5 probability, and −5 with 0.25 probability) also works for our model (indeed sometimes more effectively). Such a training regime is consistent with the intuition that in actual psychophysical experiments, a given offset (e.g., −4) may be mistaken by the participants with the ones that are similar (e.g., −3 and −5), but probably not with the ones that are very different (e.g., −1). Our model is indeed quite tolerant to these small mistakes. Had we used this more realistic training regime, we would also obtain the same basic results, as the particulars of the training regime would not affect the outcome of the training phase and off-task processes. 
Below is a step-by-step description of the simulation, in chronological order: 
Early development
Due to the developmental design of WWNs, the internal feature representations are not predesigned but rather need to emerge while exploiting the sensory input. Therefore, a 50 × 50 × 2 array of naive (with random connection weights) simulated cortical columns were presented with natural images in an unsupervised mode.3 To develop stable representations 104 natural image patches (randomly selected from larger images) of size 100 × 100 were used. Figure 6 shows that the neuronal columns in this stage develop features that resemble oriented edges and blobs. 
Figure 6
 
(left) Sample natural scene images used in an early development step of the simulation. (right) Bottom-up weight vectors (receptive field profile) of 15 × 15 sensory neurons developed after exposure to natural images.
Figure 6
 
(left) Sample natural scene images used in an early development step of the simulation. (right) Bottom-up weight vectors (receptive field profile) of 15 × 15 sensory neurons developed after exposure to natural images.
Coarse training
Adult human subjects understand the concept of left versus right and above versus below. Therefore, even naive subjects can do the Vernier discrimination task when the offset is large. In order to enable the WWN develop this capability, we trained it for only easy (or coarse) discrimination at offsets −5, −4, +4, +5, ten times for each offset at both locations and both orientations. As a result, the model developed the capability to successfully discriminate the Vernier input at both locations and orientations if their offsets were large enough. The dashed curve in Figure 7 shows the performance of the WWN after this step. 
Figure 7
 
Psychometric function for the network's performance before and after perceptual learning.
Figure 7
 
Psychometric function for the network's performance before and after perceptual learning.
Perceptual learning at loc1_ori1 and loc2_ori2
Each of the offsets in range [–5, +5], excluding zero, were trained 100 times at loc1_ori1 for five consecutive sessions. Then the same training procedure was simulated for Vernier stimuli at loc2_ori2. The training regime followed the double-training procedure employed in Xiao et al. (2008). 
Off-task processes and transfer
In the last step of the simulation, new and improved connections between feature neurons at loc1 and concept neurons for ori2 were formed during off-task processes due to spontaneous firing of these neurons and the Hebbian rule. Therefore, the skill originally learned for loc2_ori2 was transferred to loc1_ori2. In a similar scenario, the skill learned for loc1_ori1 was transferred to loc2_ori2. Here we explain the details of the off-task processes in our simulation. 
To simulate the off-task processes in each iteration of our simulations, one of the 20 neurons in the What area, say neuron z1, and one of the two neurons in the Where area, say neuron z2, were selected to fire (their output value was set to one) and the remaining concept neurons were imposed not to fire (their output value was set to zero). The selection of firing neurons during the off-task processes was based on the exposure-dependent probabilities in Equations 9 and 10 explained in the section The off-task processes triggered by exposure. Input to the network was a 100 × 100 noise background (see Figure 5, right). The random noisy input guaranteed nondeterministic network behavior while providing unbiased excitation to sensory neurons, i.e., the background input did not influence the discrimination behavior of the network since all the sensory neuronal columns received equal bottom-up excitation on average. 
Top-down connections from the concept areas to the sensory area, however, selectively excited a subset of sensory neurons. The sensory neuronal columns excited by both active What and active Where neurons were more likely to win in lateral competition (see the section The learning algorithm). Let us denote the set of winner neuronal columns in the sensory area by Y1. Due to the Hebbian nature of our learning rule (Equations 5 and 8), repetition of this process caused the weight of connections between Y1 and z1 and connections between Y1 and z2 to increase and eventually converge to similar weight vectors for both concept neurons. 
Similar connections from Y1 in the sensory area and z1 and z2 in concept areas helped to increase the likelihood for z1 and z2 to fire together since they receive similar bottom-up input. In other words, although z1 and z2 are not directly connected and therefore cannot excite each other, they become indirectly connected via Y1 neuronal columns in the sensory area, and after completion of these off-task processes they more frequently fire together. If one of the concept neurons has been trained on a particular stimulus prior to off-task processes, say z1 was trained on Feature 1, then its more frequent simultaneous firing with z2 after the off-task processes stage is behaviorally interpreted as a transfer of training effects to z2, say Location 2. The processes explained above were repeated until complete transfer was achieved. 
Moreover, short-range lateral excitation in sensory area caused the sensory neuronal columns close to Y1 in the neuronal plane to also fire and get connected to the concept neurons z1 and z2 during the off-task processes. This results in extended representation (allocation of more neuronal resources) for the concepts encoded by z1 and z2. The newly allocated representations are slight variations of the old representations which result in more inclusive and discriminative coverage of the stimulus space by sensory neurons and hence improved performance when a similar stimulus is presented. Our simulations showed that allocation of more resources was necessary in order for complete transfer to happen. It is worth mentioning that the learning algorithm of the network was not intervened in by the programmer in order for neural recruitment to happen. It was rather a mere consequence of the same Hebbian (reweighting) rule (see Equations 5 and 8) during training and off-task processes. 
Results
Basic perceptual learning effect
In general, the model exhibited a graded response to different Vernier offsets with near-perfect performance at large Vernier offsets and near-chance performance as the offset approached zero (Figure 7), similar to human observers. We fitted the performance data with a Weibull function using psignifit (Wichmann & Hill, 2001) and plotted the resulting psychometric functions in Figure 7. After the initial coarse training, the slope of the psychometric function was relatively shallow (dashed curve); after perceptual learning (fine discrimination), the psychometric function became steeper (solid curve), indicating improved discriminability. We defined threshold as the difference in offset between 0.25 and 0.75 response probability. 
Specificity and transfer of perceptual learning
The pretesting threshold (right after the coarse training step) for all the four combinations of location and orientation were similar, at slightly less than 2.2 pixels (first four points in Figure 8A). In the first training phase (loc1_ori1), the threshold decreased consistently across sessions with smaller decreases in later sessions. At the end of this training phase, thresholds in the other three conditions were measured and were found to be close to their pretesting counterparts (first three points in Figure 8B). This result shows that the specificity of learning in our model, i.e., training in loc1_ori1, does not transfer to untrained location or orientation, as we expected. 
Figure 8
 
Performance of the WWN model—perceptual learning and transfer effects. (A) All the four combinations of orientation and location were first pretested to measure their threshold and then in Phase 1, loc1_ori1 condition. The blue curve shows the decreases in threshold for the trained condition. (B) Testing for the three untrained conditions shows no change in their corresponding thresholds at the end of loc1_ori1 (no transfer). Threshold decreases for loc2_ori2 as a result of training (green curve). At the end of the ninth training session, threshold for the two untrained conditions loc1_ori2 and loc2_ori1 drops to the same level as the trained conditions. (C, D) Percentage of improvement in discrimination after training and transfer. It plots the same data as in (A) and (B). Hollow and filled bars show relative improvement as a result of training and transfer, respectively. See figures 3C and 3D in Xiao et al. (2008) for comparison.
Figure 8
 
Performance of the WWN model—perceptual learning and transfer effects. (A) All the four combinations of orientation and location were first pretested to measure their threshold and then in Phase 1, loc1_ori1 condition. The blue curve shows the decreases in threshold for the trained condition. (B) Testing for the three untrained conditions shows no change in their corresponding thresholds at the end of loc1_ori1 (no transfer). Threshold decreases for loc2_ori2 as a result of training (green curve). At the end of the ninth training session, threshold for the two untrained conditions loc1_ori2 and loc2_ori1 drops to the same level as the trained conditions. (C, D) Percentage of improvement in discrimination after training and transfer. It plots the same data as in (A) and (B). Hollow and filled bars show relative improvement as a result of training and transfer, respectively. See figures 3C and 3D in Xiao et al. (2008) for comparison.
The training of loc2_ori2 then followed and resulted in a similar decrease in the threshold over five sessions. After this second training session, the off-task processes were run to simulate what typically happens with a human subject. Finally, the threshold for all conditions were measured again. Importantly, the threshold for untrained conditions loc1_ori2 and loc2_ori1 were at the same level as the trained conditions (the last four points in Figure 8B), demonstrating effective transfers as we expected. We calculated the percentage of improvement in performance, defined as 100 × ((TbTa)/Tb) in percentage, where Tb is the threshold before the phase is started and Ta is the threshold after the phase is completed (Figures 8C and 8D). Specificity was evident after the first training phase (Figure 8C) whereas nearly complete transfer occurred after the second training phase (Figure 8D). Thus, the model showed transfer of the perceptual learning effect across retinal locations in a Vernier task, capturing the basic pattern of results in the original behavioral study of Xiao et al. (2008). 
Reweighting versus change in sensory representation
As mentioned in the Introduction, a major debate among PL theories is the neuronal locus of changes that result in performance improvement, i.e., change in sensory representation versus reweighting of connections from sensory to higher level areas. Since all the neurons in our model have plastic weights which change according to the same LCA updating rule, performance improvement is expected to be attributed to change in both the sensory area and higher concept areas. To quantify this change, we define the following metric: where di,j is the amount of change in the connection weight from neuron ni to neuron nj (or a neuron to a sensory pixel) and wi,jpreL and wi,jpostT are the corresponding weight values after the pretraining (section Coarse training) and post-transfer (section Off-task processes and transfer) stages, respectively. In the sensory area, only neurons which have overlapping receptive fields with the Vernier stimulus were counted in this measurement. When we normalized all the weights to [0, 1] range, the average amount of change for sensory neurons was dsensory = 0.0098 while the average value for ascending and descending connections between the sensory area and the concept areas was dreweighting = 0.247. This substantial difference in the amount of change in sensory and higher areas shows that reweighting of sensory readouts is mainly responsible for performance improvement in our model. 
Recruitment and sharpening of tuning curves
How much neural recruitment is involved in learning is currently an unsettled issue in the learning literature. Although there seems to be no known report of neuronal recruitment following perceptual learning, the possibility cannot be ruled out either. What we refer to as neuronal recruitment in this article is the alternation of connections between neurons following perceptual learning. Because neuronal recruitment due to the Hebbian mechanism results in more neurons competing in representing the same range of features, the tuning curve of each neuron involved is consequently sharpened. Sharpening of the tuning curve of an individual neuron is currently technically measurable using sparse electrodes in cortex, but neural recruitment as a fine-detailed phenomenon of Hebbian-based self-organization needs future technical advances to measure experimentally. Ghose (2004) and Ghose, Yang, and Maunsell (2002) presented evidence that perceptual learning might involve suppression of neuronal signals that interfere with performance. This is consistent with sharpening the tuning curves. Raiguel, Vogels, Mysore, and Orban (2006) also showed that, in a single-cell study on orientation discrimination task, the tuning curves for neurons in both V1 and V4 are sharpened after perceptual learning, although the amount of change is greater for V4 neurons. Again, these results imply the recruitment of more neurons to represent the same range of features (orientation here). 
To examine how neuronal tuning is affected by learning in our model, we examined three cortical columns of neurons that were tuned for offset +2 of the Vernier bars. The reader should note that there were only several dozen neurons in the internal feature detection area of our model covering the visual field for each of the simulated conditions. Therefore, each neuron in the WWN model is a simplified model of many neurons in the brain, and the three cortical columns which we study here model a large population of biological neurons. The average of the tuning curves of the three cortical columns before and after perceptual learning is plotted in Figure 9 which shows a sharpened tuning curve for Vernier offset after perceptual learning. This result confirms our explanation that neuronal recruitment in our model is responsible for more selective responses for individual neurons and therefore the improved performance in transfers. 
Figure 9
 
Sharpening of tuning curve after perceptual learning. The average response level of three cortical columns tuned on Vernier offset +2 is plotted.
Figure 9
 
Sharpening of tuning curve after perceptual learning. The average response level of three cortical columns tuned on Vernier offset +2 is plotted.
Control simulations
It is well established that a network with three properties of Hebbian learning, local topology, and lateral competition can self-organize (Kohonen, 2001). The WWN implements Hebbian learning (e.g., note multiplication of post-synaptic, zi(L4), and presynaptic, bi(L4), values in Equation 5), local topology via short-range lateral excitations, and lateral competition via top-k winner rule (See the section Hebbian learning in the winning cortical columns). Self-organization has been shown to occur even in the absence of any structured input, e.g., random noise (Linsker, 1986). Therefore, one may think that random background stimuli during off-task processes are enough to produce the improvement results reported in the Results section, and PL training sessions are not necessary. In contrary, we argue that the systematic and selective firing of the concept neurons is necessary for improvement in the network's discrimination performance. This selective firing is guaranteed by supervision during training sessions (only concept neurons corresponding to the training condition are allowed to fire) and by the exposure effect of the recent training sessions, as explained in the section Off-task processes and transfer. In a sense, the exposure effect provides a kind of a soft supervision for the network during off-task processes. Without the discriminative top-down signals from concept neurons during off-task processes, the network's performance will not be improved. 
To verify the argument above, we ran the following two control simulations. 
Control Simulation 1
This control simulation investigates the effect and necessity of prior PL training in triggering transfer during off-task processes. It does so by skipping the PL training stage in our simulations and comparing the results with the original simulations. The following steps are run in the control simulation, while the discrimination thresholds are measured the same way as in the original simulations reported in Figure 8
  • (a)   
    Early development stage as in the section Early development.
  • (b)   
    Coarse training stage as in the section Coarse training.
  • (c)   
    Off-task processes stage as in the section Off-task processes and transfer.
Control Simulation 2
This control simulation examines whether or not PL training in only one location, e.g., loc1, is sufficient to produce the transfer effects in Figure 8. The simulation is the same as our original simulation except PL training at loc2_ori2 is skipped. The following steps are performed: 
  • (a)   
    Early development stage as in the section Early development.
  • (b)   
    Coarse training stage as in the section Coarse training.
  • (c)   
    Perceptual learning as in the section Perceptual learning at loc1_ori1 and loc2_ori2 but only at loc1_ori1.
  • (d)   
    Off-task processes stage as in the section Off-task processes and transfer.
Our reasoning above predicts that the network should not be able to lower the thresholds after Control Simulations 1 and 2, except for the trained condition of loc1_ori1 in Control Simulation 2. Figures 10 and 11 verify that is indeed the case. In Figure 10 the Vernier thresholds do not change significantly after the off-task processes stage. Therefore, we conclude that the PL training stage was necessary to get lowered thresholds (transfer) in our original simulations (Figure 8). Also, the results of the Control Simulation 2 in Figure 11 show that the only condition in which threshold decreases is the trained condition loc1_ori1. Again, no transfer effect is observed without appropriate double-training sessions. 
Figure 10
 
Change in Vernier thresholds before and after the off-task processes stage in Control Experiment 1.
Figure 10
 
Change in Vernier thresholds before and after the off-task processes stage in Control Experiment 1.
Figure 11
 
Change in Vernier thresholds before and after the off-task processes stage in Control Experiment 2.
Figure 11
 
Change in Vernier thresholds before and after the off-task processes stage in Control Experiment 2.
Discussion
In this study, we showed that the WWN learning model of the cerebral cortex is able to demonstrate both specificity and transfer of perceptual learning effects using representations of sensory and motor signals, which are emergent from the simple rules of Hebbian learning and lateral inhibition and excitation. Similar to previous models of perceptual learning, our work showed performance improvement following extensive practice with the stimulus, where the training effects were specific to the trained feature and location. Our focus on this study, however, was explaining how training effects can transfer (generalize) to untrained but related situations. Although the WWN model was trained and tested only for the Vernier discrimination task, the underlying mechanism for transfer should be applicable to other types of stimuli and tasks, at least in principle. 
How versus why transfer occurs
Similar to any other model, our model is not meant to account for all the intricate details of many published results on transfer. Particularly, the question of when transfer occurs in PL experiments does not yet have a consistent answer among the PL research community and is out of the scope of this article. Numerous factors have been shown to correlate with transfer in different experimental settings. For example, Ahissar and Hochstein (1997) proposed that transfer depends on task difficulty. Aberg, Tartaglia, and Herzog (2009) found that transfer depends on the number of trials per session. When 800 trials were presented in two sessions each there was no transfer but with four sessions with 400 trials each transfer occurred (see also Jeter, Dosher, Liu, & Lu, 2010). Transfer also occurred in imagery perceptual learning (Tartaglia, Bamert, Mast, & Herzog, 2009). Finally, strong transfer to all kinds of visual tasks was found when observers were trained with action video gaming (e.g., Li, Polat, Makous, & Bavelier, 2009). In this work we outline a general theory of the elements that may lead to transfer in perceptual learning as well as a detailed neuromorphic computational model of how a network of neurons can transfer learning effects. In other words, the algorithm for the computation of the network (See the section The learning algorithm) does not to explain whether or not transfer should happen. Rather, it demonstrates the neural mechanisms that can result in transfer, if there is a cause, e.g., enough exposure, to trigger those neural mechanisms. 
Transfer via gated self-organization
The model proposes a rather unorthodox view to transfer in PL whose justification awaits further research. Rather than learning representations that will later be reused by transfer conditions, training sessions provide priming effects which trigger (or open the gate for) the self-organizing mechanisms to better utilize information already present in the connections of the network. A theoretically important aspect of the model is that the accumulated effect of exposure, modeled by γi in Equation 11, stores no relational information about which characteristics of the input image (feature or location) relate to which of the concepts in the network. γi only encodes the frequency and the temporal recency of which concepts (including feature and location) have been active in the network, regardless of their relation to the bottom-up input. This is in contrast with the common intuition of transfer that learned relational information for the trained condition can later make a difference in performance for the untrained transfer condition. This aspect of our model makes it unnecessary to require any representational overlap between training and transfer conditions and still demonstrate improved performance for the transfer condition assuming there are prior coarse connections between conjunctive representations in the internal area, e.g., Y(L1F2), and concept area, e.g., Z(L1) and Z(F2). 
As a direct result of viewing transfer as an exposure-driven self-organization mechanism, instead of literal transfer of relational information learned in training sessions, the following points are important to note regarding the implications of the model. 
Coarse training is necessary
A prior coarse training step (the section Coarse training) is crucial for the off-task processes step to result in improved performance. The learning mechanisms (detailed in the section The learning algorithm) extract information from the bottom-up and top-down training input and store this information in the connection patterns of the neurons in the network. This initial stored information, although incomplete, is critical for any later improvements. If the coarse training step is skipped and neurons are (artificially) allowed to fire in the off-task phase, the result would be a soup of connections where every concept neuron is connected to every internal neuron, hence, no discrimination power. The connections made during coarse training (the solid lines in Figure 3A and 3B) are crucial for guiding any later developments in the network, either during off-task processes or regular perceptual training. 
Coarse training gathers enough information already
The results reported in Results show that the relational information stored in the network from the coarse training step (and correspondingly, the prior information in the adult human brain about how to do a PL task) is already sufficient to achieve a better performance in transfer conditions. The extensive exposure during training and testing sessions only opens the gate for self-organizing neural mechanisms in off-task processes which result in improved performance. 
Extensive exposure is necessary
The role of extensive training sessions is critical in accumulating enough exposure effect to allow the concept neurons to fire during off-task processes (the section The off-task processes triggered by exposure). The brief pretest sessions (e.g., 50 trials) would not be sufficient for the model to produce transfer effects. On the other hand, if pretest is very extensive (about the same number of training trials), our model will produce transfer as the accumulated exposure effects will be enough to trigger the gated self-organization mechanisms during off-task processes. 
Exposure effect only opens the gates
The temporal recency effect of exposure, modeled by Equations 9, 10, and 11, implements our intuition that a plausible trigger for selective firing of the concept neurons during the off-task processes could be recent exposure to those concepts during extensive training and testing sessions. Therefore, artificially allowing the concept neurons to fire during off-task processes, without the exposure-triggered gating justification, will result in improved performance in transfer conditions similar to those reported in Figure 8, even without any perceptual training. 
Transfer across feature and transfer across location
One cannot label the observed effects transfer across feature or location. For example, for exhibiting improved performance at condition L1F2 (transfer to L1F2) without explicit perceptual training for that condition, our model needs (a) extensive exposure/training at L1 and (b) extensive exposure/training for F2. In the current simulations (a) is satisfied by perceptual learning at L1F1 as well as all the testing sessions at L1 and (b) is satisfied by perceptual learning at L2F2 as well as all the testing for F2. Therefore, the observed lowered threshold at L1F2 could be attributed to both transfer across location (from L2 to L1, since threshold at L2F2 is lowered due to PL) and transfer across feature (from F1 to F2, since threshold at L1F1 is lowered due to PL). 
Top-down and off-task processes and neuronal recruitment in PL
A number of studies have shown the importance of higher brain areas and top-down signals in perceptual learning context. For example, Law and Gold (2008) trained monkeys to determine the direction of moving visual stimuli while recording from MT (e.g., representing motion information) and LIP (e.g., representing transformation of motion into action). Their results showed that improved sensitivity to motion signals was correlated with neural responses in LIP, more so than MT. Hence, at least in learning to discriminate motion direction, performance seems to rely on information from higher areas such as LIP, in addition to information in early sensory areas such as MT. Similarly, experiments by Li, Piëch, and Gilbert (2004) showed that task experience can have a long lasting modulation effect on behavior as well as the response of V1 neurons in a perceptual learning task. 
Despite the strong evidence for the important role of top-down and off-task signals in perceptual learning, the prior models are either strictly feed-forward networks (e.g., Poggio et al., 1992) or a two-way cascade of internal areas with unsupervised learning without top-down supervision (Hinton, Osindero, & Teh, 2006). Our model, presented in this article, explains transfer of learning effects utilizing top-down and off-task signals as well as the mechanisms of neural recruitment. The model we present here predicts that off-task processes are also essential for generalization in learning, i.e., transfer to novel situations. 
Exposure to untrained situations makes the feature representation neurons corresponding to those situations fire during off-task time. Cofiring of those representation neurons with the trained concept neurons result in improvement of connections between them through the Hebbian learning rule. These improved connections as well as recruitment of new neurons to represent the transfer feature result in transfer of training effects to the untrained situations. 
Recruitment of new representations is another important component of our model that helps the transfer of learning. Many previous studies show that neurogenesis is related to the acquisition of new knowledge (Kempermann, 2002; Kirn, O'Loughlin, Kasparian, & Nottebohm, 1994; Nottebohm, 2002). Similar to the case of off-task processes, our model predicts that neuronal recruitment is an essential element of transfer/generalization of learning in addition to being essential for learning itself to take place. 
Previous models
A prominent model in perceptual learning literature is the Hebbian channel reweighting model by Dosher and Lu (1998). Petrov, Dosher, and Lu (2005) expanded this model and conducted a focused study on the locus of perceptual learning (representation [lower] versus decision [higher] areas). Using a simple model consisting of a layer of fixed feature detectors in the representation area and a linear classifier, perceptron (Rosenblatt, 1958), they suggested that perceptual learning may involve reweighting of connections from lower representation areas to higher decision areas in their feed-forward model with optionally only inhibitory feedbacks. Their relatively simple model used fixed representation of sensory data making their model unable to predict plasticity in stimulus representation in lower visual areas reported by, e.g., Lee, Yang, Romero, and Mumford (2002); Schoups et al. (2001). Moreover, their lack of top-down connections from higher areas to representation areas is inconsistent with overwhelming neuroanatomic evidence, as reviewed by, e.g., Felleman and Van Essen (1991). 
Similar to Petrov et al. (2005), several previous models which have been tested on different perceptual learning tasks (e.g., Vaina, Sundareswaran, & Harris, 1995, on motion perception, Zhaoping, Herzog, & Dayan, 2003b, on bisection, Weiss, Edelman, & Fahle, 1993, on Vernier hyperacuity) rely on reweighting of connections from sensory representation to concept areas to explain learning effects. In another influential model, reverse hierarchy theory, Ahissar and Hochstein (2004) suggested that lower representations in visual hierarchy are not to be altered unless necessary. Without feedbacks from concept to sensory areas, these feedforward models cannot explain transfers. 
Unlike previous feed-forward networks, our model suggested that within a fully developed network (simulating an adult human brain), the lower representations still change, not only because of the exposure to the stimuli but also due to the overt and covert actions of the subject, projected via top-down connections. 
Conclusion
In summary, the WWN model presented in this article bears analogy to previous models of PL in a number of aspects, including incremental reweighting of connections from sensory areas to concept areas via a biologically-plausible Hebbian rule and having the selective reweighting of connections to account for performance improvement after training. However, we present a more extensive brain-anatomy inspired model that goes beyond the previous models in several aspects, including: (a) a novel approach to viewing transfer as a result of gated self-organization rather than literal transfer of relational information, (b) fully developed feature representations emerged from presentation of natural image stimuli to the network as well as top-down signals, as opposed to hand-designed filters (e.g., Gabor filters), (c) adaptive and constantly reweighted connections for neurons which have both top-down and bottom-up components, in contrast with exclusively feed-forward network design, (d) modeling both the Where (dorsal) and the What (ventral) visual pathways in an integrated functional system. The model attributes the development of such pathways to top-down connections from the corresponding concept areas in the frontal cortex, going beyond the classical sensory account of the two streams (Mishkin, Unterleider, & Macko, 1983), (e) the computational model for the six-layer laminar architecture in the WWN network, (f) the proposal of the off-task processes and showing their critical role in transfer, and (g) the analysis of the dynamic recruitment of more neurons during learning and transfer and demonstration through the sharpening of the neuronal tuning curves to account for the improved performance. The last two aspects of the model (off-task processes and neuronal recruitment) were the key new mechanisms in our model that caused transfer of learning effects to untrained conditions. 
Table 1
 
Parameters of the model.
Table 1
 
Parameters of the model.
Parameter Value Comment
Competition window size, ω 5 × 5 Used to compute ri in Equation 2
Number of winning neurons, k 2 Equation 2
Lateral excitation range 3 × 3 Section The learning algorithm
t 1 10 Equation 7
t 2 1000 Equation 7
c 2 Equation 7
r 104 Equation 7
Bottom-up receptive-field size 10 × 10 pixels e.g., Size of bi(L4) and wb,i(L4) as in Equation 1
Neuronal resources in the internal area 50 × 50 × 2 cortical columns Each column has a six-layer laminar as in Figure 3C
Neuronal resources in the Where area 2 neurons Representing loc1 and loc2
Neuronal resources in the What area 20 neurons Representing Vernier offsets in range [–5, +5], excluding zero, for two orientations
A normalization factor for exposure effect, α 0.01 Equation 11
A normalization factor for exposure effect, c 0.005 Equation 11
Table 2
 
Parameters of the stimuli.
Table 2
 
Parameters of the stimuli.
Parameter Value
Wave-length Gabor patches in the Vernier stimuli, λ 10 pixels
Standard of the Gaussian envelope in the Vernier stimuli σ 4 pixels
Range of offsets used for the Vernier stimuli [–5, +5] pixels
Input image size 100 × 100 pixels
Center of Location 1, loc_1 (r, c) = (20, 20)
Center of Location 2, loc_2 (r, c) = (75, 20)
Pixel intensity range [0, 255]
Maximum pixel intensity in the random noise background 100
Table 3
 
Parameters of the simulations.
Table 3
 
Parameters of the simulations.
Parameter Value
Number of training samples (natural images) for the early development stage 104
Number of training trials for the coarse training stage 400 = 10 trials for each of 10 offsets at two locations and two orientations
Number of training trials for the perceptual learning (fine discrimination) stage 5 Sessions × 100 Trial/Session × 10 Offsets × 2 Trained Conditions = 104
Number of simulation cycles in off-task processes stage 1000
Acknowledgments
This project was supported in part by funding from Michigan State University's Cognitive Science Program. We thank an anonymous reviewer for detailed and insightful comments which greatly improved the clarity of our manuscript. 
Commercial relationships: none. 
Corresponding authors: Mojtaba Solgi; Juyang Weng. 
Email: solgi@cse.msu.edu; weng@cse.msu.edu. 
Address: Department of Computer Science and Engineering, Michigan State University, East Lansing, MI, USA. 
References
Aberg K. C. Tartaglia E. M. Herzog M. H. (2009). Perceptual learning with chevrons requires a minimal number of trials, transfers to untrained directions, but does not require sleep. Vision Research,49, 2087–2094. [CrossRef] [PubMed]
Adini Y. Sagi D. Tsodyks M. (2002). Context-enabled learning in the human visual system. Nature,415, 790–793. [CrossRef] [PubMed]
Ahissar M. Hochstein S. (1997). Task difficulty and the specificity of perceptual learning. Nature,387, 401–406. [CrossRef] [PubMed]
Ahissar M. Hochstein S. (2004). The reverse hierarchy theory of visual perceptual learning. Trends in Cognitive Sciences,8, 457–464. [CrossRef] [PubMed]
Ball K. Sekuler R. (1987). Direction-specific improvement in motion discrimination. Vision Research,27, 953–965. [CrossRef] [PubMed]
Callaway E. M. (2004). Feedforward, feedback and inhibitory connections in primate visual cortex. Neural Networks,17, 625–632. [CrossRef] [PubMed]
Dosher B. A. Lu Z. L. (1998). Perceptual learning reflects external noise filtering and internal noise reduction through channel reweighting. Proceedings of the National Academy of Sciences of the United States of America,95, 13988–13993. [CrossRef] [PubMed]
Drever J. (1960). Perceptual learning. Annual Review of Psychology,11, 131–160. [CrossRef]
Fahle M. Edelman S. (1993). Long-term learning in vernier acuity: Effects of stimulus orientation, range and of feedback. Vision Research,33, 397–412. [CrossRef] [PubMed]
Felleman D. J. Van Essen D. C. (1991). Distributed hierarchical processing in the primate cerebral cortex. Cerebral Cortex,1, 1–47. [CrossRef] [PubMed]
Fine I. Jacobs R. A. (2002). Comparing perceptual learning across tasks: A review. Journal of Vision, 2(2):5, 190–203, http://www.journalofvision.org/content/2/2/5, doi:10.1167/2.2.5. [PubMed] [Article] [CrossRef]
Fiorentini A. Berardi N. (1980). Perceptual learning specific for orientation and spatial frequency. Nature,287, 43–44. [CrossRef] [PubMed]
Fukai T. Tanaka S. (1997). A simple neural network exhibiting selective activation of neuronal ensembles: From winner-take-all to winners-share-all. Neural Computation,9, 77–97. [CrossRef] [PubMed]
Ghose G. M. (2004). Learning in mammalian sensory cortex. Current Opinion in Neurobiology,14, 513–518. [CrossRef] [PubMed]
Ghose G. M. Yang T. Maunsell J. H. R. (2002). Physiological correlates of perceptual learning in monkey V1 and V2. Journal of Neurophysiology,4, 1867–1888.
Gibson E. J. (1963). Perceptual learning. Annual Review of Psychology,14, 29–56. [CrossRef] [PubMed]
Gibson E. J. (1969). Principles of perceptual learning and development. East Norwalk, CT: Lawrence Erlbaum.
Hinton G. E. Osindero S. Teh Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation,18, 1527–1554. [CrossRef] [PubMed]
Jeter P. E. Dosher B. A. Liu S. H. (2007). Transfer (vs. specificity) following different amounts of perceptual learning in tasks differing in stimulus orientation and position. Journal of Vision, 7(9):84, http://www.journalofvision.org/content/7/9/84, doi:10.1167/7.9.84. [Abstract] [CrossRef]
Jeter P. E. Dosher B. A. Liu S. H. Lu Z. L. (2010). Specificity of perceptual learning increases with increased training. Vision Research,50, 1928–1940. [CrossRef] [PubMed]
Ji Z. Weng J. Prokhorov D. (2008). Where-what network 1: “Where” and “What” assist each other through top-down connections. Proceedings of the IEEE International Conference on Development and Learning (pp. 61–66), Monterey, CA.
Kandel E. R. Schwartz J. H. Jessell T. M. (Eds.). (2000). Principles of neural science (4th ed.). New York: McGraw-Hill.
Karni A. Sagi D. (1991). Where practice makes perfect in texture discrimination: Evidence for primary visual cortex plasticity. Proceedings of the National Academy of Sciences of the United States of America,88, 4966–4970. [CrossRef] [PubMed]
Kempermann G. (2002). Why new neurons? Possible functions for adult hippocampal neurogenesis. The Journal of Neuroscience,22, 635–638. [PubMed]
Kirn J. O'Loughlin B. Kasparian S. Nottebohm F. (1994). Cell death and neuronal recruitment in the high vocal center of adult male canaries are temporally related to changes in song. Proceedings of the National Academy of Sciences of the United States of America,91, 7844–7848. [CrossRef] [PubMed]
Kohonen T. (2001). Self-organizing maps (3rd ed.). Berlin: Springer-Verlag.
Law C. T. Gold J. I. (2008). Neural correlates of perceptual learning in a sensorymotor, but not a sensory, cortical area. Nature Neuroscience,11, 505–513. [CrossRef] [PubMed]
Lee T. S. Yang C. F. Romero R. D. Mumford D. (2002). Neural activity in early visual cortex reflects behavioral experience and higher-order perceptual saliency. Nature Neuroscience,5, 589–597. [CrossRef] [PubMed]
Li R. Polat U. Makous W. Bavelier D. (2009). Enhancing the contrast sensitivity function through action video game training. Nature Neuroscience,12, 549–551. [CrossRef] [PubMed]
Li W. Piëch V. Gilbert C. D. (2004). Perceptual learning and top-down influences in primary visual cortex. Nature Neuroscience,7, 651–657. [CrossRef] [PubMed]
Linsker R. (1986). From basic network principles to neural architecture: Emergence of spatial-oponent cells. Proceedings of the National Academy of Sciences of the United States of America,83, 7508–7512, 8390–8394, 8779–8783. [CrossRef] [PubMed]
Liu J. Lu Z. L. Dosher B. A. (2010). Augmented hebbian reweighting: Interactions between feedback and training accuracy in perceptual learning. Journal of Vision, 10(10):29, 1–14, http://www.journalofvision.org/content/10/10/29, doi:10.1167/10.10.29. [PubMed] [Article] [CrossRef] [PubMed]
Lu Z. L. Liu J. Dosher B. A. (2009). Modeling mechanisms of perceptual learning with augmented Hebbian reweighting. Vision Research,50, 375–390. [CrossRef] [PubMed]
Luciw M. Weng J. (2010). Where What Network 3: Developmental Top-Down Attention with Multiple Meaningful Foregrounds. International Joint Conference on Neural Networks (pp. 4233–4240). Barcelona, Spain.
Mishkin M. Unterleider L. G. Macko K. A. (1983). Object vision and space vision: Two cortical pathways. Trends in Neuroscicence,6, 414–417. [CrossRef]
Miyan K. Weng J. (2010). Where-what network 3: Developmental top-down attention for multiple foregrounds and complex backgrounds. IEEE 9th International Conference on Development and Learning (pp. 280–285). Ann Arbor, MI.
Nottebohm F. (2002). Neuronal replacement in adult brain. Brain Research Bulletin,57, 737–749. [CrossRef] [PubMed]
O'Reilly R. C. (1998). Six principles for biologically based computational models of cortical cognition. Trends in Cognitive Sciences,2, 455–462. [CrossRef] [PubMed]
O'Toole A. J. Kersten D. J. (1992). Learning to see random-dot stereograms. Perception,21, 227–243. [CrossRef] [PubMed]
Paslaski S. VanDam C. Weng J. (2011). Modeling dopamine and serotonin systems in a visual recognition network. Proceedings of the International Joint Conference on Neural Networks (pp. 3016–3023). San Jose, CA.
Pavlovskaya M. Hochstein S. (2011). Perceptual learning transfer between hemispheres and tasks for easy and hard feature search conditions. Journal of Vision, 11(1):8, 1–13, http://www.journalofvision.org/content/11/1/8, doi:10.1167/11.1.8. [PubMed] [Article] [CrossRef] [PubMed]
Petrov A. A. Dosher B. A. Lu Z. L. (2003). A computational model of perceptual learning through incremental channel re-weighting predicts switch costs in non-stationary contexts. Journal of Vision, 3(9):670, http://www.journalofvision.org/content/3/9/670, doi:10.1167/3.9.670. [Abstract] [CrossRef]
Petrov A. A. Dosher B. A. A. Lu Z. L. L. (2005). The dynamics of perceptual learning: An incremental reweighting model. Psychological Review,112, 715–743. [CrossRef] [PubMed]
Poggio T. Fahle M. Edelman S. (1992). Fast perceptual learning in visual hyperacuity. Science,256, 1018–1021. [CrossRef] [PubMed]
Raiguel S. Vogels R. Mysore S. G. Orban G. A. (2006). Learning to see the difference specifically alters the most informative V4 neurons. The Journal of Neuroscience,26, 6589–6602. [CrossRef] [PubMed]
Ramachandran V. S. Braddick O. (1973). Orientation-specific learning in stereopsis. Perception,2, 371–376. [CrossRef] [PubMed]
Rosenblatt F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review,65, 386–408. [CrossRef] [PubMed]
Schoups A. Vogels R. Qian N. Orban G. (2001). Practising orientation identification improves orientation coding in V1 neurons. Nature,412, 549–553. [CrossRef] [PubMed]
Shi N. B. Zhongzhi S. Zucker J.-D. (2003). Perceptual learning and abstraction in machine learning. IEEE Transactions on Systems, Man and Cybernetics, Part C,36, 172–181.
Solgi M. Weng J. (2010). Developmental stereo: Emergence of disparity preference in models of the visual cortex. IEEE Transactions on Autonomous Mental Development,1, 238–252. [CrossRef]
Tartaglia E. M. Bamert L. Mast F. W. Herzog M. H. (2009). Human perceptual learning by mental imagery,Current Biology,19, 2081–2085. [CrossRef] [PubMed]
Teich A. F. Qian N. (2003). Learning and adaptation in a recurrent model of V1 orientation selectivity. Journal of Neurophysiology,89, 2086–2100. [CrossRef] [PubMed]
Tsodyks M. Gilbert C. (2004). Neural networks and perceptual learning. Nature,431, 775–781. [CrossRef] [PubMed]
Vaina L. M. Sundareswaran V. Harris J. G. (1995). Learning to ignore: psychophysics and computational modeling of fast learning of direction in noisy motion stimuli. Cognitive Brain Research,2, 155–163. [CrossRef] [PubMed]
Weiss Y. Edelman S. Fahle M. (1993). Models of perceptual learning in vernier hyperacuity. Neural Computation,5, 695–718. [CrossRef]
Weng J. (2010). A 5-chunk developmental brain-mind network model for multiple events in complex backgrounds. Proceedings of the International Joint Conference of Neural Networks (pp. 1–8). Barcelona, Spain.
Weng J. Luciw M. (2009). Dually optimal neuronal layers: Lobe component analysis. IEEE Transactions on Autonomous Mental Development,1, 68–85. [CrossRef]
Weng J. McClelland J. Pentland A. Sporns O. Stockman I. Sur M. (2001). Autonomous mental development by robots and animals. Science,291, 599–600. [CrossRef] [PubMed]
Wichmann F. A. Hill N. J. (2001). The psychometric function: I. Fitting, sampling, and goodness of fit. Perception & Psychophysics,63, 1293–1313. [CrossRef] [PubMed]
Xiao L. Q. Zhang J. Y. Wang R. Klein S. Levi D. Yu C. (2008). Complete transfer of perceptual learning across retinal locations enabled by double training. Current Biology,18, 1922–1926. [CrossRef] [PubMed]
Yu A. J. Dayan P. (2005). Uncertainty, neuromodulation, and attention. Neuron,46, 681–692. [CrossRef] [PubMed]
Yu C. Klein S. A. Levi D. M. (2004). Perceptual learning in contrast discrimination and the (minimal) role of context. Journal of Vision, 4(3):4, 169–182, http://www.journalofvision.org/content/4/3/4, doi:10.1167/4.3.4. [PubMed] [Article] [CrossRef]
Zhang T. Xiao L. Q. Klein S. A. Levi D. M. Yu C. (2010). Decoupling location specificity from perceptual learning of orientation discrimination. Vision Research,50, 368–374. [CrossRef] [PubMed]
Zhang J. Y. Zhang G. L. Xiao L. Q. Klein S. A. Levi D. M. Yu C. (2010). Rule-based learning explains visual perceptual learning and its specificity and transfer. The Journal of Neuroscience, 12323–12328.
Zhaoping L. Herzog M. Dayan P. (2003a). Nonlinear observation and recurrent preprocessing in perceptual learning. Network,14, 790–793. [CrossRef]
Zhaoping L. Herzog M. H. Dayan P. (2003b). Quadratic ideal observation and recurrent preprocessing in perceptual learning. Network: Computation in Neural Systems,14, 233–247. [CrossRef]
Footnotes
1  In fact, a major goal of machine learning research is to create algorithms which can generalize using as few training examples as possible (no specificity).
Footnotes
2  The noise was added to make the LCA algorithm simulate the nondeterministic behavior of humans in psychophysical experiments (i.e., nonidentical responses given the same sensory stimulus).
Footnotes
3  The images used are available from http://research.ics.tkk.fi/ica/imageica/.
Appendix A. Parameter values used for reported results
In the table below all the parameters in the paper are listed with their corresponding values and a short comment, if necessary. 
Figure 1
 
General pattern observed in transfer studies. Regardless of the order, a training and an exposure step seem to be common prior to transfer.
Figure 1
 
General pattern observed in transfer studies. Regardless of the order, a training and an exposure step seem to be common prior to transfer.
Figure 2
 
A schematic of the Where-What Networks (WWN). It consists of a sensory cortex which is connected to the What area in the ventral pathway and to the Where area in the in the dorsal pathway.
Figure 2
 
A schematic of the Where-What Networks (WWN). It consists of a sensory cortex which is connected to the What area in the ventral pathway and to the Where area in the in the dorsal pathway.
Figure 3
 
How training and exposure accompanied by off-task processes can cause the learning effects to transfer. Each circle schematically represents a column of neurons with laminar architecture (see a column's details in Figure 3C), solid lines show connections made (or improved) via direct perceptual learning, and dashed lines are the connections made or improved via off-task processes. (A) Transfer across locations in Where-What Networks. See the text for explanation. (B) Recruitment of more neurons in the sensory and concept areas. Many connections are not shown for the sake of visual simplicity. See text for details. (C) A cortical column from the internal layer magnified, along with its neighboring columns. The column is depicted in the jagged rectangle, and the arrows show the bottom-up and top-down signal passings among the layers. Only three functional layers (two, three, and four) are shown. We conjecture that Layers 5 and 6 have an assistive role in modulating the lateral connections (depicted by gray vertical arrows in each layer). They are not shown here for the simplicity of illustration. No information processing role is assumed for Layer 1, hence, not shown in the Figure. In short, Layer 4 and Layer 2 processes bottom-up and top-down signals, respectively. Then, Layer 3 integrates the output of Layer 4 and 2 and projects signals to higher concept areas.
Figure 3
 
How training and exposure accompanied by off-task processes can cause the learning effects to transfer. Each circle schematically represents a column of neurons with laminar architecture (see a column's details in Figure 3C), solid lines show connections made (or improved) via direct perceptual learning, and dashed lines are the connections made or improved via off-task processes. (A) Transfer across locations in Where-What Networks. See the text for explanation. (B) Recruitment of more neurons in the sensory and concept areas. Many connections are not shown for the sake of visual simplicity. See text for details. (C) A cortical column from the internal layer magnified, along with its neighboring columns. The column is depicted in the jagged rectangle, and the arrows show the bottom-up and top-down signal passings among the layers. Only three functional layers (two, three, and four) are shown. We conjecture that Layers 5 and 6 have an assistive role in modulating the lateral connections (depicted by gray vertical arrows in each layer). They are not shown here for the simplicity of illustration. No information processing role is assumed for Layer 1, hence, not shown in the Figure. In short, Layer 4 and Layer 2 processes bottom-up and top-down signals, respectively. Then, Layer 3 integrates the output of Layer 4 and 2 and projects signals to higher concept areas.
Figure 4
 
An example of activation patterns and neuronal changes during the off-task processes in the network. Only 5 × 5 = 25 neuronal columns in the internal area are shown. Each small square represents a neuron, and each grid represents a layer in the model laminar cortex. Neurons at the same location on each grid belong to the same neuronal column. For each of the Layers 2, 3, and 4 the pattern on the grid on the left shows the preresponse of the neuron (activation level before lateral competition), and the grid on the right shows the final response of the neuron. See the section Example: Activation patterns during off-task processes for a step-by-step description of the neural activation patterns, concept F1, and the second neuron in the Where area (corresponding to the concept L2) that happen to be active.
Figure 4
 
An example of activation patterns and neuronal changes during the off-task processes in the network. Only 5 × 5 = 25 neuronal columns in the internal area are shown. Each small square represents a neuron, and each grid represents a layer in the model laminar cortex. Neurons at the same location on each grid belong to the same neuronal column. For each of the Layers 2, 3, and 4 the pattern on the grid on the left shows the preresponse of the neuron (activation level before lateral competition), and the grid on the right shows the final response of the neuron. See the section Example: Activation patterns during off-task processes for a step-by-step description of the neural activation patterns, concept F1, and the second neuron in the Where area (corresponding to the concept L2) that happen to be active.
Figure 5
 
Sample images of Vernier input to the model. (left) Sample vertical Vernier stimulus at upper left corner (loc1_ori1). (middle) Sample horizontal Vernier stimulus at lower left corner (loc2_ori2). (right) Background (no input) used as input during network's off-task mode.
Figure 5
 
Sample images of Vernier input to the model. (left) Sample vertical Vernier stimulus at upper left corner (loc1_ori1). (middle) Sample horizontal Vernier stimulus at lower left corner (loc2_ori2). (right) Background (no input) used as input during network's off-task mode.
Figure 6
 
(left) Sample natural scene images used in an early development step of the simulation. (right) Bottom-up weight vectors (receptive field profile) of 15 × 15 sensory neurons developed after exposure to natural images.
Figure 6
 
(left) Sample natural scene images used in an early development step of the simulation. (right) Bottom-up weight vectors (receptive field profile) of 15 × 15 sensory neurons developed after exposure to natural images.
Figure 7
 
Psychometric function for the network's performance before and after perceptual learning.
Figure 7
 
Psychometric function for the network's performance before and after perceptual learning.
Figure 8
 
Performance of the WWN model—perceptual learning and transfer effects. (A) All the four combinations of orientation and location were first pretested to measure their threshold and then in Phase 1, loc1_ori1 condition. The blue curve shows the decreases in threshold for the trained condition. (B) Testing for the three untrained conditions shows no change in their corresponding thresholds at the end of loc1_ori1 (no transfer). Threshold decreases for loc2_ori2 as a result of training (green curve). At the end of the ninth training session, threshold for the two untrained conditions loc1_ori2 and loc2_ori1 drops to the same level as the trained conditions. (C, D) Percentage of improvement in discrimination after training and transfer. It plots the same data as in (A) and (B). Hollow and filled bars show relative improvement as a result of training and transfer, respectively. See figures 3C and 3D in Xiao et al. (2008) for comparison.
Figure 8
 
Performance of the WWN model—perceptual learning and transfer effects. (A) All the four combinations of orientation and location were first pretested to measure their threshold and then in Phase 1, loc1_ori1 condition. The blue curve shows the decreases in threshold for the trained condition. (B) Testing for the three untrained conditions shows no change in their corresponding thresholds at the end of loc1_ori1 (no transfer). Threshold decreases for loc2_ori2 as a result of training (green curve). At the end of the ninth training session, threshold for the two untrained conditions loc1_ori2 and loc2_ori1 drops to the same level as the trained conditions. (C, D) Percentage of improvement in discrimination after training and transfer. It plots the same data as in (A) and (B). Hollow and filled bars show relative improvement as a result of training and transfer, respectively. See figures 3C and 3D in Xiao et al. (2008) for comparison.
Figure 9
 
Sharpening of tuning curve after perceptual learning. The average response level of three cortical columns tuned on Vernier offset +2 is plotted.
Figure 9
 
Sharpening of tuning curve after perceptual learning. The average response level of three cortical columns tuned on Vernier offset +2 is plotted.
Figure 10
 
Change in Vernier thresholds before and after the off-task processes stage in Control Experiment 1.
Figure 10
 
Change in Vernier thresholds before and after the off-task processes stage in Control Experiment 1.
Figure 11
 
Change in Vernier thresholds before and after the off-task processes stage in Control Experiment 2.
Figure 11
 
Change in Vernier thresholds before and after the off-task processes stage in Control Experiment 2.
Table 1
 
Parameters of the model.
Table 1
 
Parameters of the model.
Parameter Value Comment
Competition window size, ω 5 × 5 Used to compute ri in Equation 2
Number of winning neurons, k 2 Equation 2
Lateral excitation range 3 × 3 Section The learning algorithm
t 1 10 Equation 7
t 2 1000 Equation 7
c 2 Equation 7
r 104 Equation 7
Bottom-up receptive-field size 10 × 10 pixels e.g., Size of bi(L4) and wb,i(L4) as in Equation 1
Neuronal resources in the internal area 50 × 50 × 2 cortical columns Each column has a six-layer laminar as in Figure 3C
Neuronal resources in the Where area 2 neurons Representing loc1 and loc2
Neuronal resources in the What area 20 neurons Representing Vernier offsets in range [–5, +5], excluding zero, for two orientations
A normalization factor for exposure effect, α 0.01 Equation 11
A normalization factor for exposure effect, c 0.005 Equation 11
Table 2
 
Parameters of the stimuli.
Table 2
 
Parameters of the stimuli.
Parameter Value
Wave-length Gabor patches in the Vernier stimuli, λ 10 pixels
Standard of the Gaussian envelope in the Vernier stimuli σ 4 pixels
Range of offsets used for the Vernier stimuli [–5, +5] pixels
Input image size 100 × 100 pixels
Center of Location 1, loc_1 (r, c) = (20, 20)
Center of Location 2, loc_2 (r, c) = (75, 20)
Pixel intensity range [0, 255]
Maximum pixel intensity in the random noise background 100
Table 3
 
Parameters of the simulations.
Table 3
 
Parameters of the simulations.
Parameter Value
Number of training samples (natural images) for the early development stage 104
Number of training trials for the coarse training stage 400 = 10 trials for each of 10 offsets at two locations and two orientations
Number of training trials for the perceptual learning (fine discrimination) stage 5 Sessions × 100 Trial/Session × 10 Offsets × 2 Trained Conditions = 104
Number of simulation cycles in off-task processes stage 1000
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×