Open Access
Methods  |   April 2025
Hierarchical Bayesian augmented Hebbian reweighting model of perceptual learning
Author Affiliations
  • Zhong-Lin Lu
    Division of Arts and Sciences, NYU Shanghai, Shanghai, China
    Center for Neural Science and Department of Psychology, New York University, New York, USA
    NYU-ECNU Institute of Brain and Cognitive Science, Shanghai, China
    https://orcid.org/0000-0002-7295-727X
    [email protected]
  • Shanglin Yang
    Division of Arts and Sciences, NYU Shanghai, Shanghai, China
    Cognitive Sciences Department, University of California, Irvine, CA, USA
    [email protected]
  • Barbara Anne Dosher
    Cognitive Sciences Department, University of California, Irvine, CA, USA
    [email protected]
Journal of Vision April 2025, Vol.25, 9. doi:https://doi.org/10.1167/jov.25.4.9
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Zhong-Lin Lu, Shanglin Yang, Barbara Anne Dosher; Hierarchical Bayesian augmented Hebbian reweighting model of perceptual learning. Journal of Vision 2025;25(4):9. https://doi.org/10.1167/jov.25.4.9.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

The augmented Hebbian reweighting model (AHRM) has proven effective in modeling the collective performance of observers in perceptual learning studies. In this work, we introduce a novel hierarchical Bayesian version of the AHRM (HB-AHRM), which allows us to model the learning curves of individual participants and the entire population within a unified framework. We compare the performance of HB-AHRM with that of a Bayesian inference procedure, which independently estimates posterior distributions of model parameters for each participant without using a hierarchical structure. To address the substantial computational challenges, we propose a method for approximating the likelihood function in the AHRM through feature engineering and linear regression, increasing the speed of the estimation process by a factor of 20,000. This enhancement enables the HB-AHRM to compute the posterior distributions of hyperparameters and model parameters at the population, subject, and test levels, facilitating statistical inferences across these layers. Although developed in the context of a single experiment, the HB-AHRM and its associated methods are broadly applicable to data from various perceptual learning studies, offering predictions of human performance at both individual and population levels. Furthermore, the approximated likelihood approach may prove useful in fitting other stochastic models that lack analytic solutions.

Introduction
Perceptual learning is a powerful process that can significantly enhance human performance in various perceptual tasks (Dosher & Lu, 2020; Fahle & Poggio, 2002; Green, Banai, Lu, & Bavelier, 2018; Lu & Dosher, 2022; Lu, Hua, Huang, Zhou, & Dosher, 2011; Sagi, 2011; Seitz, 2017; Watanabe & Sasaki, 2015). It can lead to improvements in tasks such as orientation, spatial frequency, and motion direction judgements, taking performance from near chance to high proficiency (Ball & Sekuler, 1982; Fiorentini & Berardi, 1980; Poggio, Fahle, & Edelman, 1992). Contrast sensitivity can increase by over 100% (Dosher & Lu, 1998; Huang, Zhou, & Lu, 2008), and response times can decrease by approximately 40% (Petrov, Van Horn, & Ratcliff, 2011). Perceptual learning is increasingly being applied in rehabilitation and the development of perceptual expertise (Cavanaugh, 2015; Gu et al., 2020; Hess & Thompson, 2015; Huang et al., 2008; Huxlin et al., 2009; Levi, 2020; Lu, Lin, & Dosher, 2016; Maniglia, Visscher, & Seitz, 2021; Roberts & Carrasco, 2022; Yan et al., 2015). 
Two main theories, representation enhancement and selective reweighting, have been proposed to explain performance improvements in visual perceptual learning (Ahissar & Hochstein, 2004; Dosher & Lu, 1998; Dosher & Lu, 2009; Fahle, 1994; Karni & Sagi, 1991; Mollon & Danilova, 1996; Sotiropoulos, Seitz, & Seriès, 2011; Talluri, Hung, Seitz, & Seriès, 2015; Watanabe et al., 2002; Watanabe & Sasaki, 2015; Zhang et al., 2010). Representation enhancement suggests that perceptual learning improves performance by altering the responses or tuning characteristics of neurons in early visual cortical areas. In contrast, selective reweighting involves the up-weighting of relevant and down-weighting of irrelevant representations from early visual cortical areas during perceptual decision without changing the representations themselves. Although both processes can contribute to perceptual learning (Kourtzi, Betts, Sarkheil, & Welchman, 2005; Roelfsema, van Ooyen, & Watanabe, 2010; Seitz & Watanabe, 2005; Watanabe & Sasaki, 2015), selective reweighting appears to be the dominant component (Dosher & Lu, 2020). This conclusion is also supported by physiological and brain imaging studies, which indicate that early sensory representation enhancement accounts for only a small fraction of behavioral performance improvements (Ghose, Yang, & Maunsell, 2002; Schoups, Vogels, Qian, & Orban, 2001), while evidence of neural plasticity is strongest in higher visual areas (Adab & Vogels, 2011; Law & Gold, 2008; Yan et al., 2014). Notably, representation enhancement remains primarily a verbal theory, and most existing computational models of visual perceptual learning are based on selective reweighting. These models aim to enhance the readout of sensory information during perceptual decision making (Dosher, Jeter, Liu, & Lu, 2013; Dosher & Lu, 1998; Jacobs, 2009; Law & Gold, 2009; Petrov, Dosher, & Lu, 2005; Poggio et al., 1992; Sotiropoulos et al., 2011; Vaina, Sundareswaran, & Harris, 1995; Weiss, Edelman, & Fahle, 1993; Zhaoping, Herzog, & Dayan, 2003). 
The augmented Hebbian reweighting model (AHRM) (Figure 1) is the first full computational process model of perceptual learning (Petrov et al., 2005). It comprises representation, bias, feedback, and decision units. The representation unit computes activations in multiple orientation- and frequency-selective channels from stimulus images. The decision unit weights and sums activations along with a bias unit and yields a response on each trial. The learning unit updates the weights to the decision unit on every trial using augmented Hebbian learning, which moves the late post-synaptic activation in the decision unit towards the correct response when feedback is available, and operates on the early decision activation when there is no feedback (Dosher & Lu, 2009; Petrov et al., 2005; Petrov, Dosher, & Lu, 2006). 
Figure 1.
 
The augmented Hebbian reweighting model (AHRM) with representation, bias, feedback, and decision units.
Figure 1.
 
The augmented Hebbian reweighting model (AHRM) with representation, bias, feedback, and decision units.
The AHRM has successfully modeled various phenomena in perceptual learning, including perceptual learning in nonstationary environments with and without feedback (Petrov et al., 2005, Petrov et al., 2006), basic mechanisms of perceptual learning, asymmetric transfer of learning in high and low external noise, and effects of pretraining mechanisms (Lu, Liu, & Dosher, 2010), co-learning of multiple tasks (Huang, Lu, & Dosher, 2012), interaction between training accuracy and feedback (Liu, Lu, & Dosher, 2010; Liu, Lu, & Dosher, 2012), and trial-by-trial and block feedback (Liu, Dosher, & Lu, 2014). It has also led to the development of several related models (Dosher et al., 2013; Jacobs, 2009; Law & Gold, 2009; Sotiropoulos et al., 2011; Talluri et al., 2015). 
Despite its success, fitting the AHRM to data presents a significant challenge. The AHRM is a sequential stochastic learning model that, with a given set of parameters, must be simulated to generate performance predictions with sequential trial-by-trial updates of the decision weights. Simulations typically involve running the model hundreds to thousands of times to generate average predictions and confidence bands for a given set of parameter values. For a fixed set of parameter values, each run of the model leads to a different sequence of responses and somewhat different weight changes owing to stochastic trial-by-trial variations resulting from internal and external noises and different random trial sequences. Fitting the AHRM with typical curve fitting procedures (e.g., maximum likelihood, least squares, Bayesian) is not feasible because the fitting process requires simulations of many potential parameter sets (tens to hundreds of thousands). Instead, estimation of the AHRM parameters is generally done using hierarchical grid search methods. These methods evaluate a matrix of spaced parameter values and then narrow down regions of the parameter space that are more promising, making it difficult to obtain the optimal solutions. 
In this study, we introduce three modeling technologies to facilitate AHRM fitting: 
  • 1) A hierarchical Bayesian AHRM (HB-AHRM): This approach incorporates population, subject, and test levels to estimate the posterior hyperparameter and parameter distribution across all subjects while considering covariance within and between subjects.
  • 2) Vectorization with PyTensor: Leveraging PyTensor library and GPU acceleration, these techniques drastically speed up simulations by optimizing the computation of mathematical expressions involving multidimensional arrays.
  • 3) Likelihood function approximation: We developed an approach to approximate the likelihood function in the AHRM with feature engineering and linear regression. Based on simulated predictions of the AHRM over a large parameter grid, we encoded the functional relationship between the parameters and likelihood, greatly accelerating model computations.
Hierarchical models enable effective combination of information across subjects and groups while preserving heterogeneity (Kruschke, 2014; Rouder & Lu, 2005). These models typically consist of submodels and probability distributions at multiple levels of the hierarchy and can compute the joint posterior distributions of the parameters and hyperparameters using Bayes' theorem based on all available data (Kruschke, 2014; Kruschke & Liddell, 2018). Hierarchical models are valuable for reducing the variance of estimated posterior distributions by decomposing variabilities from different sources using parameters and hyperparameters (Song et al., 2020) and shrinking estimated parameters at lower levels toward the modes of higher levels when there is insufficient data at the lower level (Kruschke, 2014; Rouder et al., 2003; Rouder & Lu, 2005). 
The HB-AHRM consists of three levels: population, subject, and test. In this framework, all subjects belong to a population and may, in principle, run the same experiment (called test) multiple times. The distributions of AHRM parameters at the test level are conditioned on the hyperparameter distributions at the subject level, which, in turn, are conditioned on the hyperparameter distribution at the population level. The HB-AHRM also includes covariance hyperparameters at the population and subject levels to capture the relationship between and within subjects. 
PyTensor is a Python library used to define, optimize, rewrite, and evaluate mathematical expressions, particularly those involving multidimensional arrays. It combines elements of a computer algebra system and an optimizing compiler. PyTensor is particularly useful for tasks where complex mathematical expressions need repeated evaluation, and speed is critical. The library provides a loop mechanism called scan, which can process inputs efficiently. We used PyTensor to represent all variables in the HB-AHRM and applied the scan function, significantly speeding up simulations from 22.2 to 1.6 seconds for 300 repeated runs of the experiment in Petrov et al. (2005) based on one set of AHRM parameters. 
Although PyTensor improved simulation speed, computing the HB-AHRM still involves evaluating hundreds of thousands of parameter sets. Because of the tremendous computational load, we developed a method to approximate the likelihood function in the AHRM with feature engineering and linear regression. It involves simulating AHRM predictions in a large parameter grid using parallel computing with GPU processors, taking less than 24 hours for a 64, 000 mesh grid. We then used feature engineering and linear regression to learn the relationship between the likelihood function and AHRM parameters, which took approximately 30 minutes. The differentiable functional relationship enabled efficient exploration of a large parameter space in fitting the models (<1 ms per sample). Our approach is related to but different from the likelihood approximation networks developed by Fengler, Govindarajan, Chen, and Frank (2021) in the context of the drift diffusion model. 
In this paper, we provide an overview of the AHRM as a generative model of trial-by-trial human performance in perceptual learning. We also introduce a Bayesian inference procedure (BIP) used to independently estimate the posterior distribution of AHRM parameters for each subject. Subsequently, we present the HB-AHRM, designed to collectively estimate the posterior distribution of hyperparameters and parameters at multiple levels of the hierarchy. We discuss the simulation technologies, including PyTensor, and the method for likelihood function approximation. These procedures are applied to data from Petrov et al. (2005). Our analysis involves comparing the goodness of fit the BIP and HB-AHRM, and evaluating the variability of estimated AHRM parameters, learning curves and weight structures. In addition, we conducted a simulation study to evaluate parameter recovery and HB-AHRM's ability in predicting the performance of a new simulated observer with no or limited training data. 
Theoretical development
The AHRM
In this section, we briefly describe the AHRM. More details of the model can be found in the original paper (Petrov et al., 2005). 
Representation unit
For subject i in test j, the stimulus image consists of a signal image Sijt and an external noise image Nijt in each trial t. The representation unit encodes the stimulus image into expected activations over a bank of orientation and spatial frequency channels tuned to different orientations φ and spatial frequencies f, E(φ, f|Sijt, Nijt), through convolution, half-wave rectification, contrast normalization and pooling over phase and space (Petrov et al., 2005). We consider 35 channels centered at 7 orientations (φ∈[0°, ±15°, ±30°, ±45°]) and 5 spatial frequencies (f ∈[1, 1.4, 2, 2.8, 4] cycles per degree). The expected activation is then combined with internal noise εφ,f (with mean 0 and standard deviation σr) and passed to a saturating non-linearity to compute the activation in each channel:  
\begin{eqnarray} && A^{\prime}\left( {\varphi ,f|{{S}_{ijt}},{\rm{\ }}{{N}_{ijt}}} \right) = E\left( {\varphi ,f{\rm{|}}{{S}_{ijt}},{\rm{\ }}{{N}_{ijt}}} \right) + {{\varepsilon }_{\varphi ,f}}, \quad \end{eqnarray}
(1)
 
\begin{eqnarray} && A\left( {\varphi ,f|{{S}_{ijt}},{\rm{\ }}{{N}_{ijt}}} \right) \nonumber\\ && = {{A}_{max}}\left\{ {\begin{array}{@{}*{1}{c}@{}} {\frac{{1 - {{e}^{ - \gamma A^{\prime}\left( {\varphi ,f|{{S}_{ijt}},{\rm{\ }}{{N}_{ijt}}} \right)}}}}{{1 + {{e}^{ - \gamma A^{\prime}\left( {\varphi ,f|{{S}_{ijt}},{\rm{\ }}{{N}_{ijt}}} \right)}}}},\ if\ A^{\prime}\left( {\varphi ,f|{{S}_{ijt}},{\rm{\ }}{{N}_{ijt}}} \right)}\\ {0,\ \ \ \ \ \ \ \ \ \ otherwise} \end{array}} \right. \ge 0 \nonumber\\ \end{eqnarray}
(2)
 
Task-specific decision unit
The decision unit weighs the evidence in the noisy activations from the representation unit to generate a response in each trial. Specifically, it first aggregates the activation pattern A(φ, f|Sijt, Nijt) over the orientation and spatial-frequency channels using current weights wφ,f(t), a top-down bias b(t), and a Gaussian decision noise ε (mean 0 and standard deviation σd):  
\begin{eqnarray} && u\left( t \right) = \mathop \sum \limits_{\varphi ,f} {{w}_{\varphi ,f}}\left( t \right){\rm{\ }}A\left( {\varphi ,f|{{S}_{ijt}},{\rm{\ }}{{N}_{ijt}}} \right) - b\left( t \right) + \varepsilon \quad \end{eqnarray}
(3)
and then computes its output o(t) as a sigmoidal function of u(t):  
\begin{eqnarray}o\left( t \right) = G\left( {u\left( t \right)} \right), \quad \end{eqnarray}
(4)
 
\begin{eqnarray}G\left( {u\left( t \right)} \right) = \frac{{1 - {{e}^{ - \gamma u\left( t \right)}}}}{{1 + {{e}^{ - \gamma u\left( t \right)}}}}{{A}_{max}}. \quad \end{eqnarray}
(5)
 
The decision variable o(t) saturates at ± Amax = ±0.5; the model responds left or counterclockwise if (t) < 0 and right or clockwise otherwise. 
The weights are initiated to be proportional to the preferred orientations of the channels in the representation unit relative to the vertical: wφ,f(0) = (φ/30°)winit
Learning unit
The weights between the representation and decision units are updated on each trial using an augmented Hebbian reweighting rule, in which feedback, F(t)  = ± 0.5, is used as the correct output of the decision unit. The change of weight wφ,f(t) in each channel depends on the activation of the representation A(φ, f|Sijt, Nijt), the correct output of the decision unit and the internal learning rate α:  
\begin{eqnarray}{{\delta }_{\varphi ,f}} = \alpha A\left( {\varphi ,f{\rm{|}}{{S}_{ijt}},{\rm{\ }}{{N}_{ijt}}{\rm{)}}F(t} \right), \quad \end{eqnarray}
(6a)
 
\begin{eqnarray} &&{\rm{\Delta }} {{w}_{\varphi ,f}}\left( t \right) = \left( {{{w}_{\varphi ,f}}\left( t \right) - {{w}_{min}}} \right)[ {{\delta }_{\varphi ,f}} ]_ - \nonumber\\ &&\quad + \left( {{w}_{max}} - {{w}_{\varphi ,f}}\left( t \right) \right) [{{\delta }_{\varphi ,f}}]_ +, \quad \end{eqnarray}
(6b)
where wmin = −1 and wmax = 1.0. 
Adaptive criterion control
The adaptive criterion control unit shifts the decision variable on each trial to compensate for biases in the immediate history of responses by adding a bias correcting term to the activation at the decision unit. It assumes that observers monitor their own responses and seek to equalize response frequencies—trying to match stimulus probabilities that are often balanced in experiments. A weighted running average of recent responses exponentially discounts the distant past response history:  
\begin{eqnarray}r\left( {t + 1} \right) = \rho {{R}_{ijt}} + \left( {1 - \rho } \right)r\left( t \right) \quad \end{eqnarray}
(7a)
 
\begin{eqnarray}b\left( {t + 1} \right) = \beta r\left( t \right), \quad \end{eqnarray}
(7b)
where R(t) is the response in the current trial, and r(t) is the weighted running average responses and b(t) is the bias. Following Petrov et al. (2005), we set ρ = 0.02. 
In summary, the AHRM has six free parameters (Table 1), internal learning rate α, bias strength β,  activation function gain γ, standard deviation of decision noise σd, standard deviation of representation noise σr, and initial weight scaling factor winit. Additional parameters, including maximum activation level, weight bounds, orientation spacing, and spatial frequency spacing, are fixed. 
Table 1.
 
AHRM parameters and their corresponding symbols in the BIP and HB-AHRM. AHRM, augmented Hebbian reweighting model; HB-AHRM, Bayesian version of the augmented Hebbian reweighting model.
Table 1.
 
AHRM parameters and their corresponding symbols in the BIP and HB-AHRM. AHRM, augmented Hebbian reweighting model; HB-AHRM, Bayesian version of the augmented Hebbian reweighting model.
To simplify the notation, we use θij to denote the AHRM parameters for subject i in test j (see Table 1 for the correspondence with the original AHRM parameters). Because AHRM is a dynamic learning model, for a given subject i in test j with ARHM parameters θij, we can compute the probability of obtaining a correct response in trial t, p(Rijt = 1|θij, Sijt, Nijt), from repeated simulations of the AHRM. The probability of obtaining an incorrect response in trial t is computed as: p(Rijt = 0|θij, Sijt, Nijt) = 1 − p(Rijt = 1|θij, Sijt, Nijt). The two probabilities define the likelihood function for each of T trials. The likelihood of obtaining all the observed responses of subject i in test j is the product of all the trial-by-trial likelihoods:  
\begin{eqnarray}p\left( {{{R}_{ij1:T}}{\rm{|}}{{\theta }_{ij}}} \right) = \mathop \prod \limits_{t = 1}^T p({{R}_{ijt}}|{{\theta }_{ij}},{\rm{\ }}{{S}_{ijt}},{\rm{\ }}{{N}_{ijt}}). \quad \end{eqnarray}
(8)
 
The actual response Rijt in each trial is generated with a binomial function with p(Rijt = 1|θij, Sijt, Nijt) as the binomial probability. 
Bayesian inference procedure
The BIP is used to estimate the posterior distribution of θij from the trial-by-trial data Yij = {(Rij1: T, Sij1: T,Nij1: T, )} of subject i in test j via Bayes’ rule (Figure 2a):  
\begin{eqnarray} && p( {{{\theta }_{ij}}{\rm{|}}{{Y}_{ij}}} ) = \frac{{\mathop \prod \nolimits_{t = 1}^T p({{R}_{ijt}}|{{\theta }_{ij}},{\rm{\ }}{{S}_{ijt}},{\rm{\ }}{{N}_{ijt}}){{p}_0}( {{{\theta }_{ij}}} )}}{{\smallint \mathop \prod \nolimits_{t = 1}^T p({{R}_{ijt}}|{{\theta }_{ij}},{\rm{\ }}{{S}_{ijt}},{\rm{\ }}{{N}_{ijt}}){{p}_0} ( {{{\theta }_{ij}}} )d{{\theta }_{ij}}}}. \;\;\; \end{eqnarray}
(9)
 
Figure 2.
 
(a) The Bayesian inference procedure (BIP). For a given subject i in test j with parameters θij, the likelihood of obtaining response p(Rijt) in trial t is computed from the augmented Hebbian reweighting model (AHRM). (b) The Bayesian version of the AHRM (HB-AHRM) is a three-level HB model in which the population level hyperparameter η is modeled as a mixture of Gaussian distributions with mean µ and covariance Σ, hyperparameter τi at the subject level is modeled as a mixture of Gaussian distributions with mean ρi and covariance ϕ, and the probability distribution of parameters θij is conditioned on τi.
Figure 2.
 
(a) The Bayesian inference procedure (BIP). For a given subject i in test j with parameters θij, the likelihood of obtaining response p(Rijt) in trial t is computed from the augmented Hebbian reweighting model (AHRM). (b) The Bayesian version of the AHRM (HB-AHRM) is a three-level HB model in which the population level hyperparameter η is modeled as a mixture of Gaussian distributions with mean µ and covariance Σ, hyperparameter τi at the subject level is modeled as a mixture of Gaussian distributions with mean ρi and covariance ϕ, and the probability distribution of parameters θij is conditioned on τi.
Here, pij|Yij) is the posterior distribution of AHRM parameters, θij, given the trial-by-trial data Yij, p(Rijtij, Sijt, Nijt) is the likelihood term that quantifies the probability of observing responses Rijt given θij, Sijt, and Nijt, p0ij) is the prior probability distribution of θij
The prior of θij is set as a uniform distribution in all its dimensions:  
\begin{eqnarray}{{p}_0}\left( {{{\theta }_{ijk}}} \right) = \mathcal{U}\left( {{{\theta }_{0k,min}},{{\theta }_{0k,max}}} \right), \quad \end{eqnarray}
(10)
where θ0k,min and θ0k,max are the lower and upper bounds of the uniform distribution for dimension k (Table 2), which are set based on observed values in prior applications of the model. The denominator of (Equation 9) is an integral across all possible values of θij. In the BIP, the AHRM parameters are estimated independently for each subject. 
Table 2.
 
Lower and upper bounds of the priors.
Table 2.
 
Lower and upper bounds of the priors.
The HB-AHRM
The HB-AHRM is a three-level HB model used to estimate the joint posterior hyperparameter and parameter distribution in all levels, considering covariance within and between subjects (Figure 2b). The HB-AHRM includes probability distributions at the population, subject, and test levels. 
Population level
The probability distribution of the six-dimensional hyperparameter η of the AHRM parameters (Table 1) at the population level is modeled as a mixture of six-dimensional Gaussian distributions with mean µ and covariance Σ, which have distributions p(μ) and p(Σ):  
\begin{eqnarray}p\left( \eta \right) = \mathcal{N}\left( {\eta ,\mu ,\Sigma } \right)p\left( \mu \right)p\left( \Sigma \right). \quad \end{eqnarray}
(11)
 
Subject level
The probability distribution of hyperparameter τi for subject i at the subject level is modeled as a mixture of six-dimensional Gaussian distributions with mean ρi and covariance ϕ, with distributions pi|η) and p(ϕ):  
\begin{eqnarray}p\left( {{{\tau }_i}{\rm{|}}\eta } \right) = \mathcal{N}\left( {{{\tau }_i},{{\rho }_i},\phi } \right){\rm{\ }}p\left( {{{\rho }_i}|\eta } \right)p\left( \phi \right), \quad \end{eqnarray}
(12)
in which ρi is conditioned on η. 
Test level
piji), the probability distribution of parameters θij is conditioned on τi. The probability of obtaining the entire dataset is computed using probability multiplication, which involves all levels of the model and the likelihood function based on the trial-by-trial data:  
\begin{eqnarray} && p( {{{Y}_{1:I,1:J}}{\rm{|}}X} ) \nonumber\\ && = \mathop \prod \limits_{i = 1}^I \mathop \prod \limits_{j = 1}^J \mathop \prod \limits_{t = 1}^T p({{R}_{ijt}}|{{\theta }_{ij}},{\rm{\ }}{{S}_{ijt}},{\rm{\ }}{{N}_{ijt}})p( {{{\theta }_{ij}}{\rm{|}}{{\tau }_i}} )p( {{{\tau }_i}{\rm{|}}\eta } )p( \eta ) \nonumber\\ && = \mathop \prod \limits_{i = 1}^I \mathop \prod \limits_{j = 1}^J \mathop \prod \limits_{t = 1}^T p({{R}_{ijt}}|{{\theta }_{ij}},{\rm{\ }}{{S}_{ijt}},{\rm{\ }}{{N}_{ijt}})p( {{{\theta }_{ij}}{\rm{|}}{{\tau }_i}}) \nonumber\\ &&\times\; \mathcal{N} ( {{{\tau }_i},{{\rho }_i},\phi }){\rm{\ }}p ( {{{\rho }_i}|\eta } )p( \phi )\mathcal{N} ( {\eta ,\mu ,\Sigma } )p ( \mu )p ( \Sigma ), \,\,\,\,\,\, \end{eqnarray}
(13)
where X = (θ1: I, 1: J, ρ1: I, µ, ϕ, Σ) are all the parameters and hyperparameters in the HB-AHRM. 
Bayes' rule is used to compute the joint posterior distribution of X, which includes all HB-AHRM parameters and hyperparameters:  
\begin{eqnarray} p( {X{\rm{|}}{{Y}_{1:I,1:J}}} ) = \frac{\begin{array}{@{}l@{}}\mathop \prod \nolimits_{i = 1}^I \mathop \prod \nolimits_{j = 1}^J \mathop \prod \nolimits_{t = 1}^T p({{R}_{ijt}}|{{\theta }_{ij}}, {{S}_{ijt}}, {{N}_{ijt}})\\ \quad\times\; p( {{{\theta }_{ij}}{\rm{|}}{{\tau }_i}} )\mathcal{N}( {{{\tau }_i},{{\rho }_i},\phi } ) p( {{{\rho }_i}|\eta } )\\ \quad \times\;{{p}_0}( \phi )\mathcal{N}( {\eta ,\mu ,\Sigma } ){{p}_0}( \mu ){{p}_0}( \Sigma )\end{array}} {\begin{array}{@{}l@{}} \smallint \mathop \prod \nolimits_{i = 1}^I \mathop \prod \nolimits_{j = 1}^J \mathop \prod \nolimits_{t = 1}^T p({{R}_{ijt}}|{{\theta }_{ij}}, {{S}_{ijt}}, {{N}_{ijt}})\\ \quad\times\; p( {{{\theta }_{ij}}{\rm{|}}{{\tau }_i}} )\mathcal{N}( {{{\tau }_i},{{\rho }_i},\phi } ) p( {{{\rho }_i}|\eta } )\\ \quad \times\;{{p}_0}( \phi )\mathcal{N}( {\eta ,\mu ,\Sigma } ){{p}_0}( \mu ){{p}_0}( \Sigma )dX\end{array}},\nonumber\\ \end{eqnarray}
(14)
where the denominator is an integral across all possible values of X and is a constant for a given dataset and HB-AHRM; p0(μ), p0(Σ), and p0(ϕ) are the prior distributions of μ, Σ, and ϕ, with  
\begin{eqnarray}{{p}_0}\left( \mu \right) = {{\mathcal{U}}_k}\left( {{{\theta }_{0k,min}},{{\theta }_{0k,max}}} \right), \quad \end{eqnarray}
(15)
where \({{\mathcal{U}}_k}( {{{\theta }_{0k,min}},{{\theta }_{0k,max}}} )\) denotes a uniform distribution between θ0k,min and θ0k,max in each of the six dimensions, with θ0k,min and θ0k,max defined in Table 2. Both p0(Σ) and p0(ϕ) are set with the LKJ distribution with a shape parameter of 2.0 (Lewandowski, Kurowicka, & Joe, 2009). 
Equation 14 allows us to estimate the joint posterior distribution of HB-AHRM parameters and hyperparameters across all tests and subjects. Unlike the BIP, the HB-AHRM hyperparameters and parameter estimates mutually constrain each other across tests and subjects via the joint distribution. This allows for more robust and interconnected estimates of HB-AHRM parameters and hyperparameters. 
Study 1. Reanalysis of Petrov et al. (2005)
Methods
Data
Petrov et al. (2005) investigated perceptual learning in an orientation identification task involving 13 adult subjects with normal or corrected-to-normal vision. Subjects judged the orientation (±10° from vertical) of a Gabor patch (windowed sinusoidal gratings, peak spatial frequency = 2 c/d) in each trial. The experiment included a nonstationary context where external noise images, predominantly oriented left in context L and right in context R, were superimposed on the target stimuli (Figure 3). 
Figure 3.
 
An illustration of left and right tilted Gabors in context L.
Figure 3.
 
An illustration of left and right tilted Gabors in context L.
Figure 4.
 
Approximating the likelihood function with feature engineering and linear regression. The simulated learning curves show data for incongruent (top) and congruent (bottom) trials at three contrast levels (colors) over training blocks for each parameter combination.
Figure 4.
 
Approximating the likelihood function with feature engineering and linear regression. The simulated learning curves show data for incongruent (top) and congruent (bottom) trials at three contrast levels (colors) over training blocks for each parameter combination.
The study consisted of 8 daily sessions, each with four blocks of 300 trials, totaling 9,600 trials. Subjects were trained in block sequences of either L–8R–8L–8R–6L–R (7 subjects) or R–8L–8R–8L–6R–L (6 subjects) contexts. Context congruency was randomly selected, with the target Gabor and context either congruent or incongruent in orientation. Gabor contrast was randomly selected from three fixed levels (0.106, 0.160, and 0.245). The resulting behavioral data shows a complex pattern related to congruency and contrast. 
The study adhered to ethical standards, with written consent obtained from all subjects before the experiment. The research protocol received approval from the institutional review board for human subject research at the University of California, Irvine, and complied with the principles of the Declaration of Helsinki. 
Likelihood function approximation
Fitting the BIP and HB-AHRM to the data involves using simulations to evaluate the likelihood of a vast set of model parameters. Owing to the significant computation time, previous studies relied on grid search methods for AHRM fitting (Petrov et al., 2005, Petrov et al., 2006). To reduce the computational cost, we approximated the likelihood function by learning its functional relationship with AHRM parameters based on simulated predictions over a large parameter grid. This facilitated fitting the BIP and HB-AHRM models. 
We constructed a six-dimensional mesh grid Θ (Table 3) to train the functional relationship between AHRM parameters and the likelihood function (Figure 4). The mesh grid contained 8 × 8 × 8 × 5 × 5 × 5  =  64, 000 sets of model parameters. The ranges and values of the AHRM parameters were chosen based on an exploration of model predictions with various parameters. 
Table 3.
 
Mesh grid used in the simulation.
Table 3.
 
Mesh grid used in the simulation.
We calculated the likelihood, representing the trial-by-trial probability of a correct response, for each set of AHRM parameters θijacross six stimulus conditions over 9,600 trials. This computation was based on the average of five simulations, each comprising 300 repeated runs with the same AHRM parameters and a different trial sequence. 
The AHRM was used to generate trial-by-trial response based on the set of parameters and the stimulus sequence, using PyTensor library's scan function. Because the exact external noise image on each trial was not available, we obtained a cache of 1,200 expected 35-dimensional activations, E(φ, f|Sijt, Nijt), consisting of 100 random samples of the 12 combinations of 2 (context) ×  3 (Gabor contrast) × 2 (Gabor orientation). In each of the 300 runs, the AHRM starts with the same initial weights and no decision bias, generating orientation judgements in eight sessions with four blocks of 300 trials each. The contexts of the blocks were arranged in terms of L–8R–8L–8R–6L–R or R–8L–8R–8L–6R–L, totaling 9,600 trials. Each block of 300 trials consisted of 50 trials in each of the 2 (congruency) × 3 (Gabor contrast) conditions, with a random permutation of the trial types. 
Each simulation took 1.6 seconds, in contrast with 22.2 seconds when using a for loop. Additionally, we averaged the likelihoods within each block of 300 trials, resulting in one likelihood for each of the 6 conditions per block for each set of parameters, and 64,000 likelihoods for the 64,000 sets of AHRM parameters in each of the six conditions per block. 
In each block of the six experimental conditions, we computed the functional relationship between the likelihood and AHRM parameters using feature engineering and linear regression in the Scikit-learn library. Across blocks and experimental conditions, we established a total of 192 functional relationships. 
For each relationship, the 64 predictors \(\theta _{ij}^{\prime}\ \)included an intercept, the six AHRM parameters θij, and 57 additional features obtained through comprehensive feature engineering by exploring all 21 quadratic and 36 cubic terms created from the 6 AHRM parameters. For each block of trials t (note 1), the linear regression is expressed as:  
\begin{eqnarray}p(t|{{\theta }_{ij}})\hbox{=a}\left( t \right) + \mathop \sum \limits_{l = 1}^{64} {{b}_k}\left( t \right)\theta _{ijk}^{\prime}. \quad \end{eqnarray}
(16)
 
The feature engineering and linear regression step took approximately one-half hour. 
The functional relationship for the six conditions in 32 blocks is encoded in a 192 × 64 matrix, where each row represents the coefficients in 1 experimental condition in 1 block. This coefficient matrix allows us to calculate the likelihood function p(Rijt = 1|θij, Sijt, Nijt) for any AHRM parameters within the mesh grid range, not just the parameter sets in the mesh grid. Moreover, the likelihood functions are differentiable, facilitating various inference functions in PyMC (Abril-Pla et al., 2023). 
Estimating the posterior distributions
We used the automatic differentiation variational inference (ADVI) method in the PyMC library to generate representative samples of the posterior distributions in the BIP and HB-ARHM. In this method, the variational posterior distribution is assumed to be spherical Gaussian without correlation between parameters and fit to the true posterior distribution. The means and standard deviations of the variational posterior are referred to as variational parameters. 
We ran ADVI optimization for 300,000 iterations in the BIP and HB-AHRM to ensure good approximations of the posterior distributions. To generate representative samples of the posterior distribution in the BIP, we used the ADVI method to generate 100,000 samples for each subject i. Similarly, we computed 100,000 representative samples of the posterior distribution of µ (6 parameters), Σ (21 parameters), ρik (6 × 13 = 78 parameters), ϕ (21 parameters), and θijk (6 × 13 = 78 parameters). A model is considered converged when the evidence lower bound stabilizes during iterations, indicating that the variational posterior has adequately approximated the true posterior distribution. 
Goodness of fit
We used the Watanabe–Akaike information criterion to compare the BIP and HB-AHRM fits. The Watanabe–Akaike information criterion quantifies the likelihood of the observed data based on the posterior distribution of model parameters while penalizing for model complexity (Watanabe & Opper, 2010). Additionally, we assessed the accuracy of model predictions with root mean square error (RMSE), the proportion of variance in the observed data explained by the model (R2), and the uncertainty of the parameter estimates and model predictions with estimated credible intervals. 
The RMSE between the predicted and observed quantities is defined as:  
\begin{eqnarray}{\rm{RMSE}} = \sqrt {\sum {{{\left( {{{{\rm{y}}}_{\rm{m}}} - {{{{\rm{\hat{y}}}}}_{\rm{m}}}} \right)}}^2}/{\rm{M}}} , \quad \end{eqnarray}
(17)
where ym is the observed value, \({{{\rm{\hat{y}}}}_{\rm{m}}}\) is the predicted value, \({\rm{\bar{y}}}\) is the mean of all the observarions, and M is the total number of observations. 
The proportion of variance accounted for, R2, is defined as:  
\begin{eqnarray}{{{\rm{R}}}^2} = 1 - \frac{{\sum {{{\left( {{{{\rm{y}}}_{\rm{m}}} - {{{{\rm{\hat{y}}}}}_{\rm{m}}}} \right)}}^2}/{\rm{M}}}}{{\sum {{{\left( {{{{\rm{y}}}_{\rm{m}}} - {\rm{\bar{y}}}} \right)}}^2}/{\rm{M}}}}, \quad \end{eqnarray}
(18)
 
Results
Likelihood function approximation
Figure 5 shows the predicted likelihood function of the AHRM for one set of parameters: α = 0.0008,  β = 1.8,  γ = 1.2,  σd = 0.2, σr = 0.1, and winit = 0.17. The model predictions exhibit characteristic patterns observed in Petrov et al. (2005): general learning, persistent switch costs, and within context rapid relearning. 
Figure 5.
 
Predicted likelihood function of the augmented Hebbian reweighting model (AHRM) for one set of AHRM parameters (α = 0.0008,  β = 1.8,  γ = 1.2, σd = 0.2, σr = 0.1, winit = 0.17) from the simulations.
Figure 5.
 
Predicted likelihood function of the augmented Hebbian reweighting model (AHRM) for one set of AHRM parameters (α = 0.0008,  β = 1.8,  γ = 1.2, σd = 0.2, σr = 0.1, winit = 0.17) from the simulations.
Figure 6 shows a scatter plot of the approximate likelihoods from feature engineering and linear regression against likelihoods generated from the AHRM across all 64,000 sets of AHRM parameters in the mesh grid, with an R2 of 0.991 and an RMSE of 0.016, indicating excellent approximation of the likelihoods by the linear model (Equation 16). 
Figure 6.
 
A scatter plot of the approximate likelihoods from feature engineering and linear regression against likelihoods generated from the augmented Hebbian reweighting model (AHRM).
Figure 6.
 
A scatter plot of the approximate likelihoods from feature engineering and linear regression against likelihoods generated from the augmented Hebbian reweighting model (AHRM).
BIP and HB-AHRM comparison
Both the BIP and HB-AHRM converged, indicated by the stabilization of evidence lower bound. The Watanabe–Akaike information criterion values of the BIP and HB-AHRM were −7,908.7 ± 92.5 and −8,754.1± 153.3, respectively, with a difference of −845.4 ±179.0. The HB-AHRM provided a significantly better fit to the data. We will focus on the results from the HB-AHRM in the main body of the paper. Detailed results from the BIP can be found in Supplementary Materials A
Posterior distributions (HB-AHRM)
The marginal posterior distributions of the population-level hyperparameter η are depicted in Figure 7. The mean and 94% half-width credible interval (HWCI) of these distributions are listed in Table 4. For most η components, except η5 (representation noise), the HWCI was quite small relative to their respective mean. 
Figure 7.
 
Marginal posterior distributions of the population-level hyperparameter η.
Figure 7.
 
Marginal posterior distributions of the population-level hyperparameter η.
Table 4.
 
Mean and 94% HWCI of the marginal distributions of η. HWCI, half-width credible interval.
Table 4.
 
Mean and 94% HWCI of the marginal distributions of η. HWCI, half-width credible interval.
The expected correlation matrix derived from the expected covariance hyperparameter Σ is shown in Table 5 (note 2). The expected between-subject correlations among η components were quite small, and none of them was significantly different from zero, owing to the relatively small range of performance variations across the 13 subjects. 
Table 5.
 
Expected correlation matrix at the population level.
Table 5.
 
Expected correlation matrix at the population level.
The marginal posterior distributions of the subject-level hyperparameter τi for a typical subject (i = 6) are depicted in Figure 8. The mean and 94% HWCI of these distributions are listed in Table 6. Compared with η, τ6 components exhibited greater uncertainties, a pattern held across all subjects. The full table with the mean and HWCI of the marginal distributions for all 13 subjects is available in Supplementary Materials B
Figure 8.
 
Marginal posterior distributions of the subject-level hyperparameter τi for subject i = 6.
Figure 8.
 
Marginal posterior distributions of the subject-level hyperparameter τi for subject i = 6.
Table 6.
 
Mean and 94% HWCI of the marginal posterior distributions of τ6. HWCI, half-width credible interval.
Table 6.
 
Mean and 94% HWCI of the marginal posterior distributions of τ6. HWCI, half-width credible interval.
The expected correlation matrix derived from the expected covariance hyperparameter Φ is shown in Table 7. The expected correlations between τi2 (bias strength) and τi4 (decision noise) was −0.201 and between components τi4 (decision noise) and τi6 (initial weight scaling factor) was 0.215; however, both were not significantly different from zero. The lack of significant within-subject correlations among τi components suggests that they are essentially independent. 
Table 7.
 
Expected correlation matrix at the subject level.
Table 7.
 
Expected correlation matrix at the subject level.
The marginal posterior distributions of the test-level parameter θi1 for subject 6 are depicted in Figure 9. The mean and 94% HWCI of these distributions are listed in Table 8. Compared with η and τ6, θ61 components exhibited much lower uncertainties because these test-level parameters are constrained by the experimental data directly. The pattern held across all subjects. The full table with the mean and HWCI of the test-level marginal distributions for all 13 subjects is available in Supplementary Materials B
Figure 9.
 
Marginal posterior distributions of the test-level parameter θi1 for subject i = 6.
Figure 9.
 
Marginal posterior distributions of the test-level parameter θi1 for subject i = 6.
Table 8.
 
Mean and 94% HWCI of the marginal posterior distributions of θ61. HWCI, half-width credible interval.
Table 8.
 
Mean and 94% HWCI of the marginal posterior distributions of θ61. HWCI, half-width credible interval.
Predicted learning curves and weight structures (HB-AHRM)
Figure 10 depicts the observed and predicted population-level z-score learning curves in the incongruent and congruent conditions, as well as the derived d′ learning curves. For the z-score learning curves, the average RMSE was 0.173 ± 0.071 and 0.175 ± 0.090 and the average 94% HWCI was 0.064 ± 0.037 and 0.071 ± 0.001 in the congruent and incongruent conditions, respectively, with a R2 of 0.835. 
Figure 10.
 
Observed and predicted population-level z-score learning curves in the incongruent (a) and congruent (b) conditions, and the derived d′ learning curves (c). Data points represent the average learning curves across all 13 subjects.
Figure 10.
 
Observed and predicted population-level z-score learning curves in the incongruent (a) and congruent (b) conditions, and the derived d′ learning curves (c). Data points represent the average learning curves across all 13 subjects.
For the d′ learning curves, the average RMSE was 0.199 ± 0.074 and average 94% HWCI was 0.105 ± 0.051, with a R2 of 0.887. In comparison, the AHRM with parameters from grid search in Petrov et al. (2005) accounted for 88.2% of the variance of the average d′ learning curve across all the subject, with a RMSE of 0.209. The results suggest that the HB-AHRM provideda largely comparable solution as the original AHRM at the population level. A table that details RMSE and 94% HWCI in each of the six experimental conditions is available in Supplementary Materials B
Figure 11 depicts population-level weight structure evolution over the course of training. Similar to Petrov et al. (2005), the weights for task-relevant channels (e.g., 2 c/d) increased over the course of training, while the weights for task-irrelevant channels (e.g., 4 c/d) stayed more or less the same. The HB-AHRM also allowed us to estimate the uncertainties of the weights. For the channels at 2 c/d, the average HWCI of the weights was 0.008 after 300 trials of training, 0.011 after 2,700 trials of training, and 0.011 after 9,300 trials of training. For the channels at 4 c/d, the average HWCI of the weights was 0.008 after 300 trials of training, 0.007 after 2,700 trials of training, and 0.006 after 9,300 trials of training. 
Figure 11.
 
Population-level weight structure evolution over the course of training. (Top) Longitudinal weight traces for units tuned to the target frequency (2.0 c/d) and an irrelevant frequency (4.0 c/d). Each trace corresponds to a particular orientation. (Middle and bottom) Cross-sections of the weights at 2.0 c/d, 2.8 c/d, and 4.0 c/d at the end of each epoch.
Figure 11.
 
Population-level weight structure evolution over the course of training. (Top) Longitudinal weight traces for units tuned to the target frequency (2.0 c/d) and an irrelevant frequency (4.0 c/d). Each trace corresponds to a particular orientation. (Middle and bottom) Cross-sections of the weights at 2.0 c/d, 2.8 c/d, and 4.0 c/d at the end of each epoch.
Figure 12 depicts the observed and predicted test-level z-score learning curves in the incongruent and congruent conditions, as well as the derived d′ learning curves for subject 6. For the z-score learning curves, the average RMSE was 0.423 ± 0.204 and 0.456 ± 0.227 and the average 94% HWCI of 0.035 ± 0.001 and 0.032 ± 0.015 in the congruent and incongruent conditions, respectively, with a R2 of 0.675. For the d′ learning curves, the average RMSE was 0.276 ± 0.071 and the average 94% HWCI was 0.031 ± 0.014, with a R2 of 0.744. A table that details RMSE and 94% HWCI in each of the six experimental conditions for all the subjects is available in Supplementary Materials B
Figure 12.
 
Observed and predicted test-level z-score learning curves in the incongruent (a) and congruent (b) conditions, and the derived d′ learning curves (c) for subject 6.
Figure 12.
 
Observed and predicted test-level z-score learning curves in the incongruent (a) and congruent (b) conditions, and the derived d′ learning curves (c) for subject 6.
Figure 13 depicts test-level weight structure evolution over the course of training for subject 6. The pattern of results is very similar to what happened at the population level, although the quantitative weights were different. For the channels at 2 c/d, the average HWCI of the weights was 0.003 after 300 trials of training, 0.005 after 2,700 trials of training, and 0.006 after 9,300 trials of training. For the channels at 4 c/d, the average HWCI of the weights was 0.003 after 300 trials of training, 0.002 after 2,700 trials of training, and 0.001 after 9,300 trials of training. 
Figure 13.
 
Test-level weight structure evolution over the course of training for subject 6. (Top) Longitudinal weight traces for units tuned to the target frequency (2.0 c/d) and an irrelevant frequency (4.0 c/d). Each trace corresponds with a particular orientation. (Middle and bottom) Cross-sections of the weights at 2.0 c/d, 2.8 c/d, and 4.0 c/d at the end of each epoch.
Figure 13.
 
Test-level weight structure evolution over the course of training for subject 6. (Top) Longitudinal weight traces for units tuned to the target frequency (2.0 c/d) and an irrelevant frequency (4.0 c/d). Each trace corresponds with a particular orientation. (Middle and bottom) Cross-sections of the weights at 2.0 c/d, 2.8 c/d, and 4.0 c/d at the end of each epoch.
Study 2: Simulations
In study 2, we conducted a simulation study to evaluate parameter recovery and HB-AHRM's ability in predicting the performance of a new simulated observer with no or limited training data. 
Methods
Simulated datasets
We created simulated dataset1 with 13 simulated observers to evaluate parameter recovery. For each simulated observer i, we set their AHRM parameters by drawing a random sample from the six-dimensional test-level HB-AHRM posterior distribution θi1 (see Supplementary Materials C for a table of all the parameters). We then created a single trial sequence with 9,600 trials based on Petrov et al. (2005) and simulated these observers’ performance using the AHRM with the same trial sequence 300 times. Simulated dataset1, therefore, consisted of performance in six experimental conditions, averaged every 300 trials, from each of the 13 simulated observers from the 300 repeated simulations. The structure of the data was identical to that in Petrov et al. (2005)
To evaluate HB-AHRM's ability in predicting the performance of a new simulated observer with no or limited training data, we created three additional simulated datasets by deleting some training data for a randomly selected subject 13 in simulated dataset1, while keeping all the data from the other 12 simulated observers. Specifically, simulated dataset2, simulated dataset3, and simulated dataset4 comprised 9,600 trials of subjects 1 to 12 and 0 trials, the first 300 trials, and the first 2,700 trials of training data for subject 13, respectively. 
HB-AHRM fitting and statistical evaluation
We fit the HB-AHRM to each of the four simulated datasets and computed the predicted learning curves from the posterior distributions of the HB-AHRM hyperparameters and parameters. The procedure was identical to that of STUDY 1. 
For simulated dataset1, we evaluated parameter recovery by comparing the mean of the posterior distributions of the AHRM parameters with those of the simulated observers (“the truth”) and computed the 94% HWCI for each of the parameters. We also evaluated the goodness of fit using RMSE and R2
For simulated dataset2, dataset3, and dataset4, we focused on the predicted learning curves of simulated observer 13 and compared them with the simulated learning curves of the same observer in simulated dataset1 (“the truth”). 
Results
Model recovery
As shown in Figures 14 and 15, the HB-ARHM provided excellent fits to simulated dataset1 at both the population and test levels. At the population level, for the z-score learning curves, the average RMSE was 0.014 ± 0.001 and 0.005 ± 0.002 and the average 94% HWCI was 0.037 ± 0.001 and 0.031 ± 0.015, in the congruent and incongruent conditions, respectively, with a R2 of 0.994. For the d′ learning curves, the average RMSE was 0.078 ± 0.038 and the average 94% HWCI was 0.052 ± 0.031, with a R2 of 0.981. A table that details RMSE and 94% HWCI in each of the six experimental conditions is available in Supplementary Materials C
Figure 14.
 
Simulated and predicted population-level z-score learning curves in the incongruent (a) and congruent (b) conditions, and the derived d′ learning curves (c). Data points represent the average simulated learning curves across all 13 simulated observers.
Figure 14.
 
Simulated and predicted population-level z-score learning curves in the incongruent (a) and congruent (b) conditions, and the derived d′ learning curves (c). Data points represent the average simulated learning curves across all 13 simulated observers.
Figure 15.
 
Simulated and predicted test-level z-score learning curves in the incongruent (a) and congruent (b) conditions, and the derived d′ learning curves (c) for simulated observer 13. Data points represent the simulated learning curves.
Figure 15.
 
Simulated and predicted test-level z-score learning curves in the incongruent (a) and congruent (b) conditions, and the derived d′ learning curves (c) for simulated observer 13. Data points represent the simulated learning curves.
At the test level, for the z-score learning curves, the average RMSE was 0.010 ± 0.000 and 0.010 ± 0.000 and the average 94% HWCI was 0.034 ± 0.001 and 0.030 ± 0.018 for simulated observer 13 in the congruent and incongruent conditions, respectively, with an R2 of 0.994. For the d′ learning curves, the average RMSE was 0.068 ± 0.023 and the average 94% HWCI was 0.048 ± 0.028, with an R2 of 0.987. A table that details RMSE and 94% HWCI in each of the six experimental conditions for all 13 observers is available in Supplementary Materials C
Figure 16 shows scatter plots of the recovered AHRM parameters against their true values in simulated dataset1. The RMSE between the recovered and true AHRM parameters were 0.00019, 0.308, 0.061, 0.017, 0.002, and 0.011 for the learning rate (α), bias strength (β), activation function gain (γ), decision noise (σd), representation noise (σr), and initial weight scaling factor (winit), respectively. The 94% HWCI for the recovered parameters were 9−5 ± 2−5, 0.091 ± 0.040, 0.012 ± 0.002, 0.005 ± 0.001, 7−17 ± 2−17, and 0.007 ± 0.001, respectively. For the learning rate, bias strength, decision noise, and initial weight scaling factor, the recovered parameters exhibited excellent correlations with their true values, with Pearson's correlation coefficients of 0.690, 0.956, 0.877, and 0.980, respectively. For activation function gain, the true values ranged from 0.384 to 0.606, but the recovered values all fell within a narrow range between 0.459 and 0.497, suggesting that the model was not very sensitive to activation function gain. For representation noise, the true values were in a very narrow range (0.00366 to 0.00371), and the recovered values were also in a very narrow range (0.00567 to 0.00593), although with a slightly higher mean. This is because representation noise was very small relative to the external noise in this experiment; it didn't have much impact on model performance. Overall, these results indicate that the HB-ARHM exhibited very good model recovery. 
Figure 16.
 
Scatter plots of the recovered augmented Hebbian reweighting model (AHRM) parameters against their true values in simulated dataset1. Each panel shows one AHRM parameter and each point represent one simulated observer. Error bars represent 94% half width confidence interval (HWCI).
Figure 16.
 
Scatter plots of the recovered augmented Hebbian reweighting model (AHRM) parameters against their true values in simulated dataset1. Each panel shows one AHRM parameter and each point represent one simulated observer. Error bars represent 94% half width confidence interval (HWCI).
Model predictions
Figure 17 shows the predicted learning curves of subject 13 with no data, 30 trials of data, 2,700 trials of data, and 9,600 trials of data. We compared the predictions with the simulated performance of the subject in simulated dataset1. 
Figure 17.
 
Simulated and predicted z-score and d′ learning curves for subject 13 with no data (a), 300 trials of data (b), 2,700 trials of data (c), and 9,600 trials of data (d). Data points represent the simulated learning curves for the subject in simulated dataset 1.
Figure 17.
 
Simulated and predicted z-score and d′ learning curves for subject 13 with no data (a), 300 trials of data (b), 2,700 trials of data (c), and 9,600 trials of data (d). Data points represent the simulated learning curves for the subject in simulated dataset 1.
With no data, for the z-score learning curves, the average RMSE was 0.011 ± 0.002 and 0.027 ± 0.009 and the average 94% HWCI was 0.146 ± 0.003 and 0.128 ± 0.068 in the congruent and incongruent conditions, respectively, with an R2 of 0.972. For the d′ learning curves, the average RMSE was 0.104 ± 0.007 and the average 94% HWCI was 0.193 ± 0.115, with an R2of 0.960. 
With 300 trials of data, for the z-score learning curves, the average RMSE was 0.018 ± 0.006 and 0.025 ± 0.004 and the average 94% HWCI was 0.098 ± 0.002 and 0.088 ± 0.041 in the congruent and incongruent conditions, respectively, with an R2 of 0.968. For the d′ learning curves, the average RMSE was 0.132 ± 0.016 and the average 94% HWCI was 0.146 ± 0.078, with an R2 of 0.942. 
With 2,700 trials of data, for the z-score learning curves, the average RMSE was 0.012 ± 0.004 and 0.022 ± 0.006 and the average 94% HWCI was 0.062 ± 0.002 and 0.055 ± 0.022 in the congruent and incongruent conditions, respectively, with an R2 of 0.980. For the d′ learning curves, the average RMSE was 0.101 ± 0.017 and the average 94% HWCI was 0.084 ± 0.044, with an R2 of 0.966. 
Overall, the HB-AHRM made excellent predictions with no data, 300 trials of data, and 2,700 trials of data for simulated observer 13. A table that details RMSE and 94% HWCI in each of the six experimental conditions is available in Supplementary Materials C
Discussion
In this study, we developed the HB-AHRM and new modeling technologies to address the challenge in fitting the AHRM, a very successful model in visual perceptual learning. A combination of feature engineering and linear regression provided a high-quality approximation of the likelihood function. This approach allowed us to drastically reduce the computation time for fitting the HB-AHRM, estimated to be more than125 days for the dataset in Petrov et al. (2005), which is practically infeasible. The HB-AHRM produced significantly better fits than the BIP, enabling fitting at both the group level with comparable goodness of fit as Petrov et al. (2005), and at the individual level. In stimulation studies, the HB-AHRM along with the new modeling technologies demonstrated robust model recovery properties, accurately predicting simulated observer outcomes with minimal or no initial data. 
The AHRM generates trial-by-trial responses based on its parameters and the stimulus sequence. Because the exact external noise image for each trial in Petrov et al. (2005) was unavailable, we adopted a procedure similar to the original study, sampling from a cache of 1,200 expected activations. The impact of the mismatch with the exact stimulus sequence used in the experiment appeared to be minimal at the group level owing to cross-subject averaging, but it may have influenced the fits at the individual subject level because each subject in the original study used a unique random trial sequence. Therefore, it is crucial to accurately record the exact stimulus sequences in future studies. 
Many traditional statistical methods assume homogeneity or complete independence across subjects and tests. In contrast, hierarchical modeling (HB) integrates heterogeneous information across multiple levels, using Bayes’ theorem to combine sub-models and probability distributions from all observed data in a study (Kruschke, 2014; Kruschke & Liddell, 2018; Rouder & Lu, 2005). This yields updated joint posterior distributions of hyperparameters and parameters, enhancing accuracy compared with methods that treat each individual independently (Gu et al., 2016; Kruschke, 2014). Previous studies, including this one, have shown that HBM provides more precise estimates of parameters of interest compared with traditional methods like the BIP, in estimating contrast sensitivity functions (Zhao, Lesmes, Hou, & Lu, 2021), visual acuity behavioral functions (Zhao et al., 2021), and learning curves in perceptual learning (Zhao, Liu, Dosher, & Lu, 2024a, Zhao, Liu, Dosher, & Lu, 2024b). 
Moreover, HBM offers a robust framework for predictions, treating to-be-predicted performance as missing data to compute their posterior distributions based on available information. Leveraging conditional dependencies across the hierarchy and between- and within-subject covariances, HBM facilitates constructing digital twins and making highly accurate and reasonably precise predictions (Zhao, Lesmes, Dorr, & Lu, 2024). 
This study develops a regression-based likelihood function approximation approach and demonstrates its application in fitting the HB-AHRM. Our approach is related to but different from the likelihood approximation networks developed by (Fengler et al., 2021) in the context of the drift diffusion model. Likelihood approximation may have broader utility in fitting other stochastic models lacking analytic forms, such as the perceptual template model (Lu & Dosher, 2008), integrated reweighting theory (Dosher et al., 2013), and various response time models (Ratcliff, Smith, Brown, & McKoon, 2016), as well as complex models that requires extensive computations to generate predictions, such as the population receptive field model in retinotopic map studies (Dumoulin & Wandell, 2008). In this particular application, feature engineering and linear regression were used to generate the approximate likelihood function. Alternatively, other methods, including non-linear neural networks and machine learning, could be used to derive the approximate likelihood function. Our preliminary investigation into this topic suggests that different approaches may be more effective for different models. For instance, using a tree to approximate the solution space, the qPRF algorithm can perform population receptive field modeling of blood oxygen level–dependent functional magnetic resonance imaging at speeds more than 1,000 times faster than the conventional systems designed to fit the model (Waz, Wang, & Lu, 2025). 
In conclusion, we successfully developed and implemented the HB-AHRM using newly developed modeling technologies. These advancements hold promise for widespread applications in fitting stochastic models. 
Acknowledgments
Supported by the National Eye Institute (EY017491). 
Data and code availability: The data will be available on https://alexpetrov.com/softw/, and the analysis code will be available upon reasonable request. 
Competing interests: Z.L.L. has intellectual and equity interests in Adaptive Sensory Technology, Inc. (San Diego, CA) and Jiangsu Juehua Medical Technology Co, LTD (Jiangsu, China). S.Y. and B.D. have no competing interests. 
Commercial relationships: none. 
Corresponding author: Zhong-Lin Lu. 
Address: Center for Neural Science, New York University, 4 Washington Place, New York, NY 10003, USA. 
Footnotes
1  Although the BIP and HBM are formulated to model trial-by-trial data, we use t to denote each block of 300 trials in the rest of the paper.
Footnotes
2  The HB model presented in this paper is based on Gaussian mixtures, where the covariance matrices, Σ and ϕ, represent the covariance structures of the Gaussian components. These covariance matrices are meaningful within the context of the Gaussian mixtures, as they interact with the distributions of the means of the components. They play a crucial role in predicting the performance of a new simulated observer, particularly when limited or no training data is available.
References
Abril-Pla, O., Andreani, V., Carroll, C., Dong, L., Fonnesbeck, C. J., Kochurov, M., ... Martin, O. A. (2023). PyMC: A modern, and comprehensive probabilistic programming framework in Python. PeerJ Computer Science, 9, e1516. [CrossRef] [PubMed]
Adab, H. Z., & Vogels, R. (2011). Practicing coarse orientation discrimination improves orientation signals in macaque cortical area v4. Current Biology, 21(19), 1661–1666. [CrossRef] [PubMed]
Ahissar, M., & Hochstein, S. (2004). The reverse hierarchy theory of visual perceptual learning. Trends in Cognitive Sciences, 8(10), 457–464. [CrossRef] [PubMed]
Ball, K., & Sekuler, R. (1982). A specific and enduring improvement in visual motion discrimination. Science, 218(4573), 697–698. [CrossRef] [PubMed]
Cavanaugh, M. R. (2015). Visual recovery in cortical blindness is limited by high internal noise. Journal of Vision, 15, 9, doi:10.1167/15.10.9. [CrossRef] [PubMed]
Dosher, B. A., Jeter, P., Liu, J., & Lu, Z.-L. (2013). An integrated reweighting theory of perceptual learning. Proceedings of the National Academy of Sciences of the United States of America, 110(33), 13678–13683. [PubMed]
Dosher, B. A., & Lu, Z.-L. (1998). Perceptual learning reflects external noise filtering and internal noise reduction through channel reweighting. Proceedings of the National Academy of Sciences of the United States of America, 95(23), 13988–13993. [PubMed]
Dosher, B. A., & Lu, Z.-L. (2009). Hebbian reweighting on stable representations in perceptual learning. Learning & Perception, 1(1), 37–58. [PubMed]
Dosher, B. A., & Lu, Z.-L. (2020). Perceptual learning: How experience shapes visual perception. Cambridge, MA: MIT Press.
Dumoulin, S. O., & Wandell, B. A. (2008). Population receptive field estimates in human visual cortex. Neuroimage, 39(2), 647–660. [PubMed]
Fahle, M. (1994). Human pattern recognition: parallel processing and perceptual learning. Perception, 23, 411–427. [PubMed]
Fahle, M., & Poggio, T. (2002). Perceptual learning. Cambridge, MA: MIT Press.
Fengler, A., Govindarajan, L. N., Chen, T., & Frank, M. J. (2021). Likelihood approximation networks (LANs) for fast inference of simulation models in cognitive neuroscience. Elife, 10, e65074. [PubMed]
Fiorentini, A., & Berardi, N. (1980). Perceptual learning specific for orientation and spatial frequency. Nature, 287, 43–44. [PubMed]
Ghose, G. M., Yang, T., & Maunsell, J. H. (2002). Physiological correlates of perceptual learning in monkey V1 and V2. Journal of Neurophysiology, 87(4), 1867–1888. [PubMed]
Green, C. S., Banai, K., Lu, Z. L., & Bavelier, D. (2018). Perceptual learning. Stevens’ Handbook of Experimental Psychology and Cognitive Neuroscience, 2, 1–47.
Gu, H., Kim, W., Hou, F., Lesmes, L. A., Pitt, M. A., Lu, Z.-L., ... Myung, J. I. (2016). A hierarchical Bayesian approach to adaptive vision testing: A case study with the contrast sensitivity function. Journal of Vision, 16(6), 15, doi:10.1167/16.6.15. [PubMed]
Gu, L., Deng, S., Feng, L., Yuan, J., Chen, Z., Yan, J., ... Chen, Z. (2020). Effects of monocular perceptual learning on binocular visual processing in adolescent and adult amblyopia. IScience, 23(2), 100875. [PubMed]
Hess, R. F., & Thompson, B. (2015). Amblyopia and the binocular approach to its therapy. Vision Research, 114, 4–16. [PubMed]
Huang, C.-B., Lu, Z.-L., & Dosher, B. A. (2012). Co-learning analysis of two perceptual learning tasks with identical input stimuli supports the reweighting hypothesis. Vision Research, 61, 25–32. [PubMed]
Huang, C.-B., Zhou, Y., & Lu, Z.-L. (2008). Broad bandwidth of perceptual learning in the visual system of adults with anisometropic amblyopia. Proceedings of the National Academy of Sciences of the United States of America, 105(10), 4068–4073. [PubMed]
Huxlin, K. R., Martin, T., Kelly, K., Riley, M., Friedman, D. I., Burgin, W. S., ... Hayhoe, M. (2009). Perceptual relearning of complex visual motion after V1 damage in humans. Journal of Neuroscience, 29(13), 3981–3991. [PubMed]
Jacobs, R. A. (2009). Adaptive precision pooling of model neuron activities predicts the efficiency of human visual learning. Journal of Vision, 9(4), 22.
Karni, A., & Sagi, D. (1991). Where practice makes perfect in texture discrimination: evidence for primary visual cortex plasticity. Proceedings of the National Academy of Sciences of the United States of America, 88(11), 4966–4970. [PubMed]
Kourtzi, Z., Betts, L. R., Sarkheil, P., & Welchman, A. E. (2005). Distributed neural plasticity for shape learning in the human visual cortex. PLoS Biology, 3(7), e204. [PubMed]
Kruschke, J. (2014). Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan. Amsterdam: Elsevier.
Kruschke, J., & Liddell, T. M. (2018). The Bayesian new statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective. Psychonomic Bulletin & Review, 25, 178–206. [PubMed]
Law, C.-T., & Gold, J. I. (2008). Neural correlates of perceptual learning in a sensory-motor, but not a sensory, cortical area. Nature Neuroscience, 11(4), 505–513. [PubMed]
Law, C.-T., & Gold, J. I. (2009). Reinforcement learning can account for associative and perceptual learning on a visual-decision task. Nature Neuroscience, 12(5), 655–663. [PubMed]
Levi, D. M. (2020). Rethinking amblyopia 2020. Vision Research, 176, 118–129. [PubMed]
Lewandowski, D., Kurowicka, D., & Joe, H. (2009). Generating random correlation matrices based on vines and extended onion method. Journal of Multivariate Analysis, 100(9), 1989–2001.
Liu, J., Dosher, B. A., & Lu, Z.-L. (2014). Modeling trial by trial and block feedback in perceptual learning. Vision Research, 99, 46–56. [PubMed]
Liu, J., Lu, Z.-L., & Dosher, B. A. (2010). Augmented Hebbian reweighting: Interactions between feedback and training accuracy in perceptual learning. Journal of Vision, 10(10), 29. [PubMed]
Liu, J., Lu, Z.-L., & Dosher, B. A. (2012). Mixed training at high and low accuracy levels leads to perceptual learning without feedback. Vision Research, 61, 15–24. [PubMed]
Lu, Z.-L., & Dosher, B. A. (2008). Characterizing observers using external noise and observer models: Assessing internal representations with external noise. Psychological Review, 115, 44–82, doi:10.1037/0033-295X.115.1.44. [PubMed]
Lu, Z.-L., & Dosher, B. A. (2022). Current directions in visual perceptual learning. Nature Reviews Psychology, 1(11), 654–668, doi:10.1038/s44159-022-00107-2. [PubMed]
Lu, Z.-L., Hua, T., Huang, C.-B., Zhou, Y., & Dosher, B. A. (2011). Visual perceptual learning. Neurobiology of Learning and Memory, 95(2), 145–151. [PubMed]
Lu, Z.-L., Lin, Z., & Dosher, B. A. (2016). Translating perceptual learning from the laboratory to applications. Trends in Cognitive Sciences, 20, 561–563. [PubMed]
Lu, Z.-L., Liu, J., & Dosher, B. A. (2010). Modeling mechanisms of perceptual learning with augmented Hebbian re-weighting. Vision Research, 50(4), 375–390. [PubMed]
Maniglia, M., Visscher, K. M., & Seitz, A. R. (2021). Perspective on vision science-informed interventions for central vision loss. Frontiers in Neuroscience, 15, 734970. [PubMed]
Mollon, J. D., & Danilova, M. V. (1996). Three remarks on perceptual learning. Spatial Vision, 10(1), 51–58. [PubMed]
Petrov, A., Dosher, B. A., & Lu, Z.-L. (2005). The dynamics of perceptual learning: An incremental reweighting model. Psychological Review, 112(4), 715–743. [PubMed]
Petrov, A., Dosher, B. A., & Lu, Z.-L. (2006). Perceptual learning without feedback in non-stationary contexts: Data and model. Vision Research, 46(19), 3177–3197. [PubMed]
Petrov, A., Van Horn, N. M., & Ratcliff, R. (2011). Dissociable perceptual-learning mechanisms revealed by diffusion-model analysis. Psychonomic Bulletin & Review, 18(3), 490–497. [PubMed]
Poggio, T., Fahle, M., & Edelman, S. (1992). Fast perceptual learning in visual hyperacuity. Science, 256(5059), 1018–1021. [PubMed]
Ratcliff, R., Smith, P. L., Brown, S. D., & McKoon, G. (2016). Diffusion decision model: Current issues and history. Trends in Cognitive Sciences, 20(4), 260–281. [PubMed]
Roberts, M., & Carrasco, M. (2022). Exogenous attention generalizes location transfer of perceptual learning in adults with amblyopia. IScience, 25(3), 103839. [PubMed]
Roelfsema, P. R., van Ooyen, A., & Watanabe, T. (2010). Perceptual learning rules based on reinforcers and attention. Trends in Cognitive Sciences, 14(2), 64–71. [PubMed]
Rouder, J. N., & Lu, J. (2005). An introduction to Bayesian hierarchical models with an application in the theory of signal detection. Psychonomic Bulletin & Review, 12(4), 573–604. [PubMed]
Rouder, J. N., Sun D., Speckman P. L., Lu J., & Zhou, D. (2003). A hierarchical Bayesian statistical framework for response time distributions. Psychometrika, 68(4), 589–606.
Sagi, D. (2011). Perceptual learning in vision research. Vision Research, 51(13), 1552–1566. [PubMed]
Schoups, A., Vogels, R., Qian, N., & Orban, G. (2001). Practising orientation identification improves orientation coding in V1 neurons. Nature, 412(6846), 549–553. [PubMed]
Seitz, A. R. (2017). Perceptual learning. Current Biology, 27(13), R631–R636. Retrieved from https://www.ncbi.nlm.nih.gov/pubmed/28697356. [PubMed]
Seitz, A. R., & Watanabe, T. (2005). A unified model for perceptual learning. Trends in Cognitive Sciences, 9(7), 329–334. [PubMed]
Song, M., Behmanesh, I., Moaveni B, & Papadimitriou, C. (2020). Accounting for modeling errors and inherent structural variability through a hierarchical Bayesian model updating approach: An overview. Sensors, 20(14), 3874. [PubMed]
Sotiropoulos, G., Seitz, A. R., & Seriès, P. (2011). Perceptual learning in visual hyperacuity: A reweighting model. Vision Research, 51(6), 585–599. [PubMed]
Talluri, B. C., Hung, S.-C., Seitz, A. R., & Seriès, P. (2015). Confidence-based integrated reweighting model of task-difficulty explains location-based specificity in perceptual learning. Journal of Vision, 15(10), 17. [PubMed]
Vaina, L. M., Sundareswaran, V., & Harris, J. G. (1995). Learning to ignore: Psychophysics and computational modeling of fast learning of direction in noisy motion stimuli. Cognitive Brain Research, 2(3), 155–163. [PubMed]
Watanabe, S., & Opper, M. (2010). Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. Journal of Machine Learning Research, 11(12), 3571–3594.
Watanabe, T., Náñez, J. E., Koyama, S., Mukai, I., Liederman, J., & Sasaki, Y. (2002). Greater plasticity in lower-level than higher-level visual motion processing in a passive perceptual learning task. Nature Neuroscience, 5(10), 1003–1009. [PubMed]
Watanabe, T., & Sasaki, Y. (2015). Perceptual learning: toward a comprehensive theory. Annual Review of Psychology, 66, 197–221. [PubMed]
Waz, S., Wang, Y., & Lu, Z.-L. (2025). qPRF: A system to accelerate population receptive field modeling. Neuroimage, 306, 120994. [PubMed]
Weiss, Y., Edelman, S., & Fahle, M. (1993). Models of perceptual learning in vernier hyperacuity. Neural Computation, 5(5), 695–718.
Yan, F.-F., Zhou, J., Zhao, W., Li, M., Xi, J., Lu, Z.-L., ... Huang, C.-B. (2015). Perceptual learning improves neural processing in myopic vision. Journal of Vision, 15(10), 12. [PubMed]
Yan, Y., Rasch, M. J., Chen, M., Xiang, X., Huang, M., Wu, S., ... Li, W. (2014). Perceptual training continuously refines neuronal population codes in primary visual cortex. Nature Neuroscience, 17(10), 1380–1387. [PubMed]
Zhang, J.-Y., Zhang, G.-L., Xiao, L.-Q., Klein, S. A., Levi, D. M., & Yu, C. (2010). Rule-based learning explains visual perceptual learning and its specificity and transfer. Journal of Neuroscience, 30(37), 12323–12328. [PubMed]
Zhao, Y., Lesmes, L. A., Dorr, M., & Lu, Z.-L. (2021). Quantifying uncertainty of the estimated visual acuity behavioral function with hierarchical Bayesian modeling. Translational Vision Science & Technology, 10(12), 18. [PubMed]
Zhao, Y., Lesmes, L. A., Dorr, M., & Lu, Z.-L. (2024). Predicting contrast sensitivity functions with digital twins. Scientific Reports, 14(1), 24100. [PubMed]
Zhao, Y., Lesmes, L. A., Hou, F., & Lu, Z.-L. (2021). Hierarchical Bayesian modeling of contrast sensitivity functions in a within-subject design. Journal of Vision, 21(12), 9. [PubMed]
Zhao, Y., Liu, J., Dosher, B. A., & Lu, Z.-L. (2024a). Enabling identification of component processes in perceptual learning with nonparametric hierarchical Bayesian modeling. Journal of Vision, 24(5), 8. [PubMed]
Zhao, Y., Liu, J., Dosher, B. A., & Lu, Z.-L. (2024b). Estimating the trial-by-trial learning curve in perceptual learning with hierarchical Bayesian modeling. Journal of Cognitive Enhancement, 23, 1–18.
Zhaoping, L., Herzog, M. H., & Dayan, P. (2003). Nonlinear ideal observation and recurrent preprocessing in perceptual learning. Network: Computation in Neural Systems, 14(2), 233–247.
Figure 1.
 
The augmented Hebbian reweighting model (AHRM) with representation, bias, feedback, and decision units.
Figure 1.
 
The augmented Hebbian reweighting model (AHRM) with representation, bias, feedback, and decision units.
Figure 2.
 
(a) The Bayesian inference procedure (BIP). For a given subject i in test j with parameters θij, the likelihood of obtaining response p(Rijt) in trial t is computed from the augmented Hebbian reweighting model (AHRM). (b) The Bayesian version of the AHRM (HB-AHRM) is a three-level HB model in which the population level hyperparameter η is modeled as a mixture of Gaussian distributions with mean µ and covariance Σ, hyperparameter τi at the subject level is modeled as a mixture of Gaussian distributions with mean ρi and covariance ϕ, and the probability distribution of parameters θij is conditioned on τi.
Figure 2.
 
(a) The Bayesian inference procedure (BIP). For a given subject i in test j with parameters θij, the likelihood of obtaining response p(Rijt) in trial t is computed from the augmented Hebbian reweighting model (AHRM). (b) The Bayesian version of the AHRM (HB-AHRM) is a three-level HB model in which the population level hyperparameter η is modeled as a mixture of Gaussian distributions with mean µ and covariance Σ, hyperparameter τi at the subject level is modeled as a mixture of Gaussian distributions with mean ρi and covariance ϕ, and the probability distribution of parameters θij is conditioned on τi.
Figure 3.
 
An illustration of left and right tilted Gabors in context L.
Figure 3.
 
An illustration of left and right tilted Gabors in context L.
Figure 4.
 
Approximating the likelihood function with feature engineering and linear regression. The simulated learning curves show data for incongruent (top) and congruent (bottom) trials at three contrast levels (colors) over training blocks for each parameter combination.
Figure 4.
 
Approximating the likelihood function with feature engineering and linear regression. The simulated learning curves show data for incongruent (top) and congruent (bottom) trials at three contrast levels (colors) over training blocks for each parameter combination.
Figure 5.
 
Predicted likelihood function of the augmented Hebbian reweighting model (AHRM) for one set of AHRM parameters (α = 0.0008,  β = 1.8,  γ = 1.2, σd = 0.2, σr = 0.1, winit = 0.17) from the simulations.
Figure 5.
 
Predicted likelihood function of the augmented Hebbian reweighting model (AHRM) for one set of AHRM parameters (α = 0.0008,  β = 1.8,  γ = 1.2, σd = 0.2, σr = 0.1, winit = 0.17) from the simulations.
Figure 6.
 
A scatter plot of the approximate likelihoods from feature engineering and linear regression against likelihoods generated from the augmented Hebbian reweighting model (AHRM).
Figure 6.
 
A scatter plot of the approximate likelihoods from feature engineering and linear regression against likelihoods generated from the augmented Hebbian reweighting model (AHRM).
Figure 7.
 
Marginal posterior distributions of the population-level hyperparameter η.
Figure 7.
 
Marginal posterior distributions of the population-level hyperparameter η.
Figure 8.
 
Marginal posterior distributions of the subject-level hyperparameter τi for subject i = 6.
Figure 8.
 
Marginal posterior distributions of the subject-level hyperparameter τi for subject i = 6.
Figure 9.
 
Marginal posterior distributions of the test-level parameter θi1 for subject i = 6.
Figure 9.
 
Marginal posterior distributions of the test-level parameter θi1 for subject i = 6.
Figure 10.
 
Observed and predicted population-level z-score learning curves in the incongruent (a) and congruent (b) conditions, and the derived d′ learning curves (c). Data points represent the average learning curves across all 13 subjects.
Figure 10.
 
Observed and predicted population-level z-score learning curves in the incongruent (a) and congruent (b) conditions, and the derived d′ learning curves (c). Data points represent the average learning curves across all 13 subjects.
Figure 11.
 
Population-level weight structure evolution over the course of training. (Top) Longitudinal weight traces for units tuned to the target frequency (2.0 c/d) and an irrelevant frequency (4.0 c/d). Each trace corresponds to a particular orientation. (Middle and bottom) Cross-sections of the weights at 2.0 c/d, 2.8 c/d, and 4.0 c/d at the end of each epoch.
Figure 11.
 
Population-level weight structure evolution over the course of training. (Top) Longitudinal weight traces for units tuned to the target frequency (2.0 c/d) and an irrelevant frequency (4.0 c/d). Each trace corresponds to a particular orientation. (Middle and bottom) Cross-sections of the weights at 2.0 c/d, 2.8 c/d, and 4.0 c/d at the end of each epoch.
Figure 12.
 
Observed and predicted test-level z-score learning curves in the incongruent (a) and congruent (b) conditions, and the derived d′ learning curves (c) for subject 6.
Figure 12.
 
Observed and predicted test-level z-score learning curves in the incongruent (a) and congruent (b) conditions, and the derived d′ learning curves (c) for subject 6.
Figure 13.
 
Test-level weight structure evolution over the course of training for subject 6. (Top) Longitudinal weight traces for units tuned to the target frequency (2.0 c/d) and an irrelevant frequency (4.0 c/d). Each trace corresponds with a particular orientation. (Middle and bottom) Cross-sections of the weights at 2.0 c/d, 2.8 c/d, and 4.0 c/d at the end of each epoch.
Figure 13.
 
Test-level weight structure evolution over the course of training for subject 6. (Top) Longitudinal weight traces for units tuned to the target frequency (2.0 c/d) and an irrelevant frequency (4.0 c/d). Each trace corresponds with a particular orientation. (Middle and bottom) Cross-sections of the weights at 2.0 c/d, 2.8 c/d, and 4.0 c/d at the end of each epoch.
Figure 14.
 
Simulated and predicted population-level z-score learning curves in the incongruent (a) and congruent (b) conditions, and the derived d′ learning curves (c). Data points represent the average simulated learning curves across all 13 simulated observers.
Figure 14.
 
Simulated and predicted population-level z-score learning curves in the incongruent (a) and congruent (b) conditions, and the derived d′ learning curves (c). Data points represent the average simulated learning curves across all 13 simulated observers.
Figure 15.
 
Simulated and predicted test-level z-score learning curves in the incongruent (a) and congruent (b) conditions, and the derived d′ learning curves (c) for simulated observer 13. Data points represent the simulated learning curves.
Figure 15.
 
Simulated and predicted test-level z-score learning curves in the incongruent (a) and congruent (b) conditions, and the derived d′ learning curves (c) for simulated observer 13. Data points represent the simulated learning curves.
Figure 16.
 
Scatter plots of the recovered augmented Hebbian reweighting model (AHRM) parameters against their true values in simulated dataset1. Each panel shows one AHRM parameter and each point represent one simulated observer. Error bars represent 94% half width confidence interval (HWCI).
Figure 16.
 
Scatter plots of the recovered augmented Hebbian reweighting model (AHRM) parameters against their true values in simulated dataset1. Each panel shows one AHRM parameter and each point represent one simulated observer. Error bars represent 94% half width confidence interval (HWCI).
Figure 17.
 
Simulated and predicted z-score and d′ learning curves for subject 13 with no data (a), 300 trials of data (b), 2,700 trials of data (c), and 9,600 trials of data (d). Data points represent the simulated learning curves for the subject in simulated dataset 1.
Figure 17.
 
Simulated and predicted z-score and d′ learning curves for subject 13 with no data (a), 300 trials of data (b), 2,700 trials of data (c), and 9,600 trials of data (d). Data points represent the simulated learning curves for the subject in simulated dataset 1.
Table 1.
 
AHRM parameters and their corresponding symbols in the BIP and HB-AHRM. AHRM, augmented Hebbian reweighting model; HB-AHRM, Bayesian version of the augmented Hebbian reweighting model.
Table 1.
 
AHRM parameters and their corresponding symbols in the BIP and HB-AHRM. AHRM, augmented Hebbian reweighting model; HB-AHRM, Bayesian version of the augmented Hebbian reweighting model.
Table 2.
 
Lower and upper bounds of the priors.
Table 2.
 
Lower and upper bounds of the priors.
Table 3.
 
Mesh grid used in the simulation.
Table 3.
 
Mesh grid used in the simulation.
Table 4.
 
Mean and 94% HWCI of the marginal distributions of η. HWCI, half-width credible interval.
Table 4.
 
Mean and 94% HWCI of the marginal distributions of η. HWCI, half-width credible interval.
Table 5.
 
Expected correlation matrix at the population level.
Table 5.
 
Expected correlation matrix at the population level.
Table 6.
 
Mean and 94% HWCI of the marginal posterior distributions of τ6. HWCI, half-width credible interval.
Table 6.
 
Mean and 94% HWCI of the marginal posterior distributions of τ6. HWCI, half-width credible interval.
Table 7.
 
Expected correlation matrix at the subject level.
Table 7.
 
Expected correlation matrix at the subject level.
Table 8.
 
Mean and 94% HWCI of the marginal posterior distributions of θ61. HWCI, half-width credible interval.
Table 8.
 
Mean and 94% HWCI of the marginal posterior distributions of θ61. HWCI, half-width credible interval.
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×