Inferred mechanisms of learning, such as those involved in improvements resulting from perceptual training, are reliant on (and reflect) the functional forms that models of learning take. However, previous investigations of the functional forms of perceptual learning have been limited in ways that are incompatible with the known mechanisms of learning. For instance, previous work has overwhelmingly aggregated learning data across learning participants, learning trials, or both. Here we approach the study of the functional form of perceptual learning on the by-person and by-trial levels at which the mechanisms of learning are expected to act. Each participant completed one of two visual perceptual learning tasks over the course of two days, with the first 75% of task performance using a single reference stimulus (i.e., “training”) and the last 25% using an orthogonal reference stimulus (to test generalization). Five learning functions, coming from either the exponential or the power family, were fit to each participant's data. The exponential family was uniformly supported by Bayesian Information Criteria (BIC) model comparisons. The simplest exponential function was the best fit to learning on a texture oddball detection task, while a Weibull (augmented exponential) function tended to be the best fit to learning on a dot-motion discrimination task. The support for the exponential family corroborated previous by-person investigations of the functional form of learning, while the novel evidence supporting the Weibull learning model has implications for both the analysis and the mechanistic bases of the learning.

*n*= 132,

*M*

_{age}= 19.0,

*SD*

_{age}= 0.7, 70 female, 72% White, 13% Asian, 15% other, multiple, or no response) from the University of Wisconsin–Madison Introduction to Psychology participant pool. All participants read and signed consent forms and were compensated with course credit. All procedures comply with the Declaration of Helsinki and were approved by the University of Wisconsin–Madison Institutional Review Board.

**TEfits**package (Cochrane, 2020 model code reported in the Supplementary Material). As in the original studies cited above, outcomes in texture detection were defined as thresholds while outcomes in dot-motion were defined as d-prime. Learning functions are described in detail in the

**TEfits**documentation. Note that in Ahissar and Hochstein (2000), texture detection thresholds were fit using the “Quick function,” an alternative parameterization of the Weibull function (Strasburger, 2001), whereas

**TEfits**uses a numerically different but functionally equivalent parameterization of the same psychometric function (see

**TEfits**documentation).

**TEfits**threshold values are parameterized as the stimulus strength (i.e., Stimulus Onset Asynchrony (SOA)) necessary to achieve 75% accuracy. For the dot-motion direction discrimination data, we first used the

**TEfits**function

**tef_acc2dprime**to calculate a by-trial d-prime using a Gaussian kernel–weighted percent hits and false alarms (kernel half-width half-max of two trials). Vectors of d-primes were estimated for each participant's training and generalization separately, and then the d-primes from this vector were fit using a least squares (i.e., maximum likelihood) loss function. We note that, while we did induce some smoothing and loss of temporal precision by using this Gaussian kernel, in practice, a two-trial half-width half-max induces very little smoothing and is much smaller than any typical blockwise analysis of learning. Texture oddball detection data were fit by maximum likelihood as well (i.e., minimizing the error of model predictions given the Bernoulli likelihood function). Maximum likelihood estimation utilized 2,000 randomly initialized parameter combinations, each followed by a Broyden–Fletcher–Goldfarb–Shanno (BFGS) optimization run, for each participant for each model, to increase the chances that the best-fitting model was indeed the global likelihood maximum rather than a local maximum.

**TEfits**, using maximum likelihood estimation, all parameters are constrained to be within reasonable values (i.e., the likelihood function overwhelmingly penalizes unreasonable values). Most important, the start and asymptote parameters must be within a liberal range of possibility; in this case, thresholds were limited to be between 0 and .98 in the texture task (i.e., between zero and two times the maximum SOA), and d-prime was constrained between 0 and 5 in the dot-motion task (corresponding to percent correct ranging from chance, 50%, to approximately 99.38%, or perfect accuracy with a .62% lapse rate; see Supplementary Material for model code). Restrictions on other parameters (e.g., rate) should be less influential on the outcome of the fitting, with the primary precluded outcome being that “learning” could not happen in an extremely small number of trials (e.g., half of learning in two trials). This restriction on rate parameters assists in the estimation of nonlinear functions by reducing the ability for combinations of parameters to imitate one another (i.e., if performance changes very little, this should not be fit as reaching asymptote in two trials, with a potentially large difference between start and asymptote; instead, it should take the form of starting performance being nearly identical to asymptotic performance).

*except*for initial ability and rate of learning. Start and rate parameters were estimated as varying between training and generalization, thereby allowing for tests of generalization. Note that, in the Results, we also report comparisons of fits to only the initial training. Representative learning curves for each function can be observed in the Results. After obtaining a Bayesian Information Criterion (BIC) value for each model for each participant (optimized using

**TEfits**), model comparisons were conducted by first normalizing model BIC within participants to extract Schwarz weights (Wagenmakers & Farrell, 2004). Schwarz weights must sum to 1 within participants, with the highest weight indicating the best-fitting model.

**tef_rlm_boot**from the

**TEfits**package). These models fit robust linear models, using the

**R**package

**MASS**(Venables & Ripley, 2002) to each of 2,000 data sets resampled with replacement to inform the 95% CI. Furthermore, the models fit robust linear models to each of 2,000 subsets of data (random 80% without replacement) and used the fit model to predict the held-out 20% of data. The median proportional reduction in least squares error in the held-out data was reported as Δ

*R*

^{2}

_{oos}.

*b*= −0.023, CI [−0.13, 0.065], Δ

*R*

^{2}

_{oos}= −0.0157).

*b*= −0.06, CI [−0.43,0.23], Δ

*R*

^{2}

_{oos}= −0.017). Of the participants whose learning was best characterized by the Weibull function, 93.1% had shape parameters over 0, indicating the benefits of the extra parameter were derived from fitting learning with a slow start (i.e., sigmoid shape and increasing hazard rate, as shown in Figure 4). Nearly all of participants who were not best fit by the Weibull function were instead best fit by the three-parameter exponential function (25.6% of all participants), which is equivalent to a Weibull with a shape parameter of 0.

_{upper}= −0.02), while the difficult group did not demonstrate reliably lower thresholds at the start of generalization (mean = −0.02, CI

_{upper}= 0.1; see Figure 7). No reliable pattern of generalization due to a decrease in time taken to learn was evident in either difficulty condition (rate parameter difference:

*easy*mean = −0.48, CI

_{upper}= 0.8;

*hard*mean = −0.4, CI

_{upper}= 0.9). Thus, the only evidence for any generalization was in the form of immediate transfer.

_{lower}= 0.09; see Figure 7). In contrast, starting d-prime was no higher in the generalization than in initial learning for participants in the difficult condition (mean = −0.11, CI

_{lower}= −0.64). The easier condition but not the difficult condition demonstrated learning to learn, via decreases in the time taken to learn, in generalization (rate parameter difference:

*easy*mean = −0.75, CI

_{upper}= −0.01;

*hard*mean = 0.13, CI

_{upper}= 1.33). However, due to concerns that the Weibull rate parameters may be biased by certain patterns in the Weibull shape parameters (shared across initial learning and generalization) as well as due to the minority support for the three-parameter exponential model in the motion discrimination task, we tested possible dot-motion generalization in the rate parameter of the three-parameter exponential. In these analyses, we observed null effects when comparing generalization and learning rates in the three-parameter exponential model (rate parameter difference:

*easy*mean = 0.00, CI

_{upper}= 0.91;

*hard*mean = 0.56, CI

_{upper}= 1.75). As a whole, then, there was greater support for an immediate-transfer mechanism of generalization than one involving learning to learn.

*b*= 0.11, CI [0.003, 0.2], Δ

*R*

^{2}

_{oos}= 0.0988). However, this effect was lost when the generalization benefit was calculated as a proportion of total learning (

*b*= −0.12, CI [−0.9,0.63], Δ

*R*

^{2}

_{oos}= −0.0122). Likewise, when examining dot-motion orientation discrimination parameters directly, the easy condition generalized more than the difficult condition (

*b*= −1.1, CI [−2, −0.24], Δ

*R*

^{2}

_{oos}= 0.2106). When calculating dot-motion generalization benefit as a proportion of learning, there was no difference between difficulties (

*b*= 0.03, CI [−0.77,0.69], Δ

*R*

^{2}

_{oos}= −0.0116).

*p*(

*hit*) =

*p*(

*correct rejection*)), and by-trial accuracies were simulated from a Bernoulli distribution with these probabilities and randomly distributed “same-stimuli” and “different-stimuli” trials.

*overall amount*of learning in the easy condition was greater than the difficult conditions means that the definition of generalization itself is not clear a priori. Although the benefits of the easy condition relative to the difficult condition were clear when considering only starting parameters (keeping in mind that the two curves shared an asymptote), there was no such effect when considering the overall amount learned in the different difficulty conditions.

*Proceedings of the National Academy of Sciences,*90(12), 5718–5722, https://doi.org/10.1073/pnas.90.12.5718. [CrossRef]

*Nature,*387(6631), 401–406, https://doi.org/10.1038/387401a0. [CrossRef]

*Vision Research,*40(10–12), 1349–1364, https://doi.org/10.1016/S0042-6989(00)00002-X.

*Trends in Cognitive Sciences,*8(10), 457–464, https://doi.org/10.1016/j.tics.2004.08.011. [CrossRef]

*Philosophical Transactions of the Royal Society B: Biological Sciences,*364(1515), 285–299, https://doi.org/10.1098/rstb.2008.0253. [CrossRef]

*Journal of Experimental Psychology: Learning, Memory, and Cognition,*25(5), 1120–1136.

*Vision Research,*27(6), 953–965, https://doi.org/10.1016/0042-6989(87)90011-3.

*Journal of Neuroscience,*30(45), 14964–14971, https://doi.org/10.1523/JNEUROSCI.4812-10.2010.

*Spatial Vision,*10, 433–436.

*Learning & Memory,*2(5), 225–242, https://doi.org/10.1101/lm.2.5.225.

*Journal of Open Source Software,*5(52), 2535, https://doi.org/10.21105/joss.02535.

*Attention, Perception, & Psychophysics,*81(3), 621–636, https://doi.org/10.3758/s13414-018-01636-w.

*2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)*, pp. 5349–5357, doi:10.1109/CPVR.2017.568.

*Ergonomics,*2(2), 153–166, https://doi.org/10.1080/00140135908930419.

*Psychonomic Bulletin & Review,*22(5), 1308–1319, https://doi.org/10.3758/s13423-015-0811-x.

*Psychological Science,*18(6), 531–539, https://doi.org/10.1111/j.1467-9280.2007.01934.x.

*Current Opinion in Neurobiology,*15(2), 154–160, https://doi.org/10.1016/j.conb.2005.03.010.

*Nature,*287(5777), 43–44, https://doi.org/10.1038/287043a0.

*Proceedings of the National Academy of Sciences,*101(36), 13124–13131, https://doi.org/10.1073/pnas.0404965101.

*Psychonomic Bulletin & Review,*7(2), 185–207.

*Journal of Neuroscience,*34(25), 8423–8431, https://doi.org/10.1523/JNEUROSCI.0745-14.2014.

*Journal of Vision,*9(3), 1, https://doi.org/10.1167/9.3.1.

*Current Biology,*27(6), 840–846, https://doi.org/10.1016/j.cub.2017.01.046.

*Journal of Vision,*17(11), 3, https://doi.org/10.1167/17.11.3.

*Perception 36 ECVP Abstract Supplement,*36, 1–16.

*Vision Research,*33(16), 2287–2300, https://doi.org/10.1016/0042-6989(93)90106-7.

*Nature Neuroscience,*12(5), 655–663, https://doi.org/10.1038/nn.2304.

*Journal of Mathematical Psychology,*54(3), 338–340, https://doi.org/10.1016/j.jmp.2010.01.006.

*Scientific Reports,*7(1), 7421, https://doi.org/10.1038/s41598-017-06989-0.

*Proceedings of the National Academy of Sciences of the United States of America,*96(24), 14085–14087, https://doi.org/10.1073/pnas.96.24.14085.

*Nature Neuroscience,*5(7), 677–681, https://doi.org/10.1038/nn864.

*Cognitive skills and their acquisition*(pp. 1–51). Lawrence Erlbaum, Hillsdale, N.J.

*Psychological Review,*108(1), 57–82.

*Human Movement Science,*28(6), 655–687, https://doi.org/10.1016/j.humov.2009.07.001.

*Science (New York, N.Y.),*256(5059), 1018–1021.

*NeuroImage,*171, 135–147, https://doi.org/10.1016/j.neuroimage.2017.12.093.

*Neuron,*109(4), 597–610.e6, https://doi.org/10.1016/j.neuron.2020.12.004.

*Annals of the New York Academy of Sciences,*1316(1), 18–28, https://doi.org/10.1111/nyas.12419.

*Journal of Applied Psychology,*10(1), 1–36, https://doi.org/10.1037/h0075814.

*Perception & Psychophysics,*63(8), 1348–1355, https://doi.org/10.3758/BF03194547.

*Journal of Motor Behavior,*39(6), 503–515, https://doi.org/10.3200/JMBR.39.6.503-516.

*Journal of Experimental Psychology: Learning, Memory, and Cognition,*42(5), 749–767, https://doi.org/10.1037/xlm0000204.

*Modern applied statistics with S*. New York, NY: Springer, https://doi.org/10.1007/978-0-387-21706-2.

*Psychonomic Bulletin & Review,*11(1), 192–196, https://doi.org/10.3758/BF03206482.

*Journal of Vision,*13(7), 5, https://doi.org/10.1167/13.7.5.

*Journal of Vision,*19(5), 9, https://doi.org/10.1167/19.5.9.

*Journal of Vision,*19(7), 14, https://doi.org/10.1167/19.7.14.