**We recently demonstrated that observers are capable of encoding not only summary statistics, such as mean and variance of stimulus ensembles, but also the shape of the ensembles. Here, for the first time, we show the learning dynamics of this process, investigate the possible priors for the distribution shape, and demonstrate that observers are able to learn more complex distributions, such as bimodal ones. We used speeding and slowing of response times between trials (intertrial priming) in visual search for an oddly oriented line to assess internal models of distractor distributions. Experiment 1 demonstrates that two repetitions are sufficient for enabling learning of the shape of uniform distractor distributions. In Experiment 2, we compared Gaussian and uniform distractor distributions, finding that following only two repetitions Gaussian distributions are represented differently than uniform ones. Experiment 3 further showed that when distractor distributions are bimodal (with a 30° distance between two uniform intervals), observers initially treat them as uniform, and only with further repetitions do they begin to treat the distributions as bimodal. In sum, observers do not have strong initial priors for distribution shapes and quickly learn simple ones but have the ability to adjust their representations to more complex feature distributions as information accumulates with further repetitions of the same distractor distribution.**

*summary statistics*, such as the mean or variance of color, motion direction, size, orientation, etc. (Alvarez, 2011; Alvarez & Oliva, 2008; Ariely, 2001; Chong & Treisman, 2003; Haberman, Brady, & Alvarez, 2015; Haberman & Whitney, 2012; Rosenholtz, 2001; Utochkin, 2015). Even though stimulus sets used in these experiments usually consist of discrete objects, observers' ability to extract their aggregate properties suggests that, at some level, these objects are integrated into a unified representation. Moreover, the stability of summary statistics is no less important for efficient processing than the stability of individual features. For example, changes in mean or variance of stimulus features over a sequence of trials can increase response times (RTs) on visual tasks (Corbett & Melcher, 2014; Michael, de Gardelle, & Summerfield, 2014).

^{1}corresponding to the shape of the distractor distribution.

^{2}More specifically, the critical measure was RT as a function of the distance in feature space between the current target and the mean of the previous distractor distribution (CT − PD). If there were strong Gaussian priors for the distribution shape, we would expect that with short streaks the representation of the distribution would still reflect these priors—that is, RT as a function of CT − PD would change monotonically with no breaking points. In contrast, if the priors were nonexistent or weak compared to the empirical evidence gained by observers, even short streaks would result in a two-segment function with a flat first segment and a second segment with a negative slope.

*M*= 24.80) at St. Petersburg State University took part in two experimental sessions taking approximately 20–30 min each. This and the following experiments were run in accordance with the Declaration of Helsinki and were approved by the ethics committee of St. Petersburg State University.

*SD*= 10, outliers with values outside of the ±20° range were removed; although it is technically a “truncated Gaussian,” we refer to it as Gaussian for the sake of brevity) with a streak length of one or two trials (Figure 3). Searching for targets among distractors from such a Gaussian distribution is of intermediate difficulty for observers and was found to be a good candidate for testing the effects of priming in previous studies (Chetverikov et al., 2016). The probability of each streak length within each streak type (prime or test) was the same. Each observer participated in two sessions of 180 prime streaks lasting approximately 20–30 min each (with a total of 2,703 trials per observer on average). For each test streak, the mean of the prime distractor distributions differed from the mean of the test distractor distribution by −80° to +80° in 20° steps. The target in each test and prime streak was set randomly so that target-to-distractor distance ranged from 60° to 120° and was constant within the streak (previous studies have shown that the results are similar regardless of whether target orientation in each trial is randomized; Chetverikov et al., 2016).

^{3}In other words, the resulting RT ∝ CT − PD function should also have two components. The presence of two components was tested with segmented regression, also known as piece-wise or broken-line regression, using the

*segmented*package in

*R*(Muggeo, 2003, 2008). To test whether the two-segment model fits better than a one-segment model, we used the Davies test (provided in the same package) that tests for nonzero difference-in-slope parameters of the segments.

*t*tests in case of simple linear regression and Wald's

*Z*for binomial regression. The

*lme4*package in

*R*(Bates, Maechler, Bolker, & Walker, 2014) does not report

*p*values for

*t*from mixed effects regression because of indeterminacy surrounding the choice of appropriate degrees of freedom. However, values of

*t*and

*Z*above two can be treated similarly to

*p <*0.005.

*SD*= 10 (Table 1), both in RT,

*t*(9.0) = 3.96,

*p*= 0.003, and accuracy,

*t*(9.0) = 5.27,

*p*< 0.001.

*B*= 0.12 (0.01),

*t*= 17.07, and less accurate,

*B*= −0.34 (0.06),

*Z*= 5.88, than on later trials. Search on the second trial was neither slower,

*B*= −0.01 (0.01),

*t*= −1.78, nor more accurate,

*B*= −0.03 (0.07),

*Z*= 0.41, than on later trials.

^{4}

*B*= 0.67, 95%

*CI*= [−1.36, 2.71] (values represent slope and CI for untransformed search times; the actual analysis was done using log-transformed RT data and yielded the same results), and for the second part, the slope was significantly negative,

*B*= −1.86, 95%

*CI*= [−2.37, −1.36]. The Davies test confirmed that the difference in slopes for the two parts was indeed significant,

*p*= 0.001.

*B*= 0.01 (0.02),

*t*= 0.70, and did not interact with previous streak length,

*B*= −0.001 (0.003),

*t*= 0.48. Outside the previous distractor distribution range, it was negative,

*B*= −0.01 (0.01),

*t*= −2.29, but also did not interact with previous streak length,

*B*= −0.001 (0.001),

*t*= −0.68.

*M*= 25.10) took part in three experimental sessions taking approximately 20–30 min each.

*SD*= 15 for a Gaussian with outliers outside of ±2 ×

*SD*removed) whereas test distributions had a 40° range (

*SD*= 10 for a Gaussian with similarly removed outliers). Test distribution type was counterbalanced with prime distribution and prime streak length. Each observer participated in three sessions of 208 prime and test streaks, each lasting approximately 20–30 min.

*F*(1, 9) = 57.98,

*p*< 0.001, η

^{2}

_{G}= 0.11, qualified by an interaction with distribution range,

*F*(1, 9) = 18.03,

*p*= 0.002, η

^{2}

_{G}= 0.01. For accuracy, the effects of distribution type,

*F*(1, 9) = 74.33,

*p*< 0.001, η

^{2}

_{G}= 0.17, and range,

*F*(1, 9) = 18.45,

*p*= 0.002, η

^{2}

_{G}= 0.18, did not interact. As Table 2 shows, the interaction stems from the fact that RTs were similar for Gaussian distributions with a different range, and for the uniform distributions with a different range, the RTs differed. A separate analysis of the Gaussian and uniform distributions with larger range showed differences both in RT,

*t*(9.0) = 7.06,

*p*< 0.001, and accuracy,

*t*(9.0) = 5.17,

*p*< 0.001. Similarly, with smaller range, the uniform distribution resulted in slower responses,

*t*(9.0) = 6.54,

*p*< 0.001, and lower accuracy,

*t*(9.0) = 6.54,

*p*< 0.001, than the Gaussian.

*B*= 0.14 (0.01),

*t*= 19.84 for uniform and

*B*= 0.14 (0.01),

*t*= 25.38 for Gaussian distribution, and were less accurate,

*B*= −0.39 (0.07),

*Z*= −5.82 and

*B*= −0.43 (0.08),

*Z*= −5.48, respectively, on the first trial than on later trials as well as on the second trial for the Gaussian distribution,

*B*= 0.02 (0.01),

*t*= 3.69, and marginally slower for the uniform distribution,

*B*= 0.01 (0.01),

*t*= 1.88, but not more accurately,

*B*= 0.12 (0.09),

*Z*= 1.35 and

*B*= −0.07 (0.10),

*Z*= −0.65, respectively, than on later trials (Figure 6).

*p*= 0.040). The slope of the first part did not significantly differ from zero,

*B*= 0.40, 95%

*CI*= [−2.64, 3.44], and for the second part, the slope was significantly negative,

*B*= −1.43, 95%

*CI*= [−1.73, −1.13]. In contrast, following the Gaussian distribution, no significant breaking point was found (Davies' test

*p*= 0.732). The results were similar for different test distribution shapes (Supplementary Figure S1).

*B*= 0.02 (0.01),

*t*= 2.30. However, streak length did not affect the slopes for the uniform,

*B*= 0.003 (0.014),

*t*= 0.20, or the Gaussian distributions,

*B*= 0.003 (0.013),

*t*= 0.23. Interestingly, the slopes outside the previous distractor distribution range were more shallow for longer streaks than for shorter streaks following the uniform,

*B*= 0.01 (0.01),

*t*= 2.21, but not the Gaussian distribution,

*B*= 0.01 (0.01),

*t*= 1.81.

^{5}(three female, age

*M*= 24.91) took part in two experimental sessions taking approximately 20–30 min each.

*t*(10.0) = 6.83,

*p*< 0.001, and accuracy,

*t*(10.0) = 6.39,

*p*< 0.001 (Table 3). There was a significant quadratic effect of target-to-distractor distance for both Gaussian,

*B*= 0.76 (0.25),

*t*= 3.02, and bimodal,

*B*= 5.27 (0.35),

*t*= 15.19, distributions. The repetition effect was most noticeable after the first trial (Figure 8). LMER with Helmert contrasts indicated that search on the first trial was slower,

*B*= 0.15 (0.01),

*t*= 22.52, and less accurate,

*B*= −0.42 (0.05),

*Z*= −7.97, than on later trials.

*p*= 0.868) in slope, we repeated the analysis assuming only one breaking point. The analysis revealed a breaking point at 14.0° (

*p*< 0.001). There was a tendency-level positive slope for the first part,

*B*= 3.33, 95%

*CI*= [−0.05, 6.71], and for the second part, the slope was significantly negative,

*B*= −1.38, 95%

*CI*= [−1.66, −1.10]. We then ran a mixed effects regression that included the effect of the CT − PD split into three segments (0° to 15°, 15° to 30°, and 30° to 90°) controlling for distance between previous target and the current distractor and the accuracy of the preceding response. As in Experiment 1, targets far from the mean of the preceding distribution (>30°) were processed significantly faster,

*B*= −0.03 (0.01),

*t*= −2.66. Crucially, however, RTs in the 15° to 30° segment were slower than in the 0° to 15° segment,

*B*= 0.02 (0.01),

*t*= 2.15.

*B*= −0.02 (0.03),

*t*= −0.85. However, a significant interaction between the effect of the segment and previous streak length indicated that for the longest streaks, RTs for the 15°–30° segment became slower than for the 0°–15° segment,

*B*= 0.07 (0.03),

*t*= 2.03 (see Figure 10, blue line). Pair-wise comparisons confirmed that the difference between the 0°–15° and 15°–30° segments was significant for the longest streaks,

*B*= 0.04 (0.02),

*p*= 0.033, but nonsignificant for the midlength,

*B*= 0.03 (0.02),

*p*= 0.239, and the shortest streaks,

*B*= −0.02 (0.03),

*p*= 0.601. The difference between the 30°–90° segment and the 15°–30° segment was already significant for the shortest streaks,

*B*= −0.05 (0.02),

*t*= −2.17, and did not interact with streak length,

*B*= 0.03 (0.03),

*t*= 1.17.

*between*the peaks of a previous bimodal distractor distribution (0°–15° in Figure 10) than when it falls outside its range (30°–90° in Figure 10). It is as if this part of the feature space between peaks is also occupied by distractors.

*Trends in Cognitive Sciences*, 15 (3), 122– 131, doi:10.1016/j.tics.2011.01.003.

*Psychological Science*, 19 (4), 392– 398, doi:10.1111/j.1467-9280.2008.02098.x.

*Psychological Science*, 12 (2), 157– 162. Retrieved from http://pss.sagepub.com/content/12/2/157.short

*Vision Research*, 35 (22), 3131– 3144, doi:10.1016/0042-6989(95)00057-7.

*Journal of Vision*, 15 (4): 14, 1– 15, doi:10.1167/15.4.14. [PubMed] [Article]

*Cognition*, 153, 196– 210, doi:10.1016/j.cognition.2016.04.018.

*Vision Research*, 43 (4), 393– 404, doi:10.1016/S0042-6989(02)00596-5.

*Journal of Experimental Psychology: Human Perception and Performance*, 40 (5), 1915– 1925, doi:10.1037/a0037375.

*Vision Research*, 37 (22), 3181– 3192, doi:10.1016/S0042-6989(97)00133-8.

*Journal of Vision*, 15 (9): 10, 1– 22, doi:10.1167/15.9.10. [PubMed] [Article]

*Nature Neuroscience*, 14 (7), 926– 932, doi:10.1038/nn.2831.

*Journal of Experimental Psychology*.

*General*, 144 (2), 432– 446, doi:10.1037/xge0000053.

*From perception to consciousness: Searching with Anne Treisman*(pp. 339– 349). Oxford, England: Oxford University Press.

*Visual Cognition*, 13 (3), 324– 362, doi:10.1080/13506280544000039.

*Attention, Perception & Psychophysics*, 72 (1), 5– 18, doi:10.3758/APP.72.1.5.

*Vision Research*, 48 (10), 1217– 1232, doi:10.1016/j.visres.2008.02.007.

*Psychonomic Bulletin & Review*, 15 (2), 378– 384, doi:10.3758/PBR.15.2.378.

*Vision Research*, 48 (1), 30– 41, doi:10.1016/j.visres.2007.10.009.

*Journal of Vision*, 13 (3): 14, 1– 19, doi:10.1167/13.3.14. [PubMed] [Article]

*Memory & Cognition*, 22 (6), 657– 672. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/7808275

*Visual Cognition*, 13 (2), 202– 222, doi:10.1080/13506280500277488.

*Proceedings of the National Academy of Sciences, USA*, 111 (21), 7873– 7878, doi:10.1073/pnas.1308674111.

*Statistics in Medicine*, 22, 3055– 3071.

*R News*, 8 (1), 20– 25. Retrieved from http://cran.r-project.org/doc/Rnews/

*Journal of Neuroscience Methods*, 162 (1–2), 8– 13, doi:10.1016/j.jneumeth.2006.11.017.

*Psychological Bulletin*, 68 (1), 29– 46, doi:10.1037/h0024722.

*The Review of Black Political Economy*, 5 (2), 5– 18, doi:10.1007/BF02689368.

*Journal of Experimental Psychology: Human Perception and Performance*, 27 (4), 985– 999, doi:10.1037//0096-1523.27.4.985.

*Journal of Vision*, 15 (4): 8, 1– 14, doi:10.1167/15.4.8. [PubMed] [Article]

*Acta Psychologica*, 146, 7– 18, doi:10.1016/j.actpsy.2013.11.012.

*Journal of Experimental Psychology*:

*Human Perception and Performance*, 42 (7), 995– 1007, doi.:10.1037/xhp0000203.

*Perception & Psychophysics*, 67 (2), 239– 253, doi:10.3758/BF03206488.

*Journal of Neuroscience*, 33 (10), 4415– 4423, doi:10.1523/JNEUROSCI.4744-12.2013.

^{1}We did not test for observers' awareness of the difference between distributions, but previous studies (Atchley & Andersen, 1995; Dakin & Watt, 1997) show that observers perform poorly on tasks requiring explicit discrimination between distributions based on skewness or kurtosis. Hence, it is highly unlikely that observers have conscious access to representations of distributions in our studies.

^{2}How exactly physical feature space, perceptual domain, and RTs are mapped to each other is unclear. But we assume that the relationship between them is monotonic: The lower the probabilities in physical space are, the lower are the probabilities in perceptual space, and the RTs are subsequently longer.

^{3}We expect a monotonic decrease rather than a sharp drop of RTs and then similarly fast RTs outside of the range of the previous distractor distribution because previous results show that the sharp drop in distractors' probability corresponds to a gradual decrease of response times (Chetverikov et al., 2016). Whether this is a result of encoding the borders of the distribution or simply a noisy representation is not yet known.

^{4}The lack of the effect from third and later repetitions is unlikely to result from the different number of trials per repetition. Figure 4 shows that the confidence intervals become only slightly larger with increasing trial number in a streak. Moreover, Helmert contrasts compare each trial with the average of the following trials within the streak, and hence, the number of trials per comparison remains high. Only the latest comparison, e.g., ninth trial (

*N*= 879) versus 10th and 11th (

*N*= 835) may begin to suffer from the comparatively lower power. Hence, we do not think that our conclusions regarding the repetition effects can be explained by the differences in the number of trials within different streaks. Later comparisons related to CT − PD effects for different streak length are not affected by different numbers of trials in the streaks. Although there are more first trials in prime streaks than 11th trials, the number of streaks consisting of one trial and 11 trials is balanced in the design.