**Intuition suggests that increased viewing time should allow for the accumulation of more visual information, but scant support for this idea has been found in studies of voluntary averaging, where observers are asked to make decisions based on perceived average size. In this paper we examine the dynamics of information accrual in an orientation-averaging task. With orientation (unlike intensive dimensions such as size), it is relatively safe to use an item's physical value as an approximation for its average perceived value. We displayed arrays containing eight iso-eccentric Gabor patterns, and asked six trained psychophysical observers to compare their average orientation with that of probe stimuli that were visible before, during, or only after the presentation of the Gabor array. From the relationship between orientation variance and human performance, we obtained estimates of effective set size, i.e., the number of items that an ideal observer would need to assess in order to estimate average orientation as well as our human observers did. We found that display duration had only a modest influence on effective set size. It rose from an average of ∼2 for 0.1-s displays to an average of ∼3 for 3.3-s displays. These results suggest that the visual computation is neither purely serial nor purely parallel. Computations of this nature can be made with a hybrid process that takes a series of subsamples of a few elements at a time.**

**.**However, when the external noise is much greater than the internal noise, the internal noise has a negligible effect, and efficiency can be computed from the ratio of human performance to that of an ideal observer.

^{1}Efficiency can be denoted

*M*/

*N*, where

*N*represents the number of items on display and

*M*represents the

*effective set size*, i.e., the number of these items an otherwise-ideal observer would need to examine in order to perform as well as a human observer in high levels of external noise. The relationship between external noise, internal noise, effective set size, and performance is described by the noisy, inefficient observer model, a mathematical expression for which is provided at the beginning of the Results section.

^{2}

*M*=

*N*, where 0.5 ≤

^{p}*p*≤ 0.6. Later studies using small set sizes (

*N*≤ 8) seem to have produced results consistent with Dakin's (

*M*≈ 2–3), but the one later study that used

*N*= 100 did not. Thus, Dakin's original study remains unique in its finding of large effective set sizes (

*M*> 4) for the voluntary averaging of orientation.

**Table 1**

^{3}to those of Tibber et al. (2015), in which each observer completed a mere 150 orientation-averaging trials. We must stress, however, that practice does not seem to be a sufficient criterion for high efficiency, because Solomon (2010) describes one professional psychophysicist who competed 2,000 trials, yet achieved an effective set size no greater than 1.

**Figure 1**

**Figure 1**

*E*= 1.7 degrees (making the center-to-center separation of Gabors 0.77

*E*) and there were 48 pixels per degree. Each item in the array was a Gabor pattern. It was the product of a sinusoidal luminance grating and a Gaussian blob. The grating had a spatial frequency of 3 cycles per degree and random spatial phase. The blob had a space constant (i.e., standard deviation

*σ*) of 0.25 degree of visual angle. Both grating and blob had maximum contrast. Spatial orientations were selected at random from a Wrapped Normal distribution with standard deviation,

*σ*

_{G}, either zero or 16°. The mean of this distribution (henceforth referred to as the “expected orientation”) was selected at random from a Uniform distribution over all orientations.

*μ*|) between the probe and the expected orientation of the eight-item array was controlled by a QUEST staircase (Watson & Pelli, 1983) that was unique to each particular combination of display duration (0.1 s, 1.7 s, or 3.3 s), memory condition (probe before, at the same time as, or after the array), and level of external noise (

*σ*

_{G}= 0 or

*σ*

_{G}= 16°).

^{4}The probe was clockwise or anticlockwise with equal probability (the observer's task was to decide which), and never greater than 37° from the array's expected mean. The staircases converged to 81%-correct thresholds. Different levels of external noise were interleaved within each block of trials, but display duration and memory condition were fixed, so as not to unduly handicap observers with uncertainty regarding stimulus dynamics. Each observer completed a minimum of either two blocks of 132 trials or three blocks of 88 trials in each of the nine conditions. (JAS and KAM completed a few more. QUEST was re-initialized at the beginning of each block.) Consequently, Figure 2 summarizes more than 14,256 trials.

**Figure 2**

**Figure 2**

^{5}Specifically, we found the values of

*σ*

_{E}and

*M*that maximized the joint likelihood of responses when the probability with which an observer responds “anticlockwise” is given by the formula: where Φ denotes the standard normal cumulative distribution function. We assumed that lapse rates with anticlockwise and clockwise probes would be similar, and adopted equal values of

*γ*and

*δ*for each observer, derived from the “easy” trials described in Footnote 4. Specifically, these values were 0.01 for observers JAS, JH, and TMP; and 0.04 for observers KAM, AJ, and CDC.

*M*represents the

*effective set size*, i.e., the number of items an ideal observer would need to measure in order to estimate average orientations as well as our human observers. Consequently

*M*≤

*N*, where

*N*denotes the number of items in each set. The only further constraint placed on the model was that

*M*≥ 1. When range effects, finger errors, invisible (or unattended) stimuli, and perverse response strategies are eliminated, discrimination must be based on at least one item.

*F*ratio of

*F*(1, 107) = 66.6,

*p*< 10

^{−25}. The main effect of block number was nonsignificant,

*F*(1, 3) = 0.4,

*p*> 0.75. In other words, we found no effect of practice in our experiment. Maximum-likelihood fits of the noisy, inefficient observer model (Equation 1) appear in Figure 2. Mean values across observers (thick lines) indicate that equivalent noise decreases and effective set size increases as the array duration increases.

*σ*

_{E}for each combination of six observers, three display durations, and three memory conditions, and there was a further free parameter that set the same effective set size

*M*for each observer and condition. Against this model, we compared the fit of a 60-parameter model, which again had 54

*σ*

_{E}parameters but now had six

*M*parameters, one for each observer. Because they are nested, the model with more parameters will always fit at least as well as the more-restricted model. To determine whether the fit is significantly better, we calculate the statistic,

*D*, given by where

*L*

_{2}is the likelihood of the best-fitting model with more parameters, and

*L*

_{1}is the likelihood of the more restricted model. If the model with more parameters is no better (i.e., the null hypothesis is true), then

*D*is distributed approximately as the

*χ*

^{2}distribution with degrees of freedom given by the difference between the numbers of parameters in the two models (Mood, Graybill, & Boes, 1974, pp. 440–441). Therefore, a

*χ*

^{2}test indicates whether the less-restricted model is significantly better. For the comparison between the two models described previously, we have ,

*p*= 0.0001, indicating that the six observers were not all equally efficient. Graphically, this can be appreciated by the scatter of thin lines in Figure 2b.

*M*parameters, one for each combination of observer and display duration (giving 72 parameters in total). This model fit the data significantly better ( ,

*p*= 0.0006), indicating that efficiency did in fact increase with display duration (black line in Figure 2b). The second of the less-restricted models had 18

*M*parameters, one for each combination of observer and memory condition (again giving 72 parameters in total). This model did not fit significantly better than the 60-parameter baseline ( ,

*p*= 0.2), indicating that memory condition did not significantly affect efficiency.

*σ*

_{E}parameters and 18

*M*parameters, one for each combination of observer and display condition), and the less-restricted model had 108 parameters (54

*σ*

_{E}parameters and 54

*M*parameters, i.e., each combination of observer, display duration and memory condition had its own parameter for noise and efficiency). The less-restricted model did not fit significantly better ( ,

*p*= 0.999). Fits of the 108-parameter model are illustrated in Supplementary Figure S1. Constraints on the effective set-sizes are illustrated in Supplementary Figure S2.

*consistent*with effective set sizes greater than 1 at the shortest duration, but there were no poststimulus masks in this experiment. It is conceivable, therefore, that 0.125 s plus the duration of iconic memory (Sperling, 1960) provided enough time for a serial mechanism to utilize two items.

*t*= 0, a subsample of maximum size

*m*is selected and its average value, the baseline

*μ̂*

_{1}, is computed efficiently but noisily. We assume that the value of each item in the subsample is independently perturbed by the same stochastic process. This process manifests as early noise in fits of the noisy, inefficient observer model. Size

*m*is said to be a maximum because there may be fewer items available in the display. Some

*τ*seconds later, another subsample is selected, and its mean

*μ*is computed with the same precision. A new baseline is then formed from the sum

_{2}*μ̂*

_{2}= (1 –

*p*

_{2})

*μ̂*

_{1}+

*p*

_{2}

*μ*

_{2}. The baseline continues to be updated every

*τ*seconds with newly selected subsamples until, at time

*t*=

*T*, the information disappears. For notational convenience we define

*S*to be the total number of selected subsamples, i.e.,

*S*= |

*T*/

*τ*|. The final value of the baseline

*μ̂*= (1 –

_{s}*p*)

_{s}*μ̂*

_{s–}_{1}+

*p*

_{s}μ_{s}is then perturbed by another stochastic process, which manifests as late noise in fits of the NIO.

*m*= 1, this Markovian subsampler will compute average orientation in a purely serial manner. Purely parallel computations require that

*m*> 1 and all subsamples must be identical. Regardless of the manner in which subsamples are selected, the baseline will be continually updated. Consequently a decrease in the total equivalent noise is consistent with serial, parallel, and hybrid versions of the Markovian subsampler.

*Psychological Review*, 61, 183–193.

*Spatial Vision*, 10, 433–436.

*Journal of the Optical Society of America A – Optics, Image Science, and Vision*, 18 (5), 1016–1026.

*Vision Research*, 45, 3027–3049.

*Nature Neuroscience*, 14, 239–247.

*Introduction to the theory of statistics*. New York: McGraw-Hill.

*Proceedings of the Royal Society of London: Series B, Biological Sciences*, 279, 2754–2760.

*Journal of the Optical Society of America*, 54, 950–955.

*Journal of Experimental Psychology: Human Perception and Performance*, 19 (1), 108–130.

*Nature Neuroscience*, 4, 739–744.

*Nature Reviews Neuroscience*, 6 (2), 97–107.

*Vision: Coding and efficiency*( pp. 3–24). Cambridge, UK: Cambridge University Press.

*Spatial Vision*, 10, 427–442.

*Vision Research*, 46, 4646–4674.

*Journal of the Optical Society of America A, Optics, Image Science, and Vision*, 16, 647–653.

*Psychological Review*, 119, 807–830.

*Psychological Monographs*, 72, (whole no. 11).

*PLoS ONE*, 10 (2), e0117951, doi:10.1371/journal.pone.0117951.

*Perception and Psychophysics*, 33, 113–120.

*Spatial Vision*, 10, 447–466.

*Nature Reviews Neuroscience*, 5, 1–7.

^{1}In the literature on luminance and contrast processing, this proportion has been called “calculation efficiency,” “central efficiency,” “sampling efficiency,” and “high-noise efficiency” (Pelli & Farrell, 1999). More recent literature (e.g., Pelli, Burns, Farell, & Moore-Page, 2006) favors the unmodified term “efficiency,” as does virtually all of the literature on orientation averaging, as summarized in Table 1.

^{2}Palmer, Ames, and Lindsey (1993) advocated a wide separation between search items and spatial precues for manipulating the “relevant” set size for visual search. We agree that inferences regarding efficiency are indeed less tenuous when these methods are applied. At the same time, it seems safe to assume that uncued items are unprocessed only when the separation is sufficient to eliminate crowding. This precaution is rarely taken in studies of visual search.