How well do classification images characterize human observers’ strategies in perceptual tasks? We show mathematically that from the classification image of a noisy linear observer, it is possible to recover the observer’s absolute efficiency. If we could similarly predict human observers’ performance from their classification images, this would suggest that the linear model that underlies use of the classification image method is adequate over the small range of stimuli typically encountered in a classification image experiment, and that a classification image captures most important aspects of human observers’ performance over this range. In a contrast discrimination task and in a shape discrimination task, we found that observers’ absolute efficiencies were generally well predicted by their classification images, although consistently slightly (∼13%) higher than predicted. We consider whether a number of plausible nonlinearities can account for the slight under prediction, and of these we find that only a form of phase uncertainty can account for the discrepancy.

*A*and

*B*, in white Gaussian noise, and to calculate the classification image as follows:

*A*, and the observer responds “

*B*”). The classification image is a weighted sum of noise fields over all trials where the observer responds “

*A*,” minus a weighted sum of noise fields over all trials where the observer responds “

*B*,” so it shows which first-order features in the noise bias the ob-server toward one or the other response (Ahumada, 1996; Ahumada & Lovell, 1971; Gold, Murray, Bennett, & Sekuler, 2000; Sekuler, Gaspar, Gold, & Bennett, 2004).

*A*and

*B*, possibly in an external noise field

*N*, by adding an internal noise

*Z*, cross-correlating the corrupted stimulus with a template

*T*, and responding “

*A*” if the resulting decision variable exceeds a criterion

*a*, and responding “

*B*” otherwise. Introducing the symbol

*I*

_{{A,B}}for the signal that is randomly chosen as

*A*or

*B*, using ⊗ to denote cross-correlation (i.e., ), and coding the observer’s responses as

*R*= ±1, we can describe the linear observer model as follows:

*absolute efficiency F*is equal to the squared ratio of the observer’s performance

*d*′ and the ideal observer’s performance

*d*′

_{I}on the same task (Tanner & Birdsall, 1958): .

*C*with the ideal observer’s template

*T*

_{I}, recovers an unbiased linear observer’s absolute efficiency in a two-alternative identification task: .

*C*is the classification image calculated as in Equation 1,

*T*

_{I}is the ideal observer’s template normalized to unit sum-of-squares energy, σ

^{2}

*C*is the pixelwise variance of the classification image,

*d*′ is the observer’s performance level,

*n*is the number of trials used to calculate the classification image,

*g*is the standard normal probability density function, and

*G*is the standard normal cumulative density function. The statistic is slightly biased if there are fewer than 500 trials in the experiment, but normally we need far more trials than this anyway to measure the classification image accurately. (See 1 for details.)

*C*⊗

*T*

_{I})

^{2}, the squared cross-correlation of the classification image with the ideal template. Two factors are at play in this term: the similarity of the observer’s template to the ideal template — sometimes called sampling efficiency — and the observer’s internal noise power. Because the classification image estimates the observer’s template, this cross-correlation will be higher when the observer’s template is similar to the ideal template (i.e., when the observer has higher sampling efficiency). Furthermore, because a classification image has a greater signal-to-noise ratio (SNR) for an observer with low internal noise (see discussion of Equation 16 in the 1), the cross-correlation will be higher for an observer with low internal noise. Thus sampling efficiency and internal noise power both affect the cross-correlation. They also affect the observer’s absolute efficiency. The fortunate coincidence that makes this method of analyzing classification images possible is that they affect the cross-correlation and the observer’s absolute efficiency in a quantitatively identical way, and consequently we can use the cross-correlation to recover absolute efficiency. The remaining terms in Equation 5 merely correct for the influence of the observer’s performance level, the number of trials, and the stimulus noise power on the cross-correlation of the classification image with the ideal template.

*ideal correlation*method of testing the adequacy of a classification image.

*d*′ appears in Equation 5, so it is not the case that from just the observer’s classification image and knowing nothing about performance we can use this equation to recover the observer’s absolute efficiency. Indeed, if we were interested only in finding absolute efficiency, we could use

*d*′ directly in Equation 4. This is not a crucial shortcoming, however, because our goal is not to deduce an observer’s efficiency from a classification image of unknown origin, but rather to see whether the strategy revealed in a classification image is consistent with the observer’s performance. It so happens that the quality (i.e., SNR) of a classification image depends on the observer’s performance level in a particular way, and

*d*′ enters into Equation 5 only to correct for this effect of performance level, as mentioned above and made clearer in the 1. Equation 5 is not in some covert way equivalent to Equation 4, and the predictions of Equation 5 can fail for observers whose strategies are not well described by classification images, as we discuss in detail below.

*c*= (

*L*−

*L*

_{bg})/

*L*

_{bg}, where

*L*is the luminance of the point of interest and

*L*

_{bg}is background luminance), and the contrast increment was set to each observer’s 70% contrast threshold (which ranged between 3% and 7% contrast) as measured in a pilot session. The pixelwise root-mean-square contrast of the noise was 20%, and the pixel size was 0.027 deg, for a noise power of 29 mdeg

^{2}. With these stimulus parameters, the base-contrast dot was well above detection threshold, so observers were confident about the spatial locations of the two dots. The stimulus duration was 200 ms. There were three observers in Experiment 2 and six observers in Experiment 3. Each observer ran in approximately 10,000 trials. Further details can be found in Murray et al. (2002).

^{2}. The stimulus duration was 240 ms. There were five observers, each of whom ran in between 2,800 and 9,000 trials with each of the four types of stimuli. Further details can be found in Murray (2002).

*SE*of predicted absolute efficiency was greater than 40% of the prediction itself.

*squared*ratio of real and ideal

*d*′, these values correspond to only a 6% to 7% under prediction of

*d*′. Thus, although the predictions are not exact, they are not far off either. (Figures 3 and 4 show that some data points have ratios much larger than 1.14, but these points typically also have large

*SE*s, so they have little effect on a maximum-likelihood average in which values are weighted by the inverse of their squared

*SE*[Taylor, 1982].)

*y*-intercept of 0.004 ± 0.004. (The error values are

*SE*s, calculated by bootstrapping.) The

*SE*s of these values are too large for us to conclude whether actual efficiency differs from predicted efficiency by a fixed offset or by a scale factor. In Experiment B, the slope and

*y*-intercept parameters are 0.99 ± 0.10 and 0.005 ± 0.001, respectively, indicating that the difference is mostly due to a fixed offset.

*any*phase appears in the noise, and over a large number of trials the contribution to the classification image of any grating-like noise pattern will be cancelled by the contribution of the corresponding phase-reversed grating-like noise pattern. This phase-uncertain strategy is a feasible (though suboptimal) strategy for detecting gratings, but by examining the portion of such an observer’s classification image collected on signal-absent trials, one would mistakenly infer that the observer’s performance was near chance. In this case, Equation 5 would fail to predict absolute efficiency, and this failure could provide a warning that the observer’s strategy was poorly described by a classification image.

*A*” or “

*B*”), and who simply guessed at the correct response on some proportion of trials. We will discuss in detail only our results with model observers who were uncertain about spatial position or phase, as these results are representative of the other cases. The remaining simulations are discussed in Murray (2002).

*p*pixels cross-correlated the ideal template with the stimulus, centered on each pixel within a distance of

*p*pixels of the correct location. The decision variable was the maximum of these cross-correlations: If the maximum exceeded a criterion, the observer gave one response, and otherwise gave the other response (Green & Birdsall, 1978). We chose the criterion to produce unbiased responses. We gave the model observers an early internal noise source that resulted in an internal-to-external noise ratio of 1.0, which is a typical value for human observers (Gold, Bennett, & Sekuler, 1999; Green, 1964). We simulated each model observer in 100 repetitions of an experiment with 10,000 trials. In each repetition of the experiment, we calculated predicted and actual efficiency, and we averaged these values over the 100 repetitions.

*S*is the stimulus with external and internal noise added (i.e.,

*S*=

*I*

_{{A,B}}+

*N*+

*Z*),

*G*

_{cos}is the cross-correlation with the cosine-phase Gabor template, and

*G*

_{sin}is the cross-correlation with the sine-phase Gabor template. This is the ideal decision variable for detecting a Gabor when the observer knows that on average the Gabor is in cosine-phase, but the observer is uncertain about the exact phase on any given trial (Van Trees, 1968, pp. 335–341). The parameter λ determines the degree of phase uncertainty. When λ = 0, the observer has no phase uncertainty, and the decision variable reduces to cross-correlation with a cosine-phase Gabor. When λ = 1, there is complete phase uncertainty, and the decision variable reduces to energy detection (i.e., the summed, squared outputs of Gabor filters that are an approximate quadrature pair). When 0 < » < 1, there is partial phase uncertainty. Our model observers had uncertainty parameters of 0.00, 0.25, 0.50, 0.75, and 1.00. The model observers also had an early internal noise source that resulted in an internal-to-external noise ratio of 1.0.

*d*′ versus signal contrast [see Equation 10 in the 1], but it is well known that psychometric functions are often nonlinear [Legge, Kersten, & Burgess, 1987; Pelli, 1985]). Second, other tests may reveal failings of the linear model that are not detected by the ideal correlation method, even within the same narrow range of stimuli (e.g., early pointwise nonlinearities are not detected by the ideal correlation method, as discussed earlier, but they can be detected by histogram contrast analysis) (Chubb, Econopouly, & Landy, 1994; Murray et al., 2001). Our goal in introducing the ideal correlation method is not to see whether there is a need for more complex models than the linear model as a general model of human vision, for there is ample evidence that more complex models are needed. Rather, our goal is to provide one test of whether classification images provide a reasonable phenomenological description of observers’ strategies, even over a limited range of stimuli.

*f*(

*x*) =

*f*(

*x*

_{0}) +

*f*′(

*x*

_{0})(

*x*−

*x*

_{0}). If this approximation is accurate over an interval surrounding

*x*

_{0}, we do not conclude that

*f*is a linear function. We conclude that although the full definition of

*f*may involve nonlinear terms, over the interval of interest, the behavior of

*f*is adequately characterized by a linear approximation, and we will not be greatly misled about the values of

*f*over that interval by examining the linear approximation. Just so, visual processes can often be approximated as being linear over a limited range (e.g., Ahumada, 1987), and this is the approximation that underlies use of classification images to characterize observers’ strategies.

*E*

_{I}and the observer’s contrast energy threshold

*E*on the same task (Tanner & Birdsall, 1958): .

*d*′

_{I}is proportional to the square root of the signal contrast energy (Peterson, Birdsall, & Fox, 1954), and consequently absolute efficiency can be calculated as the squared ratio of an observer’s performance

*d*′ and the ideal observer’s performance

*d*′

_{I}: .

*s*that is the cross-correlation of a signal

*I*corrupted by external noise

*N*and internal noise

*Z*, with a unitenergy template

*T*, as described by Equations 2 and 3. (By unitenergy, we mean .) In a discrimination task with signals

*A*and

*B*, the linear observer’s performance is given by the difference between the expected value of

*s*on signal-

*A*and signal-

*B*trials, divided by the

*SD*of

*s*:

*T*=

*T*

_{I}and the internal-to-external noise ratio is

*ρ*= 0, so Equation 10 shows that the ideal observer’s performance is .

*T*, we can express the classification image, considered as a random variable, as the sum of a scaled template

*kT*and a field of sampling noise

*N*

_{C}with a pixelwise mean of zero: .

*k*is necessary because the expected value of the classification image is merely proportional to the unitenergy template

*T*, and not necessarily equal to it.

*n*is the number of trials,

*ρ*is the observer’s internal-to-external noise ratio,

*d*′ is the observer’s performance,

*g*is the standard normal probability density function, and

*G*is the standard normal cumulative distribution function.

*C*, σ

^{2}

_{C}, and

*d*′ are all calculated from a single reverse correlation experiment, and in general they are correlated, so finding the expected value of this equation is not straightforward. However, after the large number of trials typically used in a reverse correlation experiment, σ

^{2}

_{C}and

*d*′ will be known quite precisely. We will make the simplifying assumption that σ

^{2}

*C*and

*d*′ are known constants to derive the expected value of Equation 17, and then we will test the accuracy of the resulting expression in simulations.

*F*. Thus is an unbiased estimator of a linear observer’s absolute efficiency.

*C*, the ideal template

*T*

_{I}, the number of trials

*n*, and the observer’s performance

*d*′ are obtained by the usual methods. The classification image is just a weighted sum of noise fields, so the pixelwise variance

*σ*

^{2}

_{C}of the sampling noise can be very closely approximated by assuming that the individual noise fields are independent samples from a multivariate normal distribution (e.g., if the pixelwise variance is 0.04, and the number of hits, misses false alarms, and correct rejections are 75, 25, 25, and 75, respectively, then

*σ*

^{2}

_{C}= 0.04/75 + 0.04/25 + 0.04/25 + 0.04/75 = 0.0043). In fact, the noise fields within each stimulus-response class of trials are not quite multivariate normal (noise fields fall in a given class because they have a particular relation to the observer’s template), but this approximation is adequate for estimating the variability of a classification image.

*σ*

^{2}

_{C}and

*d*′ are known exactly, although actually they are random variables. We have also assumed that the noise fields in each stimulus-response class of trials are multivariate normal, which again is only approximately true. To validate Equation 17 as an estimator of absolute efficiency, we ran simulated linear observers in classification image experiments, and tested whether Equation 17 really does predict their absolute efficiency. The relevant parameters of the linear model are sampling efficiency (

*T*⊗

*T*

_{I})

^{2}and internal-to-external noise ratio

*ρ*, and we tested linear model observers with a wide range of these parameters. We also varied the number of trials in the simulated experiment. Figure 6 plots the results for linear model observers with

*ρ*= 1, and with a range of sampling efficiencies, as a function of the number of trials in the classification image experiment. The horizontal lines show the model observers’ actual absolute efficiencies, and the data points show the efficiencies predicted by Equation 17, averaged over many simulations of the experiment. The plot shows that so long as there are 500 trials or more, the predictions are effectively unbiased. Results for linear model observers with different internal-to-external noise ratios ranging from

*ρ*= 0 to

*ρ*= 2 were virtually identical.