In signal detection theory, an observer’s responses are often modeled as being based on a decision variable obtained by cross-correlating the stimulus with a template, possibly after corruption by external and internal noise. The response classification method estimates an observer’s template by measuring the influence of each pixel of external noise on the observer’s responses. A map that shows the influence of each pixel is called a *classification image*. Other authors have shown how to calculate classification images from external noise fields, but the optimal calculation has never been determined, and the quality of the resulting classification images has never been evaluated. Here we derive the optimal weighted sum of noise fields for calculating classification images in several experimental designs, and we derive the signal-to-noise ratio (SNR) of the resulting classification images. Using the expressions for the SNR, we show how to choose experimental parameters, such as the observer’s performance level and the external noise power, to obtain classification images with a high SNR. We discuss two-alternative identification experiments in which the stimulus is presented at one or more contrast levels, in which each stimulus is presented twice so that we can estimate the power of the internal noise from the consistency of the observer’s responses, and in which the observer rates the confidence of his responses. We illustrate these methods in a series of contrast increment detection experiments.

*s*obtained by cross-correlating a stimulus

*I*with a template

*T*, possibly after corruption by an external Gaussian white noise

*N*and an internal Gaussian white noise

*Z*. The decision variable for this noisy cross-correlator can be written as In a two-alternative identification task, the observer sets a criterion

*a*, and gives one response if

*s ≥ a*, and the other response if

*s*<

*a*(Green & Swets, 1974; Peterson, Birdsall, & Fox, 1954). This model gives a good account of many aspects of observers’ performance in many perceptual tasks. In the “General Discussion,” we briefly review the evidence for the model.

*T*by measuring the influence of each pixel of an external noise field on the observer’s responses (Ahumada & Lovell, 1971; Beard & Ahumada, 1998). An image that shows the influence of each noise pixel is called a

*classification image*. Other authors have derived methods of calculating classification images whose expected value is proportional to the template

*T*(Abbey, Eckstein, & Bochud, 1999; Richards & Zhu, 1994), but the optimal calculation has never been determined, and the quality of the resulting classification images has never been evaluated. Response classification experiments require large amounts of data, so it would be useful to know how to use the data most efficiently and to know how the quality of a classification image depends on experimental variables, such as the number of trials and the observer’s level of performance. Here we show how to calculate a classification image by taking a weighted sum of external noise fields from individual trials, in such a way as to maximize the signal-to-noise ratio (SNR) of the classification image, and we give expressions for the SNR. We derive optimal methods for two-alternative identification experiments in which the stimulus is presented at one or more contrast levels, in which each stimulus is presented on two separate trials so that we can estimate the power of the internal noise from the consistency of the observer’s responses, and in which an observer rates the confidence of his responses. From the expressions for the SNR, we show how to choose experimental parameters, such as the observer’s performance level and the power of the external noise, to obtain classification images with a high SNR. We illustrate these methods in a series of contrast increment detection experiments.

*A*or

*B*, is presented in Gaussian white noise

*N*, and the observer’s task is to state which signal was presented. On some trials, the noise causes the observer to make mistakes if, by chance, the noise is distributed in such a way as to make

*A*look more like

*B*, or to make

*B*look more like

*A*. The response matrix of such an experiment has four cells, corresponding to the four stimulus-response pairs:

*AA, AB, BA*, and

*BB*(Figure 1). Other authors have shown that if the observer is a linear discriminator who responds ‘

*A*’ if a decision variable

*s*of form (1) is greater than some criterion

*a*and responds ‘

*B*’ otherwise, then the expected value of the external noise field

*N*on trials where the observer responds ‘

*A*’ is proportional to the template

*T*, and the expected value on trials where the observer responds ‘

*B*’ is proportional to the negated template −

*T*(Abbey et al., 1999; Richards & Zhu, 1994). Hence we can estimate the template by finding the average of the external noise fields

*N*over trials where the observer responds ‘

*A*’, and subtracting the average of the noise fields over trials where the observer responds ‘

*B*’.

*M*contaminated by white noise: Here is the vector magnitude of

*U, E*[

*M*] is the expected value of

*M*, and

*σ*

^{2}

_{M}is the variance of each pixel of

*M*. [Some authors define SNR as the square root of the right-hand side of (2).] In “” we show that in general the SNRs of the estimates of

*T*given by individual noise fields in the four classes of trials are not equal. Specifically, noise fields have higher SNRs on trials where the observer gives the incorrect response than on trials where the observer gives the correct response, so it is inefficient to combine noise fields in a weighted average that does not distinguish between correct and incorrect trials.

*A*and the observer responded ‘

*B*’. Expression (3) states that in a two-alternative identification experiment, the best classification image is obtained by calculating the average of the noise fields within each of the four classes of trials, then adding together the means of classes

*AA*and

*BA*and subtracting the means of classes

*AB*and

*BB*. This is the formula for calculating classification images that appears most often in the psychophysics literature (e.g., Beard & Ahumada, 1998), and in “,” we show that it is the optimal weighted sum of noise fields when the observer is unbiased. In “,” we also derive the optimal weighted sum 10 for a biased observer. Although (3) is the most commonly used formula, other formulas have been proposed that are suboptimal because they weight correct and incorrect trials equally, and do not take account of observer bias (Abbey et al., 1999; Richards & Zhu, 1994). The optimal expression (3) follows from the noisy cross-correlator model (1), but it is an empirical question whether it is actually optimal for human observers. In Experiment 2, we report data from a contrast increment detection experiment that indicate that (3) is the optimal expression.

*n*is the total number of trials,

*σ*

^{2}

_{N}is the variance of each pixel of external noise

*N*,

*σ*

^{2}

_{Z}is the variance of each pixel of internal noise

*Z*, and

*d*′ is the observer’s performance level. The function

*g*is the standard normal probability density function, and

*G*is the standard normal cumulative distribution function.

*n*. The SNR depends nonlinearly on

*d*′, varying as . As shown in Figure 2, this means that the quality of the classification image declines rapidly at high performance levels, but does not vary greatly below approximately 75% correct (e.g., the SNR at 60% correct is only 15% higher than the SNR at 75% correct). Finally, the SNR is proportional to the ratio of the external noise variance to the total noise variance, , indicating that the more the observer’s performance is limited by internal noise, the lower the quality of the classification image. Internal noise is typically the sum of a noise of fixed variance

*σ*

^{2}

_{ZF}and a noise whose variance

*σ*

^{2}

_{ZP}is proportional to a weighted sum of the signal energy

*E*and the external noise variance

*σ*

^{2}

_{N}: (Burgess & Colborne, 1988; Lillywhite, 1981; Lu & Dosher, 1998). This implies that the noise ratio is highest when the external noise power is high. In many foveal tasks, the power spectral density of the fixed internal noise is on the order of and the constant of proportionality of the internal proportional noise is approximately

*q*≈ 1, so the internal noise variance is given by (e.g., Burgess & Colborne, 1988; Pelli & Farell, 1999). This suggests that to obtain a high noise ratio

*σ*

^{2}

_{N}/

*σ*

^{2}

_{N}+

*σ*

^{2}

_{Z}, the power of the external noise should be several times the power of the internal fixed noise, e.g., at least 10

^{−5}deg

^{2}, which at a typical pixel width of 0.02 degrees corresponds to a root mean square (RMS) noise contrast of 16%.

*per se*. It is less obvious whether the increase in information from the incorrect trials exceeds the decrease in information from the correct trials at higher performance levels, but our discussion of the SNR (4) of a classification image showed that the SNR is lower at higher performance levels, declining as . In “,” we show that the optimal method of summing noise fields across trials with different performance levels is first to calculate separate classification images

*C*from the trials at each contrast level

_{Ci}*i*, using Equation (3), and then to take the following weighted sum of the separate classification images

*C*, in which each classification image is weighted according to the number of trials

_{i}*n*and observer’s performance level

_{i}*d*′

_{i}at the corresponding contrast level:

*C*is the sum of the SNRs of the classification images

*C*at each contrast level: This SNR depends on the noise variances

_{i}*σ*

^{2}

_{N}and

*σ*

^{2}

_{Z}in the same way as the single-contrast classification image (3) discussed in the previous section. Furthermore, this SNR is highest when most trials are collected at low performance levels

*d*′

_{i}.

*d*′ versus the contrast increment. The signal levels covered a wide performance range, from

*d*′ near zero to approximately 4.0, which in terms of proportion correct covers a range of 0.50 to 0.98. When measured in terms of

*d*′, performance was an approximately linear function of signal contrast, at least up to high performance levels (around

*d*′ =

*3.5*, which corresponds to 96% percent correct) at which point even infrequent keypress errors and lapses of attention can cause performance to level off. This linearity is consistent with the noisy cross-correlator model (1), and with the findings of earlier studies of contrast increment detection (Legge, Kersten, & Burgess, 1987).

*kT*+

*N*, i.e., as the sum of a signal

_{C}*kT*that is proportional to the observer’s template

*T*, and a sampling noise

*N*. If we scale the template

_{C}*T*to have unit energy, the signal energy in the classification image is

*k*

^{2}, the noise variance is the pixelwise variance of the sampling noise

*σ*, and the SNR of the classification image is

_{C}^{2}*k*

^{2}/

*σ*

^{2}

_{C}. If we knew the observer’s template

*T*exactly, we could estimate the SNRs of the optimal and suboptimal classification images. First, the classification image is a weighted sum of noise fields, , so if the weights and noise fields are independent, then the pixelwise variance of the classification image is . The weights and noise fields are not independent (e.g., the weight assigned to a noise field depends on the observer’s response to the noise field, which in turn depends on how similar the noise field is to the observer’s template), but in “” [Equation (A5)], we show that this approximation to

*σ*

^{2}

_{C}is very accurate and gives a simple and effective way of calculating the variance of the classification image. Second, if we knew the observer’s template

*T*, we could calculate the signal energy

*k*

^{2}from the cross-correlation , which has an expected value of

*k*. Of course, we do not know the observer’s template

*T*exactly, so we cannot directly compare the SNRs of the optimal and suboptimal classification images this way.

*k*

^{2}/

*σ*

^{2}

_{C}, and can be used to estimate the SNRs of the optimal and suboptimal classification images up to a common scale factor. In principle, the choice of the approximation

*T*′ is arbitrary, but if we make a poor approximation, the cross-correlation is small compared to the noise term , and our calculation of the rSNR will be noisy. The closer our approximation

*T*′ is to the true template

*T*, the better.

*T*′ to the human observers’ templates. The ideal template is the signal-right stimulus minus the signal-left stimulus, so it consists of a positive-contrast dot to the right of fixation and a negative-contrast dot to the left. For the numerator of (8), we cross-correlated the ideal observer’s template with the classification images, and for the denominator, we calculated the pixelwise variance from the variance of individual noise fields and the weights in the weighted sum that produced the classification image.

*σ*

^{2}

_{C}, with

*σ*

^{2}

_{C}calculated individually for each observer as described earlier. The optimal rSNRs were consistently higher than the suboptimal rSNRs, and although the differences were not statistically significant for individual observers, taken together they did reject the null hypothesis that the optimal rSNRs were the same as the corresponding suboptimal rSNRs: under the null hypothesis, differences this large for all three observers are improbable (

*p*< .05). On average, the rSNRs of the optimal classification images were 13% higher than the rSNRs of the suboptimal images. We conclude that the optimal method (5) improves on the standard method (3), although the standard method is reasonably efficient considering the wide range of performance levels covered in the experiment.

*C*is determined by the SNR of the classification image

_{i}*C*, as predicted by model (1): classification images that are predicted to have a high SNR are weighted heavily, and classification images that are predicted to have a low SNR are weighted lightly. Consequently, expression (5) is the optimal weighted sum for calculating the pooled classification image C only if our predictions of the SNRs of the subordinate classification images are correct. To see whether the SNR actually varied across performance levels as predicted by (4), we calculated the rSNR of the classification images at each performance level. Figure 6 plots the rSNRs of the classification images

_{i}*C*at each performance level, divided by the number of trials

_{i}*n*collected at each performance level, along with the predictions of expression (4) scaled to minimize the sum-of-squares error between the predictions and the data. The rSNRs roughly followed the predicted pattern of declining as a function of

_{i}*d*′, indicating that expression (5) assigns approximately correct weights to the subordinate classification images

*C*when calculating the pooled classification image

_{i}*C*.

*d*′=1, and for two of the observers (A.N.C. and O.R.W.), the rSNR may have been lower than predicted below this level as well. The deviations are small compared to the overall accuracy of the predictions, and the measurement errors are too large for us to say with certainty where the rSNR peaks. Nevertheless, this discrepancy calls for further investigation, because if it is genuine, then it indicates a failure of the noisy cross-correlator model, i.e., a failure of linearity. For instance, the keypress errors and lapses of attention that may have caused the psychometric functions to saturate at high performance levels (Figure 4) might have to be included in the model as a form of internal noise that grows with the signal level, lowering the SNR at high performance levels. Furthermore, this discrepancy implies that expression (5) is a slightly suboptimal method of calculating classification images, as it assigns too large a weight to noise fields collected at performance levels far from

*d*′ =

*1*. Most of the deviations are small, but a revised model that predicted the SNRs correctly would be useful because it would allow us to derive the truly optimal method of calculating classification images when combining trials across performance levels. Another possibility that we will not investigate here is empirical estimation of the rSNRs of trials at different performance levels, as in Figure 6, and the use of these estimates to weight the classification images at different performance levels optimally when combining them into a single classification image. In any case, one practical consequence of this finding is that classification images should be collected at a performance level of approximately

*d*′ = 1. Above this point, the SNR drops even more rapidly than predicted, and the SNR may drop below this point as well. Furthermore, there are other reasons for not collecting classification images at very low performance levels, such as the changes in observers’ strategies that can occur due to spatial uncertainty (Ahumada & Beard, 1999; Pelli, 1985) or the frustration that observers experience when a task is too difficult.

*d*′ ≈ 1, the rSNR at

*d*′ ≈ 4 is lower than expected, or we could equally well say that given the rSNR at

*d*′ ≈ 4, the rSNR at

*d*′ ≈ 1 is higher than expected. This ambiguity does not matter for our test of whether expression (5) is the optimal method of calculating classification images, because as we mentioned earlier, we need only predict the SNRs correctly up to a common scale factor. This ambiguity does matter, though, when we attempt to explain why the model’s predictions do not match the observed rSNRs.

*T*, and by the power of the internal noise

*Z*. One way of measuring the power of the internal noise that limits an observer’s performance is to present each stimulus twice, on separate trials, and to measure the proportion of repeated trials on which the observer gives the same response twice (Burgess & Colborne, 1988; Gold et al., 1999; Green, 1964). We emphasize that in this two-pass method, the repeated stimulus, including the external noise, is identical pixel-by-pixel on both presentations. The two presentations are separated by many trials, so the observer does not know when the stimulus is repeated, and treats the two trials as showing independent stimuli. If the observer’s responses are based on a noiseless decision rule, then the observer will give the same response to a stimulus every time it is presented, whereas if the observer’s performance is largely limited by internal noise, then the observer’s responses to repeated presentations of the same stimulus will be less consistent. Burgess and Colborne (1988) showed how to use the consistency of an observer’s responses in a two-pass experiment to calculate the power of the internal noise.

*A*and

*B*) and four possible pairs of responses on repeated presentations of a single stimulus (

*AA, AB, BA*, and

*BB*). In “,” we show that noise fields from trials in different cells of this matrix have different SNRs, and we show that for an unbiased observer, the optimal weighted sum for calculating a classification image is To calculate the weighting parameter

*w*, we need to know the observer’s performance level (

*d*′), which is easily determined, and the internal-to-external noise ratio

*σ*

^{2}

_{Z}/

*σ*

^{2}

_{N}, which can be calculated from the consistency of the observer’s responses (Burgess & Colborne, 1988).

*p*is the probability of the observer giving two correct responses on repeated trials,

_{CC}*p*is the probability of one correct and one incorrect response, and

_{CI}*p*is the probability of two incorrect responses. For instance, when stimulus

_{II}*A*is presented,

*p*is the probability of two

_{CC}*A*responses,

*p*is the probability of one

_{CI}*A*response and one

*B*response, in either order, and

*p*is the probability of two

_{CI}*B*responses.

*σ*

^{2}

_{Z}

*σ*

^{2}

_{Z}=0, it follows that

*p*

_{CC}=

*p*

_{C},

*p*

_{CI}=

*0*,

*p*

_{II}=

*p*

_{I}, and

*w*=

*0.5*. With these values, the SNR (10) of the two-pass classification image is half the SNR (4) of the one-pass classification image. When

*σ*

^{2}

_{Z}

*σ*

^{2}

_{Z}→ ∞, it follows that

*p*

_{CC}=

*p*

_{C}

^{2},

*p*

_{CI}=

*p*

_{C}p

_{I},

*p*

_{II}=

*p*

_{I}

^{2}, and

*w*=

*p*

_{C}. With these values, the two-pass SNR (10) equals the one-pass SNR (4). That is, the quality of a classification image obtained in a two-pass experiment can approach but never exceed the quality of a classification image obtained in the corresponding one-pass experiment.

*AAA*cell of the two-pass response matrix would be counted twice in the

*AA*cell of the one-pass matrix, each trial in the

*AAB*cell of the two-pass matrix would be counted once in the

*AA*cell and once in the

*AB*cell of the one-pass matrix, and so on. Using this regrouping of trials and expressions (C1) through (C4) in “” for the SNR of the noise fields in each cell of the two-pass response matrix, it is possible to show that over a wide range of values of

*p*and

_{C}*σ*

^{2}

_{Z}/

*σ*

^{2}

_{NN}(e.g., as

*p*ranges from 0.50 to 0.95 and

_{C}*σ*

^{2}

_{Z}/

*σ*

^{2}

_{N}ranges from 0 to 3), the SNR obtained using the standard method (3) is only a few percent lower than the SNR obtained using the optimal method (9), so the optimal method does not improve appreciably on the standard method. In the following experiment, we show that the standard method (3) does work almost as well as the optimal method (9) in a contrast increment detection task.

*r*point rating scale to indicate his confidence that stimulus

*A*or

*B*was presented. We will take response 1 to mean that the observer is confident that the stimulus was

*B*, and response

*r*to mean that he is confident that it was

*A*. Figure 11 shows the response matrix for an experiment with a six-point rating scale.

*r*+

*1*criteria

*a*, and giving response

_{i}*i*if the decision variable

*s*falls between

*a*and

_{i}*a*(Egan, Schulman, & Greenberg, 1959). (This formulation requires that

_{i+1}*a*

_{1}→ −∞ and

*a*

_{r+1}→ +∞.) In “” we show that noise fields from different cells of the response matrix of a rating scale experiment have different SNRs, and we show that the optimal weighted sum for calculating a classification image is Here

*p*is the probability that the observer gives a rating less than

_{Ai-}*i*when stimulus

*A*is presented, and

*p*is the probability that the observer gives a rating less than

_{Bi-}*i*when stimulus

*B*is presented. The function

*G*is the inverse of the normal cumulative distribution function (i.e., it is the z-transform function used in signal detection theory). Expression (11) states that the optimal weighted sum adds the average noise fields in each cell of the response matrix, with the average of each cell weighted by a quantity that is a function of the normal deviates

^{−1}*z*and

_{i}*z*of the criteria

_{i+1}*ita*and

_{i}*a*that bound the decision variable in that cell.

_{i+1}*d*′, 0.10

*d*′, 0.50

*d*′, 0.90

*d*′, 1.39

*d*′, and +∞ so that he gives each response equally often, expression (12) predicts that the SNR will be 1.67 times the SNR obtained in the corresponding binary-response identification experiment. Evidently, the advantage of using a rating scale can be substantial. In the following experiment, we show that recording rating scale responses can improve the quality of classification images obtained in a contrast increment detection task.

*p*< .01), and on average were 15% higher. Equation (11) gives the optimal method of calculating classification images for an observer who performs the rating scale task by comparing a Gaussian-distributed decision variable of form (1) to a number of fixed criteria, as in the standard signal detection account that we outlined above. Clearly, our observers did not follow this strategy.

*p*< .001), and on average were 21% higher. Figure 15 shows the average rSNRs of noise fields in each cell of the rating scale response matrix. The agreement between the predicted SNRs and actual rSNRs is much better for these observers than for the first three, which explains why the optimal method (11) gave better results for this set of observers.

*d*′ is often a linear function of signal contrast (e.g., Legge et al., 1987). It leads to the concepts of sampling efficiency and internal-to-external noise ratio, which are useful ways of describing many factors that limit observers’ performances (Burgess & Colborne, 1988; Burgess, Wagner, Jennings, & Barlow, 1981). Nevertheless, it does not account for all aspects of observers’ performances, and the model has been elaborated in various ways by many authors. Most of these elaborations are unimportant for our purposes, because we require only that the model described by (1) is

*locally*valid, in the sense that it describes observers’ performance in a single discrimination task. In the following paragraphs, we consider a few examples of how models that differ from the noisy cross-correlator described by Equation (1) may be locally equivalent to it.

*Z*added to the stimulus at the input (Ahumada, 1987; Ahumada & Watson, 1985). The internal noise

*Z*affects the observer’s decisions only via the term that is added to the decision variable, so any late noise

*Z*

_{L}added after the cross-correlation is equivalent to an early noise

*Z*added before the cross-correlation that satisfies . Hence the difference between early- and late-noise models is not important for our purposes.

*Z*

_{N}that affects the observer’s decisions only after it has passed through the template is equivalent to a white noise

*Z*that produces a term of the same variance after the template, i.e.,

*Z*

_{N}and

*Z*are equivalent so long as .

*A*and signal-

*B*trials in a threshold discrimination task will produce little difference in the observer’s proportional noise. For this reason, we can consider proportional noise as just another form of internal noise that can be incorporated in the early noise (

*Z*). This said, we should also point out that it is easy to modify the methods we have presented to handle tasks where the internal noise power is very different on signal-

*A*and signal-

*B*trials. The derivation in “” considers signal-

*A*and signal-

*B*trials separately, and if we need to obtain a more general expression for calculating classification images, we can simply drop the assumption that the internal noise power is the same on the two types of trials.

*c*is the signal grating contrast and

*c*

_{0}is an arbitrary reference contrast. This is clearly a nonlinear model, but in a task where the observer discriminates between two gratings of fixed contrast

*c*

_{A}and

*c*

_{B}, the nonlinearity can be accommodated within the noisy cross-correlator model. Let

*I*be a unit-contrast grating, so that

*cI*is a grating of contrast

*c*. We can incorporate Foley and Legge’s (1981) power-law transduction nonlinearity by writing the decision variable in response to a stimulus

*cI*+

*N*as . With no external noise, this decision variable has mean and fixed variance, as in Foley and Legge’s (1981) model. If the external noise

*N*causes the term to vary over only a small range, as in an experiment where observers discriminate between gratings of similar contrasts

*c*

_{A}and

*c*

_{B}, we can use a Taylor series approximation that is linear in the external noise term

*N*: . If we rescale the decision variable, multiplying by , we can rewrite it as . As we pointed out when we discussed early and late noise, we can choose

*Z*so that , and rewrite the decision variable as . Hence over a small contrast range, the observer behaves like a noisy cross-correlator, except that the internal-to-external noise ratio depends on the signal contrast

*c*. When an observer discriminates between gratings of two similar contrasts

*c*

_{A}and

*c*

_{B}, the internal noise power will be approximately the same on signal-

*A*and signal-

*B*trials, and the methods we have derived will be approximately optimal. When the grating contrast ratio

*c*

_{A}

*c*

_{B}is very different from 1, as when

*c*

_{A}or

*c*

_{B}is zero in a detection experiment, the internal-to-external noise ratio may be very different on the two types of trials. As we pointed out in our discussion of proportional noise, the methods we have derived can be easily modified to handle this case as well.

*c*to

*f(c)*. Chubb and Nam (2000) reported an extreme example of such a nonlinearity: they found that observers used a half- or full-wave rectifying nonlinearity to judge the contrast variance of a texture patch. Clearly, an observer who used full-wave rectification would not produce a classification image because the contrast of each pixel of the stimulus would be uncorrelated with the observer’s response. The precise effect of less extreme nonlinearities, such as a logarithmic transform, is unclear. On the other hand, Nam and Chubb (2000) found that early pointwise nonlinearities were negligible when observers judged the luminance of a texture patch, and we have found similar results in complex shape discrimination tasks (Murray, Bennett, & Sekuler, 2001), suggesting that such nonlinearities might be unimportant in first-order tasks.

^{−5}deg

^{2}, which at a typical pixel width of 0.02 degrees corresponds to an RMS noise contrast of 16%. Third, we found that the SNR of our classification images peaked at a performance level of approximately

*d*′ = 1. Finally, we found that classification images had a higher SNR when we recorded responses on a six-point rating scale than when we recorded binary identification responses.

*I*+

*N*as one of two alternatives,

*A*or

*B*, by corrupting it with an additive internal noise

*Z*, cross-correlating the corrupted stimulus with a template

*T*to obtain a decision variable

*s*, and responding

*A*if and only if

*s*exceeds a criterion

*a*: . We will call the corrupted stimulus.

*I*, the noises

*N*and

*Z*, and the template

*T*as vectors in an

*m*-dimensional vector space, where

*m*is the number of pixels in the stimulus. The cross-correlation then becomes the vector dot product . An observer who follows strategy (A2) divides the

*m*-space in two with a hyperplane

*Π*

_{T}perpendicular to

*T*, and responds ‘

*A*′ if and only if the corrupted stimulus

*I** falls on one side of

*Π*

_{T}. Without loss of generality, we can assume that .

*A*. We will adopt an orthonormal coordinate frame

*F*′ with the origin at

*A*, with the first coordinate vector parallel to

*T*and with the remaining coordinate vectors parallel to

*Π*

_{T}. We can represent the transformation from our original coordinate frame

*F*to the new frame

*F*′ by , where

*R*is a rotation matrix . In

*F*′, a signal-A stimulus is represented as . We will define and , and write . The coordinates of

*N*are independent, equal-variance Gaussian random variables, so the coordinates of

*N*′ are also independent, equal-variance Gaussian random variables. Similarly, the coordinates of

*Z*and

*Z*′ are independent, equal-variance Gaussian random variables.

*A*trials amounts to , and we have assumed that the observer’s responses depend on whether

*s*≥

*a*. Equivalently, we can define the decision variable as , and assume that the observer uses a criterion . The vector dot product is invariant under rotation, so we can rewrite this new decision variable as . We have defined the rotation R such that , so the decision variable

*s*′ takes the particularly simple form . That is, the observer’s responses depend only on whether the first coordinate of the 0 stimulus exceeds a criterion

*a*′, and the observer’s responses are statistically independent of coordinates 2 through

*m*of the noises

*N*′ and

*Z*′.

*N*′ on trials where the signal is

*A*and the observer responds ‘

*A*′? We will denote this expected value by . The observer’s response is independent of components

*N*′

_{2}through

*N*′

_{m}, so the mean of these components, conditional on the observer having responded

*A*, is equal to their unconditional mean, which is zero. The conditional mean of the first component is . In “,” we derive an expression (F1) for conditional means of this form, and this expression shows that the expected value of

*N*′

_{1}is . Here

*p*

_{AA}is the probability that the observer gives response

*A*on a trial where stimulus

*A*is presented,

*σ*

^{2}

_{N}is the variance of each pixel of the external noise field

*N*′, and

*σ*

^{2}

_{Z}is the variance of each pixel of the internal noise field

*Z*′. The function

*g*is the standard normal probability density function, and

*G*

^{−1}is the inverse of the standard normal cumulative distribution function.

*N*′

_{1}is the only nonzero component of the expected value of the entire noise field , so expression (A3) also gives the magnitude of this expected value. Furthermore, because

*N*′

_{1}is the only nonzero component, the expected value is proportional to the coordinate vector , and hence proportional the observer’s template

*T*; this is why the response classification method gives an estimate of the observer’s template.

*N*′ on trials where the signal is

*A*and the observer responds

*B*, i.e., ? Again, the conditional mean of components

*N*′

_{2}through

*N*′

_{m}is zero, and now the conditional mean of the first component is . This mean can be rewritten as , and we can use (F1) again to evaluate this expression: .

*N*on trials of type

*AA*and

*AB*? The observer’s response is independent of components

*N*′

_{2}through

*N*′

_{m}of the transformed noise field

*N*′, so the conditional variance of these components is equal to their unconditional variance

*σ*

^{2}

_{N}. In “,” we give expressions (F1, F2) from which the conditional variance of

*N*′

_{1}can be computed, and these expressions show that under typical experimental conditions (e.g., 75% correct and an internal-to-external noise ratio of 1.0) the variance of

*N*′

_{1}is slightly less than

*σ*

^{2}

_{N}.

*N*is a rotation of

*N*′, so each pixel of

*N*can be expressed as a weighted sum of the components of

*N*′, i.e., . When the stimulus contains many pixels (e.g., 10,000 pixels in a 100 × 100 stimulus) the variance of the single component

*N*′

_{1}makes a negligible contribution to the variance of the pixels of

*N*. Furthermore, the expression for the conditional variance of

*N*′

_{1}is cumbersome, and requires us to know the observer’s internal-to-external noise ratio . Consequently, we will use the approximation that the variance of

*N*′

_{1}is

*σ*

^{2}

_{N}on trials of type

*AA*and

*AB*, and that the variance of each pixel in

*N*is .

*AA*trials the true variance of

*N*′

_{1}is 0.77

*σ*

^{2}

_{N}, whereas we approximate it as

*σ*

^{2}

_{N}. Hence in an

*m*-pixel noise field, we approximate a total noise field variance of as

*mσ*

^{2}

_{N}. Even in a small 16 × 16 pixel noise field, this is an error of less than 0.1%, which is negligible compared to other sources of error, such as our estimate of the observer’s response bias that as we show below must also be known to calculate the optimal weighted sum. In any case, in situations where (A5) is an inadequate approximation, we can use the correct expression for the conditional variance of

*N*′ which is immediately derivable from (F1) and (F2).

*B*is presented, with analogous results. We need only introduce a coordinate frame

*F*″ with its origin at

*B*and its first coordinate vector parallel to

*T*, and a decision variable with criterion . We then obtain expressions like (A3), (A4), and (A5), with stimulus

*A*replaced by stimulus

*B*. Hence the SNR of an external noise field

*N*on a single trial of type

*SR*(where we use

*SR*to stand for trial types

*AA*,

*AB*,

*BA*, and

*BB*) is the ratio of an expression of the form (A3) or (A4) squared, and an expression of the form (A5): . In general, the four trial types have different probabilities

*p*

_{SR}, so noise fields from the four cells of the response matrix have different SNRs.

*T*, but also that, in general, noise fields from different classes of trials have different SNRs, given by (A6). In “,” we show that the SNR of a weighted sum of several noisy images

*C*

_{i}with proportional means is maximized when . According to the expressions for the expected value (A3, A4), and variance (A5) of the noise fields, this means that we should calculate the classification image by taking a weighted sum of noise fields across trials, with the weight

*w*

_{SR}assigned to the noise fields in each cell

*SR*of the response matrix equal to . The sign is positive for trials in classes

*AA*and

*BA*, and negative for trials in classes

*AB*and

*BB*, as the expected values of noise fields on response-

*A*and response-

*B*trials are anti-parallel.

*w*

_{SR}corresponding to all noise fields can be dropped, as these factors simply scale the classification image without affecting its SNR. The external noise variance

*σ*

^{2}

_{N}is the same on all trials, and if the contrast energy of stimuli

*A*and

*B*is approximately the same then the internal noise variance

*σ*

^{2}

_{Z}will also be approximately the same on all trials (see our discussion of fixed and proportional internal noise in the general discussion), and we can drop the factor . Hence the weights reduce to . We will denote the number of trials in response cell SR in a particular experiment as

*^n*

_{SR}and estimate the response probabilities by . If we substitute

*^p*

_{SR}for

*p*

_{SR}(we discuss this substitution later), and scale each weight by dividing it by the number of trials

*n/2*on which stimulus

*A*or stimulus

*B*was presented, the weights evaluate to .

*A*and the observer responded

*B*.

*p*

_{AA}=

*p*

_{BB}, and from these equalities it follows (again using the substitution

*p*

_{SR}=

*^p*

_{SR}) that the numerator of (A9) is the same for all four classes of trials, and this factor can be dropped, leaving . That is, the optimal weighted sum for calculating a classification image for an unbiased observer is derived by averaging the noise fields in each cell of the response matrix, adding the averages from response-

*A*cells, and subtracting the averages from response-

*B*cells: .

*p*

_{SR}. In practice, we can only count the number of trials

*^p*

_{SR}in each cell and make estimates of the probabilities, , from a limited total number of trials. This may introduce biases into our formulas for classification images. For example, although

*^n*

_{SR}is an unbiased estimator of

*n*

_{SR}, 1/

*^n*

_{SR}, is not an unbiased estimator of 1/

*n*

_{SR}, so the weights calculated according to (A11) will be biased. However, the biases are very small: when we evaluate the expected value of (A11) numerically, we find that at a response probability of

*p*

_{SR}= 0.75, even with only 100 trials, the bias is less than 1%. The number of trials in a response classification experiment is typically very large (e.g., 10,000 trials per condition in Gold, Murray, Bennett, & Sekuler [2000]), so our estimates of

*^n*

_{SR}and

*^p*

_{SR}are very accurate, and the biases in (A10) and (A15) are negligible.

*C*is the sum of the SNRs of the individual noise fields (see “”). The SNR of a single noise field on trial type

*SR*is given by (A6). We can find the SNR obtained from all trials of type

*SR*by multiplying (A6) by the expected number of trials of type

*SR*, (

*n*/2)

*p*

_{SR}, and we can find the SNR of the classification image

*C*by summing these SNRs across all four types of trials. For a unbiased observer with proportion correct

*p*

_{C}=

*p*

_{AA}=

*p*

_{BB}, .

*w*

_{SR}, whereas we can only estimate the optimal weights from a limited amount of data. Hence (A15) overestimates the quality of the classification image calculated according to (A12). However, as we have pointed out, the number of trials in a response classification experiment is typically very large, and in this case, our estimates of the weights will be quite accurate, and the error in (A15) will be small.

*C*

_{i}. In “,” we show that the weighted sum has the highest SNR when . From expressions (A3), (A4), and (A5) for the mean and variance of individual noise fields, and the method (A12) of combining noise fields to calculate a classification image, it follows that the magnitude and variance of the classification images

*C*

_{i}obtained at each stimulus level are . The optimal weights

*w*

_{i}are the ratio of (B1) and (B2): . Eliminating scale factors common to the weights at all stimulus levels, we are left with .

*C*

_{i}at each stimulus level using (A12), and to take the weighted sum .

*σ*

^{2}

_{Z}is the same at all signal contrasts. As we mentioned in the “General Discussion,” internal noise power actually grows with the signal contrast as well as with the external noise power, so this assumption is false. However, if a large part of the stimulus energy comes from external noise, as is often the case in a response classification experiments because of the high power of the noise added to the signal, then the proportional noise will vary little with the signal contrast. (See our discussion of fixed and proportional noise in the general discussion.) In any case, internal proportional noise is not yet understood well enough for us to anticipate how it will vary with the signal contrast in general (e.g., how the constant of proportionality depends on the shape of the signal), so we will use the rough approximation that the internal noise power does not vary with signal contrast when a signal is shown in strong external noise. In an experiment where one is in a position to measure the internal noise power at each signal contrast, the weights in (B3) can be used instead of those in (B4).

*A*is presented. We will adopt the coordinate frame

*F*′ introduced in “.” On a repeated pair of trials, the external noise

*N*′ is the same on both trials, but the internal noise samples Z

^{1}′ (first trial) and Z

^{2}′ (second trial) are independent. As shown in “,” the observer’s response is independent of noise components

*N*′

_{2}through

*N*′

_{m}, so the conditional mean of these components is zero. The conditional mean of

*N*′

_{1}is . In “,” we give an expression (F3) for conditional means of this form, and this expression shows that . Here

*p*

_{AAA}is the probability that the observer responds

*A*on both trials of a trial pair on which the signal is

*A*.

*A*on the first trial and

*B*on the second trial, the mean of components

*N*′

_{2}through

*N*′

_{m}is zero, and the mean of

*N*′

_{1}is , which expression (F4) shows to be equal to . The same expression gives the mean on trial pairs where the observer responds

*B*on the first trial and

*A*on the second trial.

*p*

_{AAA}with

*n*

_{AAA},

*p*

_{AAB}with

*n*

_{AAB}, etc., and using the fact that for an unbiased observer

*z*=

*d*′/2, we find that for an unbiased observer, the optimal weighted sum for calculating a classification image is .

*C*is the sum of the SNRs of the individual noise fields. The SNRs of the individual noise fields are the ratios of (C1), (C2), and (C3) squared with (C4), and summing these ratios times the expected number of trials in each category, we obtain . In this expression,

*p*

_{CC}is the probability of two correct responses (

*p*

_{CC}=

*p*

_{AAA}=

*p*

_{BBB}),

*p*

_{CI}is the probability of one correct response and one incorrect response (this probability covers two cells of the response matrix,

*AAB*and

*ABA*, as the order of the correct and incorrect responses are normally of no interest; hence

*p*

_{CI}=

*2p*

_{AAB}=

*2p*

_{ABA}=

*2p*

_{BAB}=

*2p*

_{BBA}), and

*p*

_{II}is the probability of two incorrect responses (

*p*

_{II}=

*p*

_{ABB}=

*p*

_{BAA}).

*A*is presented. We will adopt the coordinate frame

*F*′ introduced in “.” On trials where the observer’s response is rating

*i*, the decision variable

*s*′ falls between criteria

*a*

_{i}′, and

*a*

_{i+1}′. The observer’s response is independent of noise components

*N*′

_{2}through

*N*′

_{m}, so the mean of these components on trials where the observer responds

*i*is zero. The mean of the first component, , can be evaluated using (F5) from “,” and we find .

*z*

_{Ai}′ are the normal deviates of the criteria

*a*

_{i}′ with respect to the distribution of the decision variable

*s*′ on trials on which the stimulus is

*A*. These normal deviates can be estimated from the observer’s response probabilities, , where

*p*

_{Ai-}is the probability that the observer gives a rating less than

*i*on a trial where stimulus

*A*is presented.

*B*is presented. We adopt the coordinate frame

*F*″ introduced in “.” Again, the observer’s response is independent of external noise components

*N*″

_{2}through

*N*″

_{m}, so the mean of these components on trials where the observer responds

*i*is zero. Expression (F5) shows that the mean of the first component is .

*z*

_{Ai}′ are the normal deviates of the criteria

*a*

_{i}′ with respect to the distribution of the decision variable on trials where the signal is

*A*, and the quantities

*z*

_{Bi}″ are the normal deviates of the criteria

*a*

_{i}″ with respect to the distribution of the decision variable on trials where the signal is

*B*. The signal detection analysis of this task holds that the criteria are the same on signal-

*A*and signal-

*B*trials. The normal deviate of the mean on signal-

*A*trials with respect to the distribution on signal-

*B*trials is

*d*′, and it follows that

*z*

_{Ai}′=

*z*

_{Bi}″−

*d*′. We can form more accurate estimates of the observer’s criteria by averaging the estimates from signal-

*A*and signal-

*B*trials: . These values

*z*

_{i}are estimates of the normal deviates of the observer’s criteria with respect to the distribution of the decision variable

*s*on signal-A trials; the normal deviates with respect to the distribution on signal-B trials are

*z*

_{i}+

*d*′. We could form better estimates

*z*

_{i}by combining the estimates

*z*

_{Ai}′ and

*z*

_{Bi}″−

*d*′ in a weighted average that takes account of the standard errors associated with these quantities (Taylor, 1982), but we will not develop these details.

*Ai*and

*Bi*are . Each denominator term of the form is the probability that the decision variable

*s*falls between criteria

*a*

_{i}and

*a*

_{i+1}, i.e., the probability that a trial receives rating

*i*, which is proportional to the number of trials that receive rating

*i*. Using this fact, and dropping scale factors common to all weights, we find that the optimal weighted sum is . where the normal deviates

*z*

_{i}are calculated from both signal-A trials and signal-B trials as in (D4).

*C*is the sum of the SNRs of the individual images. The SNR of the individual noise fields is given by the ratio of (D1, D2) squared and (D5): . The SNR of

*C*is the sum of the SNRs of the noise fields in each cell of the response matrix, given by (D9) and (D10), times the expected number of noise fields in each cell: .

*n*stochastic images

*C*

_{i}, all of whose means are proportional to an image

*T*, but that have different SNRs. If we take a weighted sum to estimate

*T*more precisely, what weights

*w*

_{i}should we use to maximize the SNR of the weighted sum

*C*? (We are interested only in obtaining an image

*C*that is proportional to

*T*, and we do not care about estimating the magnitude of

*T*.)

*σ*

^{2}

_{i}as the variance of each pixel of C

_{i}. Rescaling an image does not change its SNR, so we can rescale the images C

_{i}to have unit magnitude, , and consider weighted sums of

*C*

_{i}′. The unit-magnitude images

*C*

_{i}′ have variance . Similarly, we can require that the weights be chosen so that the magnitude of the sum

*C*′ is one. The sum then takes the form .

*u*

_{i}, and requiring that each derivative equal zero, we obtain a system of linear equations for the weights

*u*

_{i}whose solution is . It follows that the

*n*th weight is . The SNR of the sum

*C*′ is unaffected if we multiply each weight by , so the SNR of the weighted sum of images

*C*

_{i}′ is maximized with weights , and the SNR of the weighted sum of images

*C*

_{i}is maximized with weights : .

*μ*

_{i}and the variances

*σ*

^{2}

_{}

_{i}of a set of stochastic images, to find the optimal weights for a weighted sum of the images. In “ through ,” in which we derive optimal expression for calculating classification images, only the expected values

*μ*

_{i}of the noise fields depend on the noisy cross-correlator model (1). The variances

*σ*

^{2}

_{i}of the noise fields follow directly from the statistics of sampling error. Hence when we compare the predicted SNRs of each class of noise fields to the measured rSNRs, we are testing the model’s prediction of the expected value of each class of noise fields. If our predictions about the expected value of each class of noise fields are mistaken by a common scale factor

*b*, the weights we use to calculate a classification image will nevertheless be correct: rather than weighting each noise field by

*μ*

_{i}/

*σ*

^{2}

_{i}, as in (E5), we will weight each noise field by

*bμ*

_{i}/

*σ*

^{2}

_{i}, and multiplying each weight by a common scale factor does not change the SNR of the weighted sum. Hence the resulting classification image will still have the maximum possible SNR.

*X*determined by the stimulus and a random variable

*Y*determined by noise internal to the observer. It is sometimes useful to know the statistics of

*X*conditional on the sum

*X+Y*falling in a particular range. In this appendix, we state some results of this type, and we give a full proof of one such result. The proofs are simply evaluations of integrals, and we hope it is clear from the example given how the other integrals can be completed.

*Theorem.*Let and be independent Gaussian random variables, and let

*a*

_{1}and

*a*

_{2}be real numbers. Define and . That is,

*z*

_{1}and

*z*

_{2}are the normal deviates of

*a*

_{1}and

*a*

_{2}, respectively, with respect to the distribution of

*X*+

*Y*

_{1}or

*X*+

*Y*

_{2}. Then, . The function

*g*(

*x*,

*μ*,

*σ*) is the normal probability density function, and

*G*(

*x*,

*μ*,

*σ*) is the normal cumulative distribution function. When we omit the arguments

*μ*and

*σ*, they default to 0 and 1, respectively.

*μ*

_{X}≠

*0*, we can reduce the problem to the zero-mean case by defining and . Then, . This conditional mean can be evaluated using (F6), and we find .

*Journal of the Optical Society of America A*, 16, 647–653. [PubMed]