We consider three simple forced-choice visual tasks—detection, contrast discrimination, and identification—in Gaussian white noise. The three tasks are designed so that the difference signal in all three cases is the same difference-of-Gaussians (DOG) profile. The distribution of the image noise implies that the ideal observer uses the same DOG filter to perform all three tasks. But do human observers also use the same visual strategy to perform these tasks? We use classification image analysis to evaluate the visual strategies of human observers. We find significantly different subject classification images across the three tasks. The domain of greatest variability appears to be low spatial frequencies [<5 cycles per degree (cpd)]. In this range, we find frequency enhancement in the detection task, and frequency suppression and reversal in the contrast discrimination task. In the identification task, subject classification images agree reasonably well with the ideal observer filter. We evaluate the effect of nonlinear transducers and intrinsic spatial uncertainty to explain divergence from the ideal observer found in detection and contrast discrimination tasks.

*j*th trial by

**g**

_{j}^{+}, and the signal-absent (alternative) image by

**g**

_{j}^{−}. The trial index,

*j,*runs from 1 to the number of trials,

*N*

_{T}. These images are defined by

*b*is the task-dependent mean background intensity—including any common signal pedestal,

*s*is the difference signal, and

**n**

_{j}^{+}and

**n**

_{j}^{−}are noise fields associated with each alternative. The profile of the difference signal is held constant across tasks although the contrast of the signal was adjusted from the results of pilot studies to achieve targeted levels of task performance. We utilize the method of constant stimuli, so the signal,

**s**, is unchanging throughout an experiment. Random number generators are used to create uncorrelated (white) Gaussian luminance noise with a pixel standard deviation of

*σ*. The noise fields in signal-present and signal-absent alternatives are independent of each other and independent across experimental trials as well.

_{n}**g**

_{j}^{+}), and 0 for incorrectly identifying the alternative image (

**g**

_{j}^{−}) as the target. We will refer to the trial score by the variable

*o*(indicating the outcome of trial

_{j}*j*). The proportion of correct responses (

*P*

_{C}) is defined as the expected value of

*o*

_{j}, P_{C}=

*E*(

*o*), where

_{j}*E*indicates the mathematical expectation of a random variable. In 2AFC experiments, we estimate this quantity with the sample average across trials

*d*′, by the formula (Green & Swets, 1966),

*λ,*as

**w**is a vector of spatial weights and

*ɛ*is a stochastic internal noise component. The internal noise component is assumed to be a Gaussian-distributed random variable that is independent of

**g**with a mean of 0 and standard deviation,

*σ*. The weighting vector,

_{ɛ}**w**, governs how the spatial distribution of intensity in the image influences the observer's response. As such, it encodes the visual strategy used by the observer. Under this linear model, the detectability index for the images defined in Equation 1 is

**w**=

**s**, and

*σ*= 0.

_{ɛ}**n**

*=*

_{j}**n**

_{j}^{+}−

**n**

_{j}^{−}, and

*N*

_{T}is the number of trials. For experiments with

*P*

_{C}values near 0.85, the

*o*−

_{j}*q*is directly related to the template (Abbey & Eckstein, 2002a) by

_{j}*d*′ is the detectability index defined in Equation 7. To estimate the classification image, we use the sample average across 2AFC trials instead of the expectation in Equation 9 to obtain

**y**

*as the radial average of Δ*

_{j}**q**

*and let Δ*

_{j},**u**

*be the radial average of the DFT of Δ*

_{j}**q**

*. The spatial and spatial-frequency radial averages can be computed analogously to Equation 10 with*

_{j}*S*. The standard errors for each radial bin in Figure 3 are computed from the square root of the diagonal elements of these matrices after division by

_{Δu}*N*

_{T}. Because of the increased number of pixels included in larger radial bins, the standard errors decrease further from the center of the image. Various Hotelling

*T*

^{2}hypothesis tests for classification images based on these sample statistics are described by Abbey and Eckstein (2002a). An alternative maximum-likelihood approach is described by Solomon (2002).

**w**=

**s**, defined in Equation 1) but have a significant internal noise component (

*σ*> 0). We can scale this model by finding the internal noise variance such that the performance of the model is equal to the performance of each human observer. For a linear model with the Gaussian assumptions used here,

_{ɛ}*σ*using the human observer

_{ɛ}*P*

_{C}values and the definition of

*d*′ in Equation 7 for the ideal observer. Once we have obtained

*σ*in this manner, the magnitude of the expected template can be computed from Equation 9. The scaled ideal observer serves as a convenient reference for human observer data. If the human observer were simply equivalent to the ideal observer corrupted by internal noise, the resulting classification image should closely match this reference.

_{ɛ}^{2}. The monitor control board (Dome Imaging Systems, Waltham, MA) used photometer measurements to calibrate monitor driving voltages leaving access to 256 gray levels (GL) on the linear scale. The mean background luminance of the stimuli was 31.3 cd/m

^{2}(100 GL). Viewing distance was approximately 1 meter. Display pixels were 0.3 mm (0.017 deg visual angle) on each side.

*d*′ = 1.5) across observers in each task. The resulting target and alternative contrasts in each experiment are given in Table 1.

Task | Target, contrast | Alternative, contrast |
---|---|---|

Detection | DOG, 7.96% | None, 0.0% |

Discrimination | DOG, 69.93% | DOG, 60.00% |

Identification | Gaussian, 19.26% (σ = 3.3 min) | Gaussian, 12.32% (σ = 4.1 min) |

^{−6}deg

^{2}. across all experimental conditions. At this level, the noise in the images is well above threshold (see Figure 2). The influence of the noise is important because classification image analysis only works for tasks that are at least partially limited by some form of external noise in the image. In addition, the noise encompasses a fairly large number of luminance quantization levels in the display (for example, 29 quantization levels between the ±

*σ*levels of the noise). Burgess (1985) has shown that the errors introduced by quantizing a signal to discrete intensity levels are well modeled psychophysically as an additional noise source with variance equal to the size of the quantization step squared and divided by 12. For the noise contrast used here, quantization to discrete GL boosts the overall noise variance by less than 0.04% and can be neglected.

*T*

^{2}

*p*values testing the ideal observer matched to each human observer's performance in each task. We have performed these tests in both the spatial and spatial-frequency domain to capture a greater number of possible features for inferring the significance of differences. We use a subset consisting of the 12 bins closest to the origin but excluding the origin itself. In the spatial domain, the region used for inference includes bins centered on radial distances from 0.017 to 0.189° visual angle. In the frequency domain, this region consists of bins centered on 0.9–10.0 cpd. These bins were chosen to capture the area where the most differences would be expected to occur, while excluding the origin because of its relatively high standard error. As can bee seen in the table, in most cases the ideal observer with internal noise can be rejected as a model of the observed classification images. Also in Table 2C and D are

*p*values for differences between the classification images in different tasks for each observer. For all subjects, we find that the classification image for the discrimination task is significantly different than the detection of identification tasks. For comparisons between detection and identification, the plots are qualitatively more similar, and the significance of differences is mixed depending on subject and test (spatial or spatial frequency).

Detection | Discrimination | Identification | |
---|---|---|---|

(A) Departure from ideal observer: Spatial domain p values | |||

DV | <.0001 | <.0001 | .0261 |

CA | <.0001 | <.0001 | .0032 |

CH | .0005 | <.0001 | <.0001 |

(B) Departure from ideal observer: Frequency domain p values | |||

DV | .0003 | <.0001 | .6972 |

CA | <.0001 | <.0001 | .0003 |

CH | .0002 | <.0001 | <.0001 |

Det. vs. Disc. | Det. vs. Ident. | Disc. vs. Ident. | |

(C) Differences between tasks: Spatial domain p values | |||

DV | <.0001 | .0053 | <.0001 |

CA | <.0001 | .0004 | <.0001 |

CH | <.0001 | .6999 | <.0001 |

(D) Differences between tasks: Frequency domain p values | |||

DV | <.0001 | .1080 | <.0001 |

CA | <.0001 | .0187 | <.0001 |

CH | <.0001 | .2469 | <.0001 |

*d*′) versus signal contrast (Eckstein, Ahumada, & Watson, 1997). This fact has been used to test for a linear decision variable by fitting a line to psychometric data, and then—assuming it fits reasonably well—looking for a

*y*-intercept that is different from zero. Figure 7A shows psychometric data for the experiments reported in this work with line fits to average observer performance (solid lines). The intercepts are all fairly close to zero (none are significantly different), which is in agreement with the hypothesis of a linear decision variable.

**n**

^{+}instead of Δ

**n**) to that derived from only the signal-absent images (Abbey & Eckstein, 2002a). Under the null hypothesis of a linear observer, these should not be significantly different. Figure 7B gives tables of

*p*values for testing this hypothesis. The analysis shows relatively little evidence for nonlinearity given

*p*values above .01 and the number of comparisons. A Bonferroni correction for multiple comparisons in each table would require a

*p*value less than .0012 for a group rejection rate of .01, which is not found in either the spatial or spatial-frequency data.

*σ*is the standard deviation of the luminance noise in the stimuli. Following the derivation of Murray et al. (2005), we find that the efficiency of a 2AFC classification image computed from Equation 10 is estimated by

_{n}**s**

*is the ideal observer's classification image scaled to have a norm of 1, Δ*

_{n}*d*′ is the observer's detectability index estimated from

*d*′ values as in Equation 4, is plotted along the

*y*-axis, and the predicted efficiency computed from the classification image is used for the

*x*-axis. Consistent with Murray et al., we find that efficiency estimates derived from the classification images are slightly less than those obtained from the ratio of

*d*′ values, by a factor of 0.11 (they report 0.13).

*T*

^{2}test (

*df*= 11,

*p*> .048) (Abbey & Eckstein, 2002a). These two observers are both significantly different from observer CH (

*p*< .0001), who utilizes a much broader range of spatial frequencies, extending above 10 cpd, and shows little to none of the negative weighting at the lowest spatial frequencies. This finding is similar to recent works by Meese, Hess, and Williams (2005) and Solomon (2002), which suggest individual differences in spatial summation for suprathreshold tasks. Also noteworthy here is that the three observers had nearly identical levels of performance in this task (CA: 85.1%, DV: 85.9%, and CH: 86.1%). So we find that classification image analysis is able to detect differences in the way that observers perform tasks that are not observable from performance measures alone.

**w**=

**s**). The functional implementation of this transducer is to take the cross-correlation component of Equation 6 and apply a sigmoidal nonlinearity to it before adding the Gaussian internal noise component. For a sigmoidal response function, we choose the cumulative normal distribution,

*Φ*. This yields the nonlinear response function,

*γ*and

*β*are the gain and offset applied to the filter response before transduction, and

*ɛ*is posttransduction internal noise. A plot of the transducer function and values of gain, offset, and standard deviation of the internal noise component are given in Figures 9A and B.

*P*

_{C}determined from 2000 psychophysical trials in each task. This good agreement is not overly surprising given that we have three observed performance levels to match and three free parameters in the model (gain, offset, and internal noise standard deviation). Nonetheless, Figure 9B shows that a sigmoidal transducer can explain the observed performance data with high accuracy.

*T,*that acts on each point in an image,

*g*. The posttransduction image,

*y,*is defined by

*g*is the intensity of the image at location (pixel)

_{m}*m, y*is the posttransduction intensity at

_{m}*m,*and

*ν*is posttransduction (internal) noise at

_{m}*m*. We will assume that the posttransduction noise is white with uniform standard deviation

*σ*. We consider the ideal observer strategy after the image has undergone an early nonlinearity followed by noise as described in Equation 17. 1 describes how a first order Taylor series yields approximately optimal image weights,

_{ν}*T*′(

*b*+ 0.5

_{m}*s*) is the derivative of the transducer function evaluated at the average of the target (

_{m}*b*+

_{m}*s*) and alternative (

_{m}*b*) defined in Equation 1.

_{m}*s*to obtain a weight-to-signal ratio,

_{m}*w*

_{m}is the opposite sign of

*s*we can rule out the template as optimal because there is no (real-valued) transducer function that can make the right side of Equation 19 negative. Also, if two different points,

_{m},*m*and

*m*′, in an image happen to satisfy

*b*+ 0.5

_{m}*s*=

_{m}*b*

_{m′}+ 0.5

*s*

_{m′}, then we must have

*w*

_{m}/

*s*=

_{m}*w*

_{m′}/

*s*

_{m′}, or else the weights are not optimal. This latter point can be applied to different tasks as well.

**b**and signal

**s**(as above), and a second task with background

**u**and signal

**x**. We also have two sets of weights,

**w**for the first task and

**z**for the second task, and we would like to know if they are optimal under the same transducer function. If there is some contrast level in the images that is common to both, then

*b*+ 0.5

_{m}*s*=

_{m}*u*

_{m′}+ 0.5

*x*

_{m′}, for some

*m*and

*m*′. If the same transducer applies to both tasks, then it must have the same slope at these points and hence me must have

*w*

_{m}/

*s*=

_{m}*z*

_{m′}/

*x*

_{m′}. Failure to achieve this effectively rules out the possibility of optimal weights with any common transducer. Figure A1 in the 1 gives an example of what the weight-to-signal ratios look like for a common early transducer.

*T*′ (i.e., a steeper transducer function). The strong inhibitory region in the discrimination task implies a relatively steep transducer slope for image intensities that are slightly less than the average intensity of the background. However, we would then also expect relatively strong inhibitory regions in the detection task as well, which are not observed in the data.

*u*identifies a spatially displaced DOG template,

**w**

*(||*

_{u}**w**

*|| = 1). We presume that each filter has its own independent internal noise component, so the internal noise also requires the subscript*

_{u}*u*in this model. This model has two free parameters, the range of uncertainty, which determines the number of filters, and the standard deviation of the internal noise in each filter.

*d*′). The most pronounced differences between the ideal observer and the spatial uncertainty model occur, as expected, in the detection task. While the uncertainty model clearly does not match the classification image of subject DV, it does display one qualitative similarity, namely, the shift of emphasis to lower spatial frequencies. It is also worth noting that the uncertainty model is a better qualitative match to observers CA and CH, whose spatial-frequency classification images are more strongly peaked in the detection task, as can be seen by comparison with Figure 8.

*y*

_{m}

*,*we model the formation of an internal response as a weighted sum over the posttransduction image,

**g**. We will consider Taylor series about each point

*m,*expanded about the average intensity of the images at each point,

*b*

_{m}+ 0.5

*s*

_{m}is the average intensity at location

*m,*and

*T*′ is the first derivative of

*T*. Substituting Equation A2 into Equation A1 will give an approximate expression for the internal response. But Equation 5, that is, Score = Step[

*λ*(

**g**

^{+}) −

*λ*(

**g**

^{−})], shows how decisions in a 2AFC task are taken as a difference in responses, and thus constant terms that will be in both signal-present and signal-absent images are irrelevant. Therefore, an equivalent linear weighting of image intensities is given by

**w**and a scalar internal noise variable

*ɛ*. Equation A3 defines these as

*q*

_{m}.

**g**

^{+}=

**b**+

**s**+

**n**

^{+}and

**g**

^{−}=

**b**+

**n**

^{−}, respectively. We can use these definitions in Equation A2 to obtain first order propagation through the transducer to

**y**

^{+}and

**y**

^{−},

*σ*

_{n}is the standard deviation of the image noise. The resulting linear image weights are given using Equation A4 as