This study estimates the temporal dynamics of selective attention with classification images, a technique assessing observer information use by tracking how responses are correlated with external noise added to the stimulus. Three observers performed a yes/no discrimination of a Gaussian signal that could appear at one of eight locations (eccentricity—4.6°). During the stimulus duration (300 ms), a peripheral cue indicated the potential signal location with 100% validity, and stimuli were presented in frames (37.5 ms/frame) of independently sampled Gaussian luminance image noise. Stimuli were presented either with or without a succeeding masking display (100 ms) of high-contrast image noise, with mask presence having little effect. The results from the classification images suggest that observers were able to use information at the cued location selectively (relative to the uncued locations), starting within the first (0–37.5 ms) or second (37.5–75 ms) frame. This suggests a selective attention effect earlier than those found in previous behavioral and event-related potential (ERP) studies, which generally have estimated the latency for selective attention effects to be 75–100 ms. We present a deconvolution method using the known temporal impulse response of early vision that indicates how the classification image results might relate to previous behavioral and ERP results. Applying the model to the classification images suggests that accounting for the known temporal dynamics could explain at least part of the difference in results between classification images and the previous studies.

*t*= 0) or before selection because of the temporal blurring of the visual information.

*g*(

*t*) is denoted as a series of 8 frames (37.5 ms/frame), with each frame represented as a single value for that frame. The single value for each frame corresponds to the summary statistic used in this study, the integral of the stimulus (or classification image) over a given area (within two standard deviations of center of the Gaussian signal). Because of the added external image noise, these single summary values vary randomly from frame to frame. For the observer judgments (upper part of Figure 3), it is assumed that a linear combination of stimulus weights with the stimulus (dot product) leads to a decision variable (

*λ*), which is then compared to a criterion for a judgment of “yes” (

*λ*> crit) or “no” (

*λ*< crit). For the classification images, the same calculation of the decision variable and the decision rule leads to a sorting of stimuli by response outcome (in this case, no signal trials leading to “yes” responses, or false alarms). The classification image from the false alarms, or the average of the false alarm noise fields, is an estimate of the stimulus weights used to calculate the decision variable (

*λ*).

*g*(

*t*)) is transformed into an internal response by convolving the stimulus with the impulse response function (

*h*(

*t*)). The decision variable (

*λ*) arises from the dot product of the weight vector (

**(**

*w**t*)) and the convolved response, which is then compared to a criterion for a judgment of “yes” (

*λ*> crit) or “no” (

*λ*< crit; not shown in Figure 4). In this case, however, the weights represent the effect of a selective attentional mechanism operating at the cued location. (For the uncued locations, it is assumed that this weighting function is zero throughout, if selection is optimal.) We wish to recover this weighting function at the cued location as our estimate of the time course of selective attention, given the constraint of an impulse response function. The calculation of the classification image depicted in the bottom of Figure 4 is the same as in Figure 3. The difference is that the classification image time series now represents the convolution of the impulse response function (

*h*(

*t*)) with the cue-selective weighting function (

**(**

*w**t*)). Therefore, to recover

**(**

*w**t*), we must deconvolve the classification images with the impulse response function (

*h*(

*t*)), as represented in Figure 5 (for details, see Methods section).

**(**

*w**t*)) operates as a scalar upon the information available at the potential signal locations. Conceptually, it is specifically derived from a Bayesian ideal observer for a cueing task (Eckstein et al., 2002; Shimozaki, Eckstein, & Abbey, 2003), in which the optimal decision is determined by the scalar weighting of the likelihoods by the prior probabilities, or the cue validities. More in line with the potential variation in attentional weights in this study, a simple deviation from the ideal observer has been proposed so that the weights are free to vary (the weighted likelihood model; Eckstein et al., 2002; Shimozaki et al., 2003). There are other examples of visual attention modeled as a scalar weighting function. For example, Kinchla (1974, 1977) and Kinchla, Chen, and Evert (1995) proposed a model for cueing effects that is similar to the ideal observer (for a comparison, see Shimozaki et al., 2003), and Sperling and Weichselgartner (1995) proposed an attentional weighting function for various time-related measures, such as reaction times (Shulman et al., 1979), choice reaction times (Tsal, 1983), and threshold time durations (Lyon, 1987). The study by Eckstein et al. (2002) showed that both an ideal observer and a tuning model in which attention tunes a filter or template at the attended location (Spitzer, Desimone, & Moran, 1988; Yeshurun & Carrasco, 1998, 1999) may be behaviorally modeled as an attentional weighting function. Therefore, an attentional weighting function may be viewed as a relatively neutral and general description of selective attention.

^{2}). At each nonsignal location, a low-contrast version of the signal appeared (+11.7% peak contrast). Often this is called a “pedestal,” and also the task may be described as a contrast discrimination (of the pedestal vs. the signal plus pedestal). The stimuli presentations were divided into eight continuous frames (37.5 ms/frame, or 26.7 Hz) of Gaussian-distributed white luminance noise added to the pixels of the eight potential signal locations (

*SD*= 4.90 cd/m

^{2}), with a new independent sample of noise added to each frame. Also, separate experiments were run with or without a masking display; when present, the mask was composed of high-contrast Gaussian noise (

*SD*= 7.21 cd/m

^{2}) and was presented for 100 ms after the stimulus display. The yes/no judgments were untimed, and observers were prompted to respond after the mask display (in the mask study) or after the last stimulus frame (in the no mask study).

*d*′, the common measure of sensitivity. Human performance may be compared to an ideal observer that uses all possible information and perfectly integrates across the different noise samples. In Gaussian luminance noise, the image signal to noise ratio (SNR, which corresponds to the ideal observer

*d*′ for a single frame) can be calculated as follows (Burgess, Wagner, Jennings, & Barlow, 1981):

*E*=

^{t}.

*f*(8), the prediction of

*d*′ for the ideal observer (IO) is given by (Green & Swets, 1966)

*T*

^{2}statistic (Harris, 1985; also Abbey & Eckstein, 2002; Eckstein et al., 2002, 2004; Shimozaki et al., 2005), which is the multivariate equivalent of the

*t*statistic. The radial averaged classification images were compared against a null hypothesis of zero mean values and a variance of the added external noise. The two-sample Hotelling

*T*

^{2}statistic is given by the following equation:

v _{0} = a vector containing the null hypothesis classification image (all zeroes); |
---|

v _{1} = a vector containing the observed radial average classification image; |

n _{0}, n _{1} = the number of observations for each classification image (in this case, we assume that n _{0} = n _{1}); |

K ^{−1} = the inverse of the pooled covariance matrix for v _{0} and v _{1}; |

*T*

^{2}may be transformed into an

*F*statistic ( Equation 6) with

*p*(vector length of

*x*

_{0}and

*x*

_{1}) degrees of freedom in the numerator and

*n*

_{0}+

*n*

_{1}−

*p*− 1 degrees of freedom in the denominator,

*i*and location =

*l,*the raw classification image CI

_{ i,l},

*σ*= standard deviation of the Gaussian signal, and pixel position (

*x,*

*y*) in the raw classification image,

*h*(

*t*)) upon the measured area integrals of the classification image time series. As discussed earlier, the classification time series is assumed to be the convolution of the impulse response function (

*h*(

*t*)) and the cue-selective weighting function

**(**

*w**t*). As we were interested in the cue-selective weighting function (

**(**

*w**t*)) as the effect of selective attention in this task, we wished to separate the hypothesized effect of the impulse response function (

*h*(

*t*)) upon the classification image time series to recover

**(**

*w**t*). This is illustrated in Figure 5 as a deconvolution of the measured classification image time series with the impulse response function.

*τ,*a parameter varying the time delay of the impulse response function, with larger values indicating longer delays and slower responses (seeFigure 6). Therefore, in our fits, we chose to vary

*τ*from 4 to 8 ms, covering the range of

*τ*values that Watson used in his model fits (4.3–6.22 ms). Note that the

*τ*values represent the time constants of a single leaky integrator, and that the impulse response functions are comprised of cascades of several leaky integrators. Thus, the peak response times (approximately 40–72 ms) are considerably longer than 4–8 ms.

*g*(

*t*) be the temporal stimulus integrated over the disk. Because this function will be constant over the duration of a frame, Δ

*t,*we can write the stimulus as the sum of basis functions,

*h*(

*t*) is the temporal response of the visual system, the internal response of the visual system is

*c*

_{ i}weight in frame

*i,*then

*n*

_{ i}is the error (or noise) in the classification weight estimate. This equation shows a linear relation between the classification time series and the internal weighting that is mediated by the temporal response function

*h*. We look at approaches to inverting this linear relationship to solve for the internal weighting function

**(**

*w**t*). Because the internal weighting function is specified as a continuously defined function and we only have a finite number of measured classification weights, we will have to make some smoothness assumptions for the problem to be tractable.

*t*

_{ m}=

*m*Δ

*τ,*where Δ

*τ*is the time step that is presumed to be much smaller than Δ

*t*in Equation 9. This allows to rewrite Equation 13 as

**is an**

*c**N*-element vector of classification weights,

**is an**

*w**M*-element vector of internal weights, and

**is an**

*n**N*-element vector if classification weight errors. The nonsquare system matrix

**A**is defined by Equation 14 as

**), and it is underdetermined because Δ**

*n**τ*is presumed to be much smaller than Δ

*t,*leading to more unknowns than equations. We approach both of these problems through regularized least squares (RLS), also known as ridge regression (Golub & Van Loan,1989; Neumaier,1998).

**, by the formula**

*w***Σ**is the variance–covariance matrix of errors in the classification weights,

**R**is a regularizing matrix that stabilizes the inverse, and

*β*> 0 is a user defined parameter that is used to adjust the strength of the regularizing matrix. For this application, we use a minimum norm approach in which

**R**is set to be the identity matrix. The parameter

*β*is set so that the condition number of the inverse in Equation 17 is never greater than 100.

*d*′s close to 1.5 and efficiencies of about 0.07–0.010. Figure 7 depicts the raw classification images across the eight frames of the stimulus duration (the classification movies) for the cued location for all observers. On average in the mask study, 1,325 false alarm trials and 3,175 correct rejection trials (average false alarm rate = 0.295) were used to construct the classification movie for each observer. For the no mask study, these average numbers were 1,136 false alarm trials and 2,864 correct rejection trials (average false alarm rate = 0.284). Note that the classification images tend to mimic the signal profile, and the brightness of the center regions appears to increase and then decrease with time.

Condition | ( d′) | Percent correct | Efficiency | False alarm rate |
---|---|---|---|---|

Mask | ||||

K.C. | 1.34 | 74.8 | 0.0770 | 0.281 |

L.L. | 1.46 | 75.4 | 0.0917 | 0.308 |

No mask | ||||

D.V. | 1.26 | 73.0 | 0.0683 | 0.313 |

K.C. | 1.51 | 77.0 | 0.0980 | 0.255 |

*T*

^{2}tests for significance, Bonferroni-corrected for number of frames. The first significant (nonzero) cued location classification image after stimulus onset occurred in the first time frame (0–37.5 ms), except for D.V., who had the first significant nonzero classification image after stimulus onset in the second frame (37.5–75 ms). Also, as suggested by the raw classification images, for all observers the amplitudes of the classification images tended to decrease in the last frames.

Mask frame | K.C. | L.L. | ||||||
---|---|---|---|---|---|---|---|---|

T ^{2} | F | p | T ^{2} | F | p | |||

1 | 57.10 | 6.34 | <.0001 | * | 55.19 | 6.13 | <.0001 | * |

2 | 170.55 | 18.93 | <.0001 | * | 46.51 | 5.16 | <.0001 | * |

3 | 140.79 | 15.63 | <.0001 | * | 193.71 | 21.50 | <.0001 | * |

4 | 92.79 | 10.30 | <.0001 | * | 145.03 | 16.10 | <.0001 | * |

5 | 28.89 | 3.21 | .0007 | * | 40.77 | 4.53 | <.0001 | * |

6 | 27.93 | 3.10 | .0010 | * | 44.74 | 4.97 | <.0001 | * |

7 | 50.87 | 5.65 | <.0001 | * | 19.79 | 2.20 | .0195 | |

8 | 6.90 | 0.77 | .6487 | 5.28 | 0.59 | .8097 | ||

| ||||||||

No mask frame | D.V. | K.C. | ||||||

T ^{2} | F | p | T ^{2} | F | p | |||

1 | 22.15 | 2.46 | .0085 | 31.30 | 3.48 | .0003 | * | |

2 | 63.95 | 7.10 | <.0001 | * | 166.19 | 18.45 | <.0001 | * |

3 | 64.37 | 7.15 | <.0001 | * | 164.09 | 18.21 | <.0001 | * |

4 | 124.90 | 13.86 | <.0001 | * | 85.33 | 9.47 | <.0001 | * |

5 | 79.23 | 8.79 | <.0001 | * | 63.77 | 7.08 | <.0001 | * |

6 | 33.19 | 3.68 | .0001 | * | 13.29 | 1.48 | .1506 | |

7 | 46.46 | 5.16 | <.0001 | * | 33.28 | 3.69 | .0001 | * |

8 | 22.56 | 2.50 | .0074 | 7.68 | 0.85 | .5671 |

*t*tests for significant (nonzero) area integrals (Bonferroni adjusted for the number of frames). Figure 11 clearly demonstrates that the magnitudes of the time series rose to a peak (at 75–150 ms), and that diminished magnitudes were found at the end of the stimulus presentation (187.5–225 ms after stimulus onset). For the observers excluding D.V., the area integrals were significantly different from zero in the first frame, as in the radially averaged classification images ( Figure 8 and Table 2). For D.V., the area integrals were not significant for the first three frames, similar to his results for the radial averages that also indicated a longer delay for a cue-selective information use. Both with and without the mask, K.C. tended to have peaks earlier in the stimulus duration (about 75 ms) than L.L. and D.V. (about 150 ms).

Mask frame | K.C. | L.L. | ||||||||
---|---|---|---|---|---|---|---|---|---|---|

AI | SE | t | p | AI | SE | t | p | |||

1 | 41.47 | 8.52 | 4.87 | <.0001 | * | 47.97 | 8.50 | 5.64 | <.0001 | * |

2 | 87.16 | 8.57 | 10.17 | <.0001 | * | 32.47 | 8.50 | 3.82 | .0001 | * |

3 | 89.72 | 8.50 | 10.55 | <.0001 | * | 78.55 | 8.51 | 9.24 | <.0001 | * |

4 | 57.31 | 8.41 | 6.81 | <.0001 | * | 79.45 | 8.55 | 9.30 | <.0001 | * |

5 | 30.77 | 8.50 | 3.62 | .0003 | * | 31.13 | 8.45 | 3.68 | .0002 | * |

6 | 31.74 | 8.46 | 3.75 | .0002 | * | 51.90 | 8.46 | 6.14 | <.0001 | * |

7 | 4.65 | 8.41 | 0.55 | .5804 | 33.09 | 8.48 | 3.90 | .0001 | * | |

8 | 1.59 | 8.48 | 0.19 | .8516 | −1.61 | 8.53 | −.190 | .8504 | ||

| ||||||||||

D.V. | K.C. | |||||||||

No mask frame | AI | SE | t | p | AI | SE | t | p | ||

1 | 20.96 | 8.49 | 2.47 | .0135 | 41.88 | 8.23 | 5.09 | <.0001 | * | |

2 | 20.65 | 8.52 | 2.42 | .0154 | 90.45 | 8.30 | 10.90 | <.0001 | * | |

3 | 20.32 | 8.50 | 2.39 | .0169 | 97.12 | 8.27 | 11.75 | <.0001 | * | |

4 | 72.11 | 8.52 | 8.46 | <.0001 | * | 53.55 | 8.20 | 6.53 | <.0001 | * |

5 | 33.29 | 8.54 | 3.80 | .0001 | * | 36.24 | 8.23 | 4.40 | <.0001 | * |

6 | 23.45 | 8.49 | 2.76 | .0057 | * | 14.13 | 8.21 | 1.72 | .0854 | |

7 | 30.10 | 8.41 | 3.58 | .0003 | * | 19.40 | 8.17 | 2.37 | .0176 | |

8 | 13.63 | 8.46 | 1.61 | .1072 | 10.32 | 8.24 | 1.25 | .2104 |

**(**

*w**t*)) from the deconvolution of the time series of the area integrals of the classification images for the observers.

*τ*tested from 4 to 8 ms. As expected, the larger values of

*τ*(or longer delays and slower responses) lead to larger shifts in time of the weighting functions from the area integrals. As seen in Panel A, the fits for

*τ*= 4 ms led to systematic harmonic deviations from the classification images (“ringing”). This suggests that

*τ*= 4 ms was an inappropriate delay to represent the current classification image data, possibly because

*τ*= 4 ms was too short, or because of the noise associated with these particular data.

*τ*with the estimated errors; because of the ringing for

*τ*= 4 ms, we chose to present

*τ*= 5 ms instead of

*τ*= 4 ms. It can be seen that, after an initial delay, the weighting function for

*τ*= 5 ms initially rises more quickly than the classification image area integrals, whereas the weighting function for

*τ*= 8 ms rises at the same rate. From the standard errors of the area integrals and the estimated standard errors of the weighting function, two-sample independent

*t*tests were calculated for significant differences of the weighting functions from zero, Bonferroni-adjusted for the number of points in the fitted weighting function. This analysis found that the weighting functions significantly differed from zero from 9 to 17 ms for

*τ*= 5 ms and from 13 to 28 ms for

*τ*= 8 ms. Thus, it appears that the model with the impulse response function could account for at least part but not all of the differences between the results from the classification images and the results from previous studies.

*d*′ and efficiency, see Table 1) of D.V. compared to the other observers. However, the classification images do not seem indicative of the differences in performance among L.L. and K.C. (in her two studies).

*h*

_{1}and

*h*

_{2}). Using Watson's notation,

u( t) = unit step function; |
---|

n _{1} = the number of individual leaky integrators for Stage 1; |

τ = time delay constant; |

n _{2} = the number of individual leaky integrators for Stage 2; |
---|

κτ = time delay constant, with κ > 1; |