The relationship between attention and visual masking was investigated in a cued detection task using a factorial masking manipulation. Stimuli were either unmasked, or were masked with simultaneous (integration) masks, or delayed (interruption) masks, or integration-interruption mask pairs. The cuing effects in detection sensitivity were smallest with unmasked stimuli, intermediate with single masks, and largest with integration-interruption pairs. Large cuing effects in RT were found in all stimulus conditions. The results are inconsistent with general mechanisms of contrast gain and response gain, which do not predict interactions with interruption masks. The data were modeled using the integrated system model of visual attention of P. L. Smith and R. Ratcliff (2009), which provides an account of both RT and accuracy. The model fits suggest the action of two independent attentional mechanisms: an early selection mechanism that enhances the perceptual representation of attended, noisy stimuli, and a late selection mechanism that increases the rate of information transfer to visual short-term memory. The results are consistent with a distributed, multi-locus system of attentional control.

*integration*and

*interruption*masking (Kahneman, 1968). The terminology is of comparatively recent origin, but Breitmeyer (1984) attributes the ideas to Stigler in 1926. In integration masking, the target and masking stimulus fuse to form a perceptual composite whose signal-to-noise ratio is lower than that of the target in isolation. In interruption masking, the processing of the target is terminated prematurely by the presentation of a subsequent mask. Turvey (1973) used a clerk-customer analogy to characterize interruption masking: The time a store clerk can spend serving a customer will be truncated if another customer arrives while the first customer is still being served.

*mask-dependent cuing effect*in a series of studies comparing performance with masked and unmasked stimuli (Liu, Wolfgang, & Smith, 2009; Smith, Lee, Wolfgang, & Ratcliff, 2009; Smith, Ratcliff, & Wolfgang, 2004; Smith & Wolfgang, 2004, 2007; Smith, Wolfgang, & Sinclair, 2004). The integrated systems model explains this dependency on backward masking by assuming that attention affects the efficiency with which stimulus information is transferred to visual short-term memory (VSTM) and that backward masks interrupt this process before it is complete. The integrated systems model thus posits a theoretical link between cuing effects and interruption masking.

^{2}, uniform field. The mathematical form of the Gabor patches was as given by Graham (1989, p. 53). The sinusoid had a spatial frequency of 3.5 cpd, and the Gaussian envelope had a space constant (full width at half height) of 0.463°, giving a bandwidth of 0.80 octaves. Examples of the stimuli and the display configurations are shown in Figure 1. Because contrast thresholds for orthogonal discrimination are indistinguishable from those in yes-no detection (Thomas & Gille, 1979) researchers have treated the two tasks as equivalent for the purposes of drawing inferences about attention (Cameron, Tai, & Carrasco, 2002; Lee, Koch, & Braun, 1997). Consistent with this, we have found the same pattern of mask-dependent cuing effects with backward (interruption) masks in the two tasks (Smith, 2000a; Smith, Ratcliff, et al., 2004; Smith, Wolfgang, et al., 2004). For our purposes, the orthogonal discrimination task has two advantages over the yes-no task: It is relatively unbiased, minimizing criterion effects, and it yields very similar distributions of RT for the two responses. This simplified the task of fitting our mathematical models.

*α*(0 <

*α*≤ 360°) determined the position of the cue on the circumference of the circle on each trial. The two possible miscued locations were at

*α*± 120°. The miscued locations were equidistant from the cued location and therefore should have received equivalent processing resources. The cue consisted of four, black, right-angle markers identifying the corners of a 1.8° square centered on one of the three possible target locations. The cues were flashed for 60 ms at a cue-target SOA of 140 ms. A weakly predictive cuing manipulation was used (Gould et al., 2007; Smith, Ratcliff, et al., 2004): Stimuli were presented at the cued location on 50% of trials and at each of the possible uncued locations on 25% of trials. Masks were presented only at the stimulus location; the other possible stimulus locations were left blank.

*P*

_{ V}(

*C*) and

*P*

_{ H}(

*C*), to

*d*′ statistics using the formula

*z*[.] denotes the inverse Gaussian (z-score) transformation. The factor of

*d*′ measures from a discrimination task onto the same scale as those from a yes-no task (Wickens, 2002, p. 122). We then fitted Weibull functions of the form

*d*′ estimates as a function of contrast,

*c,*for each observer, using the method described in 1. This method yields an approximate chi-square goodness-of-fit statistic which we used to quantify model fit.

Observer | Unmasked | Noise | Pattern | Both |
---|---|---|---|---|

Δ χ ^{2}(3) | Δ χ ^{2}(3) | Δ χ ^{2}(3) | Δ χ ^{2}(3) | |

R.E. | 0.46 | 8.04* | 1.75 | 32.18*** |

C.L. | 12.81** | 26.00*** | 35.94*** | 40.69*** |

J.P. | 3.29 | 8.26** | 79.57*** | 107.32*** |

S.C. | 2.92 | 11.99** | 21.89*** | 80.42*** |

Group | 4.76 | 13.56** | 34.79*** | 65.43*** |

*d*′ values in the bottom row of Figure 2 also show this effect.

Observer | Unmasked | Noise | Pattern | Compound |
---|---|---|---|---|

χ ^{2}(2) | χ ^{2}(2) | χ ^{2}(2) | χ ^{2}(2) | |

R.E. | 12.14** | 19.75*** | 30.40*** | 11.51*** |

C.L. | 19.41*** | 29.52*** | 40.85*** | 24.78*** |

J.P. | 34.87*** | 67.82*** | 57.74*** | 75.15*** |

S.C. | 23.86*** | 26.44*** | 72.94*** | 102.65*** |

Group | 22.57*** | 35.88** | 50.48*** | 53.52*** |

*r*(

*c*), which depends on the contrast of the stimulus,

*c,*and a temporal response function,

*μ*(

*t*), which depends on the response characteristics of the visual filter. The amplitude function is assumed to be a Naka-Rushton function of the form

*c*

_{in}is a divisive inhibition term, which determines the horizontal position of the function on log-contrast axes (Boynton, 2005), and the constant

*ρ*describes the nonlinearity of contrast transduction in the early visual system. (The inhibition term is often written as

*c*

_{in}=

*c*

_{0.5}

^{ρ}, where

*c*

_{0.5}is the so-called semisaturation constant, that is, the value of contrast at which the function attains half its maximum value of 1.0. We prefer to write it in the form in Equation 4 because it decouples the effects of divisive inhibition from the effects of nonlinearity.) Equation 4 or some variant of it have been widely used to model the psychophysics and neurobiology of visual contrast sensitivity (Foley, 1994; Heeger, 1991; Kaplan, Lee, & Shapley, 1990). The temporal response function,

*μ*(

*t*), is defined as

*t*;

*β, n*) is the output of a linear filter comprised of

*n*identical exponential stages,

*d*is the stimulus duration. The quantities

*β*

_{on}and

*β*

_{off}are filter time constants that determine the onset time (rise) and offset time (fall) of the filter response. The representation of Equation 5 generalizes the usual linear system model of the visual temporal response (e.g., Sperling & Weichselgartner, 1995) to allow the rise and fall times of the filter to be different. Equation 5 has low-pass filter characteristics; its effect is to transform a brief rectangular pulsed stimulus into a smooth time-varying function of the form shown in Figure 6.

^{1}Further discussion of the properties of Equation 5 may be found in Smith and Ratcliff (2009).

*β*

_{on}>

*β*

_{off}in Equation 5; when they are backwardly masked we assume that

*β*

_{off}>

*β*

_{on}. The resulting sensory response functions for masked and unmasked stimuli are shown in Figure 6b. We assume that masking occurs by interruption, represented in Equation 5 as multiplicative suppression. We assume that integration masks reduce the effective contrast of the stimulus and may also change the shape of the transducer function,

*r*(

*c*), in Equation 4. We discuss the possible effects of integration masks on sensory transduction in more detail subsequently.

*shunting*differential equation. The distinguishing feature of shunting equations is that the function describing the stimulus information enters into the equation multiplicatively rather than additively, as occurs in the more familiar linear-system model. This gives them nice properties when modeling VSTM processes, as we discuss below. Shunting equations have been used in models of neural computation (Grossberg, 1988; Wilson & Cowan, 1973), visual masking (Ogmen, Breitmeyer, & Melvin, 2003), and luminance discrimination (Sperling & Sondhi, 1968). Indeed, the Hodkin-Huxley equations of neural conduction are of shunting type (Tuckwell, 1988). However, the application closest to ours is that of Busey and Loftus (1994) who, like us, use a shunting equation to model the time course of VSTM formation.

*ν*(

*t*), in response to the information in the stimulus. To ensure that the VSTM trace does not saturate at long exposures, we assume the trace arises as the result of an opponent process, or excitatory-inhibitory coding scheme. In its simplest form, the VSTM growth equation can be written as

*r*(

*c*)

*μ*(

*t*) is the excitatory coefficient and [1 −

*r*(

*c*)]

*μ*(

*t*) is the inhibitory coefficient. The equation has the property that, once the visual filter output has decayed to zero, that is, when

*μ*(

*t*) = 0, the derivative

*dν*/

*dt*also goes to zero and the trace stops changing. This property of shunting equations makes them natural models of the way in which a durable VSTM trace is computed from a transient perceptual event.

*γ*

_{ i}in Equation 7 is an attention gain parameter, which controls the rate of VSTM growth. Minimally, we assume that gain takes one value,

*γ*

_{ A}, when stimuli are attended and another value,

*γ*

_{ U}, when they are unattended, with

*γ*

_{ A}>

*γ*

_{ U}. The core assumption of the model is that attention affects the rate at which stimulus information is transferred to VSTM. When stimuli are attended, the rate of transfer is rapid; when they unattended, transfer is slow. All of the effect of attention on performance in the model occurs via changes in the gain parameter. Smith (2000a) proposed that attention affects the rate of information transfer from early visual filters to later processing stages as an explanation of the mask-dependent cuing effect. Carrasco and McElree (2001) provided empirical support for this idea at about the same time. Smith and Wolfgang (2004) subsequently developed a quantitative model of this transfer, which is a precursor of the integrated system model. The principal difference between their model and the Smith and Ratcliff (2009) model is that the later model has an explicit VSTM stage between the visual filters and decision process. The VSTM stage was added to the model to allow it to predict the shapes of RT distributions with brief, masked stimuli. The role of the VSTM stage is to preserve the information in the stimulus in a durable form for the second or so needed to make a decision. Without such a stage, if the decision process were driven directly by the decaying outputs of early visual filters, the model would predict RT distributions to low contrast stimuli that are far more skewed than those that are found experimentally (Ratcliff & Rouder, 2000).

*ν*(∞), is proportional to

*r*(

*c*), the Naka-Rushton transduced stimulus contrast. The constant

*θ*determines the rate at which evidence in the VSTM trace is accumulated by the decision process. It characterizes the effective information content of the stimulus and will in general depend on the similarity of the stimulus alternatives. Second, the growth to asymptote is exponential, with a rate that depends on the attention gain parameter,

*γ*

_{i}. The approach to asymptote depends on the area under the temporal response function,

*μ*(

*t*). As shown in Figure 6b, this area depends on whether or not the stimulus is backwardly masked. When stimuli are unmasked and gain is large, the final VSTM trace strength will closely approach

*ν*(∞). When stimuli are masked and when gain is low, the final trace strength may be less than

*ν*(∞). This property is central to the model's ability to predict the mask-dependent cuing effect as we discuss in the following section.

^{2}

*X*(

*t*). The growth of evidence in the decision process over time is described by a stochastic differential equation (SDE) of the form

^{3}

*dX*(

*t*), the random change in decision stage activation during a small time interval of width

*dt*. The change in activation consists of two parts: a deterministic part and a stochastic part. The deterministic part, or drift, is equal to

*ν*(

*t*), the strength of the VSTM trace. The term

*ν*(

*t*)

*dt*is the mean increase in trace strength during a small interval

*dt*. The stochastic part,

*σ*(

*t*)

*dW*(

*t*), describes the moment-to-moment effects of noise. The process

*dW*(

*t*) is the differential of

*W*(

*t*), the Brownian motion, or Wiener diffusion, process with standard deviation

*σ*(

*t*).

*a*

_{1}or a lower boundary at

*a*

_{2}. If the first boundary reached is

*a*

_{1}a “vertical” response is made; if it is

*a*

_{2}a “horizontal” response is made. The time of the first boundary crossing determines the decision component of RT. The trial-to-trial variability of the process allows the model to predict errors and distributions of RT. The model of decision-making in Figure 7 is like the diffusion model of Ratcliff (1978) except that the drift rate,

*ν*(

*t*), is time-dependent, because the strength of the VSTM trace changes over time. In Ratcliff's model, the drift rate remains constant for the duration of a trial. Mathematically, the assumption that the drift rate changes over time makes the model time-inhomogeneous, whereas Ratcliff's model is time-homogeneous. We also assume that the moment-to-moment noise entering the decision process increases in proportion to the drift to a fixed asymptotic value. This makes the diffusion term

*σ*(

*t*) time-inhomogeneous as well. Smith and Ratcliff (2009) discuss why this assumption is a necessary and appropriate one in this setting. Methods for deriving RT and accuracy predictions for time-inhomogeneous diffusion models were described by Smith (2000b).

*β*

_{off}. The VSTM trace grows rapidly to an asymptote for cued stimuli and slowly to an asymptote for miscued stimuli, but the asymptotic trace strength for the two kinds of stimuli is the same, as shown in the upper left panel of Figure 8. Under these circumstances, the model predicts shorter RTs for cued than miscued stimuli but no differences in accuracy (sensitivity).

^{4}When a backward mask interrupts the stimulus (large

*β*

_{off}), the situation is different. If the mask suppresses the stimulus before the VSTM trace formation process has run to completion, cued stimuli will have an advantage because of their higher rate of attention gain (larger value of

*γ*). As a result, more of the VSTM trace will have formed before the mask suppresses the stimulus. Consequently, the asymptotic trace strength for cued stimuli will be greater than that for miscued stimuli, as the upper right panel of Figure 8 shows. The model predicts both shorter RTs and higher accuracy for cued, backwardly masked stimuli. Smith and Ratcliff (2009) showed that this model accurately describes the RT distributions and response accuracy for the masked and unmasked data reported by Smith, Ratcliff, et al. (2004).

*γ*

_{ A}and

*γ*

_{ U}), whereas the effect of the interruption mask is to reduce stimulus persistence (a large value of

*β*

_{off}). Combining a reduced rate of VSTM transfer with reduced stimulus persistence leads to an enhanced cuing effect. We call this the

*single-locus*hypothesis. This hypothesis attributes the cuing effects with both integration and interruption masks to a common process of VSTM transfer. Because the model assumes that the effect of miscuing or inattention is to slow the rate of VSTM transfer, it predicts shorter RTs to cued stimuli than to miscued stimuli, regardless of mask condition. That is, it predicts an unconditional Posner effect in RT, as shown in Figures 4 and 5. Qualitatively, both of these predictions are consistent with our experimental results. We report a detailed quantitative test of the single-locus hypothesis in following section.

*G*

^{2}),

*p*

_{ij}and

*π*

_{ij}are, respectively, the predicted and observed proportions in the bins bounded by the quantiles, and “log” is the natural logarithm. The inner summation over

*j*extends over the 12 bins formed by each pair of joint distributions of correct responses and errors. (There are five quantiles per distribution resulting in six bins per distribution, or 12 bins per distribution pair.) The outer summation over

*i*extends over the five stimulus contrasts in each of the two cue conditions and four mask conditions (40 distribution pairs in all). The quantity

*n*

_{i}is the number of experimental trials in each condition. We set this to 252, the number of trials per data point per observer. This is consistent with our interpretation of the quantile-averaged distributions as the performance of an “average observer.” Because

*G*

^{2}computed on the joint distributions depends on the relative proportions of correct responses and errors, it characterizes goodness-of-fit to the distribution shapes and the choice probabilities simultaneously.

*γ*. Table 4 lists the parameters that were estimated in fitting the four models.

Model | G ^{2} | df | BIC |
---|---|---|---|

Model 1 (no noise) | 394.5 | 421 | 556.5 |

Model 2 (single-locus) | 392.8 | 420 | 563.3 |

Model 3 (dual-locus) | 375.1 | 420 | 545.6 |

Model 4 (Unconstrained) | 375.0 | 419 | 554.0 |

Parameter | Symbol | Model 1 | Model 2 | Model 3 | Model 4 |
---|---|---|---|---|---|

Sensory Response Function | |||||

Onset rate | β _{on} | 33.8 | 45.4 | 31.1 | 31.1 |

Offset rate (interruption) | β _{off,1}* | 150.0 | 150.0 | 150.0 | 150.0 |

Offset rate (no interruption) | β _{off,2} | 76.2 | 52.4 | 52.4 | 51.4 |

Number of filter stages* | n | 3 | 3 | 3 | 3 |

N-R inhibition (noiseless) | c _{in,1} | 0.073 | 0.082 | 0.081 | 0.081 |

N-R inhibition (noisy) | c _{in,2} | 0.483 | 0.432 | 0.405 | 0.404 |

N-R exponent (noiseless) | ρ _{1} | 2.015 | 1.976 | 1.985 | 1.981 |

N-R exponent (noisy) | ρ _{2} | 1.595 | 1.691 | 1.663 | 1.661 |

Attention/VSTM | |||||

VSTM asymptote | θ | 0.749 | 0.716 | 0.737 | 0.737 |

Gain (attended, noiseless) | γ _{ A,1} | 31.5 | 57.3 | 90.0 | 89.8 |

Gain (unattended, noiseless) | γ _{ U,1} | 21.2 | 24.5 | 39.7 | 39.5 |

Gain (attended, noisy) | γ _{ A,2} | 31.5 | 51.5 | 90.0 | 90.0 |

Gain (attended, noisy) | γ _{ U,2} | 21.2 | 22.4 | 32.1 | 32.0 |

Decision process | |||||

Decision criterion | a | 0.112 | 0.119 | 0.120 | 0.121 |

Diffusion coefficient* | σ(∞) | 0.10 | 0.10 | 0.10 | 0.10 |

Drift variability | |||||

Unmasked, cued | η _{1} | 0.292 | 0.269 | 0.287 | 0.289 |

Unmasked, miscued | η _{2} | 0.325 | 0.296 | 0.295 | 0.297 |

Integration, cued | η _{3} | 0.254 | 0.256 | 0.262 | 0.264 |

Integration, miscued | η _{4} | 0.286 | 0.302 | 0.322 | 0.325 |

Interruption, cued | η _{5} | 0.307 | 0.277 | 0.300 | 0.300 |

Interruption, miscued | η _{6} | 0.459 | 0.405 | 0.408 | 0.410 |

Both, cued | η _{7} | 0.291 | 0.295 | 0.300 | 0.300 |

Both, miscued | η _{8} | 0.476 | 0.336 | 0.567 | 0.567 |

Postdecision processes | |||||

Mean postdecision time | T _{er} | 0.335 | 0.356 | 0.336 | 0.336 |

Postdecision time range* | s _{ t} | 0.1 | 0.1 | 0.1 | 0.1 |

*γ*

_{A},

*γ*

_{U}}, one for attended and one for unattended stimuli. It also had a pair of sensory response function offset rate parameters, {

*β*

_{off,1},

*β*

_{off,2}}, (Equation 5), one for interruption-masked stimuli and one for stimuli without interruption masks. The model predicts a mask-dependent cuing effect like that shown in the top two panels of Figure 8. Mask-dependent cuing arises from an interaction between the differences in gain for attended and unattended stimuli and the differences in persistence of masked and unmasked stimuli, as discussed previously. However the model has no mechanism for characterizing the attentional effects of noise (integration) masks, and so necessarily predicts identical cuing effects for noisy and noise-free stimuli. We use this model as a baseline model against which to compare the performance of the three other models.

*ϕ,*which determined the reduction in VSTM transfer rate with noisy stimuli. The attention gain in this model was parameterized as {

*γ*

_{ A},

*γ*

_{ U},

*ϕγ*

_{ A},

*ϕγ*

_{ U}}. These four parameters describe the gain for attended noiseless, unattended noiseless, attended noisy, and unattended noisy stimuli, respectively. The lower panels of Figure 8 show an example of this parameterization with

*γ*

_{ A}= 40,

*γ*

_{ U}= 20, and

*ϕ*= 0.5. Very similar predictions can be obtained by assuming instead that noise reduces the sensory onset rate parameter,

*β*

_{on}, in Equation 5. According to this assumption, noise changes the temporal response of the visual filter; specifically, it slows the rate at which the function

*μ*(

*t*) in Figure 6 grows to its maximum. The reason why the two versions of the model make equivalent predictions is because of the cascaded structure of the integrated system model. Slowing the rate at which sensory information becomes available perceptually has the same effect as slowing the rate at which sensory information is transferred to VSTM. We have chosen to characterize this as a change in gain for simplicity; but some readers may prefer the sensory filter interpretation. Which interpretation is preferred is immaterial for the tests of our substantive hypotheses.

*γ*

_{ A},

*γ*

_{ U,1}

*γ*

_{ A},

*γ*

_{ U,2}}. This model assumed that gain was the same for attended noisy and noiseless stimuli, but differed for unattended stimuli. We refer to this model as the

*dual-locus*model, because it implies some mechanism in addition to mask- and attention-dependent changes in the rate of VSTM transfer. Model 4 relaxed the assumptions of Model 3, and allowed the gain for attended noiseless, unattended noiseless, attended noisy, and unattended noisy stimuli all to be different. The set of gain parameters for this model is denoted {

*γ*

_{ A,1},

*γ*

_{ U,1},

*γ*

_{ A,2},

*γ*

_{ U,2}}.

*m*is the number of parameters estimated in fitting the model and

*N*is the number of observations used in the calculation of

*G*

^{2}. The BIC is a penalized likelihood statistic, which penalizes a model in proportion to its number of free parameters. Like the widely used Akiake information criterion (AIC; e.g., Smith, 1998a), the preferred model is the one with the smallest BIC. The BIC differs from the AIC in that its penalty term depends on the sample size, which gives it better properties than the AIC. Specifically, it is less prone to favor complex models with increases in sample size. The BIC values for each of the four models are shown in the right-hand column of Table 3. The degrees of freedom associated with each model are the number of degrees of freedom in the data minus the number of free parameters estimated in fitting the model. There are 440 degrees of freedom in the data (40 conditions × 11 degrees of freedom per distribution pair); the number of free parameters was 19 for Model 1, 20 for Model 2 and 3, and 21 for Model 4.

*β*

_{on}and

*β*

_{off}, the number of cascaded stages in the filter,

*n,*and the exponent and divisive inhibition term of the Naka-Rushton function in Equation 4,

*ρ,*and

*c*

_{in}. We assumed a single onset rate for the sensory response function in all conditions, but different offset rates depending on whether an interruption mask was used, as previously discussed. Because the predictions of cascade models are relatively insensitive to the number of stages in the filter (see Smith, 1995, for further discussion) we set

*n*= 3 arbitrarily after verifying that the fits remained essentially unchanged with other choices. We allowed the Naka-Rushton exponent and divisive inhibition parameters to vary as a function of display noise (i.e., whether or not an integration mask was used). This reflects the different contrast dependencies in the psychometric functions in Figure 2. Following the work of Boynton (2005) and Lee et al. (1999), we also investigated models in which either the exponent or the divisive inhibition term or both varied as a function of attention in one or more conditions. We discuss these models later in the article.

*γ,*and a parameter,

*θ,*in Equation 8, which describes the mapping of contrast into trace strength. The decision process is characterized by two main parameters: the decision criterion,

*a,*and the diffusion coefficient,

*σ*

^{2}(

*t*). The decision criterion determines the amount of evidence needed for a response, whereas the diffusion coefficient determines the variability of the sample paths in the accumulation process in Figure 7. We constrained the criteria for vertical and horizontal responses to be equal (

*a*

_{1}= −

*a*

_{2}in Figure 7) after inspection of the RT distributions and associated accuracy of the two responses. We also assumed observers use the same criteria for all stimulus types. This assumption is implied by our use of a design that randomized stimulus contrasts and mask conditions within blocks. Although it is plausible that observers use different criteria for different stimulus types in a blocked design, it is much less plausible that they change their criteria from trial to trial in response to changes in the stimulus. This was our reason for using a randomized rather than a blocked design, because it implies a more highly constrained mathematical model. One exception to this is that observers may use different criteria for attended and unattended stimuli. Indeed, some researchers have suggested that the Posner effect in detection RT may arise because observers set lower criteria at cued locations (Pashler, 1998; Sperling, 1984; Sperling & Dosher, 1986). We tested this possibility with our data and found that model fit was not improved with different criteria for cued and miscued stimuli. Smith and Ratcliff (2009) found the same result for the Smith, Ratcliff, et al. (2004) data and the Gould et al. (2007) data. Of course, this does not falsify criterion setting explanations in other contexts, especially in tasks in which accuracy is high and speed of responding is stressed; but in this task it does not appear that the Posner effect in RT was due to the use of different criteria at attended and unattended locations.

*η*. In Ratcliff's (1978) diffusion model,

*η*describes the variance in drift between trials. This parameter allows the model to predict slow errors. In the integrated system model,

*η*is identified with trial-to-trial variation in the encoded value of

*r*(

*c*), the Naka-Rushton transduced stimulus contrast in Equation 5. In tasks in which stimulus discriminability is low and accuracy is stressed, like ours, error RTs are typically longer than correct RTs. The model attributes this to the fact that the encoded stimulus information varies in quality randomly from trial to trial. On trials in which the encoded stimulus information is low, RTs are longer and errors are more likely. This joint dependence of RT and accuracy on the quality of the stimulus encoding results in slow errors. More discussion of this and other features of diffusion models can be found in Ratcliff and Smith (2004).

*ν*(

*t*), which is time varying. We make an assumption analogous to that in Ratcliff's model, namely that

*r*(

*c*) is normally distributed within stimulus conditions, with variance,

*η*. Based on the results of Smith and Ratcliff (2009) and Smith, Ratcliff, et al. (2004), who found that

*η*in model fits varied with both cue and mask condition, we allowed

*η*to vary in each cell of the Cue × Mask design (8 values in all). In Ratcliff's model the diffusion coefficient (which he denotes

*s*) is treated is a fundamental scaling parameter of the model and is arbitrarily set to 0.1. By “fundamental scaling parameter” we mean that other model parameters, such as the drift, starting point, drift variability, and decision criteria, are identified only as multiples of

*s*. We adopted the same convention and set

*σ*(∞) = 0.1, where

*σ*(∞) denotes the asymptotic value of the diffusion term in Equation 9.

*T*

_{er}(time for encoding and responding). In our model, the sensory and VSTM models specify the encoding time, so

*T*

_{er}is better conceived of as post-decisional rather than non-decisional time, but we retain the

*T*

_{er}designation for consistency of usage. Following Ratcliff, we assume that

*T*

_{er}is rectangularly distributed with range

*s*

_{t}. Because the variability of

*s*

_{t}is assumed to be much smaller than the variability of the decision process, the particular form assumed for the distribution of

*T*

_{er}has no effect on the predicted distribution of RT (Ratcliff & Smith, 2004). For long RTs like those here, the value of

*s*

_{t}has no effect on the quality of the model fit. We therefore fixed

*s*

_{t}= 0.1 in all fits.

*G*

^{2}statistics for the models are similar to one another, but there is nevertheless a clear pattern apparent in the table. In comparison with Model 1, which has no mechanism for predicting the effects of integration masking, the single-locus model (Model 2), produced essentially no improvement in fit. This is despite the fact that, qualitatively, it can predict the interaction between integration and interruption masks found empirically, as shown in Figure 8. The largest improvement in fit is found with Model 3. Like Model 2, this model had three free gain parameters, but it assumed that the magnitude of the miscuing effect was greater for integration-masked stimuli. Model 4, which allowed all four of the gain parameters to vary freely, produced no further improvement in fit over Model 3. Consistent with this, the BIC values lead to Model 3 being selected as the best of the four models. This is the model that provides the best combination of parsimony and fit. We call this model the dual-locus model because it reflects the pattern of performance that might be expected if integration masks interact with cues somewhere other than at the point of VSTM transfer, possibly at the level of the sensory response function in Figure 6.

*p,*the quantiles of the distribution of correct responses are plotted on the

*y*-axis against

*p*on the

*x*-axis; the quantiles of the error distribution are similarly plotted against 1 −

*p*. Each pair of distributions for each level of contrast is plotted in a similar way. In the resulting plot, distributions of correct responses appear (usually) to the right of the 0.5-point on the

*x*-axis and the distributions of errors appear on the left. The two innermost points in each panel are the distributions for the most difficult stimulus (the lowest contrast) and the two outermost points are the distributions for the easiest stimulus (the highest contrast).

*x*-axis values (most pronounced in the Both, Miscued condition in the panel at the lower right of Figure 9). The relative speed of correct responses and errors is reflected in the left-right asymmetry of the plot. If RTs for correct responses and errors were the same, the plot would be symmetrical around its vertical midline. It is clear from the figure that error RTs are substantially longer than RTs for correct responses, as is typically found when discriminability is low and accuracy is stressed (Luce, 1986; Ratcliff & Smith, 2004). We have reported this pattern previously in other studies using versions of this paradigm (Gould et al., 2007; Smith, Ratcliff, et al., 2004). In the model, the relative speed of correct responses and errors is controlled by the drift variability parameter,

*η*.

*β*

_{off,1}=

*β*

_{off,2}—the fit markedly worsened:

*G*

^{2}= 501.7, BIC = 655.2.

*β*

_{off}, to vary freely, as this seemed to better capture the individual differences among observers. As expected, the

*G*

^{2}statistics for the individual fits are substantially larger than the fits to the group data, because quantile averaging smoothes some of the variability that is present in individual data (see Smith, Ratcliff, et al., 2004, for a similar comparison.) Nevertheless, the model captures the main features of the cuing and masking effects for each observer. The important result in Table 5 is that the dual-locus model shows a systematic advantage for all observers. Our conclusions from fitting group data are thus also supported at the individual observer level.

Observer | Single Locus (Model 2) | Dual Locus (Model 3) | ||
---|---|---|---|---|

G ^{2} | BIC | G ^{2} | BIC | |

R.E. | 1359.9 | 1538.9 | 1333.5 | 1512.6 |

C.L. | 814.6 | 993.7 | 791.0 | 970.0 |

J.P. | 949.6 | 1128.6 | 940.5 | 1119.6 |

S.C. | 893.1 | 1072.1 | 883.5 | 1062.5 |

*γ*

_{ A}vs.

*γ*

_{ U}) for attended and unattended stimuli to produce differences in the asymptotic VSTM trace strength for cued and miscued stimuli. When no interruption mask is used, this results in small-to-moderate differences in

*ν*(

*t*) for cued and miscued stimuli and a commensurate cuing effect in accuracy and RT. The RT effect includes a change in the leading edge of the RT distribution that, in the integrated system model, is an index of the rate of VSTM formation. When an interruption mask is also used, the suppression of the stimulus by the mask produces large differences in

*ν*(

*t*) for cued and miscued stimuli. The changes in the leading edge of the RT distribution should be fairly similar in the noise mask condition and the compound mask condition, because noise has the same effect on the rate of VSTM formation in the two conditions.

*r*(

*c*)

*μ*(

*t*), is encoded in durable form in VSTM under the control of attention. The VSTM trace,

*ν*(

*t*), is the basis for the observer's decision and subjective report. Changes in the attention gain,

*γ,*affect the quality of the trace,

*ν*(

*t*), but have no effect on the time course or amplitude of the sensory response,

*r*(

*c*)

*μ*(

*t*). To the extent that the model conceives of stimulus identification as a multi-stage process, involving initial stimulus encoding followed by VSTM selection and decision-making, with the effects of attention confined to the latter stages of the process, it can be conceived of as a form of late selection theory, with a particular, well-specified computational form. The single-locus model assumes that simultaneous noise masks and backward pattern masks both interact with attention at the point of VSTM selection. In contrast, the dual-locus model assumes that noise masks have their effect at some other point in the system—by implication, at the level of the sensory response,

*r*(

*c*)

*μ*(

*t*). The dual-locus model thus admits the possibility of an additional, early selection mechanism.

*contrast gain*or a

*response gain*mechanism (Boynton, 2005; Reynolds & Heeger, 2009). Contrast gain and response gain models seek to characterize the effects of attention on the early stages of the visual contrast response, often using a Naka-Rushton function like that assumed in the integrated system model (Equation 4). In contrast gain models, attention produces a uniform increase in contrast sensitivity. The contrast response function, or transducer function, for such models can be written as

*γ*

_{i}is the attention gain. (Because we can define

*γ*

_{i}′ =

*γ*

_{i}

^{ρ}it is immaterial whether we write the gain term as (

*γ*

_{i}c)

^{ρ}or

*γ*

_{i}′c

^{ρ}, that is, whether we view the gain change as applied before or after the nonlinearity.) In response gain models, attention produces a uniform increase in the visual contrast response. The transducer function for response gain models can be written as

*x*-axis, whereas in response gain it results in an expansion of the function on the

*y*-axis. In logarithmic coordinates, these effects appear as a leftward shift and an upward shift, respectively.

*r*(

*c*), of Equation 4 can be obtained as the asymptotic solution of the shunting equation

*μ*(t) is again the temporal response function of Equation 5. The asymptotic solution of this equation is

^{5}

*n,*acts as an additional source of inhibition and that the noise and stimulus are differentially affected by attention:

*n*= 0) the contrast response is the same for attended and unattended stimuli. When there is noise present the log contrast response function is shifted to the right by an amount that depends on the gain. Large gain attenuates the effects of noise and reduces the rightward shift, that is, contrast gain.

*r*′(

*c*). The primed notation indicates that noise acts at a stage of processing after the one involved in the computation of

*r*(

*c*) in Equation 4. The rate of growth in this stage is described by the shunting equation

*d*′ for the same data.

*c*

_{in}to differ for cued and miscued noisy stimuli. For response gain, we allowed the amplitude of the Naka-Rushton contrast-response function,

*r*(

*c*), to differ. The resulting model fits are shown in Table 6. In these model fits, the sensory response function offset parameter,

*β*

_{off}, again varied as a function of whether an interruption mask was used. That is, these models assume that the cuing effect is a function jointly of early attentional modulation of the visual contrast response and late attentional dependencies in the rate of VSTM transfer. Unlike the previous Model 3 (the dual-locus model), the models in Table 6 assumed only two free attention parameters,

*γ*

_{ A}and

*γ*

_{ U}, whose values were the same for noisy and noise-free stimuli.

Observer | Contrast Gain | Response Gain | ||
---|---|---|---|---|

G ^{2} | BIC | G ^{2} | BIC | |

R.E. | 1335.2 | 1514.2 | 1327.6 | 1506.6 |

C.L. | 777.4 | 956.4 | 777.7 | 956.7 |

J.P. | 944.2 | 1123.2 | 943.7 | 1122.8 |

S.C. | 896.1 | 1066.6 | 869.6 | 1048.6 |

Group | 372.4 | 551.4 | 371.6 | 542.1 |

*channel enhancement*or of

*channel selection*. In their model, channel enhancement is an increase in the quality of stimulus information at attended locations, whereas channel selection is an increase in the efficiency with which the available stimulus information is used to make a decision. These mechanisms have similarities with the two components of attention we have identified here. However, they identified the two mechanisms with the action of voluntary and reflexive orienting mechanisms, whereas we have identified them with the interaction with visual masks of different kinds.

*η*. We assume that the asymptotic VSTM strength,

*ν*(∞), determines the mean of a normal distribution of drift values with standard deviation

*η*. Variations in

*η*determine the relative speed of correct responses and errors. When

*η*= 0 the distributions of correct responses and errors are the same and the quantile-probability plot is left-right symmetrical. When

*η*> 0 errors are slower than correct responses, as shown in Figure 9. Ratcliff, Philiastides, and Sajda (2009) have recently shown that such trial-to-trial variation in the quality of the stimulus encoding can be predicted from a component of the EEG response associated with decision making.

*η*to vary freely in each cell of the Cue × Mask design (8 values in all). We did this because we had no strong a priori views about how

*η*should vary, but we thought it was likely that it would vary as a function of mask condition, because it seemed plausible that masks would affect encoding variability. The resulting estimates are shown in Table 4. There are two patterns evident in the estimated

*η*values: encoding variability is increased by interruption masking and increased by miscuing. There is no strong evidence that it changes appreciably with integration masking, but the increase with interruption masking is as previously reported by Smith, Ratcliff, et al. (2004). The increase in encoding variability with miscuing suggests the presence of a trial-to-trial source of variability in attentional effects which is not represented explicitly in the integrated system model, but whose effects appear indirectly in the estimates of

*η*. The finding is noteworthy in the light of the related finding by Prinzmetal, Amiri, Allen, and Edwards (1998) that one of the effects of inattention is to increase perceptual variability. They reported this in the context of judgments of phenomenal appearance; but our model fits suggest the presence of a similar attentional dependency in the information used to make psychophysical decisions.

*c,*we fitted Weibull functions,

*F*(

*c*) of the form of Equation 2 to the

*d*′ values for each observer by minimizing the approximate chi-square statistic

*d*′(

*c*) is the measured sensitivity in condition

*i*and

*F*

_{ i}(

*c*) is the predicted sensitivity from Equation 2. The index of summation,

*i,*runs over the five levels of stimulus contrast and the two types of attentional cue. The quantity in the denominator,

*n*

_{H}and

*n*

_{V}are, respectively, the number of horizontal and vertical stimuli in condition

*i*(set here to 126), and

*z*(.) is the standard normal density function evaluated at the specified abscissa. The other quantities are as defined in the text. The factor of 2 in the denominator of Equation A2 is a reflection of the √2 in the denominator of the expression for

*d*′ in Equation 1. To test for the presence of cuing effects we compared the fit of an unrestricted model {

*α*

_{A},

*β*

_{A},

*γ*

_{A},

*α*

_{U},

*β*

_{U},

*γ*

_{U}}, in which separate Weibull functions were fitted to the psychometric functions for attended (cued) and unattended (miscued) conditions to the fit of a restricted model, {

*α*.,

*β*.,

*γ*.}, in which the same Weibull function was fitted to the two psychometric functions simultaneously.

^{1}The important feature of the function

*μ*(

*t*) in Equation 5 is less its low-pass filtering properties than the fact that it characterizes the way stimulus information persists after stimulus offset. Other assumptions are consistent with the cascaded process structure of the model, but do not seem to be needed in the present setting. For example, Smith (1995, 1998b) showed how to combine band-pass visual filters with diffusion process decision models as a way to model transient perceptual channels.

^{2}Smith and Ratcliff (2009) compared two attentional models: a gain model and an orienting model. The gain model, like the model described here, assumed that attention affects the rate at which stimulus information is transferred to VSTM. The orienting model assumed that an attention window, or gate, opens to admit stimuli to VSTM. When stimuli are unattended the opening of the attention window is delayed. The orienting model is an elaboration of an earlier model described by Smith, Ratcliff, et al. (2004), which was inspired by the model of Reeves and Sperling (1986). Smith and Ratcliff also considered two different decision models: the single diffusion process model described here and a model described by Ratcliff and Smith (2004), in which the evidence for each response is accumulated as a separate total, each of which is an independent diffusion process. In this latter model, decision-making is conceived of as a race between two diffusing evidence totals, with the response depending on which of the processes first reaches a criterion. The model is like the leaky, competing accumulator model of Usher and McClelland (2001), but is simpler to work with, because it does not assume competitive interaction between accumulators. Smith and Ratcliff showed that, in the context of the integrated system model, the gain and orienting attention models and the single- and dual-diffusion decision models differed in only minor ways. For this reason, we have chosen to use the attention gain model and the single-diffusion decision model to analyze our data. Had we used any of the other models to analyze our data, our conclusions would not have altered.

^{3}Stochastic differential equations are usually written in the differential form of Equation 9 rather than in the more familiar form involving derivatives. The highly irregular sample paths of diffusion processes means they do not possess derivatives in the usual sense, so quantities like

*dX*/

*dt*are not defined. Smith (2000b) discusses stochastic differential equations in the context of models of perceptual decision-making.

^{4}When stimuli are unmasked, the integrated system model predicts that cues affect RT, but not accuracy. The RT effect occurs because the mean VSTM trace grows more slowly for miscued than for cued stimuli. Because the diffusive noise in Equation 9 grows in proportion to mean VSTM trace strength, the slower growth of the VSTM trace with miscued stimuli does not produce an increased likelihood of the process crossing the wrong boundary and making an error while the VSTM trace is still forming. Had we assumed abrupt-onset diffusive noise (i.e.,

*σ*(

*t*) = 0.1 for all

*t*≥ 0), the model would predict both longer RTs and reduced accuracy with unmasked stimuli.