Observers in a typical visual search experiment are required to find a target object that is randomly located among other distractor objects. In the course of their search, observers must coordinate two mental operations: locating the target within the search display and extracting the identity of objects so as to discriminate targets from distractors. Considerable theorizing has been devoted to the question of the temporal order of these two operations, with some (e.g., Treisman & Gelade,
1980) arguing for identity extraction prior to spatial localization, while others (e.g., Julesz,
1984) arguing for spatial localization prior to identity extraction (but see Green,
1991,
1992). However, here we address an even more fundamental question—one that has not been given a great deal of attention to date—namely, whether spatial selection and identity extraction are dissociable functions.
In our view, the question of dissociability (or equivalently separability) of these two mental operations has logical priority over the question of whether one of these operations is performed before the other. If the operations are separable, the relative ordering of the operations can then be explored as a secondary question. However, if these operations are inseparable, because they both rely on common cognitive resources, then the question of relative ordering becomes moot. In the sections that follow, we briefly review past theoretical and empirical contributions to this question, before turning to a series of three experiments that examine this question in a new way.
Theories of visual search differ in the emphasis they place on the processes of spatial selection and identification. For example, the most influential framework for interpreting visual search has been
Feature Integration Theory (FIT; Treisman & Gelade,
1980), which holds that visual features such as color and shape are initially registered in separate topographically organized regions of the brain. In order to identify any particular conjunction of features as belonging to the same object, information from remote brain regions must be combined (the metaphor of attention as “glue” was used in early papers on FIT, the term “binding” is used in more recent papers). The integration of features requires a master map of spatial locations to which all feature maps have access. Moreover, feature integration is inherently a serial operation; it can only be done one location (or object) at a time. According to FIT, visual search tasks are slow and effortful when feature integration must be performed for each item in the display until the target is found. Search tasks become faster and easier when the target item can be identified on the basis of unique activity in a single feature map. No linking of different feature maps is required and so the master map can be consulted directly.
Although Feature Integration Theory has undergone several modifications since its inception (Treisman,
1988; Treisman & Gormican,
1988; Treisman & Sato,
1990), it still proposes that the limiting factor on search efficiency is the feature integration process, not the step of spatially localizing the conjoined features in the search display. Note that the main theoretical alternatives to this theory also place more emphasis on identity extraction than on spatial localization. For example,
resemblance theory (Duncan & Humphreys,
1989) proposes that similarity relations among the display items limit search efficiency, both similarity relations of targets to distractors and those among the distractors. Wolfe's (
1994,
2006)
guided search theory also highlights inter-item relationships, though it does so through the complexity of the interactions among feature maps that are needed to define a target as distinct from the distractors.
In contrast to this emphasis on identity extraction, other theories of visual search have placed greater emphasis on the control of an attentional spotlight or zoom lens that enhances processing within a limited region of space (Eriksen & Yeh,
1985; Posner,
1980). The most comprehensive theory of this kind is
texton theory (Julesz,
1984; Sagi & Julesz,
1985a,
1985b), which holds that target identification occurs only after an initial stage of processing in which the visual image has been analyzed for spatially localized discontinuities in simple visual features. Discontinuity localization is said to be a parallel process, though its efficiency is still a function of the strength of the signal that is derived from the discontinuity at any given location. Furthermore, registration of features in a given location is a serial process, leading to the prediction that the location of a spatial discontinuity in a display will invariably occur prior to the identification of its featural properties.
We note that the possible relations between spatial selection and identification—addressed in the past by the functional theories specifically tailored to account for visual search—are cast in a new light when considered from the perspective of the neurologically inspired dual system theory (Goodale & Milner,
2004; Milner & Goodale,
1995; Ungerleider & Mishkin,
1982). In this framework, space-related information is processed along the dorsal (“Where/How”) pathway, while identity-related information is processed along the ventral (“What”) pathway. Moreover, studies on animals and human patients with selective injuries to these pathways support this distinction, with damage to the ventral stream compromising object identification while preserving accurate visually guided actions to the same objects, and damage to the dorsal stream compromising action to objects while preserving conscious perception of them.
Despite all these indications, however, the separability of spatial selection and identity extraction that is assumed by this theory is not easy to verify against experimental data in visual search. This is because in studies of visual search to date, experimental factors that influence spatial selection and identity processing have invariably been manipulated concurrently. Consider, for example, the most frequently manipulated factor in visual search studies, that of set size (i.e., the total number of objects in a search array). Increases in set size will impair spatial selectivity by increasing the number of potential target locations, but at the same time such increases will impair identity extraction by decreasing the signal-to-noise ratio (Eckstein,
1998; Palmer,
1995). Thus, existing visual search studies are inadequate for addressing the separability of spatial selection versus identification.
In the present study, we test for separability in visual search by combining two paradigms that have typically been used for other purposes in the study of human attention: exogenous spatial cueing and the attentional blink (AB). The use of exogenous spatial cues (e.g., a high-contrast dot displayed briefly at the expected location of an ensuing target) to direct attention to specific locations in a visual display independently of the objects that appear in those locations has an extensive history, extending at least to Eriksen and Hoffman (
1972; also see Klein,
2004). The literature on the AB (Raymond, Shapiro, & Arnell,
1992) is equally extensive (for a review, see Dux & Marois,
2009) involving the detection or identification of targets that are presented among distractor items in a rapid serial visual presentation (RSVP). Here the primary finding is impairment in the identification accuracy of a second target when it is presented less than about 500 ms after a first target. The AB is generally regarded as a high-level phenomenon that interferes with the process of identity extraction (Chun & Potter,
1995; Jolicœur & Dell'Acqua,
1998). Moreover, the AB has been shown not to interfere with the process of spatial selection (Ghorashi, Di Lollo, & Klein,
2007). It is worth emphasizing that in the present study these two paradigms will serve primarily to influence the relative difficulty of spatial selection and identity extraction during a visual search; this is not intended as yet another study of the either spatial cueing or the attentional blink in its own right.
In combining these two paradigms in a visual search study, we rely on additive-factors logic (Sternberg,
1969). Within this framework, it is assumed that mental processing is carried out in a series of non-overlapping stages. If two factors influence independent stages of processing, they will have additive effects on the dependent measure. Conversely, whenever additivity is found, the underlying stages of processing can be assumed to be independent. If, on the other hand, at least one of the factors influences both stages, as evidenced by an interaction between the effects of the two factors on the dependent measure, then the underlying stages of processing are interpreted as not independent. Sternberg (
1969, p. 287) expressed the relationship between the additive effects of two factors and the idea of independence of processing stages, as follows:
Suppose, for example, that we wish to test the following hypothesis, H1: stimulus encoding and response selection are accomplished by different stages, a and b. This can be tested only jointly with an additional hypothesis, H2: a particular factor, F, influences stage a and not b, and a particular factor, G, influences stage b and not a. If F and G are found to be additive, both hypotheses gain in strength. But the falsity of either H1 or H2 could produce a failure of additivity.
Our use of additive-factors logic therefore allows us to test three hypotheses: (a) that spatial cueing affects spatial selection during visual search, (b) that the AB influences identity extraction during visual search, and (c) that spatial selection and identity extraction are two independent stages of processing during a search task. The last hypothesis is clearly the most important for our purposes, but we note that it depends on supporting evidence for the first two hypotheses to have any meaning. If the effects of spatial cueing and the AB combine additively in their joint influence on the response measure, they can be regarded as affecting independent, non-overlapping stages of processing. It can then be concluded that spatial selection and identity extraction are separable processes. If, on the other hand, the effect of cueing is found to interact with the AB, it can be concluded that spatial selection and identity extraction are not entirely independent stages of processing but have at least some stages of processing in common. Thus, the combined use of spatial cueing and the AB, in the context of a visual search task, permits a psychophysical test of the separability of the two functions.