The apparent “pre-attentive” segmentation of objects that can influence the allocation of attention has typically been demonstrated with little more than 2 or 3 objects presented at one time (Duncan,
1984; Egly et al.,
1994). This fact led us to assess whether classical demonstrations of object-based attention could be applicable to scene perception where we are presented with many potential objects in unpredictable locations. Indeed, if the phenomenon of object-based attention is to play any functional role in real-world scene perception, we reasoned that it would have to select upon a representation that is maintained in parallel across an entire visual scene for multiple potential objects.
In
Experiments 1 and
2, we tested whether a within-vs.-between-object advantage, found in the Egly et al. cuing paradigm, would still manifest when participants were presented with up to 12 objects. Both of these experiments provided a straightforward result, in that the within-vs.-between-object advantage did not interact with the number of objects presented. This result could be taken as evidence that (without any apparent reduction in the effect) up to twelve potential units of selection could be represented in parallel. Whether this parallel set of representations should be regarded as objects is not a straightforward question (see Driver et al.,
2001, for a critical discussion regarding whether the units that influence attention can be regarded as objects). Nevertheless, our results demonstrate that whatever representations are developed for two objects can be maintained in parallel for up to 12 objects with apparently no reduction in the extent to which each unit (object) can influence attention. This, therefore, confirms a common assumption that whatever representation is developed in object-based paradigms with 2 or 3 objects can be maintained in parallel. Furthermore, it also brings the results with impoverished displays of only 2 or 3 objects a step closer to a demonstration that object-based effects like those demonstrated by Egly et al. can play a functional role in real-world scene perception where we are confronted with multiple objects.
As it stood, however, the interpretation derived from
Experiments 1 and
2 could be questioned, and one could argue that, in fact, the objects did not need to be maintained in parallel, because the presentation of the cue might enable participants to very rapidly parse just a small set of objects close to the cue.
Experiment 3 rules out this explanation by demonstrating that if the cue was presented simultaneously with the outline rectangles no within-vs.-between-object cuing advantage was observed. Indeed, even if the rectangles preceded the cue by 90 ms, there was still no object-based effect. This demonstrates that objects cannot be parsed rapidly or simultaneously upon presentation of the cue. Instead, the objects have to be processed for some time (in this context, between 90 and 300 ms) prior to the cue, in order for them to be represented such that they can influence attention (consistent with prior demonstrations: Chen & Cave,
2008; Law & Abrams,
2002).
We should be clear that the focus of this article centers on the question of whether objects are maintained in parallel as potential units of selection. We cannot make any strong claims regarding the nature of the process by which these objects are extracted, a process that is in itself often assumed to occur in parallel (consistent with Davis & Driver,
1998). It is already clear that the process of extracting the objects that influence attention does not operate in a purely automatic or stimulus-dependent manner. Rather one's previous experience with a set of stimuli can strongly influence how they are organized as objects (Chen & Cave,
2006; Watson & Kramer,
1999). Indeed, as reviewed in the
Introduction section, it also appears that some degree of “distributed” attention across an entire scene is required before objects are parsed to the level at which they influence attention (Goldsmith & Yeari,
2003). Both these factors could lead one to question whether the process of extracting the object representations that influence attention can operate in parallel. On the other hand, however, one could argue from the rather long durations associated with measures of attentional “dwell time” (Duncan, Ward, & Shapiro,
1994) that it is unlikely that the object extraction occurring within 300 ms for up to 12 objects in
Experiment 2 could reflect a serially allocated attentional process. Given the arguments pro and con however, it would clearly be preferable to test directly whether the object representations are extracted in parallel by exploring the presentation time required to observe a within-vs.-between-object advantage for differing numbers of objects. We know that this time must be between 90 and 300 ms for 8 objects, but it could vary for differing numbers of objects. If the presentation time required for differing numbers of rectangles to generate a within-vs.-between-object advantage differs, this would provide strong evidence that they are not extracted in parallel. Such a conclusion would, however, not be incompatible with the current claim that, after these objects are extracted, their representations can be maintained in parallel.
Of course while the current experiment finds no reduction in the strength of the within-vs.-between-object advantage for up to 12 objects, one could still question whether there is an upper limit on the number of objects that can be represented in parallel. Yet even as this result stands, it provides an interesting contrast to other object-based paradigms, such as Multiple Object Tracking (MOT), which reveal a limit of something in the range of 4 or 5 objects. The manipulation of the number of objects in the current experiment and in the MOT paradigm is clearly different however, because although the present research has highlighted that many objects can be represented simultaneously as potential units of selection, only one of those objects is selected. This is in clear contrast to MOT where up to four or five objects can be selected/tracked. Thus, while the current paper shows that multiple objects can be maintained in parallel, one could ask how many of these objects could be simultaneously cued such that targets presented on one of them would still lead to the within-vs.-between-object advantage. Indeed, while the Egly et al. paradigm and MOT are often discussed under the common umbrella term “object-based attention,” a more direct comparison of the number of objects that can be selected in each paradigm could provide an important indication regarding whether or not these paradigms really reflect common underlying mechanisms. It is also pertinent to recall (as discussed in the
Introduction section) that although it has already been demonstrated that something in the order of 4 objects can be tracked in parallel within the MOT paradigm, this limitation does not necessarily sit in conflict with the current result. Rather, it highlights how our report on the influence of the number of rectangles presented and the manipulation of the number of objects to be tracked may tap different stages of representation and selection. More specifically, the limitations seen in MOT pertain to a post-cue selection/tracking and, therefore, do not address the question of how many objects can be parsed and represented as potential units of attentional selection.
In summary, this research clearly highlights some important questions for future research, regarding both the potentially parallel nature of the processes involved in extracting the object representations that can influence attention and regarding how many objects can simultaneously be cued and still generate object-based effects in the Egly et al. paradigm (particularly in comparison to the limit of 4/5 objects seen in MOT). As it stands, the current research allows us to conclude that the within-vs.-between-object advantage in the Egly et al. paradigm reflects selection from a stage of representation that can simultaneously maintain multiple units of attentional selection in parallel across a visual scene. Although such a representation has been implicitly assumed, demonstrating its role in this paradigm provides an important step toward proving that the object-based attention effects apparent with simplified displays can scale up to, and potentially play a functional role in, the allocation of processing resources in real-world scene perception.