Verstraten, Cavanagh, and Labianca (
2000) reported that the upper temporal frequency limit of “attentive tracking” using ambiguous apparent motion display was 4–8 Hz. They argued that object tracking was mediated by a higher visual process, instead of a first-order motion mechanism (e.g., Adelson & Bergen,
1985), since this limit of object tracking is much lower than that of first-order motion detection (Burr & Ross,
1982; Lu & Sperling,
1995b).
Verstraten et al. (
2000) and Benjamins, Hooge, van der Smagt, and Verstraten (
2007) hypothesized that object tracking is realized by actively shifting visual attention from one object to another, and claimed that the 4- to 8-Hz limit they found is the upper temporal limit of this attentional shift. In contrast, Horowitz, Holcombe, Wolfe, Arsenio, and DiMase (
2004), using tasks similar to Verstraten et al. (
2000), found that the shift of voluntary attention between objects is quite slow (300–500 ms in stimulus onset asynchrony [SOA]), and suggested that object tracking was achieved not by attention-based processes but by a preattentive “object continuity process” such as indexing to target objects (see Pylyshyn,
1989; Pylyshyn & Storm,
1988).
There is a clear contradiction. Verstraten and his colleagues (Benjamins et al.,
2007; Verstraten et al.,
2000) argued that object tracking is based on a high-level, feature-based process called attention-based motion process that selects and matches objects by using attention (Cavanagh,
1991,
1992). On the other hand, Horowitz et al. (
2004) argued for a preattentive process. However, the nature of preattentive process is not clearly discussed, although they refer to a “object continuity” based on FINST theory (Pylyshyn,
1989; Pylyshyn & Storm,
1988). It is possible that this preattentive process is, at least partially, mediated by ordinary (first- or second-order) motion mechanisms.
Apparent motion stimuli composed of frames each defined by different attributes such as motion, texture, binocular disparity, or contrast are not detected by relatively low-level first-order or second-order motion mechanisms that detect motion from simple stimuli defined by a single attribute. It is generally assumed that attention is involved in detecting such cross-attribute motions. For example, Lu and Sperling (
1995a,
2001) have proposed a mechanism in which each cross-attribute stimulus is allocated on a salience map by selecting a salient feature by using voluntary attention, and then motion is computed on the salience map. Verstraten et al. (
2000) mentioned the possibility that attention-based processing such as a feature salience system is involved in object tracking. However, this conjecture is not supported by their own results, since the stimuli they used only involve luminance, which is supposedly processed by a first-order mechanism. Thus, it was not clear from their results whether attention-based, attribute-independent mechanisms were involved. First-order mechanisms cannot detect cross-attribute motion, but the existence of low-level, attribute-specific second-order mechanisms is not really established. However, several reports on second-order reversed-phi phenomenon (Lu & Sperling,
1999; Maruya, Mugishima, & Sato,
2003; Mather & Murdoch,
1999) suggest the existence of a low-level second-order mechanism, and such low-level mechanisms could be attribute specific.
The main objective of the present study is to clarify the involvement of two types of motion processing, relatively simple low-level and attention-based higher motion processing in object tracking. To this end, in
Experiment 1, we examined the temporal limits for object tracking by using stimuli defined by several different visual attributes. The experiment was conducted by using both within- and cross-attribute object-tracking stimuli, that is, both single attribute stimuli, or stimuli with different attributes. If object tracking is mediated by a higher, attention-based mechanism, the temporal limits of object tracking should be about equal to those obtained by Verstraten et al. (
2000) regardless of the stimulus types. However, if object tracking with within-attribute stimuli can be mediated by relatively lower motion processing, a dissociation of results between within- and cross-attribute stimuli should appear. In this case, the temporal limits would be higher, and about equal to the results of Verstraten et al. (
2000) for within-attribute stimuli, and lower for cross-attribute stimuli. In two additional experiments, we measured temporal limits for simple, classical apparent motion perception using the same stimulus combinations as for
Experiment 1 (
Experiment 2), and for pure attentional shift that did not involve motion components (
Experiment 3), and compared the results to those from
Experiment 1 to clarify the contribution of attentional and motion processing to object tracking.