Free
Research Article  |   January 2003
Feature binding in object-file representations of multiple moving items
Author Affiliations
  • Jun Saiki
    PRESTO, JST, Kawaguchi, Japan
    Graduate School of Informatics, Kyoto University, Kyoto, Japan
Journal of Vision January 2003, Vol.3, 2. doi:10.1167/3.1.2
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to Subscribers Only
      Sign In or Create an Account ×
    • Get Citation

      Jun Saiki; Feature binding in object-file representations of multiple moving items. Journal of Vision 2003;3(1):2. doi: 10.1167/3.1.2.

      Download citation file:


      © 2017 Association for Research in Vision and Ophthalmology.

      ×
  • Supplements
Abstract

Maintenance of episodic representations by feature-location binding is important for visual cognition. It has been proposed that we can hold and update coherent episodic representations of up to four objects. This study investigated the dynamic maintenance of feature-location bindings with multiple objects. In a series of seven experiments, participants judged whether a sequence of rotating patterns of three or four colored disks contains any color switch between two disks. Color-switch detection is in general difficult, even when tracking of objects’ motions is successful, suggesting that our ability for dynamic maintenance is limited. The performance improved when the interframe rotation angle became sufficiently small. Moreover, spatiotemporal predictability was necessary for this improvement, suggesting that the maintenance of multiple episodic representations is an interactive process between our prediction and sensory mechanisms.

Introduction
Objects have various perceptual features, such as color, texture, shape, and size, and they move and change their perceptual properties across time. Therefore, to perceive and understand visual scenes and events, we need to make correspondences of feature values to multiple objects, and keep track of these correspondences as the objects move. However, the underlying mechanisms of this process are largely unknown. 
It is obvious that we cannot store all possible combinations of these features and their spatiotemporal locations in long-term memory. Instead, what we appear to be doing is to form episodic representations of visual scenes and events temporarily, and to use these episodic representations in various cognitive tasks. There are few experimental studies on our ability to form, maintain, and transform episodic representations of a dynamic event with multiple objects. This study systematically investigated the process of maintenance and transformation of episodic representations. 
Episodic Representations in Visual Cognition
It has been proposed that object recognition requires not only the long-term representation of object categories (often called types), but also the representation of the object’s presence in a particular episode (often called tokens; see Kanwisher, 1987). Tokens are short-term episodic representations of objects, and spatiotemporal information is critical in their individuation (Kanwisher, 1987). Following this line of thought, I define episodic representation as mental representation whose featural information is bound to its spatiotemporal properties. Although the notion of episodic representation is closely related to the issue of feature binding, it specifically focuses on the binding of featural and spatiotemporal information. The binding of featural information with its spatiotemporal property is important for various visual cognition tasks. If you need to reach a target object among many distractors, you need to know the target location in addition to the target identity. Although one of the important properties of visual object recognition is locational invariance, locationally invariant recognition itself is usually insufficient for the action toward the object. In our dynamic world with multiple objects, formation of episodic representations is an indispensable ability for successful interaction. 
There are some theoretical proposals on episodic representation. Kahneman and Treisman (1984) proposed the notion of object files. Object files are temporary episodic representations of real-world objects that are separate from the representations stored in the long-term recognition network (Kahneman, Treisman, & Gibbs, 1992). Each object file contains information about a particular object in a scene, and is addressed by its location at a particular time (hereafter I call this spatiotemporal location), not by any feature or identifying label. An object file collects sensory information, updates it as the sensory situation changes, and may be discarded when the object disappears from view. Kahneman et al. (1992) assume that there is some limit to the number of object files to be stored concurrently, and some limit in the spatial/temporal gap that can be bridged. They examined the validity of these assumptions using a reviewing paradigm. In the reviewing paradigm, simple objects are initially shown with letters inside, move without letters, and stop with a letter that subjects are then asked to identify. The object-specific priming effect is considered as evidence for the object files. It was shown that people can store four object files concurrently, and bridge an interstimulus interval (ISI) of 590 ms. Pylyshyn (1989) proposed the FINST theory (also see Pylyshyn, 2001). FINST is a reference to a particular feature or feature cluster that keeps pointing to the same feature cluster as the cluster moves (Pylyshyn, 1989). One important property of FINST is that it does not encode any properties of the feature in question, but that it merely makes it possible to locate the feature in order to examine it further if needed. Thus, FINST can be considered a spatiotemporal index of feature clusters, which can be constructed preattentively, but there is a capacity limit to the number of FINSTs to be activated concurrently. Pylyshyn and Storm (1988) suggested that the capacity limit is four to five using a multiple object-tracking paradigm. 
Although these theoretical notions have various differences, they share important properties as episodic representations of the visual world. First, they are dynamic in the sense that they are updated as the world changes. Second, they can deal with multiple objects. Both object files and FINSTs assume that multiple episodic representations can be maintained concurrently. Third, they are dealing with the binding of featural and spatiotemporal information. With regard to this property, object files and FINSTs have some differences: object files are assumed to contain various featural information as it is available, whereas FINST simply enables us to refer to these features at an indexed location. However, both object files and FINSTs agree in that episodic representation is essential for the binding of feature and locational information. 
Three properties above, dynamic nature, multiplicity, and feature-location binding, are desirable for various visual cognition tasks. However, there are few empirical studies investigating our ability to maintain and transform episodic representations by manipulating these properties. I will briefly review some of these studies, and then formulate the problems to be addressed in this work. 
Studies on Episodic Representations in Visual Cognition
In terms of the three desirable properties of episodic representations, multiple objects, dynamic nature, and feature-location binding, previous studies on episodic representations dealt with only some of these properties by fixing the others. Using a static but multidimensional display, Luck and Vogel (1997) showed that visual working memory can hold approximately four multidimensional objects. They used a change-detection paradigm devised by Phillips (1974), using multiple objects defined by multiple dimensions, such as color, orientation, and size. Participants’ change-detection performances were determined by the number of objects, not by the number of visual features. Luck and Vogel interpreted this finding as showing that a functional unit of visual working memory is the representation of perceptual objects where multidimensional information is integrated (but see Wheeler & Treisman, 2002 and Xu, 2002). This study dealt with a multiplicity of objects and feature-location binding by manipulating multidimensionality, but the dynamic nature of representations was not addressed. 
Pylyshyn and Storm (1988) devised a paradigm called multiple object tracking, and showed that people are able to mentally track four to five moving objects concurrently. Participants were presented a set of 10 items (cross or dots) randomly placed on the display, and asked to track some of them. Then the dots slowly moved in random directions for several seconds, followed by a test to discriminate whether a probed dot was the one they were tracking or not. Participant performance was quite accurate when the number of dots to be tracked was under four or five, suggesting that visual system can hold spatiotemporal information on four to five objects concurrently. This study investigated the dynamic nature and multiplicity of episodic representations, but feature-location binding was not directly examined because objects had identical features; thus, feature-location binding was unnecessary to perform the task. 
As mentioned above, the experimental settings used in their studies did not explicitly manipulate multiplicity, dynamic nature, and feature-location binding simultaneously. Therefore, it is unclear whether the properties found in the previous studies can be applicable to more complex situations with all three properties varying, or if they are limited to some restricted situations. For example, Luck and Vogel’s (1997) finding on the capacity of visual working memory may be valid only in the static situation. Also, the number of objects concurrently tracked may be different if the objects are multidimensional. These questions are important to understand the nature of episodic representations. If Luck and Vogel’s finding is applicable only to static situations, their theoretical claim that multiple object representations are formed and stored may be questioned. Rather, their findings may reflect the role of spatial locations to bind multiple features. 
To answer these questions, we need to set up a situation that satisfies multiplicity, dynamic nature, and feature-location binding simultaneously. Using such a situation, I investigated whether the findings of Luck and Vogel (1997) and Pylyshyn and Storm (1988) could be generalized. If the previous work reflects the maintenance and transformation of coherent objects in general situations, then we should expect a similar performance in a dynamic multidimensional situation. In contrast, if the previous findings cannot be generalized, or they are not mediated by coherent object representations, then the performance in the dynamic multidimensional situation should be substantially impaired. 
In the evaluation of participants’ performances in a dynamic multidimensional situation, it is important to eliminate the possibility that other factors affected the performance. To ensure this, I used a very simple stimulus set that eliminated spatial uncertainty and stimulus complexity. I eliminated spatial uncertainty by using a completely predictable moving pattern: regular rotation. In the multiple object-tracking paradigm, the stimuli have a certain amount of spatial uncertainty because of the unpredictable movement direction of each element. Thus, the difficulty in multiple object tracking can be attributable either to its dynamic nature or spatial uncertainty. 
To eliminate the possible effect of stimulus complexity, the number of to-be-remembered objects is equal to the number of presented objects, as in Luck and Vogel (1997). In the multiple object-tracking task, the number of presented objects is much higher than the number of to-be-tracked objects. Although this is suitable for the purpose of showing that participants’ performances were better than our expectations, it is a major problem if one tries to show that participants’ performances are lower than some theories predict, because the performance impairment can be attributed to the extra stimulus complexity. Finally, to make the stimuli simple in terms of multidimensionality, I used minimally multidimensional stimuli. Participants were required to store the combination of objects’ colors and spatiotemporal locations. One should note that this minimally multidimensional situation is sufficient to investigate feature-location bindings. 
Finally, and most importantly, we need to make sure that participants could make motion correspondences of the stimulus sequence, because the feature-location binding presupposes successful motion correspondences. It is known that people can successfully track four rotating objects up to about 360°/s even though motion correspondences are ambiguous (Verstraten, Cavanagh, and Labianca, 2000), and the stimulus sequence I used here rotated much slower than that. Still, it is possible that the observed difficulty is due to failure in motion correspondences. Thus, I used various manipulations to improve motion correspondences and investigated how much the manipulations improve the task performance compared with ambiguous conditions. 
General Method
To examine the maintenance and transformation of episodic representations of multiple objects, I created an irregularity detection task (see Figure 1, 1, and 2). Participants were shown sequences of 10 frames depicting a triangular pattern of three colored disks rotating by a certain angle per frame. The sequence was either regular clockwise or counterclockwise rotation throughout, containing one frame in which the locations of two colors were switched (color-switch), or containing one frame in which a new color was replaced with an old one (color-replacement). Participants were required to judge whether a sequence was regular or irregular without identifying its type (color-switch or color-replacement). Notice that detection of a color-switch needs memory for the conjunction of each disk’s color and spatiotemporal location, whereas detection of a color-replacement does not. Thus, the performance for the color-switch condition is the critical measure of memory for the binding of color and spatiotemporal location in this paradigm. Note that as shown in Figure 1, when a change occurs, the postchange frames go back to normal. This is because the single frame irregularity eliminates participants’ strategies to memorize the color order (most likely verbally) for the first few frames and compare it with later frames. 
Figure 1
 
Schematic illustration of the irregularity detection task. In this example, rotation direction is clockwise, and irregularity occurs in the second frame.
Figure 1
 
Schematic illustration of the irregularity detection task. In this example, rotation direction is clockwise, and irregularity occurs in the second frame.
 
Movie 1
 
Demonstration of the irregularity detection task. The movie contains a no-change, a color-switch, and a color-replacement sequence. This is an example of an ambiguous 60-motion condition serving as the baseline.
 
Movie 2
 
Demonstration of the stimulus used in Experiment 2. The movie contains a color-switch sequence. This example is a moving 360-ms condition. The actual stimulus was smoother and faster than this example.
Procedure
At the beginning of the experiment, participants were given instructions with a diagram similar to Figure 1. The differences among the three sequence types and response mappings were fully explained using the diagram. Then participants had a block of six-to-eight practice trials to familiarize themselves with the procedure. Experimental trials were made up of blocks of 24-to-30 trials, and participants could take a rest between blocks. Throughout this study, the experimental conditions were randomly mixed from trial to trial. 
Data Analysis
The main dependent variables were hit rates for the color-replacement and color-switch trials. Because false alarm rates were extremely low throughout the study, and usually not different across experimental conditions, the results of statistical analysis involving false alarms are not reported. As a supplementary measure, however, d’ related to color-switch detection was estimated. Although a false alarm can be related to either a color-switch or a color-replacement detection, d’ is estimated by using the false alarm rates under the assumption that all false alarms are related to color-switch detection. Thus, the estimated d’ is somewhat underestimated, but given the extremely high hit rates for the color-replacement conditions throughout the series of experiments, the assumption is justifiable. 
Part I: Factors Not Contributing to the Maintenance of Multiple Object Representations
A pilot experiment with the equilateral triangle pattern rotating 60° per frame showed that color-switch detection was difficult, while color-replacement detection was extremely easy. However, this particular stimulus setting has an obvious problem: motion perception in this setting is inherently ambiguous (see 1). Thus, the difficulty in the color switch-detection may not be due to color-location binding, but simply to failure in tracking pattern rotation. Part I used stimuli without such ambiguity in pattern motion, and investigated whether the difficulty in color-switch detection could be overcome simply by making pattern motion unambiguous. Experiments 1 and 2 disambiguated the motion correspondence by using bilateral triangles, and smooth and continuous motion, respectively. The ambiguous condition was used as a baseline to evaluate the effect of disambiguation of motion correspondences. 
Experiment 1: Disambiguating the Direction of Motion by Bilateral Triangles
Method
All stimuli were three-colored disks with 1.6° visual angle diameters. Each disk was placed at a 2.0° visual angle from the central fixation. Four colors (red, green, blue, and yellow) were used, and the combinations of displayed colors were counterbalanced across trials. A frame with a violation of the regular rotation was inserted from the fourth to seventh frames equally often, and the disks whose colors were switched or replaced were unpredictable to the participants. The temporal schedule of the stimulus sequence was 360-ms frame duration and 520-ms stimulus onset asynchrony (SOA). In other words, each frame was presented for 360 ms followed by a 160-ms blank period. Participants were asked to fixate the central dot and to try to attend to the whole pattern throughout a trial. They judged whether a sequence contained any irregularity without correct feedback. There was no time pressure to make a response. There were 24 color-switch, 24 no-change, and 12 color-replacement trials for each condition of the main independent variables described below. The experiment included 180 trials. Participants were eight Kyoto University graduate and undergraduate students who had normal or corrected-to-normal vision. 
Experiment 1 investigated the effect of pattern configuration on irregularity detection. There were three pattern configuration conditions: an equilateral triangle, acute isosceles triangle, and obtuse isosceles triangle conditions (Figure 2a). The stimuli in the equilateral condition were identical to those in the pilot experiment (Figure 1), and served as a baseline. In the acute and obtuse isosceles conditions, the vertical angles of the triangular pattern were 30° and 90°, respectively. Because of the pattern configuration, sequences in the acute and obtuse isosceles conditions were much easier to make correspondences with across frames. 
Figure 2
 
a. Illustration of conditions in Experiment 1. b. Mean hit and false alarm rates in Experiment 1.
Figure 2
 
a. Illustration of conditions in Experiment 1. b. Mean hit and false alarm rates in Experiment 1.
Results and discussion
An alpha level of .05 was used as the criterion for all statistical tests in this article. The change to the pattern configuration did not significantly improve the performance. Means hit rates for color-replacement and color-switch trials and the mean false alarm rate are shown in Figure 2b. A 2 (irregularity type, replacement, or switch) × 3 (pattern configuration) analysis of variance (ANOVA) showed a significant main effect of the irregularity type, F(1,7)=30.375, but the main effect of pattern configuration and the interaction were not significant, F(2,14)=3.00 and F(2,14)=0.25, respectively. Throughout this study, color-replacement detection was highly accurate, and not different across experimental conditions. Thus, I will not report the statistical analyses involving the color-replacement hit rates in greater detail. As for the effect of pattern configuration on the color-switch hit rate, a single factor ANOVA showed no significant main effect, F(2,14)=1.30. Planned comparisons of the bilateral triangle conditions (acute and obtuse) with the equilateral condition also showed no significant difference, F(1,7)=1.06 and F(1,7)=0.27 for the acute and obtuse conditions, respectively. Analyses with d’ showed the same pattern of results. For the remainder of this work, the results with d’ will be reported only when there is any difference from those with hit rates. 
Overall, the difficulty in color-switch detection was unlikely to be solely due to tracking failure via the homogeneity of the stimulus configuration. If the use of a bilateral triangle eliminates tracking failure, the estimated improvement by elimination of tracking failure is around 10%, and the color-switch detection performance is still quite poor. 
Experiment 2: Local Motion Signals and Elimination of Abrupt Onsets and Offsets
Experiment 1 used a sequence of static images with blank periods. Thus, the lack of a local motion signal may make the task difficult. Clear configurational cues and spatiotemporal predictability alone may be insufficient to transform episodic representations or to track objects successfully. Some bottom-up sensory information consistent with this prediction may be necessary. Local motion signals can qualify as such information. Also, abrupt onset and offset of patterns may have disrupted the spatiotemporal integration of episodic representations or tracking of objects. Previous research suggests that abrupt onset and offset creates and discards an object representation, respectively (Yantis & Hillstrom, 1994; Yantis & Jonides, 1996). Recently, Scholl and Pylyshyn (1999) showed that multiple object-tracking performance was not impaired with the occlusion of objects, whereas it was impaired when the objects disappeared for the same amount of time between the abrupt onset and offset. In this experiment, I used a smoothly rotating pattern with an occluder, which makes the pattern visible and occluded alternately (Figure 3a). 
Figure 3
 
a. Illustration of conditions in Experiment 2. b. Mean hit and false alarm rates in Experiment 2.
Figure 3
 
a. Illustration of conditions in Experiment 2. b. Mean hit and false alarm rates in Experiment 2.
Method
The triangular pattern was occluded by a gray figure. There were two independent variables in this experiment: pattern motion (moving and stationary) and visible duration. In the moving condition, the pattern smoothly moved with a velocity of 125°/s, by showing each colored pattern for 40 ms with a 5° interframe angular displacement (see 2). In the attentive tracking literature with rotational motion stimuli, it is shown that people can track four objects up to a speed of 360°/s (Verstraten et al., 2000). Therefore, the rotation speed used in this experiment appears to be well within the trackable range. In the stationary condition, the pattern was stationary at the middle position of the visible phase for the same exposure duration as the corresponding moving condition. The stationary condition had ambiguity in object correspondence, and served as a baseline as the equilateral condition did in Experiment 1. There were two visible durations: the 280-ms condition had a visible phase of 280 ms and an occluded phase of 200 ms, and the 360-ms condition had a visible phase of 360 ms and an occluded phase of 120 ms. Visible duration was manipulated by the shape of the occluder. The 360-ms and 280-ms conditions used occluders with openings of 20° and 10°, respectively (see Figure 3a). There were 144 experimental trials with 36 trials for each condition, composed of 12 color switches, 12 color replacements, and 12 no-change trials. Experiment 4 was written in MATLAB using Psychophysics Toolbox extensions (Brainard, 1997; Pelli, 1997). Participants were 10 Kyoto University undergraduate and graduate students who had normal or corrected-to-normal vision. 
Results and discussion
In general, neither pattern motion nor visible duration had significant effects on color-switch detection. Mean hit rates for color-replacement and color-switch trials, and the mean false alarm rate are shown in Figure 3b, and a 2 (pattern motion) × 2 (visible duration) repeated measures ANOVA with switch hit rates showed no significant main effects or interaction, F(1,9)=0.015, F(1,9)=2.09, and F(1,9)=0.57, for motion, duration, and their interaction, respectively. Again, planned comparison of the moving condition with the baseline stationary condition for each duration condition revealed no significant difference, F(1,9)=0.2 and F(1,9)=0.13 for 280-ms and 360-ms conditions, respectively. 
A local motion signal and the lack of abrupt onset and offset alone cannot eliminate the difficulty in color-switch detection. We conclude that local motion and the elimination of abrupt onset and offset did not have any particular effects on irregularity detection. One should note that the lack of the effect of abrupt onset is inconsistent with Scholl and Pylyshyn (1999), suggesting that irregularity detection in this study and multiple object tracking may be tapping different aspects of episodic representations. It may be, as shown by Scholl and Pylyshyn, that the lack of abrupt onset and offset improves the trackability of objects, but that the trackability itself is insufficient for successful color-switch detection. 
Summary of Part I
Color-switch detection in a dynamic multidimensional situation with multiple objects is in general difficult. Overall, the hit rates for switch detection did not show any significant improvement in the unambiguous motion conditions over the ambiguous motion conditions. First of all, the difficulty in color-switch detection is not due to the problem of perceiving colors with moving objects, because color-replacement detection is almost perfect. Also, it was unlikely to be due to the failure in tracking objects’ rotation, because in Experiments 1 and 2, with apparently unambiguous rotational motion, the color-switch detection performance showed only a small insignificant improvement. Disambiguation of objects’ motion by pattern configuration, smooth and continuous motion, and elimination of abrupt onset and offset is insufficient for successful color-switch detection. Although these results suggest that maintenance and transformation of color-location bindings is difficult, even if the tracking of objects’ locations is successful, one could still argue that the problems observed in Experiments 1 and 2 are due to tracking failure. Experiment 7 in Part III addresses this issue and provides more direct evidence against the tracking failure hypothesis. Before that, Part II reports factors facilitating color-switch detection. 
Part II: Factors Contributing to the Maintenance of Multiple Object Representations
Experiments 3–6 investigated the effect of rotation angle on the irregularity detection performance. Experiment 3 examined the effect of the interframe rotation angle. Experiment 4 examined whether the effect obtained in Experiment 3 was due to the amount of spatial displacement or the amount of angular displacement by enlarging the distances between the disks of a pattern. Experiment 5 further examined whether the effect obtained in the previous experiments was mediated by angular velocity or angular disparity by manipulating frame duration. Finally, Experiment 6 examined whether the spatiotemporal predictability of locations is necessary for the facilitatory effects of reduced rotation angle. 
Experiment 3: Effect of Interframe Rotation Angle
Method
The independent variable of Experiment 3 was interframe rotation angle: 60°, 45°, and 30° conditions (Figure 4a). Each condition had 24 color switches, 24 no changes, and 12 color-replacement trials, and the total number of trials was 180. Participants were seven Nagoya University graduate and undergraduate students who had normal or corrected-to-normal vision. 
Results and discussion
Substantial improvement in detection performance was observed when the interframe rotation angle was reduced. Mean hit rates for color-replacement and color-switch trials and mean false alarm rate are shown in Figure 4b, and mean d’s are shown in Table 1. The color-switch hit rates increased monotonically as the interframe rotation angle decreased. A one-way ANOVA showed a significant effect of rotation angle, F(2,12)=9.93. Planned comparisons of the 45° and 30° conditions with the 60° condition (baseline) showed significant improvement, F(1,6)=6.62 and F(1,6)=16.04, respectively. 
Figure 4
 
a. Illustration of conditions in Experiment 3. b. Mean hit and false alarm rates in Experiment 3.
Figure 4
 
a. Illustration of conditions in Experiment 3. b. Mean hit and false alarm rates in Experiment 3.
Table 1
 
Mean d′ values for each condition of Experiments 1–7
Table 1
 
Mean d′ values for each condition of Experiments 1–7
Experiment condition d′
Experiment 1 Acute 2.21
Equilateral* 1.76
Obtuse 1.97
Experiment 2 Stationary/240 * 1.29
Stationary/360 * 1.46
Moving/240 1.56
Moving/360 1.80
Experiment 3 60° * 1.74
45° 2.72
30° 3.10
Experiment 4 60° /small * 1.85
30° /small 3.09
30° /large 3.27
Experiment 5 360-ms 3.13
240-ms 2.95
80-ms 3.10
Experiment 6 30° /30° 2.08
30° /60° 2.07m
60° /30° 1.77
60° /60° * 1.73
Experiment 7 On-target/90° 2.74
On-target/60° 2.71
On-target/30° 3.54
On-target/90° 1.85
On-target/60° 2.32
On-target/30° 2.55
 

Conditions with * are inherently ambiguous in motion correspondences and serve as baseline conditions.

Experiment 4: Rotation Angle or Spatial Displacement?
Method
In Experiment 4, a new condition with a pattern whose disks were located 4.0° from the central fixation (30°/large condition) was introduced (Figure 5a), in addition to the 30° and 60° conditions with smaller patterns (30°/small and 60°/small conditions, respectively). The rotation angle and the amount of spatial displacement of the 30°/large condition were comparable to that of the 30°/small condition and that of the 60°/small condition. Thus, if the rotation angle determines the irregularity detection performance, the performance in the 30°/large condition should be similar to that in the 30°/small condition. The number of trials was the same as Experiment 3. Participants were seven Nagoya University graduate and undergraduate students who had normal or corrected-to-normal vision. 
Figure 5
 
a. Illustration of conditions in Experiment 4. b. Mean hit and false alarm rates in Experiment 4.
Figure 5
 
a. Illustration of conditions in Experiment 4. b. Mean hit and false alarm rates in Experiment 4.
Results and discussion
The effect of rotation angle is not due to the reduction of spatial displacement in the smaller rotation-angle conditions because the enlargement of the triangular pattern did not affect the detection performance at all. Mean hit rates for color-replacement and color-switch trials and mean false alarm rate are shown in Figure 5b, and mean d’s are shown in Table 1. The color-switch hit rates were higher both in the 30°/large and 30°/small conditions than in the 60°/small condition. A one-way ANOVA showed a significant effect of rotation angle, F(2,12)=16.73. Planned comparisons of the 30°/large and 30°/small conditions with the 60°/small condition (baseline) showed a significant improvement, F(1,6)=22.48 and F(1,6)=18.42, respectively. 
Experiment 5: Rotation Angle or Rotation Velocity?
Method
The interframe rotation angle and duration of blank period were fixed to 30° and 160 ms, respectively, and the exposure duration of each frame was varied (Figure 6a). The 360-ms condition was identical to the 30° condition in Experiment 3. The 240-ms condition had an angular velocity of 75°/s, which was close to that for the 45° condition in Experiment 3. The 80-ms condition had an angular velocity of 125°/s, corresponding to the 60° condition in Experiment 3. If the angular velocity determines the irregularity detection performance, the temporal schedule should have a similar effect to that found in Experiment 3, whereas no significant effect would be expected if the angular disparity was important. The number of trials was the same as in Experiment 3. Participants were seven Kyoto University graduate and undergraduate students who had normal or corrected-to-normal vision. 
Figure 6
 
a. Illustration of conditions in Experiment 5. b. Mean hit and false alarm rates in Experiment 5.
Figure 6
 
a. Illustration of conditions in Experiment 5. b. Mean hit and false alarm rates in Experiment 5.
Results and discussion
Experiment 5 shows that the effect cannot be attributed to the level of angular velocity. Mean hit rates for color-replacement and color-switch trials and the mean false alarm rate are shown in Figure 6b, and mean d’s are shown in Table 1. The color-switch hit rates for all three conditions were high, and there was no clear difference among the conditions. A one-way ANOVA showed no significant effect of exposure duration, F(2,12)=0.15. Further analysis comparing data from Experiments 3 and 5 with a 2 (experiment) × 3 (angular velocity) ANOVA revealed a significant main effect of angular velocity, F(2,24)=5.78, and the interaction of experiment and angular velocity, F(2,12)=5.33. Inconsistent with the hypothesis that angular velocity determines color-switch detection, the effect of angular velocity was significantly smaller in Experiment 5 than in Experiment 3. Even with the same angular velocity, the 60° condition in Experiment 3 had a significantly lower color-switch hit rate than the 80-ms condition in Experiment 5, F(1,12)=7.56. Within the range that this experiment manipulated, the irregularity detection performance did not depend on angular velocity, but rather on the angular disparity between frames. 
Experiment 6: Necessity of Spatiotemporal Predictability
Experiments 3–5 have shown that smaller angular displacement between frames dramatically facilitates irregularity detection performance. When the angular disparity was 45° or smaller, the color-switch hit rates were significantly better than the baseline condition. Experiment 6 examined whether this facilitatory effect was mediated by predictability of future locations. Although Experiments 1 and 2 revealed that up to an interframe rotation angle of 60° complete predictability of future locations was insufficient to maintain the episodic representations of multiple objects, locational predictability may be necessary to enable participants to detect irregularity in smaller angular displacement conditions. In Experiment 6, interframe angular displacement was varied randomly between 30° and 60° within each trial, so participants could not predict the location of objects in the next frame. If locational predictability is necessary for irregularity detection, this manipulation should significantly disrupt performance. In contrast, if the irregularity detection with 30° rotation conditions was mediated by local bottom-up processing, performance should not be impaired. 
Figure 7
 
a. Illustration of conditions in Experiment 6. b. Mean hit and false alarm rates in Experiment 6.
Figure 7
 
a. Illustration of conditions in Experiment 6. b. Mean hit and false alarm rates in Experiment 6.
Method
Unlike previous experiments, interframe rotation angle within each trial was not fixed in this experiment. Each interframe rotation angle within a single trial had either 30° or 60° rotation. The order of rotation angles was random except for the two critical intervals before and after the irregular event, where the combinations of rotation angles were defined as independent variables. Thus, the independent variables were rotation angle in the interval before the irregular event (called “Before” angle): 60° and 30° conditions and rotation angle in the interval after the irregular event (called “After” angle): 60° and 30° conditions (Figure 7a). In no-change trials, the critical intervals were set so that the temporal positions of these periods were matched to the color-replacement and color-switch conditions. As in the previous experiments, irregular events occurred between the fourth and seventh frames; thus, the earliest critical interval was set at the third interval (between the third and fourth frames) and the latest critical interval was set at the seventh interval (between the seventh and eighth frames). In Experiment 6, the exposure duration of each frame was 160 ms and the blank period was 360 ms. Each condition had 24 color switches, 24 no changes, and 12 color-replacement trials. Thus, the total number of experimental trials was 240. Participants were seven Kyoto University undergraduate students who had normal or corrected-to-normal vision. 
Results and discussion
Overall, Experiment 6 showed that the locational predictability was necessary for the successful detection of irregularity. Mean hit rates for color-replacement and color-switch trials and the mean false alarm rate are shown in Figure 7b, and mean d’s are shown in Table 1. The color-switch hit rates showed no clear differences among the conditions. A 2 (Before angle) × 2 (After angle) ANOVA showed no significant main effects, F(1,6)=2.58 and F(1,6)=0.09, for the Before and After angles, respectively, and no interaction, F(1,6)=0.74. Although inspection of Figure 7b suggests that the Before angle has a weak effect such that 30° rotation tends to be easier than 60° rotation, the effect was not significant and apparently much smaller than those in Experiments 3 and 4. To further evaluate the effect of spatiotemporal predictability, 30°/30° and 60°/60° conditions in this experiment were compared with the corresponding conditions (30° and 60° conditions) in Experiment 3. A 2 (experiment) × 2 (rotation angle) ANOVA revealed a significant main effect of rotation angle, F(1,12)=24.22, and a significant interaction of experiment and rotation angle, F(1,12)=7.93. The significant interaction shows that the effect of rotation angle was significantly reduced in Experiment 6, compared with that in Experiment 3. Planned comparisons by linear contrast tests revealed that the effect of rotation angle was significant for Experiment 3, F(1,12)=29.93, but not significant for Experiment 6, F(1,12)=2.22. When the future locations of objects were uncertain, irregularity detection performance was significantly impaired even when the angular difference was 30°, suggesting that locational predictability is a necessary condition for successful color-switch detection. 
One may argue that the change in temporal schedule, particularly a reduction in the exposure duration, disrupted color-switch detection. However, it is highly unlikely that this is the sole reason for the impairment in this experiment, because in previous experiments (Experiments 5) with predictable rotation, the color-switch detection performance was not sensitive to the temporal schedule. 
Summary of Part II
Unlike Part I, a reduction of the interframe rotation angle substantially improved the color-switch detection performance. Within 45° interframe rotation, the hit rate for color switch was significantly better than the ambiguous baseline condition of 60° interframe rotation. Experiment 4 with larger patterns showed that this facilitatory effect was due to interframe rotation angle, not due to interframe spatial displacement. Experiment 5 with different exposure durations suggested that rotation angle, not angular velocity, determined the performance. Experiment 6 used a spatiotemporally unpredictable rotation sequence, and showed that spatiotemporal predictability is a necessary condition for improvement with smaller rotation angles. 
Part III: Evidence for Color-Switch Detection Difficulty Independent of Tracking Failure
Although I have tried to establish evidence against the tracking failure account for the difficulty in color-switch detection in Experiments 1 and 2, one may still argue for the tracking failure account by assuming that the use of smooth motion and nonequilateral triangular configurations did not actually help participants correctly track the objects’ rotation. Thus, we need more direct evidence that participants had difficulty in color-switch detection even when the tracking of objects was successful. In Experiment 7, smooth and continuous motion was again used to help participants’ tracking. A dual task setting was used to obtain color-switch detection performance conditionalized to the successful tracking of a cued object. Participants had to judge both the location of the tracked object and the presence of a color switch for each trial. If the difficulty in the color-switch detection reflected tracking failure, then color-switch detection should be much more accurate when the tracking judgment is correct than when it is incorrect. In contrast, if the difficulty in the color-switch detection was not due to tracking failure, then color-switch detection should still be difficult even when the tracking is successful. Another modification in Experiment 7 was the elimination of the color-replacement condition, because the presence of color replacement may have led some participants to ignore color-switch detection. 
Experiment 7: A Dual Task of Color-Switch Detection and Tracking
Method
Materials were similar to those in the moving 360-ms condition of Experiment 2, except for the following changes. First, the number of objects was four in this experiment, to make the tracking task more challenging. Second, to replicate the effect of rotation angle in Experiments 3–5, three rotation angle conditions were used. Third, the rotation direction was randomly varied across trials within each participant, to make the tracking task more challenging. Fourth, the color-replacement detection condition was eliminated. 
Participants were asked to track a precued target object and to judge the presence of a color switch simultaneously. A schematic illustration of the procedure is shown in Figure 8a. At the beginning of each trial, a beep was followed by a stimulus display that had four gray disks and an occluder. A randomly chosen tracking target was precued by flashing three times. Then 300 ms later, four disks changed colors from gray to four different colors and began rotating. The direction of pattern rotation was randomized across trials. The pattern smoothly and continuously rotated in the same way as in Experiment 2. The visible and occluded durations were both 360 ms, and the rotation speed of the pattern was manipulated by the relative motion of the pattern and the occluder. In the 90°/period condition, where the rotation speed was matched to the 60° condition in Experiment 3, the pattern alone rotated, and the occluder was stationary as in Experiment 2. However, in the 60°/period and 30°/period conditions, the rotation speeds of the patterns were two thirds and one third of the 90°/period condition, respectively, and unlike Experiment 2, the occluder rotated in the opposite direction with velocities to match the visible and occlusion durations across the three conditions. Therefore, whereas the rotation speed of the pattern was different across conditions, the visible and occluded durations were equal across conditions. The color switch occurred between a pair that contained the tracking target (on-target trials), or between a pair that did not contain the target (off-target trials). The numbers of on-target and off-target trials were equal. At the last visible period, the four disks appeared in gray, and stopped at the middle of the opening of the occluder. To prevent participants from predicting the stopping location of the tracking target by timing, the last visible period varied randomly between the 9th and 11th periods. 
Participants always judged the color switch first by pressing the 1 or 3 key as before. When they made a color-switch judgment, and the pattern rotation stopped, an arrow cursor appeared at the middle of the occluder, and they were asked to click the location of the tracking target. There were six experimental conditions composed of two factors; rotation angle (90°, 60°, and 30°) and color-switch location (on-target and off-target). Each condition had 24 color switches, and 24 no-change trials, half of which had clockwise rotation. The total number of experimental trials was 288. Participants were instructed to try to be accurate for both tasks, and not to sacrifice one task for the other. There was no correct feedback for both tasks. Participants were eight Kyoto University graduate students who had normal or corrected-to-normal vision. 
Figure 8
 
a. Illustration of conditions in Experiment 7. b. Mean hit and false alarm rates in Experiment 7.
Figure 8
 
a. Illustration of conditions in Experiment 7. b. Mean hit and false alarm rates in Experiment 7.
Results and discussion
Overall, Experiment 7 showed that the difficulty in color-switch detection is not due to tracking failure. First, not surprisingly, the tracking performance was highly accurate (M=0.962), and not affected by rotation angle. A 3 (rotation angle) × 2 (color switch, present or absent) × 2 (color-switch location, on- or off-target) ANOVA showed no significant main effects or interactions. This accurate target tracking is consistent with findings of attentive tracking (Verstraten, et al., 2000) and suggests that within this range of rotation angle, attentive tracking performance is not disrupted by frequent occlusions. 
The performance of the color-switch detection task was analyzed on the condition of the successful tracking. The conditionalized hit rates and false alarm rates are shown in Figure 8b, and d’s are shown in Table 1. Because of the extremely high accuracy in the tracking task, conditionalized hit rates and false alarms were virtually identical to the unconditionalized counterparts, and the statistical analyses showed the same pattern of results. Thus, I report only the conditionalized data analyses. The conditionalized hit rates were analyzed by a 3 (rotation angle) × 2 (color-switch location: on- or off-target) ANOVA, showing significant main effects of rotation angle, F(2,14)=29.46, and color-switch location, F(1,7)=6.59. The interaction was not significant, F(1,7)=1.91. Unlike the tracking performance, the conditionalized hit rate decreased as the rotation angle increased, which is consistent with the results of Experiments 3–5. Moreover, the switch detection was more accurate when a switching item was on the tracking target than when it was off. This result may indicate that focused attention facilitates color-switch detection. More important, there was a significant effect of rotation angle even when the switching items contain the tracking target, F(2,14)=6.28, suggesting that successful tracking is not sufficient for color-switch detection. 
The results of Experiment 7 suggest that results in Experiments 1 and 2 mainly reflect the difficulty in maintaining color-location bindings in the 60° rotation conditions, not a failure in tracking. Indeed, the color-switch performance in the 90° rotation condition of this experiment was comparable to that in the corresponding condition in Experiment 2. Because Experiment 2 used even fewer objects, it is highly unlikely that the difficulty in Experiment 2 was due to tracking failure induced by frequent occlusions. 
One might argue that extremely accurate tracking performance is somewhat deceptive, because the task asks participants to track only a single object. It might be the case that participants could track only the cued object, while the motion correspondence of the other three objects was completely lost. Although the data from this experiment could not rule out this extreme possibility, even if it should be the case, the tracking failure hypotheses cannot explain the difficulty in color-switch detection. Under the assumption that non-target objects are lost, the tracking failure hypothesis predicts the on-target condition will show extremely accurate color-switch performance, while only the off-target condition will show errors. However, this prediction was not supported by the data. Color-switch detection suffered from the increase in rotation angle, regardless of the tracking target location. Somewhat surprisingly, there were a significant number of misses of color switch even when the color switch occurred on the tracking target. More natural interpretation of the results of this experiment seems that the better detection of color switch for the on-target condition is due to the distribution of attention biased to the tracking target. 
General Discussion
The properties of episodic representations of multiple objects appear to be quite different between static and dynamic situations. In a static situation, as in Luck and Vogel (1997), correct feature-location binding for up to four objects can be maintained. In contrast, in a dynamic setting as in this study, correct feature-location binding for three objects did not seem to be maintained to the degree that allows color-switch detection. A series of experiments revealed that even when the motion correspondences are unambiguous by the use of pattern configurations and continuous motion, and object tracking is successful as in Experiment 7, color-switch detection performance is difficult; there was no significant improvement compared with the situations where motion correspondences were inherently ambiguous. At the same time, it has been revealed that color-switch detection performance is critically dependent on the interframe rotation angle, and that a facilitatory effect occurred only when spatiotemporal predictability was satisfied. 
One important aspect of these results is that color-switch detection is difficult even when objects are easily trackable. Thus, it is unlikely that the difficulty is due to tracking failure. Clearly, spatial uncertainty and stimulus complexity alone cannot explain the results either. Participants’ top-down knowledge of predictable pattern rotation is not sufficient to detect an irregularity. Because the number of objects in the sequences was just three, stimulus complexity in terms of the number of stimuli was lower than in previous studies, such as Luck and Vogel (1997) and Pylyshyn and Storm (1988). There are some possible reasons for the difficulty in color-switch detection. First, the items in this study were somewhat closer to each other than in other studies. However, the between-item distances were not so dramatically different from other studies, so it is unlikely that this is the major reason. Second, the regular rotation repeatedly places the objects in the same position as the other objects had been a moment before. This factor may have played significant roles. The repeated placement of different colored disks on the same location may create substantial interference in color-location bindings, and a recent study by Wheeler and Treisman (2002) suggests that interference among items is a major factor impairing binding in visual short term memory. Wheeler and Treisman found that in a change-detection task with static stimuli similar to Luck and Vogel, there was a significant impairment in change detection of color switch (they call the binding condition) when the probe is the whole display. The impairment disappeared when the single item probe was used, suggesting that interference by the items in the whole display probe is a major factor in impairment. 
This interference account can explain the facilitation by smaller interframe rotation angle as well. The reduction of interframe rotation angle makes the distance between the current and previous position of one object smaller than the distance between the current position of one object and previous position of another object, which not only helps motion correspondences but also reduces the interference in the binding memory. In contrast, smooth, continuous motion (Experiments 2 and 7) helps only motion correspondences, and the same interference in the binding memory as long as the interframe rotation angle is large. Furthermore, the disappearance of the facilitatory effect of smaller rotation angle with unpredictable rotation in Experiment 6 is consistent with the interference account, because spatial uncertainty in rotation is likely to increase interference. 
Although more systematic investigations are certainly necessary, the results of this study suggest that the problems with color-switch detection reflect the spatiotemporal interference in feature-location binding in visual working memory, not the encoding of the objects’ color and motion per se. Further study needs to investigate whether tightly bound object representations are formed in dynamic situations, but are distracted by spatiotemporal interference, or the formation of multiple object representations itself is restricted to static situations. Given recent challenges to Luck and Vogel’s (1997) findings (Wheeler & Treisman, 2002; Xu, 2002), the validity of the object-based account of visual working memory in general has also to be critically evaluated. In the course of these investigations, the irregularity detection task introduced in this study can play an important role. Further improvements in the experimental paradigm and systematic comparisons with other studies may reveal the mechanism of formation, maintenance, and transformation of episodic representations in visual cognition. In particular, we need to know what aspects of the current findings are specific to the particular display used in this study, and how well they can be generalized to other displays and experimental paradigms. 
Mechanism of Transformation of Episodic Representations
In this section, I consider the following two factors to discuss the difficulty of color-switch detection in a dynamic situation. The first factor is whether the difficulty resides in the spatiotemporal correspondence or in the feature-location binding. According to Kahneman et al. (1992), perceptual continuity is achieved by these three suboperations: (1) a correspondence operation, (2) a reviewing operation, and (3) an impletion process. The correspondence operation determines which object in a display is an object recently perceived at a different location. The reviewing process retrieves the characteristics of the previous object, not currently seen. The impletion process uses current and reviewed information to produce a percept of change or motion. Presumably, the correspondence operation does not depend on featural information, such as color and shape, as shown by the apparent motion literature (Kolers, 1972). Featural properties play essentially no role in apparent motion, unless the spatiotemporal parameters are perfectly balanced. However, the impletion process (given successful retrieval) involves the binding of featural information and spatiotemporal information by definition. The results of this study and some previous studies suggest that the difficulty in color-switch detection resides in the impletion operation. It is difficult to explain why various manipulations of improving correspondence did not substantially improve participants’ performances. In particular, the elimination of abrupt onset and offset did not improve color-switch detection (Experiment 2), whereas Scholl and Pylyshyn (1999) showed that it had a substantial effect on tracking performance. Although the visual system is successful in making correspondence, it may fail in the impletion process. This impletion failure hypothesis can account for the inconsistency between this study and Pylyshyn and Storm (1988). Multiple object tracking can be performed by a correspondence operation alone, whereas the color-switch detection in this study required the impletion process. 
The second factor is whether predictive transformation of episodic representations has anything to do with the difficulty in color-switch detection. Unlike Kahneman et al. (1992) and Pylyshyn and Storm (1988), in which there was spatial uncertainty regarding the correspondence, participants in this study may have spontaneously transformed their episodic representations during the blank period, because the future locations were completely predictable except for Experiments 6 and 7. Given the substantial impairment of the color-switch detection in Experiment 6, predictive transformation of episodic representations during the blank period appears to be an important determinant of color switch detection performance. Therefore, overall, the underlying mechanism of color-switch detection seems to be the predictive spatiotemporal transformation of feature-location bindings. However, as mentioned above, the effect of predictability may be related to spatiotemporal interference, not to top-down spatiotemporal prediction per se, so the issue of the role of top-down prediction is somewhat unclear at this point. 
These arguments imply that multiple object tracking and irregularity detection tasks reflect distinct processes. The tracking task may reflect the correspondence operation, which is more automatic and bottom-up; thus, spatiotemporal predictability is not important, but local characteristics such as abrupt onset and offset have substantial effects on performance (Scholl & Pylyshyn, 1999). However, the irregularity detection task may reflect the predictive feature-location binding mechanism, which is less sensitive to local information, such as local motion and abrupt onsets, while the effect of top-down predictability may be essential. Further research directly investigating these issues is necessary to understand the mechanism of transformation of episodic representations. 
Relation to Object Files and FINSTs
Object files and FINSTs are two major theoretical notions previously proposed. Findings in this study revealed further spatiotemporal constraints on the maintenance of these episodic object tokens. Kahneman et al. (1992) argued that object files can survive some spatiotemporal gaps as in the case of occlusion and apparent motion, but their limits were not systematically investigated. This study revealed that there are some severe spatiotemporal limits for the survival of multiple object files: beyond 45° regular rotation, even three object files are difficult to maintain. This result is not quite consistent with the findings of Kahneman et al. that four object files can be maintained concurrently, and that object files can survive a spatiotemporal gap of 590-ms ISI. There are some possible reasons for this inconsistency, and an important one is the difference in experimental paradigms. Kahneman et al. used a review paradigm that is similar to the priming paradigm; thus, the object-specific preview benefit seems to reflect implicit aspects of the maintenance of object files. In contrast, the irregularity detection task in this study clearly investigated the explicit detection of color switch or replacement. Therefore, the inconsistency between this study and that of Kahneman et al. may reflect dissociation between the explicit and implicit nature of visual cognition. Another possibility is that the use of object linkers in Kahneman et al. may have facilitated the maintenance of object files. In their Experiment 5 with four objects, the object frames (squares) remained visible while target letters disappeared, so the spatiotemporal continuity of perceptual objects was maintained while their featural contents (with or without letters) underwent substantial changes during the objects’ motion. On the other hand, all experiments in this study had some substantial blank period during which no object information was presented on the display. Further studies using common stimuli and paradigms are necessary. 
The findings of this study can be accounted for by the FINST theory in that there is a distinct stage of spatial indexing which is preattentive, but capacity-limited. According to Pylyshyn (1989), spatial indexing and feature-location binding are separate mechanisms, and encoding visual features and binding them to locations require additional processing stages. The findings of this work can be considered as reflecting feature-location bindings, which are presumably more capacity-limited than the spatial indexing investigated by the multiple object-tracking paradigm. Kahneman et al. (1992) claimed that a FINST might be the initial phase of an object file before any features are attached to it, and according to this interpretation, the findings of this study suggest that in dynamic situations people are successful only in the initial phase of object file formation. 
Relation to Studies on Visual Working Memory
A recent study on visual working memory by Luck and Vogel (1997) claimed that functional units of visual working memory are perceptual objects, not features. The results of this study cast some doubt on their interpretation. Clearly, in a dynamic situation, even three perceptual objects cannot be concurrently maintained beyond some limited spatiotemporal gap. The findings of this study, Luck and Vogel, and Pylyshyn and Storm (1988) suggest certain characteristics of perceptual objects in visual cognition. First, the situation of Luck and Vogel’s experiments can be interpreted as a situation with maximum spatiotemporal predictability. According to this interpretation, the coherence of perceptual objects can be maintained only within limited situations; a limited amount of predictable spatiotemporal transformation. Because the range where multiple episodic representations are successfully maintained is much more limited than the range over which people can perceive objects’ continuity (e.g., apparent motion, and motion behind an occluder), it is unclear whether we can claim that the results of this study and Luck and Vogel reflect representations of perceptual objects in the usual sense. Alternatively, Luck and Vogel’s data may reflect processes qualitatively different from this study’s. Their data may reflect temporary aggregate of features bound to particular locations, not bound to perceptual objects. It has been argued that location plays a privileged role in visual cognition (Treisman, 1988; Tsal & Lavie, 1993). Recently, some studies have cast doubt on Luck and Vogel’s data, suggesting the role of independent feature memories (Wheeler & Treisman, 2002; Xu, 2002). For example, Wheeler and Treisman (2002) failed to replicate the critical color-color conjunction condition of Luck and Vogel. The plausibility of these alternative interpretations depends on the consistency of the results of this study with smaller predictable transformations, with the results of Luck and Vogel. Further studies involving various spatiotemporal and featural transformations are necessary to resolve this issue. 
The difficulty in color-switch detection may be related to the functional dissociation between object and spatial working memory. Research in physiology and functional brain imaging suggests that spatial and object working memory systems reside in distinct brain regions (Smith & Jonides, 1997; Wilson, O’Scalaidhe, & Goldman-Rakic, 1993; but see Rao, Rainer, & Miller, 1997). The irregularity detection task may require integration of information stored in distinct brain regions. Whether the difficulty in maintaining multidimensional features is limited to situations where object and spatial working memory are to be integrated is beyond the scope of this study, and further study with various types of feature conjunctions is necessary. 
This work is consistent with some recent evidence that the system of visual cognition is working with much less memory than we believe (Ballard, Hayhoe, Pook, & Rao, 1997; Horowitz & Wolfe, 1998; Rensink, O’Regan, & Clark, 1997). For example, a phenomenon called change blindness shows that people are surprisingly poor at noticing large changes to objects, photographs, and motion pictures from one instant to the next (Simons, 2000). Contrary to some suggestions that change blindness merely reflects inefficient visual search for temporal change in complex stimuli, this study shows that a similar impairment in visual cognition can occur with quite simple and regular stimuli. Although we are able to store multidimensional information in a static display (Luck & Vogel, 1997), and to track dynamic changes of multiple unidimensional objects (Pylyshyn & Storm, 1988), we can store only one or two multidimensional dynamic objects. The maintenance and transformation of episodic representations of multiple objects seem to involve a dynamic process, which is determined by featural, spatiotemporal properties, and locational predictability. 
Conclusion
Our ability to maintain episodic representations of multiple objects in a completely predictable dynamic situation is limited. This finding strongly suggests that previous findings obtained with static displays (Luck & Vogel, 1997) and a dynamic multiple object-tracking task (Pylyshyn & Storm, 1988) may not reflect the function of common high-level episodic representations, such as object files, where featural and spatial information is coherently bound together. Instead, previous findings are likely to be mediated by lower-level representations, such as FINSTs in the case of multiple object tracking, and location-based feature clusters in the case of Luck & Vogel. As the FINST theory claims, spatiotemporal indexing and feature-location binding are separate mechanisms, and the latter requires substantial additional resources. To maintain feature-location binding of multiple objects, spatiotemporal continuity for successful object tracking is necessary but not sufficient. Regardless of the smoothness of the objects’ motion, the maintenance of feature-location binding of three objects is possible only when the rotation angle across gap is smaller than 45°, suggesting that a reduction in spatiotemporal interference among feature-location bindings may be critical. In addition, spatiotemporal predictability may be necessary for successful maintenance of feature-location bindings. 
Acknowledgments
I thank Toshio Inui, Tram Neill, Jane Raymond, and three anonymous reviewers for helpful comments on earlier manuscripts. This work was supported by Japanese Ministry of Education, Culture, Sports, Science and Technology Grants-in-Aid for Scientific Research (No.11610075, 12551001, 13610084, and 14019053), The Research for the Future Program from the Japan Society for the Promotion of Science (JSPS-RFTF99P01401), and Toyota High-Tech Research Grant Program. Commercial Relationships: None. 
References
Ballard, D. H. Hayhoe, M. M Pook^P. K Rao, R. P. N. (1997). Deictic codes for the embodiment of cognition. Behavioral and Brain Sciences, 20, 723–767. [PubMed] [PubMed]
Brainard, D. H. (1997). The Psychophysics Toolbox. Spatial Vision, 10, 443–446. [PubMed] [CrossRef] [PubMed]
Horowitz, T. S. Wolfe, J. M. (1998). Visual search has no memory. Nature, 394, 575–577. [PubMed] [CrossRef] [PubMed]
Kahneman, D. Treisman, A. (1984). Changing views of attention and automaticity. In Parasuraman, R. Davis, D. A. (Eds.), Varieties of attention (pp.29–62). New York: Academic Press.
Kahneman, D. Treisman, A. Gibbs, B. J. (1992). The reviewing of object files: Object specific integration of information. Cognitive Psychology, 24, 175–219. [PubMed] [CrossRef] [PubMed]
Kanwisher, N. (1987). Repetition blindness: Type recognition without token individuation. Cognition, 27, 117–143. [PubMed] [CrossRef] [PubMed]
Kolers, P. A. (1972). Aspects of motion perception. Elmsford, NY: Pergamon.
Luck, S. J. Vogel, E. K. (1997). The capacity of visual working memory for features and conjunctions. Nature, 390, 279–281. [PubMed] [CrossRef] [PubMed]
Pelli, D. G. (1997). The Video Toolbox software for visual psychophysics: Transforming numbers into movies. Spatial Vision, 10, 437–442. [PubMed] [CrossRef] [PubMed]
Phillips, W. A. (1974). On the distinction between sensory storage and short-term visual memory. Perception and Psychophysics, 16, 283–290. [CrossRef]
Pylyshyn, Z. W. (1989). The role of location indexes in spatial perception: A sketch of the FINST spatial-index model. Cognition, 32, 65–97. [PubMed] [CrossRef] [PubMed]
Pylyshyn, Z. W. (2001). Visual indexes, preconceptual objects, and situated vision. Cognition, 80, 127–158. [PubMed] [CrossRef] [PubMed]
Pylyshyn, Z. W. Storm, R. (1988). Tracking multiple independent targets: Evidence for both serial and parallel stages. Spatial Vision, 3, 179–197. [PubMed] [CrossRef] [PubMed]
Rao, S. C. Rainer, G. Miller, E. K. (1997). Integration of what and where in the primate prefrontal cortex. Science, 276, 821–824. [PubMed] [CrossRef] [PubMed]
Rensink, R. A. OrsRegan, J. K. Clark, J. J. (1997). To see or not to see: The need for attention to perceive changes in scenes. Psychological Science, 8, 368–373. [CrossRef]
Scholl, B. J. Pylyshyn, Z. W. (1999). Tracking multiple items through occlusion: Clues to visual objecthood. Cognitive Psychology, 38, 259–290. [PubMed] [CrossRef] [PubMed]
Simons, D. J. (2000). Current approaches to change blindness. Visual Cognition, 7, 1–15. [CrossRef]
Smith, E. E. Jonides, J. (1997). Working memory: A view from neuroimaging. Cognitive Psychology, 33, 5–42. [PubMed] [CrossRef] [PubMed]
Treisman, A. (1988). Features and objects: The fourteenth Bartlett memorial lecture. Quarterly Journal of Experimental Psychology: Human Experimental Psychology, 40A, 201–237. [CrossRef]
Tsal, Y. Lavie, N. (1993). Location dominance in attending to color and shape. Journal of Experimental Psychology: Human Perception and Performance, 19, 131–139. [PubMed] [CrossRef] [PubMed]
Verstraten, Y. Cavanagh, P. Labianca, N. (2000). Limits of attentive tracking reveal temporal properties of attention. Vision Research, 40, 3651–3664. [PubMed] [CrossRef] [PubMed]
Wheeler, M. E. Treisman, A. (2002). Binding in short-term visual memory. Journal of Experimental Psychology: General, 131, 48–64. [PubMed] [CrossRef] [PubMed]
Wilson, F. A. W. O’Scalaidhe, S. P. Goldman-Rakic, P. S. (1993). Dissociation of object and spatial processing domains in primate prefrontal cortex. Science, 260, 1955–1958. [PubMed] [CrossRef] [PubMed]
Xu, Y. (2002). Limitations of object-based feature encoding in visual short-term memory. Journal of Experimental Psychology: Human Perception and Performance, 28, 458–468. [PubMed] [CrossRef] [PubMed]
Yantis, S. Hillstrom, A. P. (1994). Stimulus-driven attentional capture: Evidence from equiluminant visual objects. Journal of Experimental Psychology: Human Perception and Performance, 20, 95–107. [PubMed] [CrossRef] [PubMed]
Yantis, S. Jonides, J. (1996). Attentional capture by abrupt onsets: New perceptual objects or visual masking? Journal of Experimental Psychology: Human Perception and Performance, 22, 1505–1513. [PubMed]
Figure 1
 
Schematic illustration of the irregularity detection task. In this example, rotation direction is clockwise, and irregularity occurs in the second frame.
Figure 1
 
Schematic illustration of the irregularity detection task. In this example, rotation direction is clockwise, and irregularity occurs in the second frame.
Figure 2
 
a. Illustration of conditions in Experiment 1. b. Mean hit and false alarm rates in Experiment 1.
Figure 2
 
a. Illustration of conditions in Experiment 1. b. Mean hit and false alarm rates in Experiment 1.
Figure 3
 
a. Illustration of conditions in Experiment 2. b. Mean hit and false alarm rates in Experiment 2.
Figure 3
 
a. Illustration of conditions in Experiment 2. b. Mean hit and false alarm rates in Experiment 2.
Figure 4
 
a. Illustration of conditions in Experiment 3. b. Mean hit and false alarm rates in Experiment 3.
Figure 4
 
a. Illustration of conditions in Experiment 3. b. Mean hit and false alarm rates in Experiment 3.
Figure 5
 
a. Illustration of conditions in Experiment 4. b. Mean hit and false alarm rates in Experiment 4.
Figure 5
 
a. Illustration of conditions in Experiment 4. b. Mean hit and false alarm rates in Experiment 4.
Figure 6
 
a. Illustration of conditions in Experiment 5. b. Mean hit and false alarm rates in Experiment 5.
Figure 6
 
a. Illustration of conditions in Experiment 5. b. Mean hit and false alarm rates in Experiment 5.
Figure 7
 
a. Illustration of conditions in Experiment 6. b. Mean hit and false alarm rates in Experiment 6.
Figure 7
 
a. Illustration of conditions in Experiment 6. b. Mean hit and false alarm rates in Experiment 6.
Figure 8
 
a. Illustration of conditions in Experiment 7. b. Mean hit and false alarm rates in Experiment 7.
Figure 8
 
a. Illustration of conditions in Experiment 7. b. Mean hit and false alarm rates in Experiment 7.
Table 1
 
Mean d′ values for each condition of Experiments 1–7
Table 1
 
Mean d′ values for each condition of Experiments 1–7
Experiment condition d′
Experiment 1 Acute 2.21
Equilateral* 1.76
Obtuse 1.97
Experiment 2 Stationary/240 * 1.29
Stationary/360 * 1.46
Moving/240 1.56
Moving/360 1.80
Experiment 3 60° * 1.74
45° 2.72
30° 3.10
Experiment 4 60° /small * 1.85
30° /small 3.09
30° /large 3.27
Experiment 5 360-ms 3.13
240-ms 2.95
80-ms 3.10
Experiment 6 30° /30° 2.08
30° /60° 2.07m
60° /30° 1.77
60° /60° * 1.73
Experiment 7 On-target/90° 2.74
On-target/60° 2.71
On-target/30° 3.54
On-target/90° 1.85
On-target/60° 2.32
On-target/30° 2.55
 

Conditions with * are inherently ambiguous in motion correspondences and serve as baseline conditions.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×