Most moving objects in the world are non-rigid, changing shape as they move. To disentangle shape changes from movements, computational models either fit shapes to combinations of basis shapes or motion trajectories to combinations of oscillations but are biologically unfeasible in their input requirements. Recent neural models parse shapes into stored examples, which are unlikely to exist for general shapes. We propose that extracting shape attributes, e.g., symmetry, facilitates veridical perception of non-rigid motion. In a new method, identical dots were moved in and out along invisible spokes, to simulate the rotation of dynamically and randomly distorting shapes. Discrimination of rotation direction measured as a function of non-rigidity was 90% as efficient as the optimal Bayesian rotation decoder and ruled out models based on combining the strongest local motions. Remarkably, for non-rigid symmetric shapes, observers outperformed the Bayesian model when perceived rotation could correspond only to rotation of global symmetry, i.e., when tracking of shape contours or local features was uninformative. That extracted symmetry can drive perceived motion suggests that shape attributes may provide links across the dorsal–ventral separation between motion and shape processing. Consequently, the perception of non-rigid object motion could be based on representations that highlight global shape attributes.

*θ*= 18°) and independently varying each dot's radial distance from the central fixation along its invisible spoke (Figure 1B). For each trial, the variations were drawn from a Gaussian random distribution with zero mean and standard deviation

*α*, the

*shape amplitude*. Further, on each frame in the trial, positional noise was applied to each dot's radial component, independently sampled from a random Gaussian distribution with zero mean and a pre-set standard deviation

*δ*, the

*dynamic jitter*magnitude for that trial (Figure 1B).

*θ*, so the rotation of a circle centered at fixation would be invisible in this experiment. Given the ambiguous nature of the rotation, a dot belonging to a random shape on frame

*i*could have been perceived as moving to either of the adjacent spokes on frame

*i*+ 1 or along the same spoke. As illustrated in Figure 1B, the distance between a dot at frame

*i*was generally shorter to the dot on the same spoke at frame

*i*+ 1 than it was to dots on adjacent spokes. A “nearest neighbor” rule is generally accepted as dominating the perceived path of apparent motion in cases where multiple locations compete for motion correspondence (e.g., Ullman, 1979). In addition, the shortest spatial excursion between two frames is also the slowest motion, which has been suggested as a governing principle in motion perception (Weiss et al., 2002). To test whether local correspondence or coherent global rotation dominates motion perception in different configurations, we measured observers' accuracy in determining the direction of rotation as a function of the standard deviation

*δ*of

*dynamic jitter*. Since, a moving form is sampled through apertures, this paradigm may seem similar to multi-slit viewing (Anstis, 2005; Kandil & Lappe, 2007; Nishida, 2004), but it is different both in intent and design. Instead of using familiar shapes to study form recognition, we used unfamiliar shapes that deform during rotation to create competition between different rules of motion combination.

*Shape amplitudes*of

*α*= 4 or 13 min arc were used to assign each dot a fixed radius for the entire trial.

*Dynamic jitter*was calculated independently for each dot per frame with

*δ*set at 0, 2.5, 5, 10, 20, or 40 min arc for the trial. To vary the difficulty of the task, we used

*presentation rates*of 3.5, 5.5, or 12.5 frames per second. Rotation speed was proportional to presentation rate, since all trials consisted of the same number of frames.

*δ*of the dynamic jitter, for the three presentation rates, respectively. Accuracy decreased monotonically with increasing jitter. High accuracy rates at low jitter show that global rotation was detected easily, despite competing with shorter inward and outward local motions. This shows that the global percept is not formed by combining the most salient local motions. Shapes with larger shape amplitudes (

*α*) were significantly more resistant to dynamic jitter (

*F*(1, 5) = 465.3,

*p*< 0.0001) but not by a constant factor as reflected in the significant interaction between shape amplitude and jitter (

*F*(5, 5) = 9.9,

*p*< 0.001). An accuracy of 75% can be used as the estimated threshold for radial jitter for each shape amplitude. Thresholds generally corresponded to values of dynamic jitter slightly greater than the shape amplitude, i.e., when the deformation of the trial shape from frame to frame was of the same order as the variations that distinguish the trial shape from a circle. This suggests that until the global rotation becomes incoherent, its percept dominates the shorter/slower local motions that indicate local expansions or contractions but do not form a coherent percept. The difference between the two presentation times was not significant, but a small improvement for less jagged shapes at faster presentation times led to a significant interaction between presentation time and shape amplitude (

*F*(2, 5) = 4.7,

*p*< 0.05). Johansson (1975) wrote “The eye tends to assume spatial invariance, or invariance of form, in conjunction with motion rather than variance of form without motion”. The results of this experiment provide limits to Johansson's principle.

*Nearest Neighbor Model*based on closest spatiotemporal correspondence. Each dot on frame

*i*was matched to the nearest dot on frame

*i*+ 1. On each trial, a tally was kept of the number of clockwise, counterclockwise, and same spoke matches. The trial was classified as clockwise or counterclockwise if the majority of matches were in that direction. For the same stimuli as used in Experiment 1, the predicted percent of correct classifications is plotted against dynamic jitter in Figure 2E, showing that this model could not detect the correct direction of rotation because its input was dominated by same spoke motions.

*θ*between dots (Figure 3). The shortest/slowest local motions were all individually consistent with rotation but in the direction opposite to the globally consistent rotation.

*Nearest Rotational Neighbor Model*, which was identical to the first model, except that radial (same spoke) motions were ignored and dots were matched according to the shortest/slowest rotary motions. As would be expected, this model did better for the large amplitude shapes in Experiment 1. However, it did not predict observers' accuracy for the low amplitude shapes and failed completely on the critical test provided by Experiment 2 (Figure 2F).

*Global Rotation Model*for the two distinct neural processes.

*G*

_{ i }is the transition from frame

*i*to frame

*i*+ 1,

*d*

_{cw}

^{ i }is the sum of squared errors for transition

*G*

_{ i }after accounting for a clockwise rotation, and

*d*

_{cc}

^{ i }is the sum of squared errors for transition

*G*

_{ i }after accounting for a counterclockwise rotation.

*P*

_{ i }(cw) and

*P*

_{ i }(cc) are the prior probabilities for clockwise and counterclockwise rotations (based on the experimental design, priors were set equal to 0.5). The likelihoods for stimulus transition

*G*

_{ i },

*P*(

*G*

_{ i }∣

*θ*

_{cw}) and

*P*(

*G*

_{ i }∣

*θ*

_{cc}), were calculated for each rotation angle independently using

*d*

_{ θ k }

^{ i }is the sum of squared errors for transition

*G*

_{ i }after accounting for a rotation by an angle

*θ*

_{ k }.

*θ*

_{ k }<

*π*and counterclockwise for −

*π*<

*θ*

_{ k }< 0. Assuming that judgments on each transition were independent of other transitions, the plausibility ratio for each trial was taken as the product of the ratios calculated for all transitions in that trial. The outcome of the trial was taken as clockwise if the trial ratio was larger than 1.0 and as counterclockwise otherwise.

*Global Rotation Model*does as well as the human observers in both experiments, suggesting that the visual system could either use a rotation template or match shapes across rotations to accomplish the task. Note that the optimal model also performs with greater accuracy for the larger shape amplitude, reflecting the easier distinctions between cc and cw shape matches as shapes depart more from the generating circle.

*Global Rotation Model*provides optimal decoding of rotations at a computational level (Marr, 1982).

*Global Rotation Model*for the stimuli of Experiment 1, but instead of considering the whole shape, we considered only 1, 2, … or 20 consecutive dots, chosen randomly for each frame transition. The labels on the top of Figure 4 convert number of dots considered to percentage of available information used, which we will use as a measure of equivalent efficiency for human observers. At the slowest speed we tested, human observers performed almost as well as the model that used 18 points, i.e., at 90% of the efficiency of the optimal decoder. This implies that the human visual system includes near optimal processes for matching deforming shapes and/or for detecting rotation in the presence of strong distracting motions. The equivalent efficiency of human observers declined at the faster presentation rates. The equivalent linear speeds at these presentation rates were 3.8 and 8.6 dva/s. Since motion energy is extracted well at these speeds (Lu & Sperling, 1995; Zaidi & DeBonet, 2000), observer limitations at higher speeds may reflect the number of dots that can be used in shape or motion computations at the shorter stimulus durations.

*Base shapes*were generated from circles to be either symmetric or asymmetric, and

*dynamic jitter*was also either symmetric or asymmetric. Trials with symmetric base shapes and high levels of symmetric dynamic jitter consisted of a series of distinct symmetric shapes with a continuously turning axis (Figure 5, Movies 5 –8).

*F*(1, 5) = 130.5,

*p*< 0.001) and an interaction between jitter symmetry and jitter amplitude (

*F*(5, 5) = 33.6,

*p*< 0.001). As shown in Figures 6D and 6H, the

*Global Rotation Model*demonstrated no significant difference between symmetric and asymmetric stimuli from Experiment 3. This demonstrates that observers' performance on symmetric shapes was not due to dot-based shape or motion correlations. Comparison of human and model performances show that observers were capable of outdoing the optimal global model by extracting relevant shape attributes that are invisible to shape matching and rotation templates.

*α*= 4 and

*δ*= 40 rotated at 3.5 Hz in Experiment 3. As shown in Figure 6C, the greatest performance advantage for symmetry occurs in this condition. Two authors (AJ and QZ) reported the direction of the adapting stimulus and then judged the direction of rotation of dots forming a circle of the same average size as the adapting stimulus. The dots of the circle were rotated half the inter-spoke angle at 3.5 Hz, hence were equally likely to be seen to move in clockwise and counterclockwise directions, unless biased by motion adaptation. For the Symmetric shapes, AJ judged 30/30 of the adapting directions correctly while QZ judged 26/30 correctly. More to the point, AJ judged the motion aftereffects to be in the direction opposite to the simulated motion 26/30 times and QZ judged 22/30 times (chance performance can be rejected for both observers at

*p*< 0.01). Both observers noted that unlike judging the direction of the jittered symmetric shapes, judging the direction of the aftereffects seemed effortless. These aftereffects were not an artifact, because reliable aftereffects did not result from adaptation to Asymmetric shape + Asymmetric jitter of the same size at the same speed: percent corrects for Asymmetric adapting shapes were essentially at chance, 13/30 and 15/30, as were aftereffects reported in the direction opposite to the simulated motion, 13/30 and 16/30.

*SE*= 1.6) for jitter amplitude

*δ*= 40 and 3.1 Hz (

*SE*= 0.41) for

*δ*= 20. Symmetry perception is possible in intervals as short as 50 ms under ideal conditions (Julesz, 1971) but can take considerably longer for complex stimuli (Cohen & Zaidi, 2007b) and non-vertical orientations (Barlow & Reeves, 1979). The presentation rate threshold for the larger

*δ*translates into a frame duration threshold of 125 ms. This result suggests that observers need presentation durations compatible with symmetry extraction to detect rotation of the symmetry axis.