Body movements are recognized with speed and precision, even from strongly impoverished stimuli. While cortical structures involved in biological motion recognition have been identified, the nature of the underlying perceptual representation remains largely unknown. We show that visual representations of complex body movements are characterized by perceptual spaces with well-defined metric properties. By multidimensional scaling, we reconstructed from similarity judgments the perceptual space configurations of stimulus sets generated by motion morphing. These configurations resemble the true stimulus configurations in the space of morphing weights. In addition, we found an even higher similarity between the perceptual metrics and the metrics of a physical space that was defined by distance measures between joint trajectories, which compute spatial trajectory differences after time alignment using a robust error norm. These outcomes were independent of the experimental paradigm for the assessment of perceived similarity (pairs-comparison vs. delayed match-to-sample) and of the method of stimulus presentation (point-light stimuli vs. stick figures). Our findings suggest that the visual perception of body motion is veridical and closely reflects physical similarities between joint trajectories. This implies that representations of form and motion share fundamental properties and places constraints on the computational mechanisms that support the recognition of biological motion patterns.

*physical space,*and each locomotion pattern corresponds to a single point in this space. At the same time, these motion patterns elicit perceptual impressions, which might be characterized as points in a low-dimensional

*perceptual space*that can be inferred by analysis of the viewer's perceptual judgments (Figure 1B). This perceptual representation of motion patterns is only indirectly linked to the parameters of the viewed movement (e.g., joint trajectories), because it is derived from fundamentally different signals (e.g., neural activity patterns in the higher visual cortex). This implies that the mapping from physical space into the perceptual space might be given by a rather complex transformation that would not be easy to model. The approach we adopt here circumvents this difficulty by analyzing the metric structure of the two spaces separately, and quantifying their similarity (see Edelman, 1999; Shepard & Chipman, 1970, for details).

- Does a readily interpretable, low-dimensional metric perceptual space for complex motion patterns exist, similar to shape spaces for static patterns?
- Is this space related in a systematic way to the physical movement space that is defined by the joint trajectories?
- Is the perceptual representation
*veridical*in the sense that the perceptual space shares metric properties with the physical space?

*second-order isomorphism*(Edelman, 1999; Shepard & Chipman, 1970). This class of mappings provides a representation that is particularly useful for the classification and categorization of motion patterns based on their physical similarities.

*motion morphing*(Giese & Poggio, 2000), applying a method that generates new movements by linear combination of the trajectories of three prototypical gait patterns (walking, running, and marching). The stimulus sets consisted of 7 movements that formed simple, low-dimensional configurations in the space defined by the morphing weights. The perceived similarities of these patterns were assessed using two different experimental paradigms. Based on the perceived similarities of pairs of the motion patterns, the metrics of the

*perceptual space*was reconstructed by multidimensional scaling (MDS). The recovered configurations in perceptual space were first compared to the original configuration in morphing weight space. The result of this comparison depends on details of the morphing algorithm, which determines the meaning of the morphing weights. In order to obtain a characterization of the actual

*physical movement space*that is independent of the morphing method, we computed the distances between the joint trajectories of the presented stimuli and constructed their metric configurations in physical space by applying MDS. An entire family of physical distance measures was tested, with the goal of finding the physical metric that most closely approximates the configurations in the perceptual space.

*w*

_{1}+

*w*

_{2}+

*w*

_{3}= 1, thus defining a two-dimensional plane in weight space (Figure 2A).

*irrespective of viewpoint*. With 4 different view angles, 378 unique stimulus pairs were generated. To balance the fraction of trials in which the pair showed the same action with different views against the trials with different actions, the same-action pairs were repeated five times, resulting in a total of 546 trials.

*w*

_{1},…,

*w*

_{3}as axes of a low-dimensional pattern space. Distances between trajectories are then characterized by the Euclidean distances between the corresponding weight vectors. However, this characterization of the physical space is unsatisfactory because the interpretation of the morphing weights depends in a complex manner on the set of morphed patterns and on details of the applied morphing algorithm. Morphing weights thus do not provide a characterization of physical similarity between joint trajectories that is independent from the morphing method. Potential differences between the metric structure of the perceptual space and that of the space of morphing weights could thus be explained either by the perceptual metrics being incompatible with the physical similarities of the stimuli, or by the inability of the morphing weights to capture the perceptually relevant physical characteristics of the movement. This problem arises in conjunction with all known technical algorithms for motion morphing because all of them apply heuristic methods for interpolating between trajectories in space–time rather than taking into account specific constraints derived from perception.

*x*

_{2}(

*t*) can be derived from the trajectory

*x*

_{1}(

*t*) by

*time warping,*i.e., by deformation of the time axis using a smooth monotonic warping function

*τ*(

*t*). Formally this deformation can be written as

*x*

_{2}(

*t*) =

*x*

_{1}(

*τ*(

*t*)). It is assumed that

*τ*(0) = 0 and

*τ*(

*T*) =

*T, T*being the total duration of the movements, which is assumed to be equal.

*x*

_{1}(

*t*) one point on the second trajectory

*x*

_{2}(

*t*) that corresponds to it. There are many other ways to define such correspondence. For example, one might associate points of the two trajectories that are related to each other by shifts in space

*and*time. Likewise, there are multiple ways in which timing difference and spatial differences can be combined into a single value of the distance measure.

*x*

_{1}(

*t*) and

*x*

_{2}(

*t*) signify the joint (position) trajectories, which are multidimensional because they contain all major joints of the viewed moving figure, the tested distance measures were given by the expression

*i, j*= 1,2. The first equation ensures that the distance measure

*d*is symmetric with respect to

*x*

_{1}and

*x*

_{2}.

*T*is the duration of one gait cycle. The time warping function

*τ*(

*t*) was determined by dynamic time warping (Giese & Poggio, 2000; Rabiner & Juang, 1993), an algorithm that determines an optimal time alignment between the two trajectories. For the trajectories depicted in Figure 3A, this algorithm would compute the time shifts indicated by the dashed lines, resulting in a vanishing contribution of the first term in the expression for

*D*

_{ij}in Equation 4. Choosing instead

*τ*(

*t*) =

*t*defines distance measures without time alignment. By choosing positive values for the parameter

*λ,*the distance measure takes explicitly into account overall timing differences between the two trajectories. The choice

*λ*= 0 corresponds to physical distances that, except for time warping, depend only on spatial differences between the trajectories. Usually, only a part of the differences between trajectories can be removed by time alignment (cf. Figure 3B). In this case, one obtains nonzero contributions of both terms of the expression

*D*

_{ij}.

*q*in Equation 4 makes it possible to specify different distance norms (Kreyszig, 1989). Setting

*q*= 2 results in the common Euclidean norm. Large values of

*q*correspond to distances that emphasize the influence of outliers. This is illustrated in Figure 4 that shows two hypothetical trajectories. The trajectory

*x*

_{1}(

*t*) is just a linear trend. The other trajectory follows this trend with a positive spatial offset of 10. Around

*t*= 40, however, this trajectory has some outliers, the biggest deviation appearing for

*t*= 40. Without these outliers, the distance measure

*D*

_{12}(

*q*) =

*D*

_{12}(

*q*) = 10 independent of the value of the parameter

*q*(assuming

*q*≥ 1). In the presence of the outliers, for the Euclidean norm that corresponds to the choice

*q*= 2, the computed distance

*D*

_{12}(2) = 12.16 is larger than the offset between the trajectories. For the choice

*q*= 1, one obtains the value

*D*

_{12}(1) = 10.99 that is closer to the true offset. The reason is that the square term in the Euclidean norm amplifies the influence of outliers compared to the linear term for

*q*= 1. Choice of very large values of

*q*results in distance values that are mainly determined by the outliers, resulting in a distance that is close to the largest distance between the trajectories determined time-point by time-point. For the given example, this maximum distance is given by ∣

*x*

_{2}(40) −

*x*

_{1}(40)∣ = 50. For example, for

*q*= 100 one obtains a distance value

*D*

_{12}(100) = 47.74 that is close to this maximum.

*q*in Equation 4 thus controls how robust the constructed metric in physical space is against outliers. For the limit

*q*→ ∞ (infinity norm), the distance is determined by the time point(s) with the largest spatial difference between the trajectories over the entire time interval [0,

*T*]. Small values

*q*< 2 correspond to distance measures that average over the entire time course of the trajectory and which are more robust against outliers than the Euclidean norm.

*λ*and

*q*. By comparing the stimulus configurations in these physical spaces with the ones in perceptual space, we sought a physical distance that closely approximates the perceptual metric.

_{P,random}, and standard deviations, SD(

*d*

_{P,random}), of the similarity measures were computed. Values of a

*d*′ equivalent were computed according to the relationship:

*t*test for this

*d*′ equivalent.

*d*

_{P}= 0.41).

*d*′ equivalent 8.15;

*p*< 0.001), cf. Table 1. All data were also analyzed using the coefficient of congruence (Borg & Groenen, 1997) as another measure for the similarities of the recovered configurations. The results of this analysis (not shown) are highly consistent with the results obtained for the Procrustes distance.

Perceptual vs. morph space | ||||
---|---|---|---|---|

d _{P} data | d _{P} random | d′ equivalent | t _{99} | p |

Experiment 1 | ||||

0.41 | 1.1 ± 0.09 | 8.15 | 81.5 | <0.001 |

Control experiment (L configuration) | ||||

0.47 | 0.63 ± 0.01 | 1.62 | 16.2 | <0.001 |

Experiment 2 | ||||

2D embedding space | ||||

0.15 | 0.95 ± 0.31 | 2.61 | 26.1 | <0.001 |

3D embedding space | ||||

0.51 | 0.91 ± 0.04 | 9.74 | 97.3 | <0.001 |

*λ*and

*q*in Equation 4. Figure 5B shows that the best agreement between the configurations in physical and perceptual space, measured by the Procrustes distance, was obtained for

*q*= 1. This implies that the perceptual similarity of movements is characterized by an integration over time that is robust against outliers rather than by an error measure that emphasizes the contribution of individual time points where the spatial distances between trajectories are large. For small values of

*λ,*towards large values of

*q*the alignment error levels off. This behavior might be explained by the fact that above a certain critical value of

*q,*the distance measure is dominated by the maximum differences over the whole time course further increases of

*q*resulting only in minor additional changes.

*λ*and attains a minimum for

*λ*= 0. This implies that the physical distances that contain an extra term that measures overall timing differences between the two trajectories resulted in worse fits of the perceptual metrics. The best approximation was achieved with distances that, apart from prior time alignment, depend only on spatial displacements between the trajectories time-point by time-point. This does not imply that timing information, in general, is irrelevant for the perceptual metric. It just shows that

*overall timing changes*that affect all joints synchronously have only a moderate influence on the perceptual metric. Temporal variations that affect the relative timing between different joints cannot be modeled by simple time alignment (as defined by the function

*τ*(

*t*)) and are thus captured by spatial differences in the first term of the distance measure (2).

*τ*(

*t*) ≡

*t*; dashed curve). The perceptual system thus seems to compensate efficiently for time warping between the trajectories. The open diamonds in Figure 5A indicate the reconstructed configuration in physical space for the optimal parameter values (

*q*= 1,

*λ*= 0) after Procrustes alignment with the perceptual configuration. The two configurations are extremely similar, as confirmed by the very small and highly significant Procrustes distance between them (

*d*

_{P}= 0.05;

*d*′ equivalent 3.3;

*p*< 0.001) (cf. Table 2 for further details.) The structure of the perceptual metrics can therefore be approximated very accurately by a physical metric that is based on the distances between joint trajectories.

Physical vs. perceptual space | ||||
---|---|---|---|---|

d _{P} data | d _{P} random | d′ equivalent | t _{99} | p |

Experiment 1 | ||||

0.05 | 0.97 ± 0.28 | 3.3 | 32.9 | <0.001 |

Control experiment (L configuration) | ||||

0.35 | 1.03 ± 0.31 | 2.15 | 21.5 | <0.001 |

Experiment 2 | ||||

Distance of 2D trajectories (2D embedding space) | ||||

0.61 | 1.02 ± 0.31 | 1.3 | 13.0 | <0.001 |

Distance of 3D trajectories (2D embedding space) | ||||

0.23 | 1.05 ± 0.27 | 3.0 | 30.0 | <0.001 |

Distance of 3D trajectories (3D embedding space) | ||||

0.26 | 0.91 ± 0.06 | 11.7 | 117 | <0.001 |

*λ*= 0 and q = 1 (Figure 6B; Table 2). The inset in Figure 6B shows the comparison between the alignment errors for trajectory distances for

*λ*= 0 with (solid line) and without time warping, i.e., for

*τ*(

*t*) ≡

*t*(dashed line). Best alignment was obtained for a physical distance with time warping and

*q*= 1. This confirms the results of Experiment 1 for another configuration in morphing space.

*two-dimensional*perceptual space (assuming two embedding dimensions for the MDS procedure) is shown in Figure 7A. Crosses with identical color indicate different views of the same motion patterns (combination of morph weights). Different views of the same motion are clustered in perceptual space. This indicates that subjects effectively ignored the view angle in their similarity judgments. Open circles with matching colors indicate the centroids of the view clusters. The centroids define a configuration that is very similar to the triangular configuration in morphing space (open diamonds), as confirmed by a small and highly significant Procrustes distance between the two configurations (

*d*

_{P}= 0.15;

*d*′ equivalent 2.6;

*p*< 0.001), cf. Table 1. Thus, results very similar to those of Experiment 1 were obtained for stick figures and motion morphs that had been generated from three-dimensional trajectories. As in Experiment 1, the recovered configuration in perceptual space shows gradual deformations compared to the configuration in weight space.

*three-dimensional*configuration in perceptual space is shown in Figure 7B (red spheres), aligned with the configuration of the stimuli in a three-dimensional space (blue spheres) that is defined by the morphing weights (two independent dimensions) and the view angle (third dimension). Along the first two dimensions, the recovered configuration is very similar to the configuration in the space of morph weights. This supports the relevance of these two dimensions for the underlying perceptual representation. Along the third recovered dimension, which is aligned with the view angle dimension of the stimuli, the data do not show any systematic ordering. This explains why the Procrustes distance between the original and the recovered configuration (

*d*

_{P}= 0.51;

*d*′ equivalent 9.74;

*p*< 0.05) is larger than for the two-dimensional configurations. This result also confirms that subjects effectively ignored the viewpoint of the moving figure, as it was required by the task.

*q*= 1 and

*λ*= 0. Figure 8A shows the recovered two-dimensional configuration in physical space using the same conventions as in Figure 7A. In contrast to the psychophysical data, points representing different view angles of the same motion are widely scattered for the recovered two-dimensional configuration. The centroids of the points belonging to the same actions are indicated by circles. The Procrustes distance between these centroids and the corresponding points of the configuration in perceptual space are large, although still significant (Table 2). This result indicates that distances between the two-dimensional joint trajectories are not suitable for reproducing the perceptual metrics of motion recognition in the presence of viewpoint changes.

*d*

_{P}= 0.23; cf. Table 2). This indicates that the perceptual metric may be based on the three-dimensional distances between joint trajectories.

*q*= 1 and

*λ*= 0), aligned with a three-dimensional space, which is defined by the morphing weights and the view angle, is shown in Figure 8C. Similar to the corresponding configuration in perceptual space (Figure 7B), the configuration in physical space shows clustering of different views of the same movement along the dimensions that are aligned with the morphing weights. However, when compared to Figure 8B the recovered configuration shows a clear ordering along the third dimension that is aligned with the view angle. This results in a very good alignment between the configurations in morphing space and physical space (

*d*

_{P}= 0.23; see Table 2). This good alignment along the view-angle dimension contrasts with the perceptual data (Figure 7B) where no such ordering was observed, causing significantly worse alignment of the two configurations (with

*d*

_{P}= 0.51; see above).

*d*

_{P}= 0.26; Table 2).

*veridical*in that they closely reflect the metric of movements in the physical world. This finding was insensitive to the details of movement generation (e.g., based on morphing two- or three-dimensional trajectories) and to the manner of visual stimulus presentation (stick figures vs. point-light stimuli). In addition, comparable results were obtained for two completely different experimental paradigms: comparison of pairs and delayed match-to-sample.

*λ*= 0), implying that the distance (except for time alignment) depends only on spatial differences; and Equation 5 involved a robust average of the differences over the entire time course (parameter

*q*= 1). The high similarity between configurations in perceptual and physical space (for optimized parameter values) shows that perceptual representations of complex body movements faithfully reflect the physical similarities of joint movement trajectories, if measured by the right distance measure.

*superior temporal sulcus*patterns (Giese & Poggio, 2003; Peuskens, Vanrie, Verfaillie, & Orban, 2005). This view has been challenged by the alternative hypothesis that biological motion recognition exploits exclusively form information, local motion information being essentially irrelevant except for segmentation (Lange & Lappe, 2006). This alternative view seems difficult to reconcile with recent experiments demonstrating biological motion recognition from stimuli that prevent the extraction of form information from individual frames (Singer & Sheinberg, 2008), and the fact that the most informative features for the detection of point-light walkers seem to coincide with dominant motion features, rather than with the most informative body shapes (Casile & Giese, 2005; Thurman & Grossman, 2008). However, it reiterates the importance of the question how form and motion features influence the perceptual metric.

**X**be the matrix of positions of the recovered configuration and

**Y**be corresponding points in morphing space, where the coordinates of each point are given by the corresponding row of the matrix. The

*orthonormal*Procrustes transformation determines a linear transformation defined by the scaling factor s and the orthonormal matrix

**T**with

**T**′

**T**=

**I**that minimizes the error function:

_{F}is the Frobenius norm. It can be shown (Borg & Groenen, 1997) that this optimum is given by the matrix

**T**=

**U**

**V**′. The last two matrices are defined by the singular value decomposition

**Y**′

**X**=

**V**

**Σ**

**U**′ with the diagonal matrix

**Σ**and two orthonormal matrices fulfilling

**U**′

**U**=

**I**and

**V**′

**V**=

**I**.

*γ*= (

*s*

_{1}/

*s*

_{2}) with the solution of an orthonormal Procrustes problem for

*γ*fixed: This problem has the same form as Equation A1 and can be solved by singular value decomposition. The solution of this combined optimization problem defines a Procrustes transformation matrix is

*orthogonal*but

*not othonormal*.