Biological motion contains information about the identity of an agent as well as about his or her actions, intentions, and emotions. The human visual system is highly sensitive to biological motion and capable of extracting socially relevant information from it. Here we investigate the question of how such information is encoded in biological motion patterns and how such information can be retrieved. A framework is developed that transforms biological motion into a representation allowing for analysis using linear methods from statistics and pattern recognition. Using gender classification as an example, simple classifiers are constructed and compared to psychophysical data from human observers. The analysis reveals that the dynamic part of the motion contains more information about gender than motion-mediated structural cues. The proposed framework can be used not only for analysis of biological motion but also to synthesize new motion patterns. A simple motion modeler is presented that can be used to visualize and exaggerate the differences in male and female walking patterns.

*action*as a pattern of motion characteristics described by quantities of which one or more change discontinuously at transitions to other actions. Instances of the same action can be smoothly transformed into each other, with all transitions being valuable representations of the particular action. The definition implies structural similarity between instances of the same action, and, therefore, a means to define correspondence in space and time between two or more instances in a canonical and unambiguous way. Systematic differences between motion instances of an action are referred to as

*styles*. Styles can correspond to emotions, personality or biological features, such as age or gender. According to the above definitions, the stylistic state space of an action is expected to be continuous and therefore defines smooth transitions between all instances of an action. Warping between actions, in contrast, requires the definition of additional constraints in order to achieve unambiguous correspondence.

*m1*

_{x},

*m1*

_{y},

*m1*

_{z},

*m2*

_{x}…

*m15*

_{z})

^{T}(we take the transpose because we regard p to be a column vector).

_{i}denoting the

*i*th principal component and

*c*

_{i}denoting the respective score.

_{j}encoding all the parameters provides a full representation of an individual’s walking pattern p

_{j}(

*t*).

_{i,0}and p

_{j,0}is 0.998. The corresponding numbers for the first four principal components are 0.95, 0.88, 0.85, and 0.73, respectively. This high correlation shows that the components principally encode similar aspects of the walk while still representing the individual differences between walkers.

_{j}of a walker

*j*as a point in a linear space of the same dimension and, thus, the application of linear methods. Even though the dimensionality of this description is tremendously reduced compared to the original motion capture data, 229 is still a large number of variables for a concise and compact model. In particular for the purpose of constructing linear classifiers with the ability to reasonably generalize to new walking samples, we have to reduce the dimensionality to a degree that is considerably smaller than the number of items in the data set. In an attempt to reduce redundancy within the set of 40 walkers that make up our database, we computed a PCA across the walkers. In contrast to the similar computation on the level of the postures of a single walker, the problem arises that the entries of a walk vector w

_{j}are not homogenous. Whereas most of the entries encode positions (e.g., in millimeters), there is one entry that encodes the fundamental frequency (e.g., in Hz) and three more that account for the phases of the PCs (e.g., in degrees). PCA is very sensitive to relative scaling. For instance, its outcome would be very different depending on whether the phase would be given in radiants or in degrees or whether the positional measures would be in millimeters or centimeters. We therefore whitened the data by dividing each entry by the standard deviation based on the 40 corresponding entries before subjecting the data to a PCA:

_{1}, w

_{2}, …, w

_{40}). u is a vector containing the 229 standard deviations computed from the rows of W. W′ is the resulting whitened data matrix.

_{0}denotes a matrix with the average walker w

_{0}in each of its 40 columns. The matrix V containing the eigenwalkers as column vectors v

_{i}is obtained by pre-multiplying the matrix V′ containing the eigenvectors of the covariance matrix of W′ with diag(u), therefore multiplying each entry with the corresponding standard deviation of this element:

*j*can now be represented in a space spanned by the first

*n*eigenwalkers V

_{n}= (v

_{1}, v

_{2}, … v

_{n}) in terms of the respective score vector k

_{j}= (

*K*

_{1j},

*k*

_{2j}, …

*k*

_{nj})

^{T}. The dimensionality of this representation (i.e., the number of eigenwalkers used) can be treated flexibly depending on the particular requirements of the application. With increasing dimensionality, the representation becomes more accurate in terms of its reconstruction quality. On the other hand, a large ratio between the dimensionality and the number of items available for learning invariants becomes unfavourable for classification purposes.

*p*

_{0}can be regarded to encode structural information comprising both information about the lengths of the body’s segments and their average positions. The eigenpostures, in contrast, encode dynamic information. Using different sorts of input information we tested (1) how the two classes separate and (2) how a linear classifier based on a linear discriminant function would generalize to new instances that have not been used for training.

_{j}denotes a column vector with the data of a particular walker used as input for classification. Accordingly, X = (x

_{1}, x

_{2}, …, x

_{m}) is the matrix containing the data set of

*m*=40 different walkers. x

_{j}can stand for the whole walker representation (x

_{j}=w

_{j}), or only for parts of it, for instance, only for the structural or only for the dynamic part of the representation. The row vector r contains the expected output of the classifier. It has

*m*entries with

*r*

_{j}= 1 if walker

*j*is a man and

*r*

_{j}= −1 if the walker is a woman.

_{0}denotes a matrix with the average input data x

_{0}in each column. The matrix V contains the principal components as column vectors v

_{i}and K denotes a matrix containing the scores similarly to the notation used above.

*i*no longer is the

*i*th principal component but is the component with the

*i*th highest weight in the discriminant function c. We then evaluated the ability to separate male and female walks of discriminant functions of increasing dimension

*n*. A walker

*j*was considered to be classified correctly if

*j*was considered to be misclassified. The dotted lines in Figure 4 depict the percentage of misclassifications as a function of

*n*. If

*n*is large, separation is perfect due to the mismatch between the number of items to be classified and the dimensionality of the space. Depending on the information provided for the classification, perfect separation is reached at dimensions between

*n*= 4 and

*n*= 14.

*n*components. Classification was considered to be correct if the projection had the expected sign. The same procedure was repeated with all 40 walkers. The results are plotted as a function of

*n*in Figure 4 (solid lines). Typically, in the generalization test, misclassification reaches a minimum if

*n*has about the size needed to achieve perfect separation in the previous step. If the dimension of the classifier gets much higher, the error increases slightly due to overlearning.

*s*

_{j}of each walker

*j*by finding a least-square solution to the equation with p

_{j,0}being the average posture of walker

*j*. Using just the

*s*

_{j}as an input for linear classification, only 5 walkers are misclassified corresponding to an error rate of 12.5%.

*j*the average posture p

_{j,0}as well as the four eigenpostures p

_{j,1}, p

_{j,2}, p

_{j,3}, and p

_{j,4}were divided by

*s*

_{j}.

_{j,0}and dynamic information in terms of the principal components p

_{j,1}, p

_{j,2}, p

_{j,3}, and p

_{j,4}, their respective phases, and the fundamental frequency. In order to evaluate the roles of structural and dynamic information, we submitted only the respective parts of the full representation to the classifier. Figure 4c shows the results obtained from training and testing the classifier with data that contain, for each walker, only the 45 entries of p

_{j,0}. Performance in this case is not very good. Twelve components are needed for complete separation and the best generalization performance with 11 misclassifications (27.5%) requires 12 components.

*ϕ*

_{2},

*ϕ*

_{3},

*ϕ*

_{4}are used as input for classification, best separation still produces 14 misclassifications (35%) and best generalization is obtained with 2 components and 15 misclassifications (37.5%).

_{j}in the walking space has to be decomposed into its constituting components p

_{j,0}, p

_{j,1}, p

_{j,2}, p

_{j,3}, p

_{j,4},

*ϕ*

_{j},

*ϕ*

_{j,2},

*ϕ*

_{j,3}, and

*ϕ*

_{j,4}. The walk, explicitly described in terms of a time series of postures is then given by Equation 3.

_{c, α}corresponding to different points along this axis as point-light displays or stick figure animations. Demonstration 1 allows you to visualize and to interactively manipulate a walker display by changing the value of

*α*:

_{0}denotes the average walker. The matrix V contains the first few eigenwalkers, one in each column. As

*α*changes from negative to positive values, the appearance of the walker changes its gender. The dimensionality of the eigenwalker space used to compute the respective linear classifiers is

*n*=10. The value of

*α*is scaled in terms of standard deviations (z-scores). A walker resulting from setting

*α*= 6 or

*α*= −6 is therefore an extrapolation into a region of the walker space, which is far away from any real walker. Changing the value of

*α*from negative to positive values evokes a clear percept of a change in the gender of the walker.

_{j,0}with averaged motion data. This was obtained by computing averaged eigenpostures p

_{1}, p

_{2}, p

_{3}, and p

_{4}as well as average values for the phases

*ϕ*

_{2},

*ϕ*

_{3}, and

*ϕ*

_{4}and for the fundamental frequency

*ω*. The components were then combined with the individual average postures according to Equation 3. The stimuli are therefore normalized with respect to dynamic information and contain only structural information to be used for gender classification. Finally, a third set (“dynamic-only”) was generated by replacing each individual’s average posture p

_{j,0}with the average across all walker’s postures, therefore normalizing for the structural information and providing only dynamic information:

*p*<.001; INFO: F(2,42)=29.3,

*p*<.001). Performance is best with error rates around 25% when a walker is seen in frontal view and declines gradually with increasing deviation from that viewing angle. The effects with respect to the information provided are such that depriving observers from diagnostic structural information hardly impairs performance whereas depriving observers from dynamic information results in a severe drop in performance. A Scheffé post-hoc test confirms that the difference between performance in the structure-only condition and the other two conditions is statistically reliable (

*p*<.01).

*p*<.05) indicating that deprivation of dynamic information has a much stronger effect if the walker is shown in profile view as compared to frontal view presentation. In the profile view condition, performance drops from an error rate of 39% in the full-info condition all the way down to chance level (52% error rate) in the structure-only condition. In the frontal view condition, error rate increases from 24% in the full-info condition to 29% in the structure-only condition. The relatively small difference between the performances obtained in full-info and structure-only conditions with frontal view stimuli is still statistically significant (paired

*t*test:

*n*=8,

*p*<.05).

*Z*

_{j}is a measure for how well a walker

*j*(represented in terms of the scores

*k*

_{i,j}) with gender

*r*

_{j}was classified by the linear classifier c. Table 1 lists the correlation coefficients of a rank correlation between the three ranks obtained from the psychophysical data, and the ranks obtained for classifiers corresponding to the data presented in Figures 4a–4d and 4i.

*n*indicates the number of eigenwalkers used to construct the classifier.

*n*was always chosen such as to yield optimal generalization performance (see “Linear Gender Classification” for details). Correlation coefficients larger than 0.373 are significant (

*α*= 0.01).

Psychophyics | |||
---|---|---|---|

Linear Classifier | Full-info | Structure-only | Dynamic-only |

Full-info plus size n=4 | 0.2820 | −0.0379 | 0.4473 |

Full-info n=6 | 0.5158 | 0.1970 | 0.6525 |

Structure-only n=12 | 0.5114 | 0.4595 | 0.3760 |

Dynamic-only n=4 | 0.3484 | 0.0250 | 0.5602 |

First, second, and third eigenposture n=24 | 0.3773 | 0.1008 | 0.5353 |

*ϕ*

_{2}(i.e., the phase difference between the sine functions describing the temporal behavior of the first and the second eigenposture) would be exactly 90 deg and if the same would be true for the difference between

*ϕ*

_{3}and

*ϕ*

_{4}then the four-dimensional PCA decomposition would be similar to a second-order Fourier decomposition. Both decompositions are based on the same model:

_{i}to be the basis and constrains them to be orthogonal, Fourier analysis considers the sine functions to be an orthogonal basis and therefore requires

*ϕ*

_{2}and also

*ϕ*

_{4}-

*ϕ*

_{3}to equal 90 deg. Both can, in general, not be achieved at the same time. It is therefore interesting that the temporal behavior of the orthogonal basis constituted by the first four eigenpostures approximates a Fourier decomposition to a very high degree. In fact, both

*ϕ*

_{2}and

*ϕ*

_{4}-

*ϕ*

_{3}assume values very close to 90 deg (

*ϕ*

_{2}: mean 91, STD 5.3;

*ϕ*

_{4}-

*ϕ*

_{3}: mean 91, STD 3.8).

*Computer Vision and Image Understanding*, 70, 142–156. [CrossRef]

*Journal of Biomechanics*, 13, 383–390. [PubMed] [CrossRef] [PubMed]

*Perception & Psychophysics*, 23, 145–152. [PubMed] [CrossRef] [PubMed]

*Bulletin of the Psychonomic Society*, 18, 19–22. [CrossRef]

*Psychological Science*, 5, 221–225. [CrossRef]

*Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences*, 352, 1257–1265. [PubMed] [CrossRef]

*Style machines*. Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, New Orleans, 183–192.

*Motion signal processing*. Proceedings of the 22nd Annual Conference on Computer Graphics and Interactive Techniques, Los Angeles, 97–10.

*Journal of Experimental Psychology: Human Perception & Performance*, 4, 357–372. [PubMed] [CrossRef]

*Perception*, 7, 393–405. [PubMed] [CrossRef]

*Behavior Research Methods & Instrumentation*, 10, 91–94. [CrossRef]

*Journal of Experimental Psychology: Human Perception and Performance*, 7, 71–87. [CrossRef]

*Bulletin of the Psychonomic Society*, 9, 353–356. [CrossRef]

*Paper presented at the International Conference on Audio- and Video-based Biometric Person Authentication, Halmstad*.

*Perception*, 22, 15–22. [PubMed] [CrossRef] [PubMed]

*Perception*, 25, 727–738. [PubMed] [CrossRef] [PubMed]

*Brain*, 119, 593–609. [PubMed] [CrossRef] [PubMed]

*International Journal of Computer Vision*, 38, 59–73. [CrossRef]

*Nature*, 401, 693–695. [PubMed] [CrossRef] [PubMed]

*Physica D*, 115, 56–72.

*Computer animation and simulation &t'96*(pp. 95–107). New York: Springer.

*Current Biology*, 11, 880–885. [PubMed] [CrossRef] [PubMed]

*Psychological Science*, 11, 223–228. [PubMed] [CrossRef] [PubMed]

*Perception & Psychophysics*, 14, 201–211. [CrossRef]

*Psychological Research*, 38, 379–393. [PubMed] [CrossRef] [PubMed]

*International Journal of Computer Vision*, 29, 107–131. [CrossRef]

*Cardboard people: A parameterized model of articulated image motion*. Paper presented at the Proceedings of the International Conference on Face and Gesture, Vermont.

*Perception & Psychophysics*, 21, 575–580. [CrossRef]

*Proceedings of the National Academy of Sciences of the United States of America*, 99, 5661–5663. [PubMed]

*Motion-based recognition*(pp. 345–371). Amsterdam: Kluwer Academic Publishing.

*Proceedings of the Royal Society of London. Series B: Biological Sciences*, 258, 273–279.

*Perception*, 30, 323–338. [PubMed] [CrossRef] [PubMed]

*Cognition*, 82, B51–B61. [CrossRef] [PubMed]

*European Journal of Cognitive Psychology*, 9, 129–154. [CrossRef]

*Journal of the Royal Statistical Society*, 60, 365–375. [CrossRef]

*Journal of the Royal Statistical Society*, 60, 351–363. [CrossRef]

*Functional data analysis*. New York: Springer.

*Cognitive Brain Research*, 3, 131–141. [PubMed] [CrossRef] [PubMed]

*IEEE Computer Society Workshop on Human Motion*, Austin, TX, 19–24.

*Readings in cognitive science*(pp. 312–322). San Mateo: Morgenkaufmann.

*Cognitive Psychology*, 8, 382–439. [CrossRef]

*Computer Graphics and Applications*, 18, 32–40. [CrossRef]

*Perceiving events and objects*(pp.383–405). Hillsdale, NJ: Erlbaum.

*International Journal of Computer Vision*, 38, 75–91. [CrossRef]

*Manuscript submitted for publication*.

*Downward processing in the perception representation mechanism*(pp. 189–205). Singapore: World Scientific.

*Computer Animation &t'93*, 77–88.

*International Journal of Computer Vision*, 28, 103–116. [CrossRef]

*IEEE Transactions on Pattern Analysis and Machine Intelligence*, 19, 733–742. [CrossRef]

*Journal of the Optical Society of America A*, 14, 2152–2161. [CrossRef]

*Proceedings of the Virtual Reality Annual International Symposium*, 39–45.

*Learning visual behaviour for gesture analysis*. Paper presented at the Proceedings of IEEE International Symposium on Computer Vision, Coral Gables, FL.

*Proceedings of the 22nd Annual Conference on Computer Graphics and Interactive Techniques, Los Angeles*, 105–108.

*Computer Vision and Image Understanding*, 73, 232–247. [CrossRef]

Psychophyics | |||
---|---|---|---|

Linear Classifier | Full-info | Structure-only | Dynamic-only |

Full-info plus size n=4 | 0.2820 | −0.0379 | 0.4473 |

Full-info n=6 | 0.5158 | 0.1970 | 0.6525 |

Structure-only n=12 | 0.5114 | 0.4595 | 0.3760 |

Dynamic-only n=4 | 0.3484 | 0.0250 | 0.5602 |

First, second, and third eigenposture n=24 | 0.3773 | 0.1008 | 0.5353 |