Abstract
Natural body movements arise in form of temporal sequences of individual actions. In order to realize a visual analysis of these actions, the visual system must accomplish a temporal segmentation of such action sequences. Previous work has studied in detail the segmentation of sequences of piecewise linear movements in the two-dimensional plane [1,2] In our study, we tried to compare statistical approaches for segmentation of human full-body movement with human responses. Video sequences were generated by synthesized sequences of natural actions based on motion capture, using appropriate methods for motion blending. Human segmentation was assessed by an interactive adjustment paradigm, where participants had to indicate segmentation points by selection of the relevant frames. This psychophysical data was compared against different segmentation algorithms, which were based: (1) on the available 3D joint trajectories that were used for the synthesis of the motion stimuli; (2) on the two-dimensional optic flow computed from the videos. This computation exploited a physiologically-inspired neural algorithm for optic flow estimation [3]. Simple segmentation methods, e.g. based on discontinuities in path direction or speed, were compared with an optimal Bayesian action segmentation approach from machine learning. This method is based on a generative classifier (naive Bayesian or HMM). Transitions between classes (types of actions) were modeled by resetting the class priors at the change points. Change point configurations were modeled by Bayesian binning [4]. Applying optimization within a Bayesian framework, number and the length of individual action segments were determined automatically. Performance of these different algorithmic methods was compared with human performance.
[1] Shipley et al., JOV, 4(8), 2004.
[2] Agam & Sekuler JOV, 8(1), 2008.
[3] Bayerl & Neumann, IEEE PAMI 29(2), 2007.
[4] Endres et al., NIPS 20, 2008.
Funded by the EC FP7 project SEARISE, DFG, and Herman Lilly Schilling Foundation.