Human observers readily recognize emotions expressed in body movement. Their perceptual judgments are based on simple movement features, such as overall speed, but also on more intricate posture and dynamic cues. The systematic analysis of such features is complicated due to the difficulty of considering the large number of potentially relevant kinematic and dynamic parameters. To identify emotion-specific features we motion-captured the neutral and emotionally expressive (anger, happiness, sadness, fear) gaits of 25 individuals. Body posture was characterized by average flexion angles, and a low-dimensional parameterization of the spatio-temporal structure of joint trajectories was obtained by approximation with a nonlinear mixture model. Applying sparse regression, we extracted critical emotion-specific posture and movement features, which typically depended only on a small number of joints. The features we extracted from the motor behavior closely resembled features that were critical for the perception of emotion from gait, determined by a statistical analysis of classification and rating judgments of 21 observers presented with avatars animated with the recorded movements. The perceptual relevance of these features was further supported by another experiment showing that artificial walkers containing only the critical features induced high-level after-effects matching those induced by adaptation with natural emotional walkers.

*facial action coding system*that describes the production of distinct emotional expressions based on local components that were originally derived from patterns of muscle contraction (Ekman & Friesen, 1978). More recently, unsupervised-learning techniques, such as principal component analysis (PCA) or independent component analysis (ICA), have been applied to determine components of facial expressions for dimension reduction and in order to identify features that are critical for the recognition of faces (Bartlett, Movellan, & Sejnowski, 2002; Hancock, Burton, & Bruce, 1996; Turk & Pentland, 1991; Valentin, Abdi, Edelman, & O'Toole, 1997) and of facial emotion expression (Calder, Burton, Miller, Young, & Akamatsu, 2001). Last, not least, it has also been shown that dynamic cues contribute to the recognition of facial expressions (Bassili, 1978; O'Toole, Roark, & Abdi, 2002). For example, recognition performance is influenced by the speed at which an expression unfolds (Kamachi et al., 2001).

- First, applying machine learning methods, we extracted informative features from the joint-angle trajectories of emotional gaits that were recorded by motion capture from participants that expressed different emotions during walking.
- In a second step, we analyzed how the features that we had extracted from the motor behavior are related to features that determine the
*perception*of emotional gaits. For this purpose, we conducted a perception experiment during which human observers had to classify and rate the emotional expressiveness of computer-generated characters that were animated with the recorded trajectories of emotional gaits. The perceptual judgments were then subjected to statistical analysis in order to identify the most important posture and dynamic features that influenced the perceptual judgments. - Since we found a high degree of overlap between the informative features extracted from the motor behavior and those features that determine perceptual judgments, in a third experiment we exploited
*high-level after-effects*to test whether the extracted feature set truly corresponded to critical features driving the perception of the individual emotions.

*instantaneous mixture model*of the form

*x*

_{i}(

*t*) are approximated by linear superpositions of the source signals or basis components

*s*

_{j}, weighted by the mixing weights

*α*

_{ij}. The same mixing model also underlies PCA, where the source signals are orthogonal rather than statistically independent (Cichocki & Amari, 2002; Jolliffe, 2002).

*anechoic mixing model*of the form

*τ*

_{ij}describe joint-specific time delays between source signals and joint angles. The introduction of such time delays has previously been shown to be beneficial for modeling electromyographically recorded patterns of muscle activation during coordinated leg movements (d'Avella & Bizzi, 2005). Previous work from our group shows that for different types of body movements, including gaits, the model described by Equation 2 results in more compact representations with fewer parameters than the model defined by Equation 1. Quantitative comparisons for gait data show, for example, that for the same level of accuracy PCA requires more than twice the number of source terms than the proposed novel model (Omlor & Giese, 2007b), implying that the anechoic mixing model results in more compact representations than traditional approaches such as PCA or regular ICA. Our analysis was based on the hypothesis that more compact models, with fewer parameters, also might yield more interpretable learned emotion-specific features than models with many redundant parameters.

*α*

_{ij}and one time delay

*τ*

_{ij}per joint and source. This approach already resulted in the extraction of well-interpretable emotion-specific features from the movement trajectories. However, comparing across emotions, we noticed that the time delays estimated for a given joint were often largely independent of the emotion. This made it possible to further reduce the number of parameters in the model by constraining the delays for each individual joint to be equal for all different emotions. Mathematically, this constraint can be easily embedded in the original blind source separation algorithm (see 1 for details). The constraint improved the robustness of the results and the interpretability of the parameters. With the additional constraint the explained variance for a model with three sources was 92% (opposed to 99% without this additional constraint). The final model contained three source functions, 51 time delays

*τ*

_{ ij}(constrained to be equal over all emotions) that were equal for all emotions, and 51 mixing weights (one per joint and source, estimated separately for the different emotions).

*sparse regression,*for two purposes: to extract emotion-specific postural and dynamic features from gait trajectories, and to identify features that are critical for the perception of emotions from gait, as discussed in Experiment 2.

*Y*(e.g. a rating of emotional expressiveness) as a linear function of the predictors

*X*(representing, for instance, different relevant postural or kinematic features), where the elements of the vector

*β*are the estimated regression coefficients, and where

*ɛ*is a noise vector:

*l*

_{2}-norm of a vector

*u*is defined as

*β*

_{ k}and unstable estimates of the individual parameters' values. Such regression models are usually difficult to interpret. Ideally, one would try to explain the data variance with a minimum number of free model parameters, corresponding to a solution where many of the regression parameters

*β*

_{ k}are zero. Such a sparse solution, which automatically selects the most important features, can be computed by forcing small terms to adopt the value zero, leaving only those predictors in the model that carry the highest proportion of the variance. It is well-known that regression models can be sparsified by including an additional regularizing terms (for example an

*l*

_{1}-norm regularizer) into the cost function ( Equation 4). The corresponding error function is given by

*λ*≥ 0 controls the degree of sparseness. This method is known in statistics as the ‘Lasso Method’ (Meinshausen, Rocha, & Yu, 2007; Tibshirani, 1996), where the

*l*

_{1}-norm is defined as

*λ*specifies the degree to which small weights are penalized, determining the sparseness of the solution. For

*λ*= 0, the algorithm coincides with normal least-squares regression. With increasing values of

*λ,*less important contributions to the solution are progressively forced to zero, resulting in models with fewer and fewer active variables (Tibshirani, 1996).

*λ,*which determines the sparseness level of the model, and thus the number of active features, can be calculated by different statistical techniques. For our analysis, we applied generalized cross-validation (GCV) (Fu, 1998; Tibshirani, 1996), which minimizes a combined error measure that depends on the approximation error and on an estimate of the effective number of model parameters (details see 2). In the following

*λ*

_{ opt}signifies the determined optimal value of the sparseness parameter.

*a*

_{ j}and

*a*

_{0}that specify the movement and posture features for emotion

*j*and for neutral gait, respectively, so the emotion-specific feature changes are given by

*Y*

_{ j}=

*a*

_{ j}−

*a*

_{0}. With

*β*

_{ j}signifying the vector of the corresponding regression coefficients ( Equation 4), which characterize the importance of the individual features for the changes of emotion

*j*compared to neutral walking, one can define the trivial regression problem

*Y*

_{ j}=

*X*

_{ j}

*β*

_{ j}+

*ɛ*

_{ j}; here, the non-square matrix

*X*

_{ j}contains only one entries, so that the estimated

*β*

_{ j}without regularization terms correspond to the joint-specific means across trials of the entries of the

*Y*

_{ j}term.

*β*

_{ j},

*ɛ*

_{ j},

*X*

_{ j}, and

*Y*

_{ j}in matrices, i.e.,

*B*= [

*β*

_{1},…,

*β*

_{4}]

^{T}and

*X*= [

*X*

_{1},…,

*X*

_{4}]

^{T},

*Y*= [

*Y*

_{1},…,

*Y*

_{4}]

^{T}, and

*E*= [

*ɛ*

_{1},…,

*ɛ*

_{4}]

^{T}, one can approximate the emotion-specific changes of all features across emotions by the regression problem

*Y*=

*XB*+

*E*. For this matrix regression problem an error function equivalent to the one in Equation 6 can be defined, just replacing the vector norms by matrix norms. Here indicates the (matrix) Frobenius norm, and refers to the sum of the absolute values of all matrix elements. After sparsification, non-zero coefficients in the matrix specify the important features (for individual joints and emotions) that are necessary to approximate the emotion-specific changes compared to neutral gait. Again the optimal sparsification parameter

*λ*can be determined by GCV.

*β*

_{ k}from the sparse regression analysis as color-coded plot. Since the weights for the right and left body side were usually very similar we collapsed the results over both sides of the body for each individual joint. The figure shows a clear pattern of emotion-specific posture changes (defined by the average joint angles) relative to neutral walking. The sparsification parameter for this analysis was chosen according to GCV (see Methods), defining the set of significant features. The analysis reveals a clear pattern of emotion-specific posture features. The most prominent findings were the strongly reduced head angle (corresponding to increased head inclination) for sad walking, and increases of the elbow angles for fearful and angry walking.

*F*

_{3, 296}ranging from 13.7 to 64.2, all

*p*< 0.001), and significant differences between neutral and emotional gait for several joints by applying a

*t*-test (

*t*

_{74}> 2.95, uncorrected

*p*< 0.006). The means and standard errors underlying this analysis are presented in a conventional bar diagram in Figure 2B. A post-hoc Scheffé test revealed significant similarities between the different emotions. The most prominent posture features in this analysis coincided with the ones extracted by sparse regression. In total, ten features were significantly different from neutral walking, out of which 90% matched features derived from psychophysical data. The GLM analysis missed 29% of the features found in previous psychophysical studies.

*perception*of emotion from gait.

*r*= 0.86;

*p*< 0.001) between weight differences and joint-angle amplitudes computed over the entire data set.

*s*

_{1}, which explained the largest amount of variance in the trajectories in terms of a color-coded plot. This plot immediately reveals the well-known result that the emotions happiness and anger were associated with increased joint amplitudes (indicated in red), while sadness and fear rather tended to be associated with a reduction in joint-angle amplitudes (indicated in blue) compared to normal walking, seemingly consistent with the intuition that more energetic emotions were characterized by ‘larger movements’, while ‘reduced movements’ were typical for sadness and fear. For expressions of fear we also observed reduced linear weights for knee movement, likely caused by a slinking gait adopted by the actors when expressing this emotion.

*s*

_{1}and

*s*

_{2}, for the paired shoulder, elbow, hip and knee joint as dependent variables. The means and standard errors for the weight differences of the first source compared to normal walking are shown as conventional bar plots in Figure 3B. We found emotion-specific effects in all joints and for both source functions (all

*F*

_{3, 296}> 6.35,

*p*< 0.001; for

*s*

_{1}only: all

*F*

_{3, 296}> 16.98,

*p*< 0.001) except for the left and right knee on

*s*

_{1}(

*F*

_{3, 296}< 2.5,

*p*> 0.063). Homogeneous subsets determined post hoc using the Scheffé test revealed (obvious) commonalities according to emotion activation: especially for the arm joints similar weight changes relative to neutral occurred during expressions of anger and happiness on the one hand, and of sadness and fear on the other (see also Figure 3A). Comparing the weight changes for first source against neutral walking with conventional tests, we identified 14 significant features all of which also were reported in previous perception studies. In addition, the GLM analysis detected significant changes in the knee movement of anger and sadness expressions, which have not been described in previous studies.

*s*

_{2}, that explains the second largest amount of variance in the data. In this analysis step, the coefficients from the sparse regression were increased for the movement of the left shoulder and elbow joint during expressions of anger and happiness, and decreased for knee movement during expressions of fear and sadness (data not shown). Since this source oscillates with double gait frequency, and since the corresponding weights can be considered as a measure for the high-frequency components in the joint-angle trajectories, these results are again consistent with the intuition of larger, and potentially less smooth movements during happy and angry walks, and with a reduction of amplitude in fearful and angry walking.

*F*

_{1, 60}> 6.5,

*p*< 0.05), as marked in Figure 4, confirming the existence of emotion-specific dynamic features that are independent of changes in overall gait speed.

*F*

_{1, 60}> 5.4,

*p*< 0.05). Since posture is not generally strongly affected by gait speed, the results of this analysis were not very different from those shown in Figure 2B. As for the comparison with neutral walks not speed-matched to emotional gaits, significantly increased head inclination was observed during expression of sadness, and angry and fearful gaits were characterized by increased elbow flexion. For fear, upper-arm retraction and knee flexion were increased, consistent with widespread postural tension.

*perceiving*emotional body expressions. Another important observation in our analysis was that for the automatic extraction of meaningful features it was critical to approximate the trajectories with a highly compact model that minimizes the number of redundant parameters.

*perception*of emotional gait.

*χ*

^{2}1800, d. f. = 3,

*p*< 0.001). It shows that the lay actors were able to produce emotional expressions that were easily recognized, at rates comparable to rates found in many previous studies, some of which were based on the movements of professional actors (Atkinson et al., 2007; Grèzes, Pichon, & de Gelder, 2007; Wallbott, 1998).

Anger | Happiness | Fear | Sadness | |
---|---|---|---|---|

Anger | 70.3 ± 21.4 | 15.6 ± 11.3 | 3.2 ± 5.2 | 1.0 ± 1.4 |

Happiness | 23.2 ± 19.2 | 75.1 ± 23.0 | 1.9 ± 4.1 | 1.2 ± 1.4 |

Fear | 4.7 ± 8.4 | 6.6 ± 8.6 | 77.1 ± 14.1 | 8.0 ± 5.5 |

Sadness | 1.8 ± 3.1 | 2.7 ± 1.5 | 17.9 ± 5.7 | 89.8 ± 5.7 |

*t*

_{63}= 1.46, two-tailed

*p*= 0.15). We also found a highly significant influence of actor gender on the recognition of fear expressions (

*χ*

^{2}= 201.05, d. f. = 3,

*p*< 0.001): expressed by female actors, fear was correctly recognized at just over 90%, whereas males' fear expressions were only recognized in 60.5% of trials. Conversely, males' expressions of sadness were recognized more often than females' were (93% vs. 87.3%), again highly significant (

*χ*

^{2}= 15.15, d. f. = 3,

*p*= 0.005). However, there was no significant difference in the recognition rates of gaits executed by individuals with or without experience in lay-theatre groups (all

*t*

_{74}< 1.1,

*p*> 0.27).

*SEM*for anger: 1.82 ± 0.22 m/s; for fear: 0.83 ± 0.31 m/s). This effect was reflected in a highly significant main effect of the factor Emotion (levels: angry, happy, sad and fearful) on average speed in a repeated-measures ANOVA (

*F*

_{3,39}= 242.84,

*p*< 0.001). As further statistical validation of the speed matching between neutral and emotional gaits on an trial-by-trial basis for each actor, we performed a two-way repeated-measures ANOVA with the factors Trial (velocity-matched neutral gait vs. emotional gait) and Emotion (angry, happy, sad and fearful), finding no significant influence of Trial (

*F*

_{1, 13}= 0.14,

*p*= 0.71) and no significant interaction (

*F*

_{3, 39}= 1.42,

*p*= 0.25).

*χ*

^{2}> 38, d. f. = 3,

*p*< 0.001).

Anger | Happiness | Fear | Sadness | |
---|---|---|---|---|

Anger | 48.8 ± 13.8 | 20.7 ± 12.2 | 2.5 ± 6.3 | 3.7 ± 7.8 |

Happiness | 39.3 ± 10.3 | 42.6 ± 15.8 | 8.3 ± 10.9 | 8.3 ± 13.4 |

Fear | 7.0 ± 2.5 | 19.0 ± 5.4 | 28.1 ± 9.4 | 33.9 ± 10.7 |

Sadness | 5.0 ± 3.6 | 17.8 ± 6.1 | 61.2 ± 11.4 | 54.1 ± 14.1 |

*F*

_{1, 17}= 88.1,

*p*< 0.001).

*neutral*as both stimulus and response category (neutral gaits at normal speed), with five observers (three female and two male, mean age 26 years 3 months). The results of this experiment are shown in Table 3: as for four-choice classification, observers gave highly consistent responses for all five stimulus types (neutral, happy, sad, angry and fearful). The modal response was always the emotion that the actor was attempting to express. For fear and sadness, classification performance was hardly affected by including the neutral condition; there were only very few confusions between neutral and these two affects. However, there was a tendency for angry and happy gaits to be confused with neutral, and vice versa, especially for happy gait, where the second most frequent classification is in fact neutral. Neutral gait itself was classified as neutral in more than 70% of trials, demonstrating that there are specifically emotional aspects in emotionally expressive gait that differ from neutral.

Anger | Happiness | Neutral | Fear | Sadness | |
---|---|---|---|---|---|

Anger | 76.00 ± 2.8 | 14.9 ± 4.2 | 8.5 ± 2.4 | 1.9 ± 1.8 | 0.5 ± 0.7 |

Happiness | 15.5 ± 3.2 | 65.1 ± 6.5 | 12.3 ± 3.5 | 2.9 ± 3.8 | 1.9 ± 1.8 |

Neutral | 5.3 ± 4.9 | 18.4 ± 6.2 | 71.5 ± 3.1 | 5.1 ± 3.5 | 3.5 ± 2.2 |

Fear | 1.6 ± 1.7 | 1.1 ± 0.6 | 4.0 ± 1.9 | 80.0 ± 10.0 | 2.1 ± 0.7 |

Sadness | 1.6 ± 1.5 | 0.5 ± 0.7 | 3.7 ± 2.9 | 10.1 ± 5.4 | 92.0 ± 3.1 |

*SD*gait velocity 1.82 ± 0.22 m/s for anger; 1.31 ± 0.36 m/s for happiness). Negative values were obtained for fearful and sad gaits (fear: 0.83 ± 0.31 m/s; sadness: 0.68 ± 0.21 m/s). The other two discriminant functions accounted only for a small amount of variance and were therefore not considered for further analysis.

*Y*is given by the expressiveness ratings, and the predictors

*X*are given by the posture features (average joint angles over one gait cycle). In order to determine the relative importance of the different features for predicting expressiveness, we estimated the regression coefficients

*β*by sparse regression, minimizing the error function defined by Equation 6 for different values of the sparseness parameter

*λ,*where the case

*λ*= 0 corresponds to a standard linear regression without sparsification. For increasing values of the sparseness parameter the resulting model contains fewer and fewer active features, i.e. features for which the corresponding regression weight

*β*

_{ k}is different from zero. Such regression models have reduced complexity at the cost of less accurate approximation of the data, and only the most important features will still be active for large values of the sparseness parameter. Thus, sparse regression provides an elegant way of defining a rank ordering for the importance of the different features.

*r*> 0.52,

*p*< 0.001). During stimulus presentation the avatar's (anatomically) left side was always shown facing the observer, making the left side of the body more visible than the right. In addition, we have previously demonstrated an emotional-expressiveness advantage for the movement of the left side of the body (Roether et al., 2008). For these reasons, for bilateral features, we constrained the feature analysis to the left joints.

*λ*). Red and blue indicate positive and negative values of the coefficients

*β*

_{ k}respectively. As expected, without sparsification (sparseness parameter

*λ*= 0) the models typically contain all features with often small non-zero weights, which makes an interpretation of the importance of such features rather difficult. Increasing the sparseness parameter

*λ*resulted in models with fewer and fewer active features (non-zero regression coefficients), providing a ranking of models with different numbers of features.

*r*

_{41}= 0.76,

*p*< 0.001) and happiness (

*r*

_{41}= 0.36

*p*= 0.002), were rated as more intense the higher the gait velocity. For expressions of fear and sadness, gait velocity was inversely related to expressiveness ratings, significant for sadness only (

*r*

_{67}= −0.59,

*p*< 0.001). A non-significant correlation between gait velocity and expressiveness for fear (

*r*

_{52}= −0.19,

*p*= 0.19) fits the dominance of postural over kinematic cues for fear perception. This result parallels the strong influence of speed on emotion classification that we also found in the discriminant analysis.

*natural adaptors*we used the happy and sad walk (one cycle) of this actor. In order to minimize the influence of low-level motion adaptation we rendered all stimuli (adaptation and test stimuli) to have the same gait-cycle duration as the neutral prototype.

*artificial adaptors*were based on the neutral gait of the same person. To this pattern we added the two largest postural and kinematic changes for sad and happy walking as they had been extracted in Experiment 1. For generating the artificial sad-gait stimulus, we approximated the trajectories of neutral walking by Equation 2, and then modified the weights by adding the population average of the weight difference between sad and neutral walking for the shoulder and the elbow joints. These two joints had shown the maximum differences between sad and neutral walking (shoulder joints: −0.67, elbow joints: −0.79; opposite joints were treated symmetrically). Likewise, for the joints with the largest posture changes between sad and neutral walking, we added the population average of the differences between the posture angles between sad and neutral walking (−18.9 deg for the head, and −16.6 deg for the elbow joints). Correspondingly, the artificial happy gait was generated by adding the weight changes between happy and neutral walking to the weights of the shoulder and elbow (shoulder: +0.42 and elbow +0.61), the two joints showing the largest emotion-specific change relative to neutral walking. In this case, elbow and head showed the largest changes of the posture angles compared to neutral walking, and we added 2.5 deg to the elbow flexion angle and 6.3 deg to the head inclination. As an example, the artificial happy adaptor stimulus is shown in Movie 4.

*no-adaptation*block, each presentation of the test stimulus was preceded by presentation of one of four adapting stimuli per block (Natural Happy, Artificial Happy, Natural Sad, or Artificial Sad) for 8 s, followed immediately by a noise mask presented for 260 ms. The mask comprised 49 darker gray dots on the uniform gray background, moving along a planar projection of the trajectories of human arm movements. Each dot moved about a randomly chosen position, and with random phase. Fully extended, the mask had an approximate size of 5 × 9.5 degrees of visual angle. Following the mask, the test stimulus was presented for a maximum of 2 s, followed by a gray screen with a response prompt; stimulus presentation was interrupted immediately after the subject's response.

*no-adaptation*block was always the first, followed by four blocks in random order, including two artificial adaptors (happy or sad) and two natural-adaptor (happy or sad) blocks.

*no-adaptation*blocks; the statistical significance of this effect was confirmed by separate repeated-measures ANOVAs for both the happy (

*F*

_{2, 14}= 5.64,

*p*= 0.016) and the sad adaptor (

*F*

_{2, 14}= 12.60,

*p*= 0.009) with the three-level factor Adaptor (levels: no-adaptor, artificial-adaptor and natural-adaptor). Crucially, the shifts induced by both artificial adaptors were significantly different from baseline: presenting the artificial sad adaptor shifted the AP to the right (mean ±

*SEM*0.62 ± 0.043 compared to 0.57 ± 0.040 for no adaptation;

*t*

_{7}= −2.28,

*p*= 0.029), while presenting the artificial happy adaptor shifted it to the left (0.48 ± 0.041,

*t*

_{7}= 2.95, one-tailed

*p*= 0.011).

*F*

_{1, 7}= 0.063,

*p*= 0.45). There was thus no significant difference between the high-level after-effects induced by natural and artificial adapting stimuli.

*neutral*as a stimulus and response category. The vast majority of neutral gaits were classified as neutral, which shows that there are characteristic differences between neutral body movements and emotionally expressive body movements that observers can use to distinguish between them. This is an important result in the light of the finding that observers in a forced-choice situation even attribute emotional states to simple, static geometric shapes (Pavlova, Sokolov, & Sokolov, 2005) with above-chance consistency. However, a conclusive answer to the question of the relationship between expressive body movements and emotions would require monitoring the emotional experience of the actors more closely, perhaps by parallel assessment of psychophysiological measures (Cacioppo, Berntson, Larsen, Poehlmann, & Ito, 2000). In the absence of such data, one might also consider subjective mood ratings as possible method for the assessment of the affective changes that were experienced by the actors. However, we chose not to collect such subjective ratings, since we feared that this additional introspective step might disturb the immediacy of the actor's emotional experience. Besides, subjective reports of mood states are subject to strong demand effects, actors being inclined to report stronger effects, presumably in order to conform with the experimenter's intentions (Westermann et al., 1996).

*x*

_{i}(

*t*) were thus approximated by linear superpositions of the statistically independent source signals (basis functions)

*s*

_{j}(

*t*), weighted by the mixing weights

*α*

_{ij}(Equation A1). As described above, the model incorporates phase differences between different limbs by allowing for time delays

*τ*

_{ij}between source signals and angle trajectories:

- Solving Equation A2, by applying source separation methods with additional positivity constraint, such as non-negative PCA (Oja & Plumbley, 2003), positive ICA (Hojen-Sorensen, Winther, & Hansen, 2002) or non-negative matrix factorization (NMF) (Lee & Seung, 1999). This is justified by the fact that the only difference between Equation A2 and the standard instantaneous mixing model of standard PCA or ICA is the fact that all variables are non-negative.
- Solving Equation A3 numerically, given the results of the preceding step. The solution provides the unknown delays
*τ*_{ ij}and the phases of the Fourier transforms of the source signals arg(*Fs*_{ j}). To separate these two variables, we estimate*τ*_{ ij}in a separate step which is then iterated with the solution of Equation A3.

*x*

_{2}(

*t*) is a scaled and time-shifted copy of the signal

*x*

_{1}(

*t*), such that

*x*

_{2}(

*t*) =

*αx*

_{1}(

*t*−

*τ*), the following relationship in the Fourier domain holds (

*z*):

*Fx*

_{1}(

*ω*) ·

*Fx*

_{2}(

*ω*)) = 2

*πωτ,*which has to hold for all frequencies. The delay can thus be estimated by linear regression, concatenating the equations for a set of different frequencies,

*τ*specifying the slope of the regression line. Equation A4 shows how the complex phase of the cross-spectrum is connected with the unknown delay

*τ*

_{ ij}.

*x*

_{1}and

*x*

_{2}are influenced by Gaussian additive noise, it can be shown that the delay can be estimated by linear regression using the equation

*ɛ*(

*ω*) is a composite noise term. Under appropriate assumptions, the estimated slope 2

*πτ*of this regression line is the best unbiased linear estimator (Chan, Hattin, & Plant, 1978).

*τ*

_{ij}=

*τ*

_{kj}if

*i, k*specify the same joint and source, but different emotions). This constraint resulted in a higher interpretability of the mixing weights. Assuming we want to estimate a common delay from the time shifts between a reference signal

*x*

_{0}(

*t*) and the signals

*x*

_{l}(

*t*), 1 ≤

*l*≤

*L,*we can concatenate all regression equations belonging to the same joint into the vector relationship

*c*contains the values of the cross spectrum for the different signals, and where

*u*is a one-element vector. Concatenating these equations over different values of the frequency

*ω*results in a regression problem from which the joint delay can be estimated in the same way as from Equation A5.

*λ*in Equation 6 is a free parameter of our analysis method. Large values of this parameter result in highly compact models with few features, but limited approximation quality, while small values lead to better fitting models with more features. One might ask if there is an optimal value for the choice of this parameter, which results in an optimized trade-off between prediction error and model complexity.

*GCV*error is given by:

*p*(

*λ*) signifies the number of active parameters of the model and

*n*is the number of variables (dimensionality of

*β*). It can be shown that the number of active parameters is given by the relationship

*W*

^{−1}being the generalized inverse of the matrix

*W*= diag(2∣

*β*

_{j}∣) and

*n*

_{0}signifying the number of zero entries in the vector of regression coefficients (i.e.

*β*

_{j}= 0). This number is determined after solving the constrained regression problem described in Equation 6 for all values of the sparseness parameter

*λ*. An optimal estimate for the sparseness parameter

*λ*

_{opt}can thus be determined by solving the minimization problem