**Visual categorization is the brain computation that reduces high-dimensional information in the visual environment into a smaller set of meaningful categories. An important problem in visual neuroscience is to identify the visual information that the brain must represent and then use to categorize visual inputs. Here we introduce a new mathematical formalism—termed space-by-time manifold decomposition—that describes this information as a low-dimensional manifold separable in space and time. We use this decomposition to characterize the representations used by observers to categorize the six classic facial expressions of emotion (happy, surprise, fear, disgust, anger, and sad). By means of a Generative Face Grammar, we presented random dynamic facial movements on each experimental trial and used subjective human perception to identify the facial movements that correlate with each emotion category. When the random movements projected onto the categorization manifold region corresponding to one of the emotion categories, observers categorized the stimulus accordingly; otherwise they selected “other.” Using this information, we determined both the Action Unit and temporal components whose linear combinations lead to reliable categorization of each emotion. In a validation experiment, we confirmed the psychological validity of the resulting space-by-time manifold representation. Finally, we demonstrated the importance of temporal sequencing for accurate emotion categorization and identified the temporal dynamics of Action Unit components that cause typical confusions between specific emotions (e.g., fear and surprise) as well as those resolving these confusions.**

*SD*= 1.71 years) with normal or corrected-to-normal vision and minimal exposure to and engagement with non-Western cultures (De Leersnyder, Mesquita, & Kim, 2011) as assessed by questionnaire (see Supplementary Material: Observer Questionnaire). All observers gave written informed consent and received 6 pounds per hour. The Glasgow University College of Science and Engineering Ethics Committee provided ethical approval.

*n*= 5,

*P*= 0.6, minimum = 1, maximum = 6, median = 3). In this illustrative example trial, the GFG selected Upper Lid Raiser (AU5) color-coded in red, Nose Wrinkler (AU9) color-coded in green, and Upper Lip Raiser (AU10) color-coded in blue. For each AU separately, the GFG selected random values for each of six temporal parameters (onset latency, acceleration, peak amplitude, peak latency, deceleration, and offset latency—see labels illustrating the red temporal-activation curve) from a uniform distribution. The GFG then combined the random dynamic AUs to create a random photorealistic facial animation displayed on a unique same-race face identity generated using standard procedures (Yu et al., 2012). We rendered all facial animations using 3ds Max. We refer the reader to the Supplementary Material: Stimulus Parameter Sampling for more details on the parameter-sampling procedures.

**Figure 1**

**Figure 1**

*space-by-time manifold*—a dimensionality-reduction algorithm based on NMF (Lee & Seung, 1999). The space-by-time manifold represents all facial movements (described on each trial by an

*S*AUs ×

*T*temporal parameters matrix; here

*T*= 6,

*S*= 42) using a set of nonnegative spatial (AU) components and a set of nonnegative temporal components (describing the temporal profile of each AU activation). To approximate each single-trial facial-expression stimulus, the AU and temporal components are linearly combined using scalar activation coefficients. Formally, the AU activity

**M**

*with dimensions (*

_{n}*T*×

*S*) recorded during one trial

*n*is factorized as follows (Delis et al., 2014, 2015): where

**W**

_{tem}is a (

*T*×

*P*) matrix whose columns are the temporal components,

**W**

_{spa}is an (

*L*×

*S*) matrix whose rows are the AU components, and

**H**

*is a (*

_{n}*P*×

*L*) matrix containing the coefficients that combine each of the

*P*temporal components with each of the

*L*spatial ones.

**M**(

*T*×

*S*×

*N*), where

*N*is the number of trials (here

*N*= 2,400). We used this matrix as input to the space-by-time decomposition algorithm (Delis et al., 2014)—a MATLAB implementation is available online at https://sites.google.com/site/ioannisdeliswebpage/software/sNM3F.zip—to extract the AU and temporal components of facial movement that subsume each observer's emotion categorizations. Each AU component represents a specific conjunction of AUs, each temporal component represents a temporal profile of activations, and the linear combinations of AU and temporal components recode each emotion category in the manifold.

*N*= 2,400 × 60 in this case) and applied the same decomposition. We quantified similarity between each single-observer decomposition and the decomposition of the pooled data using the correlation coefficients between pairs of components. We found high similarity between the components of the single-observer decomposition and those of the pooled single-observer data (average correlation across components and observers was 0.97 ± 0.01 for the temporal components and 0.83 ± 0.03 for the AU components), thereby lending support to their consistency.

**H**

*of the space-by-time manifold decomposition as inputs to an LDA to predict the emotion the observers categorized on each trial using a leave-one-out cross-validation procedure (Duda et al., 2001; Quian Quiroga & Panzeri, 2009). After evaluating categorization accuracy with*

_{n}*P*= 1 temporal and

*N*= 1 spatial component, we iteratively added components and compute the categorization power of the resulting decompositions for the six emotion categories. Adding temporal and AU components translates into more temporal bursts and groups of AUs carrying emotion-categorization information respectively. We stopped adding components when significant increases in categorization performance (percent correct classification,

*p*< 0.001) stopped. From this procedure, the chosen set of

*N*spatial and

*P*temporal components is the smallest decomposition that carries the highest categorization power (>70% on average) of the six emotion categories (Delis et al., 2013a, 2013b).

_{1}and w

_{2}), where w

_{1}consists of a low activation of Upper Lid Raiser (AU5) and a high activation of Jaw Drop (AU26), and vice-versa for w

_{2}(see bars). Gray spheres reflect individual trials (600 in total) represented by two coefficients (c

_{1}and c

_{2}) that linearly combine the two components. On half of the trials, we impose correlations (

*p*= 0.7) between c

_{1}and c

_{2}to represent AU synergies. NMF is shown to correctly recover the simulated dimensions, whereas PCA starts from the dimension explaining the most variance, adding one orthogonal dimension, and ICA looks for statistically independent dimensions. As a result, PCA and ICA identify components with negative values for one of the two simulated AUs, which is inconsistent with their functional role as representations of AU activations.

**Figure 2**

**Figure 2**

*P*×

*L*coefficients of the space-by-time decomposition.

*P*×

*L*–dimensional space (Duda et al., 2001). We input each activation coefficient of the space-by-time decomposition to LDA and computed the discrimination power (percent correct decoding) carried by each combination of AU and temporal components.

*P*temporal components, resulting in

*P*×

*L*parameters on an

*L*-dimensional space. LDA determined the linear boundaries that split the

*L*-dimensional space into subspaces corresponding to each emotion. We found a discrimination performance higher than 75% for the resulting subspaces (as computed by LDA) for all pairs of emotions. We show in Figure 2B an illustrative example of the application of LDA to simulated fear and surprise trials on the space defined by the NMF components. Here LDA is shown to determine a categorization boundary that discriminates fear from surprise.

**Figure 3**

**Figure 3**

*p*< 0.001). This result indicates that the identified manifold effectively captures the informative dimensions of the facial-expression signals, and suggests that identifying a compact yet highly informative representation is crucial for the reliable categorization of the data. We then analyzed how these different patterns of activation contribute to the discrimination of specific pairs of emotions.

**Figure 4**

**Figure 4**

**Figure 5**

**Figure 5**

**Figure 6**

**Figure 6**

**Figure 7**

**Figure 7**

*p*< 0.05, permutation test) than the decoding accuracy obtained with both components (66% ± 1%), but only by a relatively small amount.

*SD*= 2.4 years) using the same inclusion and exclusion criteria as in Experiment 1.

*SD*= 4.1 years), each captured using standard procedures (for details, see Yu et al., 2012), resulting in a total of 600 facial animations.

*F*(5, 108) = 10.91,

*p*< 0.01, and dynamics,

*F*(1, 108) = 12.63,

*p*< 0.0001, with a statistically significant interaction,

*F*(5, 108) = 3.62,

*p*< 0.01.

*p*s < 0.05) for happy (94% ± 2%), surprise (89% ± 4%), disgust (93% ± 1%), and sad (93% ± 1%) than anger (72% ± 5%).

**Figure 8**

**Figure 8**

*t*tests, all

*p*s < 0.05) than the inverted sequence (higher diagonal values in Figure 8A than in Figure 8B)—happy: 94% ± 2% versus 84% ± 4%,

*t*(18) = 2.36; surprise: 89% ± 4% versus 77% ± 4%,

*t*(18) = 2.23; fear: 80% ± 3% versus 57% ± 8%,

*t*(18) = 2.74; and disgust: 93% ± 1% versus 88% ± 2%,

*t*(18) = 2.15. This suggests that temporal sequencing of AU components is important for the recognition of these emotions. Anger is the only emotion that showed slightly but not significantly higher hit rates for the inverted temporal dynamics than the original dynamics: 78% ± 4% versus 72% ± 5% (paired

*t*test,

*p*= 0.37),

*t*(18) = −0.92.

*t*test,

*p*< 0.05),

*t*(9) = 2.18—and angry facial expressions were more often (paired

*t*test,

*p*< 0.05) matched with disgust labels than vice versa: 70% ± 8% versus 54% ± 8%,

*t*(9) = 2.31. Interestingly, when the temporal dynamics of the facial animations were inverted, the number of these confusions decreased significantly for surprise false alarms—39% ± 8% versus 75% ± 5% (paired

*t*test,

*p*< 0.0001),

*t*(9) = 5.9, (X in Figure 8A, B)—and disgust false alarms: 47% ± 8% versus 70% ± 8% (paired

*t*test,

*p*< 0.01),

*t*(9) = 3.1 (+ in Figure 8A, B). It did not decrease significantly for fear—49% ± 7% versus 59% ± 9% (paired

*t*test,

*p*= 0.43),

*t*(9) = 0.83 (O in Figure 8A, B)—and anger: 50% ± 9% versus 54% ± 8% (paired

*t*test,

*p*= 0.69),

*t*(9) = 0.41 (* in Figure 8A, B).

*F*(5, 11976) = 9.78,

*p*< 0.0001, dynamics,

*F*(1, 11976) = 3.91,

*p*< 0.05, and response,

*F*(1, 11976) = 14.47,

*p*< 0.0001, with a significant interaction between emotion and dynamics,

*F*(5, 11976) = 5.54,

*p*< 0.0001.

**Table 1**

**Figure 9**

**Figure 9**

*t*tests, all

*p*s < 0.05) response times for hit trials for surprise: 1.11 ± 0.01 s versus 1.37 ± 0.02 s on average,

*t*(816) = −11.97; fear: 1.21 ± 0.02 s versus 1.28 ± 0.02 s on average,

*t*(676) = −2.46; disgust: 1.07 ± 0.01 s versus 1.20 ± 0.02 s on average, t(891) = −5.99; and anger: 1.23 ± 0.02 s versus 1.29 ± 0.02 s on average,

*t*(740) = −2.10 (green stars in Figure 9). It had no effect on happy or sad hit trials, suggesting that correct matching of surprise, fear, disgust, and anger depends on the temporal sequencing of AU activations, while the stimulus dynamics do not affect correct matching of happy or sad. This result confirms previous findings showing that happy and sad are reliably categorized early on, whereas the other four emotions are discriminated by means of AU signals transmitted later in time (Jack et al., 2014).

**Figure 10**

**Figure 10**

*t*test,

*p*< 0.05),

*t*(154) = 2.16 (pink star in Figure 10)—suggesting a reliance on the late activation of Upper Lid Raiser (AU5). Conversely, when the surprise label was shown followed by a happy facial expression, correct rejection relied on Lip Corner Puller–Outer Brow Raiser (AU12-2), because its early presentation (in the inverted dynamics) shortened response time: 1.14 ± 0.03 s versus 1.23 ± 0.04 s on average (paired

*t*test,

*p*< 0.0001),

*t*(177) = −6.24 (pink star in Figure 10). Thus, Upper Lid Raiser (AU5) and Lip Corner Puller–Outer Brow Raiser (AU12-2) provide diagnostic information to discriminate happy and surprise (Figure 10A).

*t*test,

*p*< 0.05),

*t*(22) = 2.20 (orange star in Figure 10). Conversely, when the anger label was shown followed by a surprise facial expression, false alarms occurred due to Upper Lid Raiser (AU5): 1.13 ± 0.09 s for original dynamics versus 1.43 ± 0.10 s for inverted dynamics on average (paired

*t*test,

*p*< 0.05),

*t*(368) = −2.16 (orange star in Figure 10). Thus, activations of Lip Funneler–Upper Lip Raiser Left (AU22-10L) and Upper Lid Raiser (AU5) caused confusions between surprise and anger. An early dynamics of Upper Lid Raiser (AU5) for surprise and a later one for anger resolves these confusions. Likewise, an early Lip Funneler–Upper Lip Raiser Left (AU22-10L) for anger and a later one for surprise resolves confusions (Figure 10B).

*t*tests, all

*p*s < 0.01)—fear false alarms: 1.09 ± 0.04 s versus 1.41 ± 0.04 s on average,

*t*(105) = −5.10; disgust false alarms: 1.20 ± 0.04 s versus 1.36 ± 0.06 s on average,

*t*(113) = −2.64; and anger false alarms: 1.00 ± 0.03 s versus 1.30 ± 0.08 s on average,

*t*(100) = −3.69. Taken together with the categorization results presented earlier, these results corroborate that surprise/fear and disgust/anger confusions are caused by the sharing of AU components (Figure 10C, D).

*, 7, 43.*

*Frontiers in Computational Neuroscience**, 7, 51.*

*Frontiers in Computational Neuroscience**, 57 (1), 125–133.*

*Brain Research Reviews**, 6 (3), 300–308.*

*Nature Neuroscience**, 37, 451–463.*

*Personality and Social Psychology Bulletin**, 7, 54.*

*Frontiers in Computational Neuroscience**, 7, 8.*

*Frontiers in Computational Neuroscience**, 111 (3), 675–693.*

*Journal of Neurophysiology**Task-discriminative space-by-time factorization of muscle activity*. Frontiers in Human Neuroscience,

*9*, 399.

*. New York: Wiley.*

*Pattern classification**. New York: Henry Holt and Co.*

*Emotions revealed*(2nd ed.)*. Salt Lake City, UT: Research Nexus.*

*Facial Action Coding System investigators guide**, 34 (1), 27–42.*

*Journal of Nonverbal Behavior**. Garden City, NY: Doubleday.*

*The hidden dimension**, 23 (21), 2169–2175.*

*Current Biology**, 3, 58–62.*

*Medicinski Glasnik**, 19 (18), 1543–1548.*

*Current Biology**, 24 (2), 187–192.*

*Current Biology**, 109 (19), 7241–7244.*

*Proceedings of the National Academy of Sciences, USA**, 25 (14), R621–R634.*

*Current Biology**, 401 (6755), 788–791.*

*Nature**, 13, 1589–1608.*

*Journal of Machine Learning Research**, 13 (2), 143–157.*

*Motivation and Emotion**, 16 (2), 133–136.*

*NeuroReport**, 10 (3), 173–185.*

*Nature Reviews Neuroscience**, 28 (7), 1214–1222.*

*Cognition and Emotion**, 21 (1), 1–54*

*Behavioral and Brain Sciences**, 290, 2268–2269.*

*Science**, 20 (10), 1202–1208.*

*Psychological Science**, 95 (4), 2199–2212.*

*Journal of Neurophysiology**, 2 (2), 162–167.*

*Nature Neuroscience**, 5 (7), 682–687.*

*Nature Neuroscience**, 36 (3), 152–162.*

*Computers & Graphics*