Free
Article  |   March 2013
Wait, are you sad or angry? Large exposure time differences required for the categorization of facial expressions of emotion
Author Affiliations
Journal of Vision March 2013, Vol.13, 13. doi:10.1167/13.4.13
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to Subscribers Only
      Sign In or Create an Account ×
    • Get Citation

      Shichuan Du, Aleix M. Martinez; Wait, are you sad or angry? Large exposure time differences required for the categorization of facial expressions of emotion. Journal of Vision 2013;13(4):13. doi: 10.1167/13.4.13.

      Download citation file:


      © 2017 Association for Research in Vision and Ophthalmology.

      ×
  • Supplements
Abstract
Abstract
Abstract:

Abstract  Facial expressions of emotion are essential components of human behavior, yet little is known about the hierarchical organization of their cognitive analysis. We study the minimum exposure time needed to successfully classify the six classical facial expressions of emotion (joy, surprise, sadness, anger, disgust, fear) plus neutral as seen at different image resolutions (240 × 160 to 15 × 10 pixels). Our results suggest a consistent hierarchical analysis of these facial expressions regardless of the resolution of the stimuli. Happiness and surprise can be recognized after very short exposure times (10–20 ms), even at low resolutions. Fear and anger are recognized the slowest (100–250 ms), even in high-resolution images, suggesting a later computation. Sadness and disgust are recognized in between (70–200 ms). The minimum exposure time required for successful classification of each facial expression correlates with the ability of a human subject to identify it correctly at low resolutions. These results suggest a fast, early computation of expressions represented mostly by low spatial frequencies or global configural cues and a later, slower process for those categories requiring a more fine-grained analysis of the image. We also demonstrate that those expressions that are mostly visible in higher-resolution images are not recognized as accurately. We summarize implications for current computational models.

Introduction
The minimum exposure time for the successful classification of an image within a category set is of great value for understanding the hierarchical computations of the semantic analysis of an image. Research in object recognition, for example, shows that very short exposure times, in the order of 20 ms, are sufficient to identify the existence of an object class (Thorpe, Fize, & Marlot, 1996), with other object recognition tasks requiring up to 500 ms (Grill-Spector, Kushnir, Hendler, & Malach, 2000). In a scene with multiple objects, results suggest that global information is processed first (Antes, Penland, & Metzger, 1981; Biederman, 1981); after short exposure times (100 ms), subjects make recognition decisions based more on the global (context) information than on local detections. Greene and Oliva (2009) show that global properties (e.g., open or closed environments) of the scene are processed with significantly less exposure time than basic-level categorizations (e.g., mountain or beach), suggesting yet another level in the hierarchical semantic analysis of the image. Although these results provide invaluable information on the hierarchical structure of the computations of object and scene analysis, such studies are almost nonexistent in the study of facial expressions of emotion (Batty & Taylor, 2003; Kirouac & Dore, 1984). 
Since at least Aristotle (1936), philosophers and scientists have argued about the relevance of facial expressions of emotion in cognition, social interactions, and evolution (Izard, 2009; Russell, 1994). Computational models for the analysis of pictures of facial expressions of emotion abound (Martinez, 2011; Martinez & Du, 2012). An early model, sometimes called the limbic hypothesis (Maclean, 1949), proposed that computations over images of facial expressions of emotion occur in the hypothalamus and limbic system (LeDoux, 1995; Maclean, 1952; Papez, 1937). This model may predict that each facial expression of emotion requires similar exposure times, with small differences reflecting the hierarchical analysis within these brain structures. 
Another possibility is that, even when the areas processing different categories of facial expressions of emotion are the same, the pathways leading to them (while analyzing each emotion category) differ. Vuilleumier, Armony, Driver, and Dolan (2003) and Smith and Schyns (2009) suggest some emotions are mostly processed using the low spatial frequencies of their images, whereas others are mostly based on their high-level frequencies. Computations of these low spatial frequencies involve the faster dorsal pathway connecting the retina to subcortical regions, whereas those involving higher spatial frequencies are computed by the much slower ventral cortical pathway (Vuilleumier et al., 2003). This model would predict that facial expressions of emotion that can be successfully recognized in low-resolution images (which include low spatial frequencies) will require significantly shorter exposure times than those facial expressions that can be recognized only in higher resolution images (which have higher spatial frequencies; Smith & Schyns, 2009). In previous work (Du & Martinez, 2011), we have shown that happiness, surprise, and neutral are reliably recognized at high and low resolutions, whereas sadness, anger, fear, and disgust not only are identified less accurately, but their classification accuracies are also more affected by the resolution loss. Thus, a similar pattern would be predicted on the exposure times according to the model defined in this paragraph. 
Other models propose that different brain regions (especially the superior temporal sulcus, amygdala, insula, basal ganglia, and orbitofrontal cortex) house the computations of distinct categories of facial expression of emotion (Allison, Puce, & McCarthy, 2000; Calder, Lawrence, & Young, 2001; Haxby, Hoffman, & Gobbini, 2000; Phan, Wager, Taylor, & Liberzon, 2002; Phillips et al., 1997; Sprengelmeyer, Rausch, Eysel, & Przuntek, 1998). These results support the categorical model (Ekman, 1992; Izard, 2009), which includes a discrete set of emotion categories, each with its own consistent and differential cortical activation pattern. A major claim is the role of the amygdala in fear (Adolphs, 2002; Calder et al., 2001; Davis, 1992; Rotshtein et al., 2010), although some studies have also found amygdala activation in the recognition of happiness (Killgore & Yurgelun-Todd, 2004), and others have argued the amygdala is involved in attention and decision making rather than in the processing of specific emotion categories (Adolphs, 2008). It is thus generally unclear if these patterns are unique to each emotion and which brain regions are involved with computations of which facial expression of emotion (Batty & Taylor, 2003). With such limited specificity, these models can only predict that facial expressions that are computed in visual areas located earlier in the visual processing hierarchy (Riesenhuber & Poggio, 2000) require a shorter exposure time. In addition, if each emotion category studied here requires approximately the same exposure time for its recognition by every subject, it would suggest that these categories are basic elements with similar neural representations in all of us (Ekman, 1992; Izard, 1992). This view is usually known as the universality hypothesis of emotions. 
In the present paper, we show that there is a correlation between the image resolution at which facial expressions can be successfully recognized and the exposure time required for its correct identification. This result supports the view that computations of categories that can be successfully detected using lower spatial frequencies use a faster pathway or occupy early regions of the hierarchical structure of the analysis of the image. Specifically, we show that the categories that can be recognized fastest (10–20 ms) and at lower resolutions are happiness and surprise (along with neutral), whereas anger and fear are in the slower group (100–300 ms) and cannot be correctly identified at lower resolutions. What is not explained by current models is our inability to recognize accurately facial expressions of emotion that are mostly visible in high-resolution images. It is unclear why expressions that can be recognized fastest and in low-resolution images (e.g., happiness and surprise) should be also recognized more reliably than those requiring additional time and higher resolutions (e.g., fear and anger). One possibility is the seemingly better ability of the human visual system to analyze configural cues, such as second-order relations (i.e., intrafacial feature distances; Martinez & Du, 2012). We discuss the implications of our results on current models. 
Experiment 1: Time thresholds
Materials and methods
Subjects
Forty-two human subjects (23 women; mean age = 21.8 years, SD = 3.3) with normal or corrected-to-normal vision were drawn from the population of students and staff at The Ohio State University and received a small payment for their participation. Subjects had not participated in any related studies before. They were seated in front of a personal computer with a 21-inch CRT monitor with a refresh rate of 100 Hz. The viewing distance was 50 cm. 
Stimuli
A total of 840 face images were used to generate the stimuli. The images were from Du, Tao, and Martinez (2013). Each emotion category (happy, sad, fearful, angry, surprised, disgusted) as well as neutral were depicted in 120 images, each expressed by a different person. The expressions of emotion in this database include the prototypical muscle activations (Ekman & Friesen, 1976), and an independent analysis shows subjects perceive as shown in Du and Martinez (2011) with other standard databases (additional details of this analysis are in the Appendix). 
The selected photos were manually cropped. The cropped images were then converted to gray scale, and contrast was equalized. The resulting images were downsized to 240 × 160 pixels. We will refer to the images in this set as belonging to resolution 1. Subsequent sets were constructed by downsizing the previous one by 1/2. This procedure yielded the following additional sets: 120 × 80 (called resolution 1/2), 60 × 40 (resolution 1/4), 30 × 20 (resolution 1/8), and 15 × 10 pixels (resolution 1/16). A box kernel was used in downsizing to smooth the image to prevent adding high frequencies in the process. The box kernel takes the average pixel value within a 4 × 4 neighborhood. To provide common visual angles of 8° vertically and 5.3° horizontally, all five sizes were scaled back to 250 × 166 pixels using bilinear interpolation, which preserves most of the spatial frequency components (Figure 1). The function for resizing was the Matlab® function imresize. Only one image from each column in Figure 1 was presented in the experiment because viewing the higher-resolution image would provide information for judging its lower-resolution version. Therefore, each stimulus was unique in terms of identity × emotion × resolution. Each condition (i.e., emotion category at each resolution) was tested with 24 stimuli. 
Figure 1
 
Facial expressions from left to right: happiness, sadness, fear, anger, surprise, disgust, and neutral. Resolutions from top to bottom: 1 (240 × 160 pixels), 1/2 (120 × 80 pixels), 1/4 (60 × 40 pixels), 1/8 (30 × 20 pixels), and 1/16 (15 × 10 pixels).
Figure 1
 
Facial expressions from left to right: happiness, sadness, fear, anger, surprise, disgust, and neutral. Resolutions from top to bottom: 1 (240 × 160 pixels), 1/2 (120 × 80 pixels), 1/4 (60 × 40 pixels), 1/8 (30 × 20 pixels), and 1/16 (15 × 10 pixels).
Design and procedure
The QUEST method (Watson & Pelli, 1983) was used to find the minimum exposure time required to recognize facial expressions of emotion. This approach is generally considered effective in estimating perceptual thresholds and has previously been used in studies of face perception (Roesch, Sander, Mumenthaler, Kerzel, & Scherer, 2010). 
We use Ti to denote the minimum time threshold required to successfully detect emotion category i. QUEST is a staircase approach that adaptively increases the exposure time after receiving an incorrect response and decreases it after a correct response so as to maintain recognition rates around a predetermined percentage p. In this study, p was set to be the mid-point 57% between chance (14%) and perfect classification 100%. 
In our use of QUEST, the estimation of the Ti's was based on four main assumptions (Watson & Pelli, 1983): (a) The shape of the psychometric function p T i (tj) is identical under all conditions, with P T i (tj) specifying the probability of correct response with exposure time tj for the jth stimulus. (b) P T i (tj) does not change from the testing of one stimulus to the next. (c) The testing of each stimulus is statistically independent from the testing of the others. And, (d) the subject's psychometric function follows a Weibull distribution. 
Thirty-five QUEST processes were run in parallel (7 facial expressions × 5 resolutions). Each QUEST process was based on a set of 24 images. Image order was randomized across all QUEST processes. The minimal exposure time for the stimulus was 10 ms. The maximal exposure time was set to be 500 ms for resolution 1 through 1/4 and 750 ms for resolution 1/8 and 1/16. The β value, which affects the slope of the psychometric function in QUEST, was set to 0.9. The guess rate in QUEST was set to chance level, that is, 14%. The mental lapse rate was set to 0.05. 
Figure 2 illustrates a typical stimulus timeline. First, a white fixation cross in a black background was shown for 500 ms. The stimulus was then shown for tj ms, where tj was determined by the QUEST procedure as described above. A random noise mask follows for a total of 500 ms. A 7-alternative forced-choice (7AFC) paradigm was used, in which subjects were asked to select one of the seven facial expression labels (categories). After the subject's response, the screen went blank for 500 ms before starting the process again. 
Figure 2
 
Stimulus timeline of Experiment 1. A white fixation cross in the black background is shown for 500 ms. The stimulus is shown for x ms, where x is determined by QUEST, followed by a random noise mask for 500 ms. A 7AFC paradigm is used. After the subject's response, the screen goes blank for 500 ms, and the process is repeated.
Figure 2
 
Stimulus timeline of Experiment 1. A white fixation cross in the black background is shown for 500 ms. The stimulus is shown for x ms, where x is determined by QUEST, followed by a random noise mask for 500 ms. A 7AFC paradigm is used. After the subject's response, the screen goes blank for 500 ms, and the process is repeated.
There was a short introductory session before the actual test. Subjects were shown face images with their corresponding emotion labels representing the seven facial expressions. They also completed a short practice session with 14 stimuli in various resolutions. The images used in the practice session were not used in the actual test. The entire experiment lasted about 55 min. Subjects were given breaks every 10 min. 
Results
The estimated thresholds (Ti) are in Table 1 and Figure 3a. The two-sample Welch's t test (with unequal sample size and unequal variance) was used for all statistical analyses. Statistical differences for different resolutions in the same emotion category are marked with an asterisk in the figure. Empty entries in Table 1 (specified with a dash) indicate subjects were unable to reach 57% classification accuracy or the estimated threshold was beyond the maximal exposure time for that category and resolution. We kept the estimates that were below the minimal exposure time 10 ms because the true threshold must be within 0–10 ms given that 10 ms was more than sufficient. However, if the estimate is beyond the maximal exposure time, it is possible to contain a large error or even have an unbounded deviation from the true threshold when the expected recognition accuracy is not achievable given unlimited exposure time. Statistical differences among emotions at the same resolution are shown in Figure 3b
Figure 3
 
Average thresholds of Experiment 1. (a) Exposure time thresholds. The asterisk indicates statistical difference (p ≤ 0.05) between the labeled point and the one to its left. (b) Statistical differences (p ≤ 0.05) among emotions at the same resolution. The two-sample Welch's t test was used for all statistical tests. In (b), the results of a pairwise t test were grouped by brackets.
Figure 3
 
Average thresholds of Experiment 1. (a) Exposure time thresholds. The asterisk indicates statistical difference (p ≤ 0.05) between the labeled point and the one to its left. (b) Statistical differences (p ≤ 0.05) among emotions at the same resolution. The two-sample Welch's t test was used for all statistical tests. In (b), the results of a pairwise t test were grouped by brackets.
Table 1
 
Average estimated thresholds of Experiment 1 in milliseconds. Notes: Standard deviations are in parentheses. A dash line indicates most subjects did not have a valid estimated threshold.
Table 1
 
Average estimated thresholds of Experiment 1 in milliseconds. Notes: Standard deviations are in parentheses. A dash line indicates most subjects did not have a valid estimated threshold.
Neutral Happiness Surprise Sadness Disgust Fear Anger
Resolution 1 9 (10) 12 (4) 13 (4) 68 (73) 77 (50) 101 (100) 122 (106)
Resolution 1/2 10 (12) 13 (4) 16 (10) 75 (53) 83 (46) 122 (105) 203 (161)
Resolution 1/4 10 (14) 17 (5) 16 (7) 187 (101) 186 (126) 175 (106) 173 (130)
Resolution 1/8 52 (103) 27 (8) 32 (36) 310 (201) 449 (272) 251 (215)
Resolution 1/16 51 (115) 106 (58) 167 (204)
The results show that subjects recognized happiness and surprise with very brief exposure times but recognized sadness, disgust, fear, and anger with much longer exposure times. The estimated thresholds for neutral were even lower than happiness, partly because some subjects showed a strong bias to select neutral, especially when the resolution of the image was low. The statistical difference between happiness and surprise and the rest of the emotion categories was significant across all resolutions (p < 0.009; Figure 3b). There was no consistent statistical difference across resolutions among disgust, sadness, fear, and anger, although they differ in how their thresholds change with resolution (Figure 3). Happy, surprise, and neutral were the only expressions successfully detected at the lowest of the resolutions (15 × 10 pixels). Sadness and fear were not recognizable even at resolutions of 30 × 20 pixels. 
Figure 3b shows three distinct groups of facial expressions: (a) neutral, (b) happy and surprise, and (c) sadness, disgust, fear, and anger. These are sorted from fastest to slowest according to their Ti. At resolution 1/2, we observed additional differences within Group c: Sadness and disgust are computed faster than fear and anger. 
Another important point to study is the response consistency among subjects. This is illustrated in Figure 4, where we plot the distribution of the thresholds (Ti) of all subjects. This figure is to be read as follows. The red lines in each of the boxes in each plot specify the median Ti over all subjects. The edges of the boxes correspond to the 25th and 75th percentiles of the distribution, and the dashed bars extend to the maximum and minimum Ti of all subjects, excluding outliers. Outliers, specified as red crosses in the plots of this figure, are points outside the 99.3% of the data coverage (assuming the data is normally distributed). 
Figure 4
 
Thresholds across subjects of Experiment 1. The red line in the box is the median, the edges of the box are the 25th and 75th percentiles, and the whiskers extend to the most extreme data points not considered outliers. Outliers are plotted as red crosses.
Figure 4
 
Thresholds across subjects of Experiment 1. The red line in the box is the median, the edges of the box are the 25th and 75th percentiles, and the whiskers extend to the most extreme data points not considered outliers. Outliers are plotted as red crosses.
From Figure 4, we see there is minimal variability in the response of subjects for happiness and neutral, even at the lowest of resolutions. For surprise, there is only a small variability in the lowest of the resolutions (15 × 10 pixels), demonstrating this category is recognized robustly and fast by all subjects in all other resolutions and is, hence, almost as readily recognized as happiness. These results favor the universality view of these two basic emotion categories. However, the same is not true for the other categories of facial expressions of emotion. 
As mentioned earlier, the universality hypothesis predicts that all subjects should have a similar response in each emotion category. In our results, this is the case for only surprise and happiness. We note that fear and anger had the largest time differences among different subjects. These observations were supported by two-sample F tests of the difference of variance. There was a consistent difference between the three expressions neutral, surprise, and happiness and the other four expressions anger, sadness, fear, and disgust across all resolutions (p < 0.0044). Neutral and happiness were consistently different (p < 0.0001). Neutral was different from surprise except in resolution 1/2 (p < 0.0034), and happiness was different from surprise except in resolution 1 and 1/4 (p < 0.0002). Sadness was not different from disgust in any resolution and was different from anger and fear only in resolution 1/2 (p < 0.0037). Disgust was different from anger and fear only in resolution 1 and 1/2 (p < 0.005). Jack, Garrod, Yu, Caldara, and Schyns (2012) report similar results for surprise and fear. 
Finally, the spatial frequencies contained in each resolution were estimated. Images were decomposed into four octaves, from 83 to 5 cycles/face width. The correlations between these high to low frequencies and the images at each of the resolutions were calculated. This process showed a direct correlation between image size and spatial frequencies and between these and subject responses (r = 0.6). This result demonstrates there is a significant correlation between time thresholds, image resolution, and image spatial frequencies. 
Experiment 2: Image matching
It is not known to what extent the recognition of emotion relies on processing low-level image features. An open mouth in surprise forms a salient image contrast that facilitates categorization. We wish to determine to what extent semantic categorization makes use of low-level image features and how semantic categorization differs from image matching. In other words, what part of the results in Experiment 1 are due to the cognitive structure we wish to study and how much to low-level image analysis? As other results suggest (Martinez, 2003), we hypothesize that low-level image features facilitate detection of happiness and surprise but that overall (low-level) image matching will lead to a different ordering and clustering of the studied categories compared with the results of Experiment 1, suggesting the results of Experiment 1 are mostly a result of the semantic analysis of their categories. 
Materials and methods
Subjects
Twenty human subjects (11 women; mean age 23.2 years, SD = 5.4) with normal or corrected-to-normal vision were drawn from the population of students and staff at The Ohio State University and received a small payment for their participation. Subjects had not participated in any related studies before. The laboratory set up was the same as in Experiment 1. The viewing distance was 50 cm. 
Stimuli
One hundred eighty photos were chosen from our database described earlier. For each individual, four emotive expressions and the neutral expression were selected. The four emotive expressions were chosen for each individual such that there were 24 different images per emotion per resolution, and each condition had a distinct set of stimuli. Images were processed the same way as described in Experiment 1
Design and procedure
Figure 5 shows the timeline for this experiment. A white cross was shown first for 500 ms. Then, three face images were shown together, one on top and two at bottom. The top image was placed 20 pixels above the center of the screen, whereas the bottom images were placed 20 pixels below and 20 pixels to the left and right of the center. The top face always expressed one of the six emotions of Experiment 1, and the same image was shown below, randomly placed on either the left or right, along with a neutral expression face. All three images were from the same individual and had the same resolution. The presentation time for the stimuli was calculated by the QUEST program. A random noise mask was presented for 500 ms and followed by the choices “left” and “right.” Subjects were asked to press the left/right arrow key on the keyboard to answer which bottom image was the same as the top. A blank screen followed for 500 ms. This process was repeated until the experiment concluded. 
Figure 5
 
Stimulus timeline of Experiment 2. A white fixation cross in the black background is shown for 500 ms. The stimulus is shown for x ms, where x is determined by QUEST, followed by a random noise mask for 500 ms. A 2AFC paradigm is used. After the subject's response, the screen goes blank for 500 ms, and the process is repeated. In a test trial, the top face is always emotive, whereas in a foil trial, the top face is always neutral.
Figure 5
 
Stimulus timeline of Experiment 2. A white fixation cross in the black background is shown for 500 ms. The stimulus is shown for x ms, where x is determined by QUEST, followed by a random noise mask for 500 ms. A 2AFC paradigm is used. After the subject's response, the screen goes blank for 500 ms, and the process is repeated. In a test trial, the top face is always emotive, whereas in a foil trial, the top face is always neutral.
As in Experiment 1, each emotion and resolution had its own independent QUEST process. Neutral was not used as a single category because of the nature of the experiment. The initial exposure time was set to 300 ms for all conditions. The β value was 0.9, and the guess rate was chance level, that is, 50%. The performance threshold p was set to 75%. The maximal exposure times were the same as in Experiment 1. The mental lapse rate was set to 0.05. 
Subjects were instructed to match the top image. They were naive to the design of the experiment. Subjects completed a short practice session of 6 trials in various resolutions prior to the test. The images used in the practice session were not used in the actual test. It is possible for subjects to notice that the top face is always emotive, and as a result, the task would be reduced to just picking the emotive bottom face without looking at the top face. To prevent this, we added 40 foil trials in which the top image displayed a neutral expression and subjects had to reject the emotive face. These foil trials contained all resolutions and various emotions, and the exposure time was uniformly 300 ms to ensure accuracy. They were spread out randomly in the experiment. The entire experiment lasted about 30 min. Subjects were given a break every 8 min. 
Results
The estimated thresholds (Ti) are shown in Table 2 and Figure 6a. Statistical differences for different resolutions in the same emotion category are marked with an asterisk in the figure. Empty entries in Table 2 (specified with a dash) indicate subjects were unable to reach 75% accuracy or the estimated threshold was beyond the maximal exposure time for that category and resolution. Statistical differences among emotions at the same resolution are shown in Figure 6b
Figure 6
 
Average thresholds of Experiment 2. (a) Exposure time thresholds. The asterisk indicates statistical difference (p ≤ 0.05) between the labeled point and the one to its left. (b) Statistical differences (p ≤ 0.05) among emotions at the same resolution. The two-sample Welch's t test was used for all statistical tests. In (b), the results of the pairwise t test were grouped by brackets.
Figure 6
 
Average thresholds of Experiment 2. (a) Exposure time thresholds. The asterisk indicates statistical difference (p ≤ 0.05) between the labeled point and the one to its left. (b) Statistical differences (p ≤ 0.05) among emotions at the same resolution. The two-sample Welch's t test was used for all statistical tests. In (b), the results of the pairwise t test were grouped by brackets.
Table 2
 
Average estimated thresholds of Experiment 2 in milliseconds. Notes: Standard deviations are in parentheses. A dash line indicates most subjects did not have a valid estimated threshold.
Table 2
 
Average estimated thresholds of Experiment 2 in milliseconds. Notes: Standard deviations are in parentheses. A dash line indicates most subjects did not have a valid estimated threshold.
Happiness Surprise Sadness Disgust Fear Anger
Resolution 1 9 (78) 85 (116) 249 (97) 271 (128) 184 (126) 236 (75)
Resolution 1/2 105 (94) 60 (76) 355 (111) 190 (117) 212 (94) 297 (99)
Resolution 1/4 165 (112) 97 (87) 281 (125) 305 (117) 236 (145) 221 (125)
Resolution 1/8 232 (167) 106 (101) 373 (145) 305 (128) 354 (174) 396 (142)
Resolution 1/16 564 (153) 288 (198)
The results show that subjects matched surprise expressions with the least exposure time, which was comparable to happiness at resolutions 1 and 1/2. But the distinction between these two categories was made clear after resolution 1/4 (p < 0.04). In comparison, the other expressions (fear, disgust, sadness, and anger) required more exposure time (p < 0.05; Figure 6b). The only significant statistical difference among these four expressions occurred at resolution 1/2, where fear and disgust had lower thresholds than sadness and anger (p < 0.03). Surprise and happiness were the only expressions successfully matched at the lowest of the resolutions (15 × 10 pixels), where the exposure time increased significantly compared with higher resolutions (p < 0.001; Figure 6a). 
Based on these results, we can classify the matching task into four groups: (a) surprise, (b) happiness, (c) fear and disgust, and (d) anger and sadness. This clustering is very different from that observed in Experiment 1, suggesting distinct processes are involved in matching and semantic categorization. As predicted, the only similarity between the results is in the fact that surprise and happiness are detected more readily than the other emotions. However, even here, surprise is detected more easily than happiness, whereas the opposite is true in semantic categorization. This becomes clearer at the lower resolutions. As the resolution decreases, the amount of information (and, hence, discriminant features) in the image diminishes. During image matching, low-level image features (e.g., those created by the wide open mouth and eyes in surprise) are used. In semantic classification, the categorical features of happiness are equally or better suited. The same holds for anger and sadness. Although these categories are matched less accurately than fear and surprise, this is not the case for their semantic categorization. In fact, in Experiment 1, sadness and disgust were more readily categorized at resolution 1/2, which corresponds to a different clustering from the one observed in this second experiment. 
Discussion
Little is still known about the hierarchical semantic analysis of facial expressions of emotion. There is evidence for a categorical distribution of resources (Martinez, 2011), but it is unclear which categories are computed first. The present work suggests a hierarchical structure based on which diagnostic information is used to identify each category. Facial expressions of emotion that can be successfully recognized in low-resolution images (and thus employ mostly low spatial frequencies) are computed earlier in the hierarchy. Categories that can be detected in only high-resolution images (and hence require high spatial frequencies or fine-grained information) are computed at a later time. This hierarchical analysis is not the same as that used in a simple matching task as that of Experiment 2. Here, it is important note that the 7AFC task used in the present work is challenging and that a smaller difference would be observed in easier tasks. Nevertheless, in preliminary work (Martinez & Du, 2010), we observed similar error patterns and the same hierarchical structure of the semantic analysis of facial expressions of emotion in 3AFC and 4AFC tasks. 
An alternative interpretation for these results is that the computational (cognitive) resources required to identify each category could be quite different (and potentially more or less taxing) than those used to recognize the others. 
The results summarized in the preceding paragraphs are inconsistent with current computational models of the recognition of facial expressions of emotion. An example is in studies of fear. Fear is perhaps the most studied emotion (Vuilleumier, 2005). The amygdala has been identified as a potential brain area where computations of this emotion take place (Adolphs, 2002, 2008; Calder et al., 2001; Morris, Öhman, & Dolan, 1999; Tsuchiya, Moradi, Felsen, Yamazaki, & Adolphs, 2009). In the context of evolution, fear is typically considered primal for survival. In this context, the recognition of the facial expression of fear has been assumed to involve early detection (Pessoa, Japee, & Ungerleider, 2005; Whalen et al., 2004) even when observed at a distance (or in low-resolution images), with some imaging studies favoring this view (Vuilleumier et al., 2003). The results of the present article suggest otherwise, showing that fear can be recognized in only high-resolution images (or when high spatial frequencies are present; Smith & Schyns, 2009) and that its computations are either relegated to a later time in the hierarchical semantic analysis or that these require more computationally taxing analyses than those used to identify other emotions. It is worth noting that some previous studies were concerned with perceptual detection of negative emotive images. Whalen et al. (2004), for example, used a backward-masking of happy and fearful eyes (using black and white drawing with clearly identified sclera) and neutral faces and showed a higher, faster amygdala activation for fearful eyes. This effect could be related to the role of the amygdala in attention (e.g., perceptual change; Adolphs, 2008). In contrast, our results are concerned with the semantic categorization of six prototypical facial expressions of emotion. One possibility is that the dorsal pathway defined earlier biases attention or is involved in early decision making (Morris et al., 1999), making the perceptual change readily visible but not its semantic categorization. 
Similarly, anger is one of the emotion categories that require a larger exposure time, although it can be recognized at lower resolutions than fear. Facial expressions of anger have been shown to be salient in a crowd (Fox et al., 2000), and, as with fear, it has been argued that the recognition of anger is essential to avoid or survive fighting (Ekman, 1999). However, the results reported above, along with others (Du & Martinez, 2011; Smith & Schyns, 2009), demonstrate that anger is not reliably detected from facial expressions at a distance. Even in high-resolution images, anger is typically mistaken for sadness and disgust, which are not good indicators for survival. It is more likely that body position plays a major role in the detection of anger and fear, especially from a distance. Some results have shown that when there is a contradictory signal between the body and face, the emotional analysis of the body position can subsume that of the face (Aviezer et al., 2008; Meeren, van Heijnsbergen, & de Gelder, 2005). 
Disgust, arguably the most artificial and culturally bound of the emotions studied here (Rozin, Haidt, & McCauley, 2008), has also been classified as essential for survival, for example, for germ avoidance. Once again, though, the results of the present article and those of several others (Du & Martinez, 2011; Smith & Schyns, 2009) form a different picture, in which its facial expression is poorly recognized even in high-resolution images. We note that the facial expression of disgust (as that of anger) was recognized at resolutions of 30 × 20, but the exposure time required to reach 57% accuracy was 376 ms. As in the previous categories discussed above, the results in the present article suggest a later computation of disgust or a more computationally taxing analysis. Facial expressions of sadness require similar exposure time to those of disgust (Figure 4), and its computations also seem to be relegated at a later time in the hierarchical analysis of facial expressions of emotion. 
In contrast, happiness and surprise were recognized much faster than the negative emotions discussed above. These facial expressions are generally recognized using low spatial frequencies (Smith & Schyns, 2009) and more reliably than the others (Du & Martinez, 2011). Fear is found to be often confused for surprise in normal subjects (Du & Martinez, 2011) but not vice versa. This one-way confusion is also observed in patients whose fear recognition is impaired because of a brain lesion (Calder, 1996). Going back to the potential survival role of early detection of fear, we note that fear may not be directly expressed using a facial expression of fear. One hypothesis is that subjects express surprise before displaying a facial expression of fear or the compound emotion of fearfully surprised (Martinez & Du, 2012). In this context, recognition of surprise may be more primal than the recognition of fear itself. This argument is in line with the involvement of amygdaloid dopamine in regulating surprise in fear (Iordanova, 2010). 
Batty and Taylor (2003) reported later N170 latencies for negative emotions (i.e., fear, anger, disgust, and sadness) than for surprise and happiness. These results are consistent with those reported in the present article. One possibility is that subcortical computations (e.g., involving a more dorsal pathway) are then combined with computations from the ventral pathway before making a decision (Adolphs, 2002; Morris et al., 1999; Tsuchiya et al., 2009). The need for combining these two computations may justify the significantly longer exposure times required for successful classification of negative emotions but does not explain why they are classified less accurately and cannot be detected in low-resolution images. Sugase, Yamane, Ueno, and Kawano (1999) identified neurons in macaque monkeys that respond at two distinct latencies. The former of these responses was tuned to global features of the face, whereas the latter responded to more fine-grained information. If similar neurons were found in the human brain, it could justify why facial expressions that can be recognized using the global information available in low-resolution images are recognized faster than those requiring a more fine-tuned analysis. It would still be unclear though, why expressions requiring a finer analysis are recognized more poorly than those that mostly use global information. Another possibility is that the human visual system has a very good, specialized processing of configural cues for face analysis and a less accurate processing of the local shape (Martinez & Du, 2012). This is because configural cues can be readily and robustly extracted from high- and low-resolution images, but detailed shape changes are visible only in higher-resolution images (Du & Martinez, 2011). A good example would be the ease with which we can detect smiles at any resolution. 
Similarly, van Rijsbergen and Schyns (2009) and Schyns, Petro, and Smith (2009) suggest facial expressions of emotion may require a three-staged categorization process. In the first stage, local features are detected, followed by a global analysis and a final local, fine-grained examination. Combined with the conclusions of Batty and Taylor (2003) summarized in the preceding paragraph, the results of the present study suggest that some categories of emotion require a shorter analysis of their facial expressions than others. Although some may be recognized using low spatial frequencies, others cannot be correctly classified without higher-order frequencies. It is possible that some emotion categories involve the activation of more brain regions than others and that this is related to the difficulty in interpreting their class. 
In summary, the results reported in the present study suggest that diverse computational processes are involved in the recognition of different emotion categories. Facial expressions of emotion that can be recognized in low spatial frequencies are readily and robustly categorized. Those that need high spatial frequencies for categorization involve more taxing computations and are more prone to errors. One possibility is that the pathways leading to common emotion brain areas differ from category to category (Smith & Schyns, 2009; Vuilleumier et al., 2003). Another possibility is that different areas are used to analyze each category semantically (Martinez & Du, 2012). In addition, it is also possible that different areas are reached using different pathways. 
Acknowledgments
We thank the reviewers for constructive comments on an earlier version of this article. This research was supported in part by the National Institutes of Health under Grants R01-EY-020834 and R21-DC-011081. 
Commercial relationships: none. 
Corresponding author: Aleix M. Martinez. 
Email: aleix@ece.osu.edu. 
Address: The Ohio State University, Columbus, OH. 
References
Adolphs R. (2002). Neural systems for recognizing emotion. Current Opinion in Neurobiology, 12, 169–177. [CrossRef] [PubMed]
Adolphs R. (2008). Fear, faces, and the human amygdala. Current Opinion in Neurobiology, 18, 166–172. [CrossRef] [PubMed]
Allison T. Puce A. McCarthy G. (2000). Social perception from visual cues: Role of the STS region. Trends in Cognitive Sciences, 4, 267–278. [CrossRef] [PubMed]
Antes J. R. Penland J. G. Metzger R. L. (1981). Processing global information in briefly presented pictures. Psychological Research, 43, 277–292. [CrossRef] [PubMed]
Aristotle. (1936). Aristotle: Minor works. Cambridge, MA: Harvard University Press.
Aviezer H. Hassin R. R. Ryan J. Grady C. Susskind J. Anderson A. (2008). Angry, disgusted, or afraid? Studies on the malleability of emotion perception. Psychological Science, 19, 724–732. [CrossRef] [PubMed]
Batty M. Taylor M. J. (2003). Early processing of the six basic facial emotional expressions. Cognitive Brain Research, 17, 613–620. [CrossRef] [PubMed]
Biederman I. (1981). On the semantics of a glance at a scene. In Kubovy M. Pomerantz J. R. (Eds.), Perceptual organization (pp. 213–263). Hillsdale, NJ: Lawrence Erlbaum.
Calder A. J. (1996). Facial emotion recognition after bilateral amygdala damage: Differentially severe impairment of fear. Cognitive Neuropsychology, 13, 699–745. [CrossRef]
Calder A. J. Lawrence A. D. Young A. W. (2001). Neuropsychology of fear and loathing. Nature Reviews Neuroscience, 2, 352–363. [CrossRef] [PubMed]
Davis M. (1992). The role of the amygdala in fear and anxiety. Annual Review of Neuroscience, 15, 353–375. [CrossRef] [PubMed]
Du S. Martinez A. M. (2011). The resolution of facial expressions of emotion. Journal of Vision, 11 (13): 24, 1–13, http://www.journalofvision.org/content/11/13/24, doi:10.1167/11.13.24 [PubMed] [Article]. [CrossRef] [PubMed]
Du S. Tao Y. Martinez A. M. (2013). Compound facial expressions of emotion: Database and baseline analysis. Manuscript submitted for publication.
Ekman P. (1992). An argument for basic emotions. Cognition and Emotion, 6, 169–200. [CrossRef]
Ekman P. (1999). Facial expressions. In Dalgleish T. Power T. (Eds.), The handbook of cognition and emotion (pp. 301–320). Sussex, UK: John Wiley & Sons, Ltd.
Ekman P. Friesen W. V. (1976). Pictures of facial affect. Palo Alto, CA: Consulting Psychologists Press.
Ekman P. Friesen W. V. (1978). Facial action coding system: A technique for the measurement of facial movement. Palo Alto, CA: Consulting Psychologists Press.
Fox E. Lester V. Russo R. Bowles R. J. Pichler A. Dutton K. (2000). Facial expressions of emotion: Are angry faces detected more efficiently? Cognition & Emotion, 14, 61–92. [CrossRef] [PubMed]
Greene M. R. Oliva A. (2009). The briefest of glances: The time course of natural scene understanding. Psychological Science, 20, 464–472. [CrossRef] [PubMed]
Grill-Spector K. Kushnir T. Hendler T. Malach R. (2000). The dynamics of object-selective activation correlate with recognition performance in humans. Nature Neuroscience, 3, 837–843. [CrossRef] [PubMed]
Haxby J. V. Hoffman E. A. Gobbini M. I. (2000). The distributed human neural system for face perception. Trends in Cognitive Sciences, 4, 223–233. [CrossRef] [PubMed]
Iordanova M. D. (2010). Dopamine transmission in the amygdala modulates surprise in an aversive blocking paradigm. Behavioral Neuroscience, 124, 780–788. [CrossRef] [PubMed]
Izard C. E. (1992). Basic emotions, relations among emotions, and emotion-cognition relations. Psychological Review, 99, 561–565. [CrossRef] [PubMed]
Izard C. E. (2009). Emotion theory and research: Highlights, unanswered questions, and emerging issues. Annual Review of Psychology, 60, 1–25. [CrossRef] [PubMed]
Jack R. E. Garrod O. G. B. Yu H. Caldara R. Schyns P. G. (2012). Facial expressions of emotion are not culturally universal. Proceedings of the National Academy of Sciences, 109, 7241–7244. [CrossRef]
Killgore W. D. S. Yurgelun-Todd D. A. (2004). Activation of the amygdala and anterior cingulate during nonconscious processing of sad versus happy faces. NeuroImage, 21, 1215–1223. [CrossRef] [PubMed]
Kirouac G. Dore F. Y. (1984). Judgment of facial expressions of emotion as a function of exposure time. Perceptual and Motor Skills, 59, 147–150. [CrossRef] [PubMed]
LeDoux J. E. (1995). Emotion: Clues from the brain. Annual Review of Psychology, 46, 209–235. [CrossRef] [PubMed]
Maclean P. D. (1949). Psychosomatic disease and the “visceral brain,” recent developments bearing on the Papez theory of emotion. Psychosomatic Medicine, 11, 338–353. [CrossRef] [PubMed]
Maclean P. D. (1952). Some psychiatric implications of physiological studies on frontotemporal portion of limbic system (visceral brain). Electroencephalography & Clinical Neurophysiology, 4, 407–418. [CrossRef]
Martinez A. M. (2003). Matching expression variant faces. Vision Research, 43, 1047–1060. [CrossRef] [PubMed]
Martinez A. M. (2011). Deciphering the face. IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 7–12.
Martinez A. M. Du S. (2010). How fast can we recognize facial expressions of emotion? Journal of Vision, 10 (7): 607, http://www.journalofvision.org/content/10/7/607, doi:10.1167/10.7.607 [Abstract]. [CrossRef]
Martinez A. M. Du S. (2012). A model of the perception of facial expressions of emotion by humans: Research overview and perspectives. Journal of Machine Learning Research, 13, 1589–1608. [PubMed]
Meeren H. K. M. van Heijnsbergen C. C. R. J. de Gelder B. (2005). Rapid perceptual integration of facial expression and emotional body language. Proceedings of the National Academy of Sciences of the United States of America, 102, 16518–16523. [CrossRef] [PubMed]
Morris J. S. Öhman A. Dolan R. J. (1999). A subcortical pathway to the right amygdala mediating “unseen” fear. Proceedings of the National Academy of Sciences, 96, 1680–1685. [CrossRef]
Papez J. W. (1937). A proposed mechanism of emotion. Archives of Neurology and Psychiatry, 38, 725–743. [CrossRef]
Pessoa L. Japee S. Ungerleider L. G. (2005). Visual awareness and the detection of fearful faces. Emotion, 5, 243–247. [CrossRef] [PubMed]
Phan K. L. Wager T. Taylor S. F. Liberzon I. (2002). Functional neuroanatomy of emotion: A meta-analysis of emotion activation studies in PET and fMRI. NeuroImage, 16, 331–348. [CrossRef] [PubMed]
Phillips M. L. Young A. W. Senior C. Brammer M. Andrew C. Calder A. J. (1997). A specific neural substrate for perceiving facial expressions of disgust. Nature, 389, 495–498. [CrossRef] [PubMed]
Riesenhuber M. Poggio T. (2000). Models of object recognition. Nature Neuroscience, 3, 1199–1204. [CrossRef] [PubMed]
Roesch E. B. Sander D. Mumenthaler C. Kerzel D. Scherer K. R. (2010). Psychophysics of emotion: The QUEST for emotional attention. Journal of Vision, 10 (3): 4, 1–9, http://www.journalofvision.org/content/10/3/4, doi:10.1167/10.3.4 [PubMed] [Article]. [CrossRef] [PubMed]
Rotshtein P. Richardson M. P. Winston J. S. Kiebel S. J. Vuilleumier P. Eimer M. (2010). Amygdala damage affects event-related potentials for fearful faces at specific time windows. Human Brain Mapping, 31, 1089–1105. [CrossRef] [PubMed]
Rozin P. Haidt J. McCauley C. R. (2008). Disgust. In Lewis M. Haviland-Jones J. M. Barrett L. F. (Eds.), Handbook of emotions ( 3rd ed., pp. 757–776). New York: Guilford Press.
Russell J. A. (1994). Is there universal recognition of emotion from facial expression? A review of the cross-cultural studies. Psychological Bulletin, 115, 102–141. [CrossRef] [PubMed]
Schyns P. G. Petro L. S. Smith M. L. (2009). Transmission of facial expressions of emotion co-evolved with their efficient decoding in the brain: Behavioral and brain evidence. PLoS ONE, 4, e5625. [CrossRef] [PubMed]
Smith F. W. Schyns P. G. (2009). Smile through your fear and sadness: Transmitting and identifying facial expression signals over a range of viewing distances. Psychology Science, 20, 1202–1208. [CrossRef]
Sprengelmeyer R. Rausch M. Eysel U. T. Przuntek H. (1998). Neural structures associated with recognition of facial expressions of basic emotions. Proceedings of the Royal Society B: Biological Sciences, 265, 1927–1931. [CrossRef]
Sugase Y. Yamane S. Ueno S. Kawano K. (1999). Global and fine information coded by single neurons in the temporal visual cortex. Nature, 400, 869–873. [CrossRef] [PubMed]
Thorpe S. Fize D. Marlot C. (1996). Speed of processing in the human visual system. Nature, 381, 520–522. [CrossRef] [PubMed]
Tsuchiya N. Moradi F. Felsen C. Yamazaki M. Adolphs R. (2009). Intact rapid detection of fearful faces in the absence of the amygdala. Nature Neuroscience, 12, 1224–1225. [CrossRef] [PubMed]
van Rijsbergen N. J. Schyns P. G. (2009). Dynamics of trimming the content of face representations for categorization in the brain. PLoS Computational Biology, 5, e1000561. [CrossRef] [PubMed]
Vuilleumier P. (2005). How brains beware: Neural mechanisms of emotional attention. Trends in Cognitive Sciences, 9, 585–594. [CrossRef] [PubMed]
Vuilleumier P. Armony J. L. Driver J. Dolan R. J. (2003). Distinct spatial frequency sensitivities for processing faces and emotional expressions. Nature Neuroscience, 6, 624–631. [CrossRef] [PubMed]
Watson A. B. Pelli D. G. (1983). QUEST: A Bayesian adaptive psychometric method. Perception & Psychophysics, 33, 113–120. [CrossRef] [PubMed]
Whalen P. J. Kagan J. Cook R. G. Davis F. C. Kim H. Polis S. (2004). Human amygdala responsivity to masked fearful eye whites. Science, 17, 2061. [CrossRef]
Appendix: Facial Action Coding System coding of our database
The database of facial expressions of emotion used in the present article is described in detail in Du, Tao, and Martinez (2013). Here, we summarize the Facial Action Coding System of Ekman and Friesen (1978) for the images used in the experiments detailed above. In particular, we specify the facial muscle activations, called Action Units (AUs), and the intensities at which they are used. These are in Tables A1A6. The AUs with an asterisk in these tables conform to prototypical ones as given by Ekman and Friesen (1978). 
The intensity of activation of each AU is done using a five-grade scale (Ekman & Friesen, 1976): a (showing a trace), b (slightly), c (markedly), d (severely), and e (extremely). To compute average activation (and standard deviation) for the images in our experiments, we equate these five intensity levels to the following integers: 1 (a), 2 (b), 3 (c), 4 (d), and 5 (e). The results are in Tables A1A6. The percentage under the intensity levels in the tables is relative to the total number of people who used that AU. 
Table A1
 
Facial Action Coding System coding of the images displaying a happy expression. Notes: “% of images” is the percentage of images where the AU is used. Under each intensity level (a–e), the percentage refers to how often this intensity occurs. Average and standard deviation (stdev) are computed by assigning numerical values of 1–5 for each of the intensities a–e.
Table A1
 
Facial Action Coding System coding of the images displaying a happy expression. Notes: “% of images” is the percentage of images where the AU is used. Under each intensity level (a–e), the percentage refers to how often this intensity occurs. Average and standard deviation (stdev) are computed by assigning numerical values of 1–5 for each of the intensities a–e.
AU % of images Average intensity Intensity stdev a b c d e
12* 100% 4 .37 0 0 12% 86% 3%
25* 100% 4 .16 0 0 2% 98% 1%
6* 52% 1.3 .68 74% 21% 2% 3% 0
Table A2
 
AU analysis of sadness.
Table A2
 
AU analysis of sadness.
AU % of images Average intensity Intensity stdev a b c d e
4* 97% 2.9 .64 3% 19% 66% 13% 0
15* 81% 2.4 .9 16% 34% 38% 11% 0
1* 58% 2.2 .84 24% 34% 39% 3% 0
11* 45% 1.9 .59 22% 65% 13% 0 0
25* 38% 2.7 .47 0 31% 69% 0 0
17* 32% 2.4 .76 11% 39% 45% 5% 0
6* 31% 1.5 .5 46% 54% 0 0 0
20 16% 1.9 .74 26% 63% 5% 5% 0
43 13% 2.6 .96 0 69% 13% 13% 6%
16 2% 2.5 .71 0 0 50% 50% 0
Table A3
 
AU analysis of fear.
Table A3
 
AU analysis of fear.
AU % of images Average intensity Intensity stdev a b c d e
1* 93% 2.4 .65 8% 42% 49% 1% 0
4* 88% 2.7 .63 5% 28% 63% 4% 0
5* 87% 2.9 .93 5% 35% 30% 30% 1
20* 83% 2.4 .87 17% 30% 44% 8% 0
25* 75% 2.6 .57 3% 32% 63% 1% 0
2* 69% 2.4 .61 5% 51% 43% 1% 0
25+26* 24% 2.3 .55 0 69% 28% 3% 0
16 3% 1.3 .5 75% 25% 0 0 0
15 3% 2.5 1 25% 0 75% 0 0
14 2% 2 1.4 50% 0 50% 0 0
17 2% 1 0 100% 0 0 0 0
11 2% 1.5 .7 50% 50% 0 0 0
Table A4
 
AU analysis of anger.
Table A4
 
AU analysis of anger.
AU % of images Average intensity Intensity stdev a b c d e
4* 100% 3.2 .78 2% 17% 44% 37% 1%
24* 90% 2.3 .82 17% 44% 32% 6% 0
17* 68% 2.4 .91 17% 34% 37% 12% 0
7* 57% 1.9 .77 37% 43% 19% 1% 0
18 18% 1.3 .66 76% 14% 10% 0 0
11 13% 1.9 .70 27% 53% 20% 0 0
14 11% 1.9 .28 8% 92% 0 0 0
10* 5% 1.5 .55 50% 50% 0 0 0
5* 4% 1.2 .45 80% 20% 0 0 0
Table A5
 
AU analysis of surprise.
Table A5
 
AU analysis of surprise.
AU % of images Average intensity Intensity stdev a b c d e
1* 100% 2.9 .43 0 15% 80% 4% 0
2* 100% 2.9 .45 0 18% 78% 4% 0
25+26* 93% 3.1 .66 0 17% 55% 28% 0
5* 53% 1.8 .87 46% 35% 14% 5% 0
27* 7% 1 0 100% 0 0 0 0
25* 7% 4 0 0 0 0 100% 0
Table A6
 
AU analysis of disgust.
Table A6
 
AU analysis of disgust.
AU % of images Average intensity Intensity stdev a b c d e
10* 98% 3.1 .62 2% 9% 66% 23% 0
9* 79% 3 .84 5% 19% 45% 31% 0
17* 68% 2.8 .83 8% 22% 52% 17% 0
4 52% 2.3 .57 5% 56% 39% 0 0
24 28% 2 .59 15% 67% 18% 0 0
18 22% 1.3 .56 69% 27% 4% 0 0
25* 12% 2.2 .7 14% 50% 36% 0 0
6 4% 1.6 .89 60% 20% 20% 0 0
Figure 1
 
Facial expressions from left to right: happiness, sadness, fear, anger, surprise, disgust, and neutral. Resolutions from top to bottom: 1 (240 × 160 pixels), 1/2 (120 × 80 pixels), 1/4 (60 × 40 pixels), 1/8 (30 × 20 pixels), and 1/16 (15 × 10 pixels).
Figure 1
 
Facial expressions from left to right: happiness, sadness, fear, anger, surprise, disgust, and neutral. Resolutions from top to bottom: 1 (240 × 160 pixels), 1/2 (120 × 80 pixels), 1/4 (60 × 40 pixels), 1/8 (30 × 20 pixels), and 1/16 (15 × 10 pixels).
Figure 2
 
Stimulus timeline of Experiment 1. A white fixation cross in the black background is shown for 500 ms. The stimulus is shown for x ms, where x is determined by QUEST, followed by a random noise mask for 500 ms. A 7AFC paradigm is used. After the subject's response, the screen goes blank for 500 ms, and the process is repeated.
Figure 2
 
Stimulus timeline of Experiment 1. A white fixation cross in the black background is shown for 500 ms. The stimulus is shown for x ms, where x is determined by QUEST, followed by a random noise mask for 500 ms. A 7AFC paradigm is used. After the subject's response, the screen goes blank for 500 ms, and the process is repeated.
Figure 3
 
Average thresholds of Experiment 1. (a) Exposure time thresholds. The asterisk indicates statistical difference (p ≤ 0.05) between the labeled point and the one to its left. (b) Statistical differences (p ≤ 0.05) among emotions at the same resolution. The two-sample Welch's t test was used for all statistical tests. In (b), the results of a pairwise t test were grouped by brackets.
Figure 3
 
Average thresholds of Experiment 1. (a) Exposure time thresholds. The asterisk indicates statistical difference (p ≤ 0.05) between the labeled point and the one to its left. (b) Statistical differences (p ≤ 0.05) among emotions at the same resolution. The two-sample Welch's t test was used for all statistical tests. In (b), the results of a pairwise t test were grouped by brackets.
Figure 4
 
Thresholds across subjects of Experiment 1. The red line in the box is the median, the edges of the box are the 25th and 75th percentiles, and the whiskers extend to the most extreme data points not considered outliers. Outliers are plotted as red crosses.
Figure 4
 
Thresholds across subjects of Experiment 1. The red line in the box is the median, the edges of the box are the 25th and 75th percentiles, and the whiskers extend to the most extreme data points not considered outliers. Outliers are plotted as red crosses.
Figure 5
 
Stimulus timeline of Experiment 2. A white fixation cross in the black background is shown for 500 ms. The stimulus is shown for x ms, where x is determined by QUEST, followed by a random noise mask for 500 ms. A 2AFC paradigm is used. After the subject's response, the screen goes blank for 500 ms, and the process is repeated. In a test trial, the top face is always emotive, whereas in a foil trial, the top face is always neutral.
Figure 5
 
Stimulus timeline of Experiment 2. A white fixation cross in the black background is shown for 500 ms. The stimulus is shown for x ms, where x is determined by QUEST, followed by a random noise mask for 500 ms. A 2AFC paradigm is used. After the subject's response, the screen goes blank for 500 ms, and the process is repeated. In a test trial, the top face is always emotive, whereas in a foil trial, the top face is always neutral.
Figure 6
 
Average thresholds of Experiment 2. (a) Exposure time thresholds. The asterisk indicates statistical difference (p ≤ 0.05) between the labeled point and the one to its left. (b) Statistical differences (p ≤ 0.05) among emotions at the same resolution. The two-sample Welch's t test was used for all statistical tests. In (b), the results of the pairwise t test were grouped by brackets.
Figure 6
 
Average thresholds of Experiment 2. (a) Exposure time thresholds. The asterisk indicates statistical difference (p ≤ 0.05) between the labeled point and the one to its left. (b) Statistical differences (p ≤ 0.05) among emotions at the same resolution. The two-sample Welch's t test was used for all statistical tests. In (b), the results of the pairwise t test were grouped by brackets.
Table 1
 
Average estimated thresholds of Experiment 1 in milliseconds. Notes: Standard deviations are in parentheses. A dash line indicates most subjects did not have a valid estimated threshold.
Table 1
 
Average estimated thresholds of Experiment 1 in milliseconds. Notes: Standard deviations are in parentheses. A dash line indicates most subjects did not have a valid estimated threshold.
Neutral Happiness Surprise Sadness Disgust Fear Anger
Resolution 1 9 (10) 12 (4) 13 (4) 68 (73) 77 (50) 101 (100) 122 (106)
Resolution 1/2 10 (12) 13 (4) 16 (10) 75 (53) 83 (46) 122 (105) 203 (161)
Resolution 1/4 10 (14) 17 (5) 16 (7) 187 (101) 186 (126) 175 (106) 173 (130)
Resolution 1/8 52 (103) 27 (8) 32 (36) 310 (201) 449 (272) 251 (215)
Resolution 1/16 51 (115) 106 (58) 167 (204)
Table 2
 
Average estimated thresholds of Experiment 2 in milliseconds. Notes: Standard deviations are in parentheses. A dash line indicates most subjects did not have a valid estimated threshold.
Table 2
 
Average estimated thresholds of Experiment 2 in milliseconds. Notes: Standard deviations are in parentheses. A dash line indicates most subjects did not have a valid estimated threshold.
Happiness Surprise Sadness Disgust Fear Anger
Resolution 1 9 (78) 85 (116) 249 (97) 271 (128) 184 (126) 236 (75)
Resolution 1/2 105 (94) 60 (76) 355 (111) 190 (117) 212 (94) 297 (99)
Resolution 1/4 165 (112) 97 (87) 281 (125) 305 (117) 236 (145) 221 (125)
Resolution 1/8 232 (167) 106 (101) 373 (145) 305 (128) 354 (174) 396 (142)
Resolution 1/16 564 (153) 288 (198)
Table A1
 
Facial Action Coding System coding of the images displaying a happy expression. Notes: “% of images” is the percentage of images where the AU is used. Under each intensity level (a–e), the percentage refers to how often this intensity occurs. Average and standard deviation (stdev) are computed by assigning numerical values of 1–5 for each of the intensities a–e.
Table A1
 
Facial Action Coding System coding of the images displaying a happy expression. Notes: “% of images” is the percentage of images where the AU is used. Under each intensity level (a–e), the percentage refers to how often this intensity occurs. Average and standard deviation (stdev) are computed by assigning numerical values of 1–5 for each of the intensities a–e.
AU % of images Average intensity Intensity stdev a b c d e
12* 100% 4 .37 0 0 12% 86% 3%
25* 100% 4 .16 0 0 2% 98% 1%
6* 52% 1.3 .68 74% 21% 2% 3% 0
Table A2
 
AU analysis of sadness.
Table A2
 
AU analysis of sadness.
AU % of images Average intensity Intensity stdev a b c d e
4* 97% 2.9 .64 3% 19% 66% 13% 0
15* 81% 2.4 .9 16% 34% 38% 11% 0
1* 58% 2.2 .84 24% 34% 39% 3% 0
11* 45% 1.9 .59 22% 65% 13% 0 0
25* 38% 2.7 .47 0 31% 69% 0 0
17* 32% 2.4 .76 11% 39% 45% 5% 0
6* 31% 1.5 .5 46% 54% 0 0 0
20 16% 1.9 .74 26% 63% 5% 5% 0
43 13% 2.6 .96 0 69% 13% 13% 6%
16 2% 2.5 .71 0 0 50% 50% 0
Table A3
 
AU analysis of fear.
Table A3
 
AU analysis of fear.
AU % of images Average intensity Intensity stdev a b c d e
1* 93% 2.4 .65 8% 42% 49% 1% 0
4* 88% 2.7 .63 5% 28% 63% 4% 0
5* 87% 2.9 .93 5% 35% 30% 30% 1
20* 83% 2.4 .87 17% 30% 44% 8% 0
25* 75% 2.6 .57 3% 32% 63% 1% 0
2* 69% 2.4 .61 5% 51% 43% 1% 0
25+26* 24% 2.3 .55 0 69% 28% 3% 0
16 3% 1.3 .5 75% 25% 0 0 0
15 3% 2.5 1 25% 0 75% 0 0
14 2% 2 1.4 50% 0 50% 0 0
17 2% 1 0 100% 0 0 0 0
11 2% 1.5 .7 50% 50% 0 0 0
Table A4
 
AU analysis of anger.
Table A4
 
AU analysis of anger.
AU % of images Average intensity Intensity stdev a b c d e
4* 100% 3.2 .78 2% 17% 44% 37% 1%
24* 90% 2.3 .82 17% 44% 32% 6% 0
17* 68% 2.4 .91 17% 34% 37% 12% 0
7* 57% 1.9 .77 37% 43% 19% 1% 0
18 18% 1.3 .66 76% 14% 10% 0 0
11 13% 1.9 .70 27% 53% 20% 0 0
14 11% 1.9 .28 8% 92% 0 0 0
10* 5% 1.5 .55 50% 50% 0 0 0
5* 4% 1.2 .45 80% 20% 0 0 0
Table A5
 
AU analysis of surprise.
Table A5
 
AU analysis of surprise.
AU % of images Average intensity Intensity stdev a b c d e
1* 100% 2.9 .43 0 15% 80% 4% 0
2* 100% 2.9 .45 0 18% 78% 4% 0
25+26* 93% 3.1 .66 0 17% 55% 28% 0
5* 53% 1.8 .87 46% 35% 14% 5% 0
27* 7% 1 0 100% 0 0 0 0
25* 7% 4 0 0 0 0 100% 0
Table A6
 
AU analysis of disgust.
Table A6
 
AU analysis of disgust.
AU % of images Average intensity Intensity stdev a b c d e
10* 98% 3.1 .62 2% 9% 66% 23% 0
9* 79% 3 .84 5% 19% 45% 31% 0
17* 68% 2.8 .83 8% 22% 52% 17% 0
4 52% 2.3 .57 5% 56% 39% 0 0
24 28% 2 .59 15% 67% 18% 0 0
18 22% 1.3 .56 69% 27% 4% 0 0
25* 12% 2.2 .7 14% 50% 36% 0 0
6 4% 1.6 .89 60% 20% 20% 0 0
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×