Abstract
Emotion recognition is a core perceptual ability critical for social functioning. It is widely assumed that recognizing facial expressions predominantly determines perceived emotion, and contextual influences only coarsely modulate or disambiguate interpreted faces. Using a novel method, ‘Inferential Emotion Tracking’, we isolated and quantified the contribution of visual context versus face and body information in dynamic emotion recognition. Independent groups of observers tracked the valence and arousal of a chosen character in a silent Hollywood movie clip in 3 different conditions in real-time: 1) fully-informed, where everything in the video was visible. 2) character-only, where the face and body of the target character were visible but the visual context (everything else except the target) was masked and invisible. 3) context-only, where the face and body of the target were masked and invisible but the context was visible. Using regression models, we found that context-only affect ratings contributed a significant amount of unique variance (14.6%) to fully-informed ratings, comparable to that by character-only ratings (20.5%; p > 0.05). Importantly, we replicated the essential contribution of context with a different set of movie clips in which no other character is present (context: 14.4%, character: 20.5%), and another different set of non-Hollywood videos (e.g. home videos, documentaries; context: 23.2%, character: 17.8%). A meta-analysis over all 34 videos shows that context contributed as much as the face and body (p > 0.05). Strikingly, temporal cross-correlation analyses (with milliseconds precision) revealed that emotion information from context is available just as fast as emotion from faces. Furthermore, when observers tracked the emotion of the target using discrete emotion categories, the context still contributed significantly (12%), although less than the character did (35%; p < 0.01). Our results demonstrate that emotion recognition is, at its heart, an issue of context as much as it is about faces.