Abstract
When inferring others’ emotions, humans often use contextual information (e.g. other people, scene information, etc.) in addition to facial expressions. Previous research has found that observers can accurately infer and track others’ emotions by using only contextual information (Inferential Emotion Tracking). However, it’s unknown whether using contextual information is what leads to accurate emotion recognition, even when faces are available. Here, we measured human observers’ gaze while they continuously inferred the emotion (valence-arousal) of a character in a video, in real-time. Observers completed either the context-only condition, which had the target character blurred out while the context (background scene) remained visible, or the ground-truth condition, where no visual information was blurred out. We calculated between-subject agreement in observers' gaze by calculating the pair-wise Pearson correlation between observers. We found that observers had lower intersubject gaze correlations during the context-only compared to the ground-truth condition (t(51) = 7.37, p < 0.001). This may be an unsurprising consequence of the reduced information and increased difficulty in the context-only condition. Interestingly, inferential emotion-tracking performance in the context-only condition was significantly correlated with context-only intersubject gaze correlations (spearman r = 0.535, p < 0.01), but performance in the ground-truth condition was not significantly correlated with ground-truth intersubject gaze correlations (spearman r = 0.317, p = 0.243). This indicates that knowing where to look in the context leads to more accurate inferences about others' emotions. Additionally, we found observers in the ground-truth condition who had high intersubject gaze correlations with observers in the context-only condition had higher emotion-tracking performance in the ground-truth condition (p = 0.0072, permutation test). Our approach provides a means of identifying emotionally relevant information in the context and suggests that individuals who know where emotional clues are in the context are more accurate at emotion recognition, even when faces are available.