Abstract
Numerous studies have found that, although people can rapidly extract average emotions from face sequences through ensemble perception, they have poor representations of the individual faces in the sequence. This might be because the speed of the sequence exceeds the temporal processing capacity of individual faces. In this study, we manipulated the presentation duration of individual faces in a rapid serial visual presentation (RSVP) paradigm to investigate the transition from attention (to individual faces) to ensemble coding (of the entire sequence). Participants (N=31) were instructed to fixate on a central cross and view faces, morphed from sad to angry, in an RSVP stream. Each face appeared on screen for 167ms, 333ms, or 500ms, and the entire stream lasted for 2s. After a 1s interval, participants responded using a method of adjustment. In the “target” condition, participants attended to and reported the emotion of the ‘target’ face within the stream. In the “average” condition, they reported the average face of the stream. Multiple linear regression analysis was used to predict the response face based on the target face and the average face. In the “target” condition, with attention to the target and an increase in presentation duration, participants became more accurate at reporting the target face (slope b=0.21, p<0.001), and the average emotion contributed less to the emotion decision (b=−0.20, p<0.001). However, in the “average” condition, although their response was significantly driven by the ensemble emotion (b=1.29, p<0.001), longer presentation duration did not alter performance. A control condition with inverted faces confirmed that these effects were not purely results of low-level feature processing. These results suggest that attention to a ‘target’ with sufficient processing time improved individual face perception and suppressed non-target faces, thus relying less on ensemble perception. Our study sheds light on the mechanisms of attention and ensemble perception.