Abstract
People can rapidly extract average emotions from face sequences by ensemble coding. However, they have poor representations of individual faces in the sequence. This might be because the fast speed of the sequence exceeds our temporal capability of processing individual faces in the sequence. In this study, we manipulated the presentation duration of individual faces in a rapid serial visual presentation (RSVP) paradigm to investigate the transition from perception of individual faces to ensemble coding of the entire sequence. Participants (N=31) were instructed to fixate on a central cross and view faces, morphed from sad to angry, in an RSVP stream lasting 2 s. Each face appeared on screen for 49.8 ms (20 Hz), 99.6 ms (10 Hz), or 166 ms (6 Hz). After a 1s interval, participants responded using a method of adjustment. In the “target” condition, participants attended to and reported the emotion of the ‘target’ face within the stream. In the “average” condition, they reported the ensemble face of the stream. During target conditions, we found that they still relied on ensemble perception (Greenhouse-Geisser corrected F(1.99, 59.66) = 12.43, p < 0.001) but their responses toward the target face were more accurate when increasing the presentation duration of individual faces in the sequence (Greenhouse-Geisser corrected F(1.67, 50.09) = 44.95, p < 0.001). In averaging conditions reporting the ensemble, participants accurately responded with the ensemble face regardless of presentation duration (Greenhouse-Geisser corrected F(1.85, 55.45) = 2.72, p = 0.074). Thus, the results suggest that attending to the target improved individual face perception, with sufficient processing time in the face sequence, whereas ensemble perception was not affected by increased processing time. These findings shed light on the mechanisms of individual item and ensemble perception.