Abstract
We encounter sets of similar objects on a regular basis—a field of corn, a bin of apples, a parking lot of cars. When we perceive these sets, the visual system tends to favor a statistical summary of the group as a whole, while individual item information is often unavailable. This ensemble representation occurs not only for low-level features but also for high-level objects. For example, observers perceive the mean emotion or identity in a set of simultaneously presented faces. Here we investigated the timecourse of high-level ensemble representation of faces. Observers were shown sequentially presented faces of varying emotions. The number of faces, the duration per face, and inter-stimulus interval were manipulated. Observers were able to extract a mean emotion from the sequentially presented faces at even the shortest test durations (over 12 Hz), and even with the largest set sizes (20 faces). The results suggest that ensemble coding of high-level objects occurs rapidly with a brief minimum temporal integration.