Several studies have found that observers represent the average expression or identity from a crowd, often as precisely as they can recognize the expression of any of the individuals (Haberman & Whitney,
2007,
2009). This crowd expression perception operates over space (e.g., a static crowd of faces) and time—observers are sensitive to crowd expression and identity when faces are presented in a temporal sequence, as happens when we shift our gaze across a crowd, or when a single face dynamically changes expression or speaks (Haberman, Harp, & Whitney,
2009; Post, Haberman, Iwaki, & Whitney,
2012). Humans are therefore sensitive to the summary statistical information in groups of faces—groups of faces are represented quickly and efficiently as an ensemble (for a review, see Alvarez,
2011; Whitney, Haberman, & Sweeny,
2013). The advantages of this are clear: averaging many local estimates improves precision and accuracy in estimates of global properties or textures (for example, ensemble motion, texture, size; Albrecht & Scholl,
2010; Alvarez & Oliva,
2008; Ariely,
2001; Chong & Treisman,
2003,
2005; Morgan & Glennerster,
1991; Parkes, Lund, Angelucci, Solomon, & Morgan,
2001; Watamaniuk & Duchon,
1992). Ensemble face representation may be a kind of facial texture coding, where the system can mitigate the effect of local (individual face) noise by averaging over the crowd (Leib et al.,
2012). Further, as mentioned previously, crowds of faces convey significant information that is uniquely available at the level of the crowd. Processing the emotional expression of a whole
crowd of people tells us something more significant, more accurate, and more reliable about the state of the world. Encoding ensemble faces (crowds) allows humans to visually represent social cues at the level of the group.