To overcome the severe restrictions of the processing bottleneck associated with the limited capacity of attention and working memory, the visual system has to compress the big and often redundant amount of visual information received from the environment. One strategy to accomplish such a compressed representation is “to discriminate and to reproduce the statistical moments” related to features and objects in the visual field (Whitney & Leib,
2018, p. 8). This information is often referred to as
ensemble summary statistics (Alvarez,
2011). The first statistical moment which can be easily computed by the visual system is the
mean. Observers can extract the information about the average (or central tendency) along a bunch of features from low-level dimensions, such as orientation (Alvarez & Oliva,
2009; Dakin & Watt,
1997; Parkes, Lund, Angelucci, Solomon, & Morgan,
2001), hue (de Gardelle & Summerfield,
2011; Maule & Franklin,
2015), brightness (Bauer,
2009), speed (Watamaniuk & Duchon,
1992), spatial position (Alvarez & Oliva,
2008), and size (Ariely,
2001; Chong & Treisman,
2003), to high-level dimensions such as emotional expression, gender, facial identity, gaze direction, and head rotation of a crowd (Florey, Clifford, Dakin, & Mareschal,
2016; Haberman & Whitney,
2007; Sweeny & Whitney,
2014). The second statistical moment is the
variance (or diversity or range), which can also be computed for low-level (Dakin & Watt,
1997; Morgan, Chubb, & Solomon,
2008; Norman, Heywood, & Kentridge,
2015; Solomon, Morgan, & Chubb,
2011; Suárez-Pinilla, Seth, & Roseboom,
2018) and high-level features (Haberman, Lee, & Whitney,
2015). Another important and well-studied ensemble statistic is the
numerosity (or sample size, in conventional terms of regular statistics), which is related to the ability to coarsely estimate a number of objects (Burr & Ross,
2008; Chong & Evans,
2011; Halberda, Sires, & Feigenson,
2006). Ensemble perception is not restricted to visual modality: Some studies have shown that people can represent averages in the auditory modality (Albrecht, Scholl, & Chun,
2012; McDermott, Schemitsch, & Simoncelli,
2013; Piazza, Sweeny, Wessel, Silver, & Whitney,
2013). Susceptibility to adaptation aftereffects (Burr & Ross,
2008; Corbett, Wurnitsch, Schwartz, & Whitney,
2012; Norman et al.,
2015), the speed extraction of ensemble information (as quickly as 50–200 ms; Chong & Treisman,
2003; Whiting & Oriet,
2011), and the possibility of computing ensemble summary statistics with limited or no conscious access to individual objects (Alvarez & Oliva,
2008; Ariely,
2001; Corbett & Oriet,
2011; Parkes et al.,
2001) all support the idea that these statistics can be directly encoded by the visual system along with basic perceptual properties. This seems to be in agreement with neurophysiological studies in animals (Treue, Hol, & Rauber,
2000), healthy subjects (Cant & Xu,
2012), and individuals with vision conditions (Leib et al.,
2012).