The human visual system is found to be sensitive to various types of statistical properties embedded in real-world inputs. Not only is it adept in detecting first-order relationships of local sensors, such as the contrasts in luminance and orientation (Cataliotti & Gilchrist,
1995; Nothdurft,
1991), it is also sensitive to correlations of sensor response, such as the co-occurrence of local edges in natural images (Geisler, Perry, Super, & Gallogly,
2001) and correlations across orientations, positions, and scales in textures (Portilla & Simoncelli,
2000). Recent studies suggest another type of statistical perception from visual inputs: ensemble perception, the ability to extract the average value of a feature dimension from multiple objects and to use it as a compact representation of the complex environment (Alvarez,
2011). The ability of statistical averaging has been demonstrated with low-level feature domains, including size (Ariely,
2001; Chong & Treisman,
2003), orientation (Parkes, Lund, Angelucci, Solomon, & Morgan,
2001), brightness (Bauer,
2009), and location (Alvarez & Oliva,
2008). More recently, evidence shows that observers also form accurate ensemble representations for complex high-level features, such as facial expression (Haberman & Whitney,
2007; Haberman & Whitney,
2009), gender (Haberman & Whitney,
2007), and identity (J. de Fockert & Wolfenstein,
2009; J. W. de Fockert & Gautrey,
2013; Neumann, Schweinberger, & Burton,
2013). Importantly, ensemble perception can be formed without high-fidelity representation of individual objects in the group (Ariely,
2001; Haberman & Whitney,
2009) and with minimal requirement of focal attention (Alvarez & Olivia,
2008).