Abstract
There has been a recent surge in the study of ensemble coding, the phenomenon in which the visual system represents a set of similar objects using summary statistics (Alvarez & Oliva, 2008; Ariely, 2001; Chong & Triesman, 2003; Haberman & Whitney, 2007; 2008). These studies have revealed that humans are sensitive to scene statistics across multiple levels of visual analysis, including high-level objects such as faces. However, the limits of ensemble coding remain undefined. Here we use the method-of adjustment to explore this issue, allowing observers to adjust a test face to match the mean expression of a set of faces. In Experiment 1, observers viewed sets of 12 faces that contained 2 emotional outliers (i.e. expressions that deviated substantially from the mean expression of the other 10 faces) for 250 ms. When adjusting the subsequent test face to the mean expression, all observers preferentially adjusted to the local mean of the set (mean expression with outliers excluded) rather than the global mean (mean expression with outliers included). In Experiment 2, we modeled the performance of an observer who implemented a ‘sampling’ strategy; that is, an observer who picked N faces from the set and perfectly averaged them. The sampling hypothesis was first put forth by Myczek and Simons (2008) as a potentially more parsimonious explanation for ensemble coding. However, our sets contained outliers, the presence of which would be a detriment to a model that sampled. The results show that a sampling model could not explain human performance. Thus, observers effectively filtered the outliers, extracting the local mean from the set in only 250 ms.