The currently prevailing interpretation of crowding is in terms of some sort of spatial pooling of visual information (Balas, Nakano, & Rosenholtz,
2009; Freeman, Chakravarthi, & Pelli,
2012; Freeman & Simoncelli,
2011; Levi,
2008; Parkes, Lund, Angelucci, Solomon, & Morgan,
2001; Pelli, Palomares, & Majaj,
2004; Van den Berg, Roerdink, & Cornelissen,
2010; Wilkinson, Wilson, & Ellemberg,
1997). In particular, it has been proposed that the peripheral representations in visual cortex provide only local statistics of the visual image (Balas et al.,
2009; Freeman & Simoncelli,
2011; Parkes et al.,
2001; Van den Berg et al.,
2010). However, a recent study suggests that not all elements of the peripheral descriptor are captured in local statistics (Wallis, Bethge, & Wichmann,
2016). Since this statistical pooling cannot be avoided, it is designated as mandatory pooling or compulsory averaging (Parkes et al.,
2001). Crowding in this perspective is seen to be caused by a two-stage process in which individual features are first detected independently and are then spatially pooled or integrated in some fashion on a subsequent stage of processing. Only this pooled information is available for subsequent recognition processes (but see, e.g., Chaney, Fischer, & Whitney,
2014; Herzog, Sayim, Chicherov, & Manassi,
2015). We thus observe spatial pooling on two stages: In addition to the reduced spatial acuity (which is a pooling across space in the sense of a low-pass filtering of the luminance function by optical and neural means) there is an additional spatial pooling of elementary features (for example V1 units) on a higher cortical processing stage.