Abstract
Recent literature in visual perception has described an informational summarization process by which statistical properties of a scene are extracted, known as ensemble coding. Debate in this literature persists on the temporal properties and the informational content underlying the formation of ensemble representations. Are summary representations created immediately, or are they emergent over time? Does all information from a scene contribute to an ensemble representation, or is a subset of information used to make inferences regarding the full scene? The current work focused on providing clarity on these questions. To do this, two experiments were conducted over briefly presented displays of 16 oriented arrows. In both experiments, separate blocks tasked subjects with either reporting the average orientation of all 16 arrows, or to provide a whole-report of all arrow orientations remembered. The first experiment paired the stimulus display characteristics between the two tasks to determine whether the average of the subset of items reported within the whole-report task provides a more accurate prediction of the subject’s reported ensemble average, as compared to the average of all items in the display (global pooling). The second experiment manipulated the time available for information consolidation into visual short-term memory (VSTM) by modulating the exposure duration of the stimulus frame before a backwards pattern mask was displayed. The results provide evidence of orientation ensemble average representations emerging over time from a subset of items contained within the display, rather than all items. These findings argue against a pre-attentive global pooling mechanism of ensemble representation formation, and instead towards a view of ensemble representations which are generated “late” in the limited VSTM store.