Abstract
We encounter crowds of faces all the time in our daily lives. Previous work has demonstrated that we are surprisingly sensitive to high-level summary statistical information, such as average expression (Haberman & Whitney, 2007), and this high-level summary encoding has been shown in both space and time (Haberman & Whitney, 2009; Albrecht & Scholl, 2010). In the real world, faces are often randomly oriented. However, previous work on ensemble or summary statistical perception has not clarified whether these percepts can be formed from viewpoint invariant object representations. If summary statistical perception operates over the viewpoint invariant 3D representations of objects, this would broaden the applicability and usefulness of ensemble coding throughout natural scenes, including faces in a crowd. Here, we presented a temporal sequence of faces. The number of faces in each sequence (crowd) varied ranging from 2-18 faces. Each individual face was viewed for 47 ms. The sequences contained leftward-orientated faces, and participants were asked to report the mean identity using an adjustable, forward-oriented test face. Our results indicate that participants achieve a veridical ensemble code even when required to view the faces in one orientation and respond in a new orientation. We varied set sizes as a control to measure how much information was integrated from each set of faces. Using this control, we found that participants were integrating on average 4 or more faces in the stimuli set. Thus, alternative explanations, such as 1 face subsampling, cannot adequately explain the participants’ results. This pattern of performance suggests that an ensemble percept is not strictly image-based, but depends on object-centered representations that can be successfully utilized under conditions of viewpoint invariance
Meeting abstract presented at VSS 2013