We demonstrated perceptual averaging of emotional valence using pairs of faces that were simultaneously and briefly presented, without allowing sufficient time for observers to systematically focus attention on a single face (a condition necessary to induce neural averaging in high-level ventral visual areas). Specifically, we demonstrated two complementary manifestations of perceptual averaging, (1) perceptual spreading in which the valence-neutral (surprise) face appeared to take on the emotional valence of the accompanying happy or angry face, and (2) perceptual reduction in which the happy face appeared less happy and the angry face appeared less angry due to the accompanying valence-neutral or opposite-valence face. Perceptual spreading and reduction together indicate that perception of both valence-neutral and emotional faces moved toward the average expression for each pair.
Crucially, this perceptual averaging occurred only when the two faces were presented within the same visual hemifield. Furthermore, averaging occurred when the two faces were not in close proximity but were separated by a distance of 7° visual angle, indicating a long-range interaction within each visual hemifield. These results are consistent with prior evidence of neural averaging in high-level ventral visual processing. High-level ventral visual neurons that encode complex visual features such as facial expressions have large, primarily contralateral receptive fields. When these neurons are activated by multiple stimuli in the absence of attention focused on one stimulus, their firing rates reflect the average of those in response to the individual stimuli (e.g., Chelazzi et al.,
1998; Kastner et al.,
2001; Miller et al.,
1993; Sato,
1989; Zoccolan et al.,
2005). Our finding of within-hemifield perceptual averaging of facial expressions can thus be explained on the basis of within-receptive-field averaging by high-level ventral visual neurons whose receptive fields are large and primarily contralateral.
Nevertheless, one might argue for an alternative explanation based on differential attention resources between and within hemifields. For example, there is evidence suggesting that each cerebral hemisphere has a relatively independent resource for visual spatial attention and short-term memory, especially for split-brain individuals but also for normal individuals to various degrees depending on the behavioral task (e.g., Alvarez & Cavanagh,
2005; Delvenne,
2005; Duncan, Bundesen, Olson, Humphreys, Chavda, & Shibuya,
1999; Luck, Hillyard, Mangun, & Gazzaniga,
1989). It is thus possible that in the within-hemifield condition, the two faces might have competed for a more limited attention resource within one cerebral hemisphere, whereas in the between-hemifield condition each face might have been processed by a relatively separate attention resource in the contralateral cerebral hemisphere. Although our results do not definitively reject this attention-resource hypothesis, we favor the neural-averaging hypothesis because it naturally predicts the phenomenon of perceptual averaging, whereas the attention-resource hypothesis must postulate an additional mechanism whereby competition for a limited attention resource results in perceptual averaging (rather than, for example, perceptual degradation).
Other alternative explanations of our results based on post-perceptual processing, attentional strategies, and spatial interactions are unlikely. Our results cannot be explained in terms of post-perceptual effects such as arousal, emotional biasing, and/or semantic interactions because these effects should have been operative to a similar extent in both the within-hemifield and between-hemifield conditions.
Nor can the results be explained in terms of a top-down attention strategy. Observers could not have adjusted their spatial distribution of attention prior to stimulus presentation because the to-be-rated (i.e., post-cued) face was always unpredictably presented in any of the four quadrants on each trial. Especially in
Experiment 4 where the within- and between-hemifield conditions were randomly intermixed, observers could not have anticipated the spatial configuration (vertical or horizontal) of each face pair. Thus, neither pre-stimulus distribution of spatial attention nor anticipation of spatial configuration could account for our perceptual-averaging results. It is also unlikely that observers adopted a post-stimulus attention strategy. Each face pair was only briefly presented (100 ms and backward masked) at an unpredictable location; it is thus unlikely that observers had sufficient time to identify the expressions of the two peripherally flashed faces and to complete shifting of attention to the type of face that would be post-cued. Even if observers were able to do this to a limited degree in
Experiments 1,
2, and
4, focusing attention on the to-be-rated face would only have reduced the effect of neural averaging (e.g., Chelazzi et al.,
1998), and it is unclear how an act of shifting attention itself would have caused perceptual averaging. Furthermore, in
Experiment 3, the to-be-rated face was completely unpredictable until the post-cue on each trial, but we still obtained evidence of perceptual averaging. Thus, neither pre-stimulus distribution of spatial attention nor post-stimulus shifting of attention could account for our perceptual-averaging effects.
The results are also unlikely to be attributable to retinotopic or spatial interactions. For example, the inter-face distance was slightly shorter in the between-hemifield condition than in the within-hemifield condition, but we obtained evidence of perceptual averaging only in the within-hemifield condition. Furthermore, perceptual spreading occurred whether the face locations were aligned (
Experiment 1) or randomly jittered (
Experiment 4), indicating that perceptual averaging is mediated by mechanisms that are relatively insensitive to random shifts in location. This is consistent with our hypothesis that within-hemifield perceptual averaging reflects within-receptive-field averaging by high-level ventral visual neurons whose responses are primarily confined to the contralateral visual hemifield but are otherwise largely insensitive to small changes in stimulus locations.
Because neural averaging occurs throughout the ventral visual pathway whenever multiple stimuli fall within a single receptive field and no particular stimulus is selectively attended (e.g., Chelazzi et al.,
1998; Kastner et al.,
2001; Miller et al.,
1993; Reynolds, Chelazzi, & Desimone,
1999; Sato,
1989; Zoccolan et al.,
2005), it is plausible that perceptual averaging is a ubiquitous phenomenon affecting perception of simple as well as complex visual features. In fact, Parkes, Lund, Angelucci, Solomon, and Morgan (
2001) demonstrated short-range (0.47° visual angle) perceptual averaging of local orientation consistent with small receptive fields in low-level visual areas. Because neural receptive fields become larger in higher level ventral visual areas that tend to process more complex visual features (see Suzuki,
2005 and Tanaka,
1996 for reviews), the spatial extent of perceptual averaging is likely to be larger for more complex features (at least 7° for facial expressions based on our current results) than for simpler features. Furthermore, neural tunings for complex features in high-level ventral visual neurons tend to develop in response to behavioral demands (e.g., Kobatake, Wang, & Tanaka,
1998; Logothetis, Pauls, & Poggio,
1995; Sigala & Logothetis,
2002; Young & Yamane,
1992). Thus, long-range averaging of complex features in fleeting glances might be advantageous by allowing people to rapidly perceive the gist of behaviorally relevant information, prior to localizing individual objects.