Abstract
To successfully communicate about and navigate a perceptually chaotic world, we must not only extract the identities of people, but also the roles that they play in events, i.e. who did what to whom: Boy-hitting-girl is very different from girl-hitting-boy. We routinely categorize Agents (i.e. the actor) and Patients (i.e. the one acted upon) from visual input, but even when attention is otherwise occupied, do we automatically encode such roles? To investigate this question, we employed a "switching cost" paradigm. In several experiments, participants observed a continuous sequence of two-person event scenes and had to rapidly identify the side of a target actor in each (the male/female, or the red/blue-shirted actor). Critically, although role was orthogonal to gender and shirt color, and was never explicitly mentioned, participants responded more slowly when the target's role switched from trial to trial (e.g., the male went from being Patient to Agent). Despite the small absolute magnitude of this role switch cost, it was both significant and robust (all p's < 0.001, Cohen's d's > 0.86), with a majority of subjects and items demonstrating the effect. In an additional experiment, we probed the level of representation at which the role switch cost operates. We ran the same paradigm as before but edited the images such that actors always faced opposite directions ("mirror-flipped"). Thus, actor poses were preserved but their interaction was eliminated. The switch cost here was significantly lower, and additional "active posture" saliency effects emerged. This indicates that the role switch cost in our previous experiments cannot be fully explained by mere pose differences associated with Agents and Patients. Taken together, our experiments demonstrate that the human visual system is automatically engaged in extracting the structure of an event, i.e. who did what to whom, even when attention is directed toward other visual features.
Meeting abstract presented at VSS 2017