Abstract
The classic animation experiment by Heider and Simmel (1944) demonstrated our strong tendency to perceive and remember interactions of simple geometric shapes in the form of a narrative. In their animation, three simple shapes move around on the screen. Observers almost inevitably interpret them as rational agents with intention, desires and beliefs ("That mean green square!"). Much subsequent work on dynamic scenes has identified basic visual properties that can make shapes seem animate. Here, we investigate the limits on our ability to use narrative to understand an animated scene. We created 30-second Heider-style cartoons containing 3 to 9 items whose trajectories were generated by a simple set of rules (e.g. red squares are biased to move toward green circles). For each set size, four distinct cartoons were designed based on a different combination of these rules. In the first stage of the experiment, we asked ten Amazon Mechanical Turk participants to write short narratives for each cartoon. These narratives were scored for accuracy by three lab assistants. For each cartoon, the five highest scored narratives were used in the next stage of the experiment. A new group of participants (N= 48) were shown a cartoon and then presented with a narrative: either one written for that specific cartoon or one written for a different cartoon with the same objects. Participants judged how well the description fit the cartoon on a scale from 1(clearly does not fit) to 5(clearly fits). ROC curves generated from the rating scale data show good performance with three objects (d'= 1.55) but poor performance for larger set sizes (all d' 0.8). Apparently, our "Heider capacity" falls off dramatically after 3 objects, suggesting a limit related to the visual working memory and/or motion tracking limits. Such limits may impact interpretation and recall of real-world dynamic scenes.
Meeting abstract presented at VSS 2016