Abstract
The world contains not only objects and features (e.g., glass vases, wooden tables), but also relations holding between them (e.g., glass vases *supported* by wooden tables). A growing body of work suggests that such sophisticated relations are rapidly and automatically extracted in visual processing. This raises a question: How does the visual system combine these elements together when constructing relational representations? Here we test the intriguing possibility that there is a canonical temporal “order” in which the mind builds visual relations, depending on a given object’s role in the relation. In particular, we take inspiration from psycholinguistics in hypothesizing that “reference” objects (e.g., tables or desks) — rather than “figure” objects (e.g., vases or laptops) — serve as the scaffold for building visual relational representations. Participants were shown scenes depicting canonical visual relations (e.g., a vase on a table), and had to evaluate whether a description of the scene (“The vase is on the table”, “The table is supporting the vase”, etc.) was correct or incorrect. Crucially, on some trials, the reference object (here, the table) appeared shortly before the figure object (here, the vase), or vice-versa. We observed a “reference-object advantage”: participants were faster to correctly evaluate relational descriptions when the reference object appeared before the figure object. This effect could not be explained by object location alone, as both object types were presented equally often at the critical locations. Furthermore, this effect was observed no matter the order of elements in the scene descriptions (e.g., table before vase in “The table is supporting the vase”). We suggest that the mind employs a sequential routine for building relational representations from a visual scene, in ways that respect the role that each element plays in the relation.