Abstract
An intriguing proposal in recent literature is that vision is compositional: Just as individual words combine into larger linguistic structures (as when “vase,” “table,” and “on” compose into the phrase “the vase on the table”), many visual representations contain discrete constituents that combine in systematic ways (as when we perceive a vase on a table in terms of the vase, the table, and the relation physical-support). This raises a question: What principles guide the compositional process? In particular, how are such representations composed in time? Here we explore the psychophysics of scene composition, using spatial relations as a case study. Inspired by insights from psycholinguistics, we test the intriguing hypothesis that the mind builds relational representations in a canonical order, such that ‘reference’ objects (those that are large, stable, and/or exert physical ‘control’; e.g., tables)—rather than ‘figure’ objects (e.g., vases resting atop them)—take precedence in forming relational representations. In Experiment 1, participants performed a ‘manual construction’ task, positioning items to compose scenes from sentences (e.g., “the vase is on the table”). As hypothesized, participants placed reference-objects first (e.g., table, then vase). Next, we explored whether this pattern arises in visual processing itself. In Experiment 2, participants were faster to recognize a target scene specified by a sentence when the reference-object (table) appeared before the figure-object (vase) than vice-versa. Notably, this pattern arose regardless of word order (reference- or figure-first) and generalized to different objects and relations. Follow-ups showed that this effect emerges rapidly (within 100ms; Experiment 3), persists in a purely visual task (Experiment 4), and cannot be explained by size or shape differences between objects (Experiment 5). Our findings reveal psychophysical principles underlying visual compositionality: the mind builds relational representations in a canonical order, respecting each element’s role in the relation.