Purchase this article with an account.
Alon Hafri, Barbara Landau, Michael F Bonner, Chaz Firestone; When a phone in a basket looks like a knife in a cup: Perception and abstraction of visual-spatial relations between objects. Journal of Vision 2019;19(10):160a. doi: https://doi.org/10.1167/19.10.160a.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
Our minds effortlessly recognize the objects and environments that make up the scenes around us. Yet scene understanding relies on much richer information, including the relationships between objects—such as which objects may be in, on, above, below, behind, or in front of one another. Such spatial relations are the basis for especially sophisticated inferences about the current and future physical state of a scene (“What will fall if I bump this table?” “What will come with if I grab this cup?”). Are such distinctions made by the visual system itself? Here, we ask whether spatial relations are extracted at a sufficiently abstract level such that particular instances of these relations might be confused for one another. Inspired by the observation that certain spatial distinctions show wide agreement across the world’s languages, we focus on two cross-linguistically “core” categories—Containment (“in”) and Support (“on”). Subjects viewed streams of natural photographs that illustrated relations of either containment (e.g., phone in basket; knife in cup) or support (e.g., spoon on jar; tray on box). They were asked to press one key when a specific target image appeared (e.g., a phone in a basket) and another key for all other images. Although accuracy was quite high, subjects false-alarmed more often for images that matched the target’s spatial-relational category than for those that did not, and they were also slower to reject images from the target’s spatial-relational category. Put differently: When searching for a knife in a cup, the mind is more likely to confuse these objects with a phone in a basket than with a spoon on a jar. We suggest that the visual system automatically encodes a scene’s spatial composition, and it does so in a surprisingly broad way that abstracts over the particular content of any one instance of such relations.
This PDF is available to Subscribers Only