September 2019
Volume 19, Issue 10
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2019
When a phone in a basket looks like a knife in a cup: Perception and abstraction of visual-spatial relations between objects
Author Affiliations & Notes
  • Alon Hafri
    Johns Hopkins University
  • Barbara Landau
    Johns Hopkins University
  • Michael F Bonner
    Johns Hopkins University
  • Chaz Firestone
    Johns Hopkins University
Journal of Vision September 2019, Vol.19, 160a. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Alon Hafri, Barbara Landau, Michael F Bonner, Chaz Firestone; When a phone in a basket looks like a knife in a cup: Perception and abstraction of visual-spatial relations between objects. Journal of Vision 2019;19(10):160a. doi:

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

Our minds effortlessly recognize the objects and environments that make up the scenes around us. Yet scene understanding relies on much richer information, including the relationships between objects—such as which objects may be in, on, above, below, behind, or in front of one another. Such spatial relations are the basis for especially sophisticated inferences about the current and future physical state of a scene (“What will fall if I bump this table?” “What will come with if I grab this cup?”). Are such distinctions made by the visual system itself? Here, we ask whether spatial relations are extracted at a sufficiently abstract level such that particular instances of these relations might be confused for one another. Inspired by the observation that certain spatial distinctions show wide agreement across the world’s languages, we focus on two cross-linguistically “core” categories—Containment (“in”) and Support (“on”). Subjects viewed streams of natural photographs that illustrated relations of either containment (e.g., phone in basket; knife in cup) or support (e.g., spoon on jar; tray on box). They were asked to press one key when a specific target image appeared (e.g., a phone in a basket) and another key for all other images. Although accuracy was quite high, subjects false-alarmed more often for images that matched the target’s spatial-relational category than for those that did not, and they were also slower to reject images from the target’s spatial-relational category. Put differently: When searching for a knife in a cup, the mind is more likely to confuse these objects with a phone in a basket than with a spoon on a jar. We suggest that the visual system automatically encodes a scene’s spatial composition, and it does so in a surprisingly broad way that abstracts over the particular content of any one instance of such relations.


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.