September 2024
Volume 24, Issue 10
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2024
Time-resolved brain activation patterns reveal hierarchical representations of scene grammar when viewing isolated objects
Author Affiliations & Notes
  • Aylin Kallmayer
    Goethe University Frankfurt, Scene Grammar Lab, Germany
  • Melissa Vo
    Goethe University Frankfurt, Scene Grammar Lab, Germany
  • Footnotes
    Acknowledgements  This work was supported by SFB/TRR 26 135 project C7 to Melissa L.-H. Võ and the Hessisches Ministerium für Wissenschaft und Kunst (HMWK; project ‘The Adaptive Mind’) and the Main-Campus-Doctus stipend awarded by the Stiftung Polytechnische Gesellschaft to Aylin Kallmayer.
Journal of Vision September 2024, Vol.24, 655. doi:https://doi.org/10.1167/jov.24.10.655
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Aylin Kallmayer, Melissa Vo; Time-resolved brain activation patterns reveal hierarchical representations of scene grammar when viewing isolated objects. Journal of Vision 2024;24(10):655. https://doi.org/10.1167/jov.24.10.655.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

At its core, vision is the transformation of sensory input into meaningful representations. Understanding the structure of such representational spaces is crucial for understanding efficient visual processing. Evidence suggests that the visual system encodes statistical relationships between objects and their semantic contexts. Recently, however, a more fine-grained framework of hierarchical relations has been formulated (“scene grammar”) according to which scene understanding is driven by real-world object-to-object co-occurrence statistics. More specifically, clusters of frequently co-occurring objects form phrases wherein larger, stationary objects (e.g., sink) anchor predictions towards smaller objects (e.g., toothbrush). Still, we know little about the mechanisms and temporal dynamics of these anchored predictions and whether the processing of individual objects already activates representational spaces characterized by phrasal structures. In the present EEG study, we aimed to quantify shared representations between objects from the same versus a different phrase within the same scene in a MVPA cross-decoding scheme paired with computational modelling to probe the format of shared representations. We presented objects from four different phrases spanning two different scenes (kitchen and bathroom) individually in isolation. Classifiers trained on anchor objects generalized to local objects of the same phrase and reverse, but crucially, not to objects from the same scene, but different phrase. This provides first evidence that phrase-specific object representations are elicited by the perception of individual objects. Computational modelling revealed that high-level semantic features quantified from Resnet50 successfully predicted the classifier’s generalization matrix suggesting that late-stage recurrent processes are responsible for the observed generalization rather than low-level visual similarity between the objects. Overall, we provide novel insights into the temporal dynamics of encoded object co-occurrence statistics which seem to reflect a more fine-tuned hierarchical structure than previously assumed. Finally, this also provides a mechanistic account for the hierarchical predictions observed in efficient attention guidance through real-world scenes.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×