December 2022
Volume 22, Issue 14
Open Access
Vision Sciences Society Annual Meeting Abstract  |   December 2022
What makes a scene? Investigating generated scene information at different visual processing stages.
Author Affiliations & Notes
  • Aylin Kallmayer
    Goethe-University Frankfurt Germany
  • Melissa L.-H. Vo
    Goethe-University Frankfurt Germany
  • Footnotes
    Acknowledgements  This work was supported by SFB/TRR 26 135 project C7 to Melissa L.-H. Võ and the Hessisches Ministerium für Wissenschaft und Kunst (HMWK; project ‘The Adaptive Mind’) and the Main-Campus-Doctus stipend awarded by the Stiftung Polytechnische Gesellschaft.
Journal of Vision December 2022, Vol.22, 3944. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Aylin Kallmayer, Melissa L.-H. Vo; What makes a scene? Investigating generated scene information at different visual processing stages.. Journal of Vision 2022;22(14):3944.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

In recent years, deep-learning research has produced a range of neural networks that are able to classify and generate images from a variety of stimulus domains. At the same time, advances in cross-domain analysis methods (e.g., representational similarity analysis, RSA) have allowed us to make inferences about the human visual system based on the representational spaces of these networks. State-of-the-art generative networks provide a useful testbed for probing and manipulating representations at different stages of scene generation which could help us better understand “what makes a scene?”. To investigate how well a generative network (PROGGAN) captures real-world scene gist and whether the information content of generated scenes is rich enough to trick human observers we conducted an online experiment. Participants had to detect real from generated images of scenes for varying presentation-times (50 vs. 500ms) and rate the degree of realism for each generated scene. We found that for short presentation times participants performed only slightly above chance, while for longer presentation times sensitivity (d’) increased significantly and response bias became more conservative. Interestingly, realism ratings correlated with the false alarm rate of generated images in both conditions. Combining RSA and generalized linear modelling we found that late layer DNN representational geometries served as significant predictors for the representational structure emerging from false alarm rates in the 50 ms condition. Our results imply that during a first glimpse especially high level information contained in generated scenes triggers processes in the visual system that are similar to those triggered by real-world scenes, but over time generated scene information is perceived as less realistic. Based on these explicit and implicit measures of realism, we can use network dissection on the generator at different representational stages to better understand the content and structure of real-world scene representations in the human visual system.


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.