Abstract
In recent years, deep-learning research has produced a range of neural networks that are able to classify and generate images from a variety of stimulus domains. At the same time, advances in cross-domain analysis methods (e.g., representational similarity analysis, RSA) have allowed us to make inferences about the human visual system based on the representational spaces of these networks. State-of-the-art generative networks provide a useful testbed for probing and manipulating representations at different stages of scene generation which could help us better understand “what makes a scene?”. To investigate how well a generative network (PROGGAN) captures real-world scene gist and whether the information content of generated scenes is rich enough to trick human observers we conducted an online experiment. Participants had to detect real from generated images of scenes for varying presentation-times (50 vs. 500ms) and rate the degree of realism for each generated scene. We found that for short presentation times participants performed only slightly above chance, while for longer presentation times sensitivity (d’) increased significantly and response bias became more conservative. Interestingly, realism ratings correlated with the false alarm rate of generated images in both conditions. Combining RSA and generalized linear modelling we found that late layer DNN representational geometries served as significant predictors for the representational structure emerging from false alarm rates in the 50 ms condition. Our results imply that during a first glimpse especially high level information contained in generated scenes triggers processes in the visual system that are similar to those triggered by real-world scenes, but over time generated scene information is perceived as less realistic. Based on these explicit and implicit measures of realism, we can use network dissection on the generator at different representational stages to better understand the content and structure of real-world scene representations in the human visual system.