Abstract
Our world is full of diverse types of visual information, yet visual attention and memory experiments focus almost exclusively on understanding attention/memory of basic visual features and discrete objects, in part because our understanding of how people represent scene surface information cognitively and neurally is very limited. Recent work (e.g., Lescroart & Gallant, 2019) has made headway in quantifying such surface representations, finding 3D surface information in scene-selective cortex, yet their results were limited to artificially generated images for which ground-truth 3D information exists. Here, we use DNNs (Zamir et al., 2018) to estimate ground-truth distance and surface-direction information based only on RGB stimulus images in the publicly available BOLD5000 fMRI data set. This procedure yielded artifact-free distance and surface-direction estimates for 978 of the 1000 scene images in the BOLD5000 stimulus set. Using a similar encoding model to Lescroart & Gallant (2019), we found significant predictions of voxel responses in scene-selective cortex (occipital place area and parahippocampal place area: 3/3 participants significant; retrosplenial complex: 2-3/3 participants significant depending on hemisphere). These results lay the foundation for investigating scene surface processing in more naturalistic environments and tasks, a critical step towards understanding visual and cognitive processes in the real world.