Abstract
Estimating the size of an indoor space involves examination of visual boundaries that limit the spatial extent of the space and reverberation cues caused by sounds reflected from interior surfaces. Here we used fMRI to examine how the brain processes the geometric size of an indoor scene when various types of sensory cues are presented individually or together. Specifically, we asked whether the size of space is represented in a modality specific way or in a more general, integrative way that combines multi-modal cues. In a block-design study, images or sounds that depict small and large sized indoor spaces are presented. Visual stimuli consisted of real-world pictures of empty spaces that are small like a closet, or large like a warehouse. Auditory stimuli consisted of sounds recorded in an anechoic chamber convolved with differential reverberation. Using a multi-voxel pattern classifier, we asked whether the two sizes of space can be accurately classified using visual, auditory and visual-auditory combined cues. We discovered that higher-level scene specific regions (OPA, PPA) showed above-chance classification of spatial size when visual cues are presented, but not when auditory cues are presented without any visual information. Conversely, we found that several regions in the transverse temporal gyrus showed above-chance classification for spatial size of scenes for auditory cues, but not for visual cues alone. Interestingly, we found that several areas in the temporal and parietal lobe including the Superior Temporal Gyrus (STG) and the Angular Gyrus (AG) represented spatial size of scenes regardless of the type of sensory cues. Furthermore, these regions also showed high classification accuracy when both visual and auditory cues are concurrently presented. These results suggest that STG and AG may contain multimodal representation of the size of space, and may play a role in integrating multi-sensory cues for spatial size.