Abstract
Theories of rapid scene categorization have emphasised the importance of computationally inexpensive image cues such as texture (Renninger & Malik, 2004) and spectral structure (Oliva & Torralba, 2001). Although binocular disparities provide reliable depth information, stereopsis is typically thought to be sluggish (Tyler, 1991; Kane, Guan & Banks, 2014; Valsecchi, Caziot, Backus, & Gegenfurtner, 2013), and therefore unhelpful during early scene processing. Recent work, however, suggests that stereopsis improves object recognition following brief (33 msec) presentations (Caziot & Backus, 2015). We investigated whether disparity information facilitates the categorization of briefly presented real-world scenes. Subjects viewed greyscale or colour images from the Southampton-York Natural Scenes dataset (Adams et al., 2016), presented monoscopically (same image to both eyes), stereoscopically, or in reversed stereo (inverted disparity-defined depth). Images were presented for 13.3, 26.6, 53.2, or 106.5 msecs, with backward masking. Subjects identified the semantic (e.g., beach / road / farm) or 3D spatial structure (e.g., open / closed off / navigable route) category, in addition to the viewing condition (mono / stereo / stereo-reversed). Disparity information facilitated semantic categorization, but only for grayscale images, reflecting redundancy across disparity and colour segmentation cues. Strikingly, the stereoscopic advantage emerged for 13.3 msec presentations, suggesting that disparity cues are encoded rapidly in complex scenes. Colour also facilitated semantic categorization, but only for presentations of 26.6 msecs or longer. Disparity and colour both improved 3D spatial structure categorization. Interestingly, however, observers were only able to distinguish the viewing condition of images presented for 53.2 msecs or longer – much later than the onset of the disparity advantage for categorisation. Our findings contradict previous claims that stereoscopic depth cues are extracted from real-world scenes after monocular cues (e.g., Valsecchi et al., 2013), and suggest that stereopsis improves the recognition of briefly presented scenes.