Human layout perception is partially due to inter-ocular stereo fusion, but those cues are only effectual at short range and humans easily perceive depth when they are absent—such as when looking at photographs. One hypothesis is that layout perception results from a combination of local cues extracted from parts of a two-dimensional image and then resolved into a consistent overall three-dimensional scene structure. Occlusion and relative position of features with respect to the ground plane permit the segmentation of images into “figure” and “background” regions. Blur information can convey the coarse layout of the scene (Schyns & Oliva,
1994), which, in turn, provides information about the probable location of the surface in depth, and even the size of the observed space (Held & Banks,
2008). Many researchers have observed that human-perceived layout and depth information is not always accurate (Howe & Purves,
2005; Watt, Akeley, Ernst, & Banks,
2005). Surface angles are often underestimated (Girshick, Burge, Erlikhman, & Banks,
2008), and slants of hills are often overestimated (Creem-Regehr, Gooch, Sahm, & Thompson,
2004; Proffitt, Bhalla, Gossweiler, & Midgett,
1995). Distances to objects can be misperceived when a relatively wide expense of the ground surface is not visible (Wu, Ooi, & He,
2004), or when the field of view is too narrow (Fortenbaugh, Hicks, Hao, & Turano,
2007). Although most studies found systematic and consistent biases in how humans estimate object distances and surface orientations, a systematic evaluation of humans' consistency in perceiving scene layout is missing from the literature.