A key tenet of the spatial envelope model is that humans compute an intermediate representation of 3D spatial structure, which is in turn used to infer semantic category during early visual processing (
Greene & Oliva, 2006,
2009a,
2009b,
2010;
Oliva, 2005;
Oliva & Torralba, 2001,
2006;
Torralba & Oliva, 2002). In support of this model, previous work has shown that scene structure is extracted from natural scenes before semantic categories are accessed (
Greene & Oliva, 2009a), and that humans use spatial structure cues to inform judgements of semantic category (
Greene & Oliva, 2009b,
2010). Although we did not manipulate presentation duration directly, in
Experiment 2, we did find that a classifier trained to predict semantic category from 3D spatial structure category produced reasonable results: 57.36% accuracy on all images, and 68.33% accuracy on only the typical category exemplars. However, in
Experiment 3, we also found that, human-rated spatial envelope properties are poor predictors of semantic categories (45.76% correct). The reduced discriminative power of spatial envelope properties (compared with our spatial structure categories, and other classification results; e.g.,
Greene & Oliva, 2009b) may be due to the taxonomical structure of our empirically derived semantic category system. Perhaps the impressive performance of previously reported spatial envelope-driven semantic classification (e.g.,
Greene & Oliva, 2009b) is produced, in part, by the selection of semantic categories that are discriminable based on spatial envelope profiles (i.e., spectral signatures; see
Oliva & Torralba, 2001). It is also possible that this result is caused by an idiosyncratic set of SYNS categories. Future research should examine whether empirically derived category systems from other datasets also produce a weak association between spatial envelope properties and semantic content. Or, perhaps a simpler explanation exists: prior studies testing semantic classification from spatial envelope properties have used up to seven properties (
Greene & Oliva, 2006,
2009b), while we used three (used by
Ross & Oliva, 2010). We would likely see an improvement in classification performance if we used additional properties like “navigability” and “temperature” (
Greene & Oliva, 2006,
2009b).