Human and nonhuman primates are remarkably fast at visual object categorization (Rosch, Mervis, Gray, Johnson, & Boyes-Braem,
1976; Perrett, Hietanen, Oram, Benson, & Rolls,
1992; Thorpe, Fize, & Marlot,
1996; Hung, Kreiman, Poggio, & DiCarlo,
2005; Grill-Spector & Kanwisher,
2005; Cauchoix, Crouzet, Fize, & Serre,
2016). Given the number of processing stages involved and the speed at which object categorization occurs, this visual process is believed to be predominantly feed-forward (Riesenhuber & Poggio,
1999; Serre, Oliva, & Poggio,
2007). That is, visual information is processed hierarchically at increasing levels of abstraction beginning with edge extraction in the primary visual cortex (V1, e.g., Hubel & Wiesel,
1959) to the processing of intermediate visual features (e.g., by area V4; Gallant, Braun, & Van Essen,
1993; Yue, Pourladian, Tootell, & Ungerleider,
2014) to the processing of more complex visual features (Tanaka,
1996) and/or object categories (e.g., Tsao, Freiwald, Tootell, & Livingston,
2006; Kriegeskorte et al.,
2008; Bell, Hadj-Bouziane, Frihauf, Tootell, & Ungerleider,
2009; Bi, Wang, & Caramazza,
2016) in the inferior temporal (IT) cortex, where different regions respond selectively to different object categories (Haxby et al.,
2001; Kanwisher,
2010). More recently, however, it has been proposed that basic object categorization, such as distinguishing between animate and inanimate objects, may not depend exclusively on processing by regions in the IT cortex; instead, processing by intermediate visual areas, which compute intermediate object features, might be sufficient for this distinction (Perrinet & Bednar,
2015; Long, Konkle, Cohen, Alvarez,
2016; Long et al.,
2017). This hypothesis advocates that animate and inanimate object categories differ substantially in their intermediate visual features, and these differences are sufficient for animate/inanimate categorization. For instance, animate objects are more curved compared to inanimate objects (Levin et al.,
2001; Perrinet & Bednar,
2015; Long et al.,
2017). As such, the amount of curvilinear information present in the visual input may carry adequate information for human and nonhuman primates to distinguish between animate and inanimate objects with minimal IT cortex involvement. Evidence supporting this hypothesis comes from a recent study (Long et al.,
2017) demonstrating that participants could visually classify animate and inanimate objects significantly above chance based on the amount of curvilinear and rectilinear information present in the images: The authors generated synthetic stimuli (termed
texforms) from images of animals (animate objects) and man-made objects (inanimate objects) using a texture synthesis algorithm described in detail in Freeman and Simonelli (2011). This algorithm (model) extracts a group of image statistical descriptors across spatial scales and orientations, including mean luminance, contrast, skewness, and kurtosis, from spatially constrained windows throughout a target image. The algorithm then iteratively adjusts the pixel values of a random Gaussian noise image with the same spatial dimensions as the original image using a variant of gradient descent until this noise image has the same image statistical descriptors within the same spatially constrained windows as the original image. By pooling and synthesizing image statistics from and within separate but spatially constrained windows, the resulting stimuli preserve only the coarse form of the target object but maintain the object's intermediate features (e.g., curvilinear, rectilinear, and some texture information). These texform images could not be recognized as the original object but carried sufficient information to be classified above chance as animate and inanimate.