The global shape of an object embedded in a scene is a well-known clue for categorization. In the current work, we operationally define the term “shape” as closed contours that are possessed by meaningful and familiar objects in the real world. The contours of the real-world objects are formed by a long-range association of local features. Under our definition, the shape is logically something global. Our visual system is designed to filter the visual input and reconstruct the world from such building blocks obtained by different filters (hierarchical approach). Since Hubel and Wiesel (
1959,
1968) discovered neuronal selectivity for an oriented bar/edge in the primary visual cortex, vision researchers have invested energy into modeling the stage through which filtered features are assembled and lead to highly complicated object representations (Biederman,
1987; Marr,
1982). The theory of recognition-by-components (RBC) is an excellent example of the hierarchical model. RBC encodes objects as clusters of three-dimensional parts, called
geons, and their spatial arrangements. It should be noted that, to extract geons from natural complex scenes, it is necessary to implement contour integration and figure–ground segmentation. However, a number of models based on these processes rely on recurrent processing provided by feedback signals from higher to lower visual areas (e.g., Bullier, Hupe, James, & Girard,
2001, figure–ground segmentation; Grossberg & Mingolla,
1985, contour integration). In short, they could be too time consuming to be completed within a feed-forward sweep (Roelfsema, Lamme, Spekreijse, & Bosch,
2002; Serre, Oliva, & Poggio,
2007). Although these processes can be achieved, in theory, without recurrent processing (May & Hess,
2008; Rosenholtz, Twarog, Schinkel-Bielefeld, & Wattenberg,
2009; Supèr, Romeo, & Keil,
2010), neurophysiological studies support its involvement (Hupé et al.,
1998; Scholte, Jolij, Fahrenfort, & Lamme,
2008; Supèr & Lamme,
2007). The fact that image parsing based on perceptual grouping in natural scenes is actually slower than animal/vehicle categorization (Korjoukov et al.,
2012) also challenges the involvement of hierarchical processing.