Whether two objects belong to the same or different categories is intimately related to the generative processes that brought them into being. We hypothesized that in many cases, shape-altering transformations leave highly distinctive statistical signatures in object shapes, which the visual system could exploit for categorizing novel objects. Our findings demonstrate that observers are indeed excellent at identifying objects that share common generative processes. In
Experiment 1, participants performed far above chance at classifying unfamiliar three-dimensional objects based on the transformations that had been applied to them. They could do this from the very first trials on, indicating that they solve this task with complex novel objects by relying on a previously established shape feature space. This is in line with previous research on simple objects (e.g., Smith, Jones, Landau, Gershkoff-Stowe, & Samuelson,
2002), characters (Lake et al.,
2013) and words (Jern & Kemp,
2013; Kemp & Jern,
2009; Xu & Tenenbaum,
2007), demonstrating a general capability of humans to make categorizations based on single exemplars. It does not follow from our results that participants explicitly infer the generative processes that shaped the objects. Instead, we suggest that the participants' ability to classify objects is a result of their ability to discriminate stimuli using the statistical shape features that are associated with different generative processes.
Experiment 2 demonstrated that sensitivity to the statistical signatures of transformations is distinct from Euclidean shape similarity. Identical stimuli yielded opposite response patterns across tasks: Although apparent motion was driven by the distances between local contour features across shapes, inferences about causality (2-AFC task) were based on texture-like statistical shape features.
Experiment 3 revealed that participants can voluntarily separate and respond to these different aspects of the shape. When asked about generative processes, they grouped objects together based on statistical features; when asked about the extent of overlap, they grouped objects based on raw pixel similarities.
Together, these findings demonstrate that shape provides rich cues for categorizations via the statistical traces left in objects by processes in their past. The key insight is that the visual system represents object shape using a very large number of distinct perceptual features, which define a high-dimensional “shape space” (e.g., DiCarlo & Cox,
2007; Leeds, Pyles, & Tarr,
2014). We speculate that objects created by similar generative processes tend to lie much closer together in the shape space than objects created by different generative processes (
Figure 5). Thus, the visual system could use a simple heuristic based on distances between items to determine whether two unfamiliar objects belong to the same class. The distance threshold for the same/different decision could be learnt from the distributions of items within familiar categories (i.e., a prior on the typical distances between items within categories).
In a similar vein, Jern and Kemp (
2013) demonstrated, with simple arrow stimuli varying in length, color and saturation, that sampling from the observed probability distributions of object features over exemplars and categories enables the creation of new exemplars for these categories. Salakhutdinov, Tenenbaum, and Torralba (
2012) proposed a hierarchical probabilistic model that transfers acquired knowledge from previously learned categories to a novel category. The model can to some extent infer novel classes from single category examples: priors derived from previous knowledge about “super categories” are used to estimate prototypes and features of the new class. We speculate that for richly structured shapes—like the ones we used—simple discriminative models in a high-dimensional shape space may be sufficient to account for putative “one-shot” learning.
Crucial to this approach, however, is the choice of features that define the perceptual shape space: Similar generative processes should lead to similar locations in shape space. However, similar generative processes do not necessarily create shapes that are similar to one another on a point-by-point basis. For example, the objects in
Experiment 1 have multiple limbs with a variety of positions and curvatures, which do not line up with one another across different instances within each class. Nevertheless, the “twisting” transformation leads to distinctive spiral-shaped features on the limbs, which the visual system could use for identifying objects that have been subjected to the same transformation. We argue that at least some of the features that define shape space are the result of “mid-level” perceptual organization computations that describe relationships between multiple distant locations of the object (cf. Jozwik et al.,
2016; van Assen, Barla, & Fleming,
2018). Thus, texture-like representations of statistical shape features likely play an important role in inferences related to causal history.
Previous studies can help us to figure out the details of this process. Op de Beeck, Torfs, and Wagemans (
2008) measured perceived similarity for a set of objects in which they independently varied overall shape envelope (e.g., “square” and “vertical” objects) and local shape features (e.g., “spiky” and “smooth” objects). They found that participants judge similarity based on both factors; that is, they perceived objects as more similar if they shared either overall shape, local shape features, or both. Neural activation as measured by functional magnetic resonance imaging was correlated with these similarity ratings. Specifically, similarity in overall shape was linked to activation in retinotopic areas lower in the visual hierarchy (V1, V2, V3, V4v), whereas similarity in local shape features was linked to activation in nonretinotopic area lateral occipital cortex (LOC) higher in the visual hierarchy. This is also supported by work from Kubilius, Bracci, and Op de Beeck (
2016) showing that deep computational models predict object similarities in human perception, with lower levels of the networks mainly representing similarities in overall shape and higher levels representing similarities in local shape features (Elder,
2018). These findings suggest that classification based on causal history is mediated by the analysis of local shape features in higher-level shape representations.
This is in line with a layered view of shape perception in which objects consist of multiple shape properties at varying degrees of abstraction (Green,
2015), for example, parallel representations of overall shape and local shape features, which are retrieved depending on the task at hand (as in our overlap vs. transformation instructions in
Experiment 3) and potentially can be computed very fast (Epshtein, Lifshitz, & Ullman,
2008). This view provides a means to unify transformational and generative approaches to object categorization. Transformational approaches (e.g., Bedford & Mansson,
2010; Hahn, Chater, & Richardson,
2003; Hahn, Close, & Graf,
2009; Imai,
1977) argue that similarity judgments depend on transformational distance between objects (i.e., overall shape similarity). Generative approaches (e.g., Kemp et al.,
2005) emphasize inferences about common generative processes (i.e., similarity in statistical shape features). Our results show that there are multiple levels of shape representation that the visual system draws on depending on the task and available information.
Evidence from child development research suggests that the ability to access multiple levels of shape representation is available relatively early in life. When 4-year-old children were presented with one standard object of a particular overall shape and texture, they grouped a novel object based on shape rather than texture; however, when presented with two standard objects of different shapes but similar textures, they grouped a novel object based on texture rather than shape (Graham, Namy, Gentner, & Meagher,
2010). This selection of features is also affected by semantic cues: for example, 3-year-old children grouped objects based on superordinate characteristics (e.g., animals vs. food) when objects were labeled with nouns (e.g., “momos”), but they grouped objects based on subordinate characteristics (e.g., relying on color or texture; red grapes vs. green grapes) when objects were labeled with adjectives (e.g., “mom-ish” ones; Waxman,
1990).
To conclude, we found that observers can categorize complex novel objects by perceiving the signatures of generative processes, even from the very first experimental trials on. These findings also have implications for the perception of causal history and shape similarity judgments. Specifically, the underlying shape representations likely aid inferences about the causal history of objects (e.g., Leyton,
1989; Pinna,
2010; Pinna & Deiana,
2015; Spröte & Fleming,
2013), and objects that share features produced by the same generative process should appear more similar compared with other shapes (e.g., Ons & Wagemans,
2012; Op de Beeck et al.,
2008). Finally, these visual processes could potentially facilitate other shape and material perception tasks, including (a) identifying the physical properties of objects (Paulun, Schmidt, van Assen, & Fleming,
2017; Schmidt, Paulun, van Assen, & Fleming,
2017; van Assen et al.,
2018), (b) making predictions about what other members of the same category might look like (i.e., mental imagery of “plausible variants”), (c) motor affordances, and (d) predicting future states of moving and interacting objects.