Object recognition is one of the main functions of vision. At present, we have some general understanding of the computations necessary for this task and how they might be implemented in the human brain. The widely accepted “standard model” of biological pattern recognition (Hubel & Wiesel,
1965; Fukushima,
1980; Riesenhuber & Poggio,
1999) is essentially a set of hierarchically organized feature detectors, tuned to increasingly complex features and with increasing extent of spatial pooling across the levels. Still, there are many unanswered questions, interesting for researchers of both biological and computer vision (e.g., Mutch & Lowe,
2008; Jarrett, Kavukcuoglu, Ranzato, & LeCun,
2009; DiCarlo, Zoccolan, & Rust,
2012). We only have vague ideas about the actual number of levels, set of features at each level, and rules of combining lower-level features into higher-level ones. There are different opinions on the role of interactions within the levels and top-down connections, and it has been argued that the feed-forward model may be fundamentally wrong (e.g., Mumford,
1992; Rao & Ballard,
1999).
Visual crowding is a deterioration of object recognition in the visual periphery caused by other objects nearby in the visual field (e.g., Bouma,
1970). Usually, crowding does not affect the detection of the presence of simple visual features much, but seems to heavily inhibit both their relative positions and combining them into recognizable objects (Levi, Hariharan, & Klein,
2002; Pelli, Palomares, & Majaj,
2004). Therefore, crowding is closely related to object recognition, and the crowding paradigm might help to understand pattern recognition in human vision (Levi,
2008; Pelli & Tillman,
2008).
Still, this possibility has rarely been the main motive of crowding studies. Usually, these studies attempt to reveal new regularities of crowding, test theories of crowding, or build better models of this phenomenon. Quantitative models of crowding frequently use simple feature discrimination as the observer's task (e.g., Parkes, Lund, Angelucci, Solomon, & Morgan,
2001; van den Berg, Roerdink, & Cornelissen,
2010). It is not clear whether the proposed mechanisms are relevant for object recognition. Of course, many studies of crowding provide some useful information on object recognition. However, the supposedly central problem of combining features into objects deserves more special attention.
There are several recent studies on the integration of small line segments (or Gabor elements) into longer contours in cluttered images (May & Hess,
2007; Chakravarthi & Pelli,
2011). Whereas this task assumes a kind of feature integration too, it is obviously different from combining features in typical object recognition. Although the stimuli of these studies (“snakes” and “ladders”) may be related to certain objects, the main issue in these studies is the saliency of two kinds of contours, not recognition of objects. Also, the integration of color, orientation, and spatial frequency studied by Põder and Wagemans (
2007) can hardly be regarded as true object recognition that should combine features and their relative positions.
An article by Dakin, Cass, Greenwood, and Bex (
2010) seems to be a better example of combining object recognition and crowding research. In that study, subjects had to identify the orientation of a rotated T in the presence of a nearby irrelevant T-like object. The authors analyzed the distributions of incorrect answers as dependent on the features of the flanking object. They found that the nature of interactions was mostly determined by the configuration formed by the target and flanker, irrespective of its absolute orientation; i.e., the observed effects were predominantly object-centered rather than viewer-centered. They observed twice as many ±90° as 180° target rotations among incorrect responses. There were many more errors with flankers above or below (end flankers) compared to left or right (side flankers) of an upright T target, and the end flankers induced errors that resembled the flanker more often. These regularities were accounted for by probabilistic weighted averaging of feature positions within the objects. However, the main idea of their study was to see how much the supposed low-level feature interactions could explain crowding effects with these simple objects, and their model was not intended to simulate generic object recognition.
The goal of the present study is to obtain a better understanding of the main problem of object recognition—combining features into objects. I chose a minimal stimulus for that purpose: an object composed of two “features” and one “free-floating feature” that can be mistakenly integrated with those from the object.
My experiment was similar to that from Dakin et al. (
2010). A target (tumbling T) was presented together with a single bar (either vertical or horizontal) in a random position around the target. Observers identified the orientation of the target. Response distributions as dependent on the position and orientation of the flanker were analyzed and modeled. I found that such a simple crowding object has very strong and regular effects on target identification, effects which may signify important constraints on object recognition mechanisms.