Abstract
Humans can recognize objects quickly and accurately despite tremendous variations in their appearance. This amazing ability of rapid categorization has motivated several models of natural vision and object recognition. The conclusion of these models is rather far-reaching: humans achieve rapid categorization in a way similar to these models. Since understanding the computations underlying rapid categorization is important for achieving natural vision, we have re-examined several of these models. In particular, we trained the models and tested them with scenes in which the objects to be categorized were replaced with uniform ellipses. We found that the models categorized most of the scenes with ellipses as having the objects. Therefore, these models do not categorize objects but rather the contexts in which the objects are imbedded and thus provide little clue on how humans achieve rapid categorization. Here, we propose a statistical object recognition model based on a large set of hierarchically organized structures of natural objects. First, a large set of hierarchical object structures are obtained from natural objects. At each level of the hierarchy are a set of object structures, each of which consists of a combination of independent components of natural objects. Each object/category is then represented by a subset of these hierarchical structures and the natural variations of the object/category by a probability distribution of the underlying structures. Object recognition/categorization is performed as statistical inference. We tested this model on several large datasets and found that the model achieves a great performance on object recognition/categorization both in isolation and in natural contexts.