Abstract
Unlike current computational models for object recognition, when humans recognize an object-image, they obtain not only the identity of the object, but also the identity and locations of a set of internal object features. We describe a computational model that combines object recognition with a process that obtains a full internal interpretation of the object structure. Our model is based on both psychophysical and computational considerations and testing. We used for the training and testing a set of closely similar image pairs, as judged by a trained classifier, in which one member of the pair contains a class example, and the other contains a non-class image. The differences between these images were used to infer the class-structures that observers identify in class images. The model includes a set of primitive features and certain relations defined over them. The primitives are divided into three types, 2-D(regions), 1-D(contours), and 0-D(points). The relations include unary (properties), binary, and more global relations between three or more primitives. Examples include local intensity extrema, parallelism and continuity between two contours, containment of point feature in region, and configurations of several contours, regions or points. The extraction of these relations employs both bottom-up and top-down mechanisms, inspired by biological vision findings, and implemented by state-of-art learning and computer vision tools. To interpret the internal structure of a novel image, our scheme first extracts a set of image measurements for candidate primitives of type points, contours, and regions, and then assigns candidates to primitives by their comparability with the set of learned relations. This assignment is the final interpretation of the object structure. We applied the scheme to object-images that are difficult to recognize for current models, to test the capacity of the interpretation scheme to validate and improve recognition. We show experimental results on several challenging categories.
Meeting abstract presented at VSS 2014