Abstract
Experiments in scene perception have shown that the human visual system makes extensive use of contextual information for facilitating object detection and recognition. However, the issue of how to formally model such contextual influences is still largely open. In this work, we introduce a perceptually motivated model of context-based object recognition. Most modeling attempts so far have defined the ‘context’ of an object in a scene in terms of other previously recognized objects within the scene. The drawback of this conceptualization is that it renders the complexity of context analysis to be at par with the problem of individual object recognition. An alternative view of context relies on using the entire scene information holistically. This approach is algorithmically attractive since it dispenses with the need for a prior step of individual object recognition. We adopt this approach in our model and represent context information holistically in terms of the global spatial layout of spectral components. We use a probabilistic framework for encoding the relationships between context and object properties. The model is designed to learn these relationships from a set of example images. Post-training, the model serves as an effective scheme for context-driven focus of attention and scale-selection on real-world scenes. Given a novel image, the scheme is able to indicate where in the image a particular object is likely to be found and at what scale. We shall present results of testing a computer implementation of this scheme to detect people at multiple scales in a variety of real images and shall also show how they compare against human performance.