Abstract
We present a novel computational approach to visual saliency detection in dynamic natural scenes based on shape centered image features. Mid-level features, such as medial features, have been recognized as important entities in both human object recognition and in computational vision systems [Tarr & Buelthoff 1998, Kimia 2003]. [Kienzle et al 2009] have shown how image driven gaze predictors can be learned from fixations during free viewing of static natural images and result in center-surround receptive fields. Method: Our novel shape-centered vision framework provides a measure for visual saliency, and is learning free. It is based on the estimation of singularities of long ranging gradient vector flow (GVF) fields that have originally been developed for the alignment of image contours [Xu & Prince 1998]. The GVF uses an optimization scheme to guarantee preservation of gradients at contours and, simultaneously, smoothness of the flow field. The specific properties are similar to filling-in processes in the human brain. Our method reveals the properties of medial-feature shape transforms and provides a mechanism to detect shape specific information, local scale, and temporal change of scale, in clutter. The approach generates a graph which encodes the shape across a scale-space for each image. Results: We have made medial-feature transforms amenable to work in cluttered environments and have demonstrated temporal stability thus providing a mechanism to track shape over time. The approach can be used to model eye tracking data in dynamic scenes. A fast implementation will provide a useful tool for predicting shape-specific saliency at interactive framerates.
This work was supported by the EU-Project BACS FP6-IST-027140 and the Deutsche Forschungs-Gemeinschaft (DFG) Perceptual Graphics project PAK 38.