Abstract
How do humans deploy attention to rapidly recognize a scene from scenic information across multiple scales? How can neural models capture this biological competence to achieve state-of-the-art scene classification? The ARTSCENE neural system models how scenic evidence can be efficiently accumulated to classify natural scene images. ARTSCENE embodies a coarse-to-fine Texture Size Ranking Principle whereby spatial attention processes multiple scales of scenic information, ranging from global gist to local properties of textures. The model can incrementally learn and predict scene identity by gist information alone and can improve performance through selective attention to scenic textures of progressively smaller size. ARTSCENE discriminates 4 landscape scene categories (coast, forest, mountain and countryside) with up to 91.58% correct on a test set, outperforms alternative models in the literature which use biologically implausible computations, and outperforms component systems that use either gist or texture information alone. Model simulations also show that adjacent textures form higher-order features that are also informative for scene recognition. The model can also be generalized to include other scenic predictors such as coherent objects.
Both authors are partially supported in part by the National Science Foundation (NSF SBE-0354378) and the Office of Naval Research (ONR N00014-01-1-0624).