Abstract
A biologically-inspired framework for perception is proposed and implemented, which helps guide the systematic development of machine vision algorithms and methods. The core is a hierarchical Bayesian inference system. Hypotheses about objects in a visual scene are generated “bottom-up” from sensor data. These hypotheses are refined and validated “top-down” when complex objects, hypothesized at higher levels, impose new feature and location priors on the component parts of these objects at lower levels. To efficiently implement the framework, an important new contribution is to systematically utilize the concept of bottom-up saliency maps to narrow down the space of hypotheses. In addition, we let the system hallucinate top-down (manufacture its own data) at low levels given high-level hypotheses, to overcome missing data, ambiguities and noise. The implemented system is tested against images of real scenes containing simple 2D objects against various backgrounds. The system correctly recognizes the objects in 98.71% of 621 video frames, as compared to SIFT which achieves 38.00%.