Abstract
Formal arguments exist establishing that the complexity of visual search prohibits extensive analysis of all visual content in parallel. It follows that the task of selecting important content out of the enormous pool of incoming sensory input may be regarded as a critical component of animal vision; theoretically as well as practically this remains an open, unsolved problem.
The history of this problem has seen many definitions for what comprises important visual content. This work posits a model termed Attention by Information Maximization (AIM) derived from first principles and firmly rooted in Information Theory. The proposal is a generalization of prior work (Bruce and Tsotsos, NIPS 2005) with the focus in this effort on how the model addresses classic psychophysics results.
The AIM model is derived from a single principle, specifically, that attention seeks to select visual content that is most informative in a formal sense. Although previous information theoretic models exist, we demonstrate that AIM forms a more natural definition and offer examples where existing efforts based on similar principles fail, additionally arguing that the model subsumes previous efforts based on analytic or heuristic definitions. The relation of the model to primate neural circuitry is also demonstrated.
AIM is compared to a variety of classic visual search paradigms revealing its efficacy in explaining an unprecedented range of effects such as pop-out, search efficiency, distractor heterogeneity, target and distractor familiarity, and visual search asymmetries among others. The model is described with sufficient specificity to operate on real images and is revealed to have a greater capacity to predict human gaze patterns than existing efforts. The generality of the definition allows consideration of saliency of arbitrary ensembles of neurons and examples derived from neurons coding for spatiotemporal content and complex stimuli are presented in addition to saliency based on simple V1 type cells.