Abstract
Traditionally, vision has been seen to comprise low-level, mid-level, and high-level processes. To make computational tasks explicit and highlight the central role of visual attention, I propose to view vision as having the encoding, selection, and decoding stages. These three stages do not correspond to the low-level, mid-level, and high-level vision, although the encoding stage dominates in low-level vision. At encoding, representation of visual information is transformed for some purposes, e.g., photoreceptor signals are transformed to retinal ganglion responses, such that a maximum amount of input information is sent to the brain while limiting the cost on neural resources (e.g., information bandwidth at the optic nerve). At selection, a tiny fraction of visual information is admitted into the attentional bottleneck for detailed processing. For example, a spatial location is selected through a saccade towards it for scrutiny, while information at other locations are downplayed or deleted. Selection can be exogeneous when a saliency map created in the primary visual cortex (V1) guides selection by its monosynaptic projection to the superior colliculus (SC) which executes saccades. Selection can also be endogenous when, e.g., the knowledge about the location of our book guides our gaze, controlled largely by frontal and parietal brain areas which also project to SC. At decoding, properties of visual scenes, e.g., the identity and movement of an object, are inferred to become our perception and aid our movements, from a combination of the information admitted to the attentional bottleneck and internal knowledge or expectation of the visual environment. In the first approximation, encoding, selection, and decoding stages are assumed to occur consecutively along the visual pathway. Hence, exogeneous selection by V1 suggests decoding and endogenous selection by extrastriate cortices. However, feedback between stages, especially between selection and decoding, are expected to enable process iterations to improve vision.
Meeting abstract presented at VSS 2014