Abstract
How does the brain maintain stable fusion of 3D scenes when the eyes move? Every eye movement causes each retinal position to process a different set of scenic features, and thus the brain needs to binocularly fuse new combinations of features at each position after an eye movement. Despite these breaks in retinotopic fusion due to each movement, previously fused representations of a scene in depth appear stable. This is illustrated by moving the eyes after fusing binocular or monocular ("Magic Eye") stereograms. A neural model proposes how the brain does this by unifying concepts about how multiple cortical areas in the What and Where cortical streams interact to carry out 3D boundary and surface perception, spatial attention, invariant object category learning, predictive eye movements, and learned coordinate transformations. Data from single neuron studies and also from psychophysical studies of covert visual attention shifts prior to eye movements (Cavanagh et al., 2010; Duhamel and Goldberg, 1992; Gottlieb, 1992; Gottlieb and Snyder, 2010; Melcher, 2007; Rolfs et al., 2011) are explained. The model clarifies how perceptual, attentional, and cognitive interactions among multiple brain regions (e.g., LGN, V1, V2, V3A, V4, MT, MST, PPC, LIP, ITp, ITa, SC) may accomplish predictive remapping as part of the process whereby view-invariant object categories are learned. This model builds upon earlier neural models of 3D vision and figure-ground separation (e.g., Grossberg 1994; Grossberg and Yazdanbakhsh, 2005) and of the learning of invariant object categories as the eyes freely scan a scene (Cao, Grossberg, and Markowitz, 2011; Fazl, Grossberg, and Mingolla, 2009). A key process concerns how an object’s surface representation generates a form-fitting distribution of spatial attention, or attentional shroud, in parietal cortex that helps to maintain the stability of multiple perceptual and cognitive processes as the eyes freely scan a scene.
Meeting abstract presented at VSS 2012