Abstract
When learning a view-invariant object category, the brain does not incorrectly bind views of all the objects that the eyes scan in a scene. The ARTSCAN model predicts how spatial and object attention in the What and Where streams cooperate to selectively learn multiple views of an attended object. The model predicts that spatial attention employs an “attentional shroud” that is derived from an object's surface representation (Tyler & Kontsevich, 1995). This shroud persists during active scanning of the object. The Where stream shroud modulates view-invariant object category learning in the What stream. Surface representations compete for spatial attention to select winning shrouds. When the eyes move off an object, its shroud collapses, releasing a reset signal that stops learning of the object category in the What stream, before a new shroud forms in the Where stream and a new object category is selected. The new shroud enables multiple view categories corresponding to the shroud to be bound together in a new object category, while top-down expectations that realize object attention within the What stream stabilize object category learning. The model learns with 96% accuracy on a letter database. The model simulates reaction times in data about object-based attention: RTs are faster when responding to the non-cued end of an attended object compared to a location outside the object, and slower engagement of attention to a new object occurs if attention has to get disengaged from another object first (Brown et al., 2005).
Supported in part by NSF and ONR.