Purchase this article with an account.
Kenneth Hayworth, Mark Lescroart, Irving Biederman; The neural representation of spatial relationships by anatomical binding. Journal of Vision 2010;10(7):968. doi: 10.1167/10.7.968.
Download citation file:
© 2017 Association for Research in Vision and Ophthalmology.
Visual spatial relations can be signaled implicitly with cells sensitive to conjunctions of features in particular arrangements. However, such hardwired circuits are insufficient for explaining our ability to visually understand spatial relations, e.g. top-of. A neural binding mechanism (Malsburg, 1999) is required that can represent two (or more) objects simultaneously while dynamically binding relational roles to each. Time has been suggested as the binding medium--either through serial attentional fixations (Treisman, 1996) or synchronous firing (Hummel & Biederman, 1992). However, using time is problematic for several reasons, not the least of which is that such representations are no longer simple vectors of neural firing but would require circuitry for decoding and storage beyond traditional associative memory models. An alternative is that the visual system uses “anatomical binding” in which one set of neurons is used to encode features of object#1 while a separate set encodes object#2. A series of fMRI experiments designed to test predictions of these various models provides evidence for anatomical binding in a manner consistent with Object Files/FINST theory (Kahneman et al., 1992; Pylyshyn, 1989). Based on these results, we propose a Multiple Slots Multiple Spotlights model: connections within the ventral stream hierarchy are segregated among several semi-independent sets of neurons creating, in essence, multiple parallel feature hierarchies each having its own focus of attention and tracking circuitry (FINST) and each having its own feature list output (Object File). When viewing a brief presentation of a single object all ventral stream cells would respond to its features (agreeing with existing single unit and speed of recognition results). However when viewing multi-object scenes (or multi-part objects) under extended processing times (>100ms) different spotlights could be allocated to different objects (or parts) producing a final neural representation that explicitly binds feature information with relational roles.
This PDF is available to Subscribers Only