Abstract
The visual system can detect the motion direction of partially occluded objects. During this process, motion signals are assigned an appropriate depth order. For example, a car moving behind a fence is perceived as moving in the background, while the fence is perceived as a static foreground. When both occluder and occluded objects are moving, their motion signals are separated in depth as well. Single cell responses in area MT correlate with the perceived 3D configuration of moving object (Bradley and Andersen, 1998) and show disparity tuning in addition to direction selectivity (Palanca and deAngelis, 2003). A laminar cortical model of form and motion processing predicts how a depth-dependent motion signal can be computed in MT based on an interaction of V2 form projections with motion inputs to MT from magno-dominated cells in layer 4B in V1. The model simulates the shape-dependent percepts of motion behind occluders (Lorenceau and Alais, 2001), coherent and incoherent motion in the chopsticks illusion, and capture of gelatinous ellipse motion by satellites moving in the same depth plane (Weiss and Adelson, 2000). Because these V2-to-MT projections are continuously tuned in depth, they clarify motion capture across depths, competition between feature-tracking signals located in different depths, and differential effects on motion capture and motion induction under transparent motion conditions (Murakami, 1999). These cross-stream interactions are predicted to share circuitry with top-down intra-stream attentional signals. Supported in part by the NSF, ONR and the NGA.