There is ample evidence in the literature over the last two decades that the perception of motion is mediated by several different types of mechanisms that operate in parallel. Some aspect of motion requires only a first stage analysis (usually referred to as first order or Fourier motion) performed by directional filters (Adelson & Bergen,
1985; Burr,
1983; Burr & Ross,
1987; Watson & Ahumada,
1985). Others require a second order analysis (Badcock & Derrington,
1985; Cavanagh & Mather,
1989; Chubb & Sperling,
1988; Derrington & Badcock,
1985), usually implemented as an intrinsic nonlinearity applied at the input stage (Chubb & Sperling,
1988; Lu & Sperling,
1995,
2001). It is also widely accepted that if the power of the stimulus is not homogeneously distributed, the prevailing velocity will determine the saliency of perception (Chubb & Sperling,
1988; Georgeson & Scott-Samuel,
1999; Lu & Sperling,
1995; Zaidi & DeBonet,
2000). The stimuli used here are balanced in power and therefore the output of linear front stage mechanisms will be balanced. Even second order mechanisms based on motion energy could not explain the prevailing direction of motion and the dependence on phase (Adelson & Bergen,
1985; Adelson & Movshon,
1982; Movshon, Adelson, Gizzi, & Newsome,
1985). Second order mechanisms that compare locally outputs from different velocity would be insensitive to the phase parameter, although some of these algorithms are very sophisticated and simulate many visual motion illusions successfully (Heeger,
1987; Weiss, Simoncelli, & Adelson,
2002; Yuille & Grzywacz,
1988) and some aspect of transparent motion perception (Qian et al.,
1994a). In particular, an energy mechanism that locally measures the difference between opposite directions (Qian, Andersen, & Adelson,
1994b) can predict the transparency between simple random dot fields, but not the dependence of transparency on phase congruency, as shown by the red and green curves of
Figure 10. To simulate the data, it is necessary that a second stage mechanism, oriented in space and time, follows the nonlinearity imposed by the front-end mechanism. This suggests that to simulate nonlinear perceptual phenomena like those illustrated here (and more generally the motion of contrast modulated stimuli), the spatial nonlinearity must precede the spatiotemporal correlation stage (in agreement with recent motion perception models; Benton, Johnston, McOwan, & Victor,
2001; Chubb & Sperling,
1988; Lu & Sperling,
2001; Solomon & Sperling,
1994; Turano & Pantle,
1989; Wilson et al.,
1992). In this respect, the proposed model bears several similarities with the initial models proposed by Chubb and Sperling (
1988) and Lu and Sperling (
1995) and to the natural extension of this model where the motion is generated by features that belong to different domains like texture, color, and depth (Lu & Sperling,
2001,
2002). In all these models, it is the neuronal salience associated with the feature that is tracked over time. The standard model for luminance stimuli first performs a full-wave rectification after appropriate spatiotemporal separable filtering (called texture grabbing) and then applies standard Reichardt (
1961) model to derive velocity (van Santen & Sperling,
1985). The computation of the local energy stage can perform the same function as feature grabbing. To distinguish which of the two models more closely simulates the neuronal mechanisms, specific tests need to be devised. However, the fact that motion did not vary with the global phase between harmonics favors the local energy alternative. The full-wave rectification of the texture grabbing modulus (Chubb & Sperling,
1988) would produce quite different outputs depending on the global phases of the stimuli. Global phase changes induce dramatic changes in luminance profiles and in the Michelson contrast of the present stimuli as much as a factor of 2. A full rectification would be highly sensitive to these variations.