Here we report a new motion illusion where the prevailing motion direction is strongly influenced by the relative phase of the harmonic components of the stimulus. The basic stimulus is the sum of three sinusoidal contrast-reversing gratings: the first, the third, and the fifth harmonic of two square wave gratings that drift in opposite direction. The phase of one of the fifth components was kept constant at 180 deg, whereas the phase of the other fifth harmonic was varied over the range 0–150 deg. For each phase value of the fifth harmonic, the motion was strongly biased toward its direction, corresponding to the direction with stronger phase congruency between the three harmonics. The strength of the prevailing motion was assessed by measuring motion direction discrimination thresholds, by varying the contrast of the third and the fifth harmonics plaid pattern. Results show that the contrast of high harmonics had to be increased by more than a factor of 10, to achieve a balance of motion for phase differences greater than 60 deg between the 2 fifth harmonics. We also measured the dependence on the absolute phase of harmonic components and found that it is not an important parameter, excluding the possibility that local luminance cues could be mediating the effect.

A feature-tracking model based on previous work is proposed to simulate the data. The model computes local energy function from a pair of space-time separable front stage filters and applies a battery of directional second stage mechanisms. It is able to simulate quantitatively the phase congruency dependence illusion and the insensitivity to overall phase. Other energy models based on directional filters fail to simulate the phase congruency dependency effect.

- How are different moving components of objects grouped together in the visual scene?
- Are they grouped according to their spatial frequency content or is image segmentation based on grouping elements with similar velocity?
- Is segmentation achieved at an early stage of visual analysis or does it require a priori knowledge about the physical world and a top-down modulation from higher cognitive processes?

^{2}and luminance nonlinearities were corrected separately for each of the three guns of the monitor to balance the chromaticity on the whole range of luminance.

*w*

_{0}and

*k*

_{0}are the temporal and spatial frequencies of the fundamental equal to 2 Hz and 0.5 c/deg.

*C*is the contrast of the fundamental and

*A*the ratio of the contrast of the high frequency plaid pattern to the contrast of the fundamentals.

*φ*–

*π*. The shift by

*π*of one of the fifth harmonics was necessary to decrease the saliency of motion in that direction so it could be balanced by increasing the contrast of the high frequency plaid pattern in the motion discrimination task (see detailed explanation in the Experimental results section).

*ψ*represents the absolute phase of the stimulus. Again 1 fifth harmonic is phase shifted of

*π*(see detailed explanation in the Experimental results section).

*A*in Equations 1 and 2, and boxes in Figure 2). This is an evaluation of the transition point between perception of flicker and perception of directional motion.

*φ,*first experiment) and of absolute (

*ψ,*second experiment) phase. In particular, for any fixed value of phase, we measured the contrast of the higher frequency components (parameter

*A*in Equations 1 and 2; see box in Figure 2) to perceive with equal probability leftward and rightward motion. Contrast of each fundamental frequency (parameter

*C*in Equations 1 and 2) remained constant and set to 0.1 in the first experiment and to 0.05, 0.1, and 0.3 in the second experiment (see Figure 2). The contrast

*A*was varied adaptively with the QUEST staircase algorithm (Watson & Pelli, 1983). Subjects reported, in a single-interval 2AFC procedure with feedback, the perceived direction of motion by pressing a CB1 button (Cambridge Research Systems). Correct motion direction was assigned arbitrarily to the direction of the fifth harmonic with variable phase. For any condition tested, data were obtained with more than five QUEST staircases, each comprising more than 40 trials.

*x*.

_{i}*T*;

*β*—representing the slope of the psychometric curves—was set equal to 1.75. This value was assessed by using a two-parameter fit of the psychometric curves and by taking the average value across conditions and subjects. The variation of

*β*across conditions was not statistically significant.

*T*was defined as the contrast corresponding to 0.75 of the fitted curve. Threshold measurements were repeated for different values of phase shift ranging from 0 to 5

*π*/6. A two-interval 2AFC was used to measure contrast detection threshold of the higher frequency components, using the same analysis and fitting procedure as for the motion discrimination threshold, with

*β*set to 1.75.

*φ*= 0). When there is no phase shift (Figure 3A), two clear pairs of edges oriented at ±45 deg are visible and have similar contrast: In this case, perception of direction of motion is ambiguous (see Movie 4). When the shift is 180 deg (Figure 3B), the pair of edges along 45 deg are weaker and less defined, given that the luminance profile along this velocity approximate a more triangular waveform as a consequence of the phase shift. The stimulus in Figure 3B elicits a clear motion perception bias in the direction opposite to the phase manipulation (see Movie 5).

*φ,*we vary the contrast of the third and the fifth harmonics to discriminate a global direction of motion. An example psychometric function is shown in Figure 4 for

*φ*= 0 deg (black curve). By increasing the contrast of higher harmonics (parameter

*A*in Equation 1; see box in Figure 2), perception changes smoothly from flicker to transparent motion. When the contrast of higher harmonics is close to 0, the stimuli comprises only two sinusoidal gratings of the same amplitude, spatial frequency and speed, but opposite direction (components in the rectangle in Figure 2 are set to zero) leading to perception of flicker and to a chance performance. Conversely, when contrast of higher harmonics is set to maximum (0.03 for each third harmonic and 0.02 for each fifth harmonic), both directions of motion are perceived. This condition would still produce chance performance in a motion discrimination task. However, the phase shift of

*π*of one of the fifth components dampens the saliency of motion in the direction of this harmonic component (see Movie 5) and the transition between transparency and flicker can be assessed by measuring direction discrimination of motion. The balance for the example in Figure 4 is obtained for

*CA*≈ 0.018 for

*φ*= 0 deg (black psychometric curve) (Equation 1). However, the balance is achieved for

*CA*≈ 0.04 when the phase

*φ*= 120 deg (red psychometric curve; Movie 6). To perceive a prevailing direction of motion, the subjects have to increase the contrast of the higher harmonic components by more than a factor of two when the phases of the 2 fifth components are more similar. We will refer in the following to the salient direction of motion as to “feature motion.” The justification of the use of this term will be given in the model section.

*φ*in Equation 1). The ordinate plots the inverse of the multiplication of the parameters

*C*and

*A*in Equation 1. Quantitative results confirm previous qualitative observations. An increase in phase shift of the fifth harmonic (

*φ*) decreases sensitivity to feature motion in the direction of the component with phase

*φ*. When

*φ*is zero, a bias in motion direction is perceived at twice the detection threshold of the higher component (dashed curves of Figure 5). When

*φ*= 130 deg, more than a log unit of super-threshold contrast is needed to achieve a preference in motion direction, showing a strong dependence of motion upon the phase congruency between the various components. These results indicate that grouping across the same velocity components is enhanced when phase between harmonic is similar.

*π*/2. This means that sensitivity to feature motion is particularly low for phase shifts over

*π*/2.

*ψ*in Equation 2). Changing absolute phase changes dramatically the spatiotemporal luminance profile of the stimuli. For example, the prevailing edges of Figure 3A are transformed into lines when

*ψ*= 90 deg. However, there is almost no dependency of motion perception on phase shift. All subjects reported that when the phase offset was varied, the prevailing motion was attributed to different kinds of features, such as moving square wave gratings or sawtooth gratings. Nevertheless, as soon as the higher harmonics were perceived, the perception of flicker broke down to transparent motion. We measured the effect for three base values of the contrast of the fundamental harmonic. At higher contrasts, the sensitivity seems lower for phase offsets around 90 deg, suggesting a preference for edges than lines. However, the effect is very small, about a factor of 1.4.

*E*(

*x,t*) in any point of the image is computed by convoluting the image

*I*(

*x,t*) with pairs of band-pass spatial linear filters in quadrature phase

*F*(

_{e}*x,t*) and

*F*(

_{o}*x,t*) (Figure 7A, Equations 5 and 6). Local energy is then computed by summing the square of the outputs of convolution with each filter (Figure 7B, Equation 7). The local energy function is particularly sensitive to phase congruency: When the phases between harmonics are most similar, all the energy becomes concentrated in high peaks; for low phase congruency, the peaks become smoother and less defined. The local maxima correspond to the location of salient features. In the present algorithm (Figure 7), we did not segment the image by marking the features, but we use the local energy stage to transform the input in a function whose intensity is proportionally related to the salience of the spatial structure.

*M*

_{left}and

*M*

_{right}are the outputs of convolution of the energy with filters (

*H*

_{left}and

*H*

_{right}in Equations 8 and 9) tuned to velocities at ±4 deg/s (45 and −45 deg orientation in Figure 7C).

*O*(

*φ,K*) can be thought as a measure of motion energy contrast and is obtained by integrating the output of the second stage filters over a full period of the stimulus. It is important to note that the ratio

*O*(

*φ,K*) is not determined by the output of directional energy units, which is equivalent to applying a normalization to the output of motion opponent units, as previously used to simulate successfully the directional thresholds of two drifting grating (Georgeson & Scott-Samuel, 1999). We will show that the second stage filtering is essential to simulate the psychophysical results (see Figure 10).

*I*(

*x,t*) with a generic even filter oriented along zero velocity,

*F*(

_{e}*x,t*), is given by:

*a*

_{1}and

*a*

_{2}that represent the gain of the third and of the fifth harmonics respect to the fundamentals, respectively.

*φ*is 0 deg for the left image and 150 deg for the right image (note only half of a period is represented). When

*φ*is equal to 0 deg, energy peaks are more uniformly distributed along the negative diagonal (direction from back to front corner in the image) than when

*φ*is equal to 150 deg. If the symmetries in space-time of the energy ridges were to give the prevailing motion direction, we would predict a clear sensation of motion along this direction when

*φ*is equal to 0 deg, but not when

*φ*is equal to 150 deg. This qualitative prediction is confirmed by the experimental data of Figure 4. At the contrast of higher harmonic equal to 0.6, the stimulus at phase 0 is perceived to be drifting along the direction of the fifth harmonic, in phase with the fundamental (corresponding to −45 deg in the images of Figure 7), whereas no net motion was perceived for phase 150 deg.

*O*(

*φ,K*) as function of the contrast of the high harmonics. Figure 8B shows an example of a simulation for stimuli with phase equal to 0 and 150 deg and for a spatiotemporal profiles of second stage filters, given by difference of Gaussian distributions oriented along

*σ*and

_{i}*σ*, the space constants of the excitatory and inhibitory filters, are equal to 0.17 and 0.13 deg, respectively. The standard deviation along the orthogonal direction

_{e}*σ*is always kept constant to 1 deg, which at the stimulus speed of 4 deg/sec corresponds to 0.25 s (the dependency of the model on the exact shape of the second stage filter is shown in Figure 9).

_{l}*O*on the high frequencies contrast is not linear, given that an expansive nonlinearity is applied to the computation of the local energy function and a divisive normalizing term is used to estimate motion contrast (Equation 4). To simulate the psychophysical data, we imposed an arbitrary threshold on the motion energy contrast, chosen such to provide the best fit for the data corresponding to a phase shift of 90 deg (corresponding to a motion contrast of 0.1). For all the contrast curves at the various phase shifts, we evaluate the contrast of the higher harmonics corresponding to the same motion energy contrast threshold. Simulations are reported in Figure 9 for various choices of the free parameters. Figure 9A shows the simulation results calculated for each value of phase shift together with experimental data (black circles). In this case, the second stage filters given by Equations 8 and 9 were used. The different symbols refer to different shapes of front stage filters obtained by varying the relative gain of the various harmonics (parameters

*a*

_{1}and

*a*

_{2}in Equations 5 and 6). The factors correspond to filters with spatial bandwidth from 2.7 to 3.9 octaves and temporal bandwidth of 2.6–3.6 octaves, when the values at the three frequencies are fitted with a parabola on log units in spatial frequency (Morrone & Burr, 1988) and the temporal filter is considered very broad and flat in the temporal frequency domain. The model fits the data very well and it is robust to the shape of the front filters. It predicts also the fast decrease in sensitivity observed at 5

*π*/6, which is a peculiar characteristic of the perception of these stimuli.

*σ*= 0.13 and 0.07 deg); with excitatory and inhibitory regions (Figure 9B, square and triangles); for two different sizes of the excitatory central Gaussian (

*σ*= 0.13 and 0.17 deg); and two different sizes of the inhibitory central Gaussian (

*σ*= 0.17 and 0.3 deg). The length of the mask (given by

*σ*in Equations 8 and 9) was always kept constant. The results show that the second stage filters that best simulate the data are those with the smaller excitatory center. The presence of an inhibitory surround does not change substantially the pattern of results. The smaller filters are those that more closely match the size of the ridges of local energy.

_{l}*φ*that was equal to 0 deg, Figure 10B when the stimulus had a

*φ*that was equal to 150 deg. We measured the contrast of the harmonic required to reach threshold considering different manipulation of the two energy functions. The results are shown in Figure 10C. The scaling factors between the various curves are different and chosen arbitrarily to give the same value for the phase of 90 deg for an easy comparison between models. The contrast sensitivity did not vary with phase when the overall integral of the energy function for the filters tuned to positive and negative velocities are considered separately. This is a simple consequence of Parseval's theorem, given that the overall power of the output of the two filters is equal. However, the integral of the positive (orange symbols) and negative (green symbols) values of the difference between the outputs of the two directional filters also did not vary with phase. This indicates that an opponent stage mechanism is not sufficient per se to explain the dependency on phase. All of the above models would always predict perception of both direction of motion with no bias (remember that the scaling factor is arbitrary).

*E*

_{left}+

*E*

_{right}) is quite constant over space, and a model that considers or the maximum activity or the overall activity for the two directions also fails in simulating the phase dependency effect.