Open Access
Article  |   May 2024
Feature-invariant processing of spatial segregation based on temporal asynchrony
Author Affiliations
Journal of Vision May 2024, Vol.24, 15. doi:https://doi.org/10.1167/jov.24.5.15
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Yen-Ju Chen, Zitang Sun, Shin'ya Nishida; Feature-invariant processing of spatial segregation based on temporal asynchrony. Journal of Vision 2024;24(5):15. https://doi.org/10.1167/jov.24.5.15.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Temporal asynchrony is a cue for the perceptual segregation of spatial regions. Past research found attribute invariance of this phenomenon such that asynchrony induces perceptual segmentation regardless of the changing attribute type, and it does so even when asynchrony occurs between different attributes. To test the generality of this finding and obtain insights into the underlying computational mechanism, we compared the segmentation performance for changes in luminance, color, motion direction, and their combinations. Our task was to detect the target quadrant in which a periodic alternation in attribute was phase-delayed compared to the remaining quadrants. When stimulus elements made a square-wave attribute change, target detection was not clearly attribute invariant, being more difficult for motion direction change than for luminance or color changes and nearly impossible for the combination of motion direction and luminance or color. We suspect that waveform mismatch might cause anomalous behavior of motion direction since a square-wave change in motion direction is a triangular-wave change in the spatial phase (i.e., a second-order change in the direction of the spatial phase change). In agreement with this idea, we found that the segregation performance was strongly affected by the waveform type (square wave, triangular wave, or their combination), and when this factor was controlled, the performance was nearly, though not perfectly, invariant against attribute type. The results were discussed with a model in which different visual attributes share a common asynchrony-based segmentation mechanism.

Introduction
The human visual system uses the spatiotemporal structure of visual signals for visual segmentation and grouping. For example, the Gestalt principle of common fate states that objects moving at the same speed and direction tend to be grouped (Wertheimer, 1923). Furthermore, several studies suggest that visual attributes with similar temporal dynamics are grouped perceptually across spatial locations. Alais, Blake, and Lee (1998) demonstrated that synchronous alternation of the luminance contrast of four Gabor elements facilitated the integration of local Gabor motions into a global motion. Sekuler and Bennett (2001) made a checkerboard pattern composed of numerous cells, each sinusoidally modulated in luminance, and found that target cells with temporally phase-shifted luminance modulation were perceptually segregated. Accumulating research has shown that temporal synchrony (i.e., simultaneous alternation of visual attributes) causes a sense of grouping, whereas asynchrony causes a sense of segregation. 
These phenomena are observed for a wide range of visual attributes, such as luminance, spatial frequency (Guttman, Gilroy, & Blake, 2005), orientation (Lee & Blake, 1999), and motion direction (Kandil & Fahle, 2004; Lee & Blake, 1999; Usher & Donnelly, 1998). Furthermore, Guttman et al. (2005) tested mixed-attribute conditions in which asynchrony was produced between two attributes selected from orientation, spatial frequency, spatial phase, and contrast. The results showed that participants could segregate an area from the background by temporal asynchrony, even when asynchrony was produced between different attributes. This suggests that spatial structure can emerge from a temporal structure abstracted from dynamic changes in any attribute (Guttman et al., 2005). The attribute-invariant segmentation suggests that the computational mechanism of the asynchrony-based segregation may consist of the following stages (Figure 1): (1) separate attribute-specific channels that encode stimulus changes in a variety of attributes, (2) integration of attribute-specific signals into attribute-invariant signals, and (3) a universal process that compares attribute-invariant temporal signals to detect asynchrony across areas. 
Figure 1.
 
A possible computational mechanism that can explain attribute-invariant segmentation based on temporal asynchrony. First, the separated channels (with specific impulse response functions) encode the attribute changes at each location, then feed the outputs to the second stage, where attribute-invariant temporal signals are generated and compared across space. The spatial segmentation is produced when the comparison process detects asynchrony.
Figure 1.
 
A possible computational mechanism that can explain attribute-invariant segmentation based on temporal asynchrony. First, the separated channels (with specific impulse response functions) encode the attribute changes at each location, then feed the outputs to the second stage, where attribute-invariant temporal signals are generated and compared across space. The spatial segmentation is produced when the comparison process detects asynchrony.
This study aims to test the generality of the findings by Guttman et al. (2005) and obtain insights into the underlying computational mechanism of asynchrony-based segregation. Since Guttman et al. (2005) only used changes in achromatic stationary attributes for mixed-attribute tests, we used color changes and motion direction changes in addition to luminance changes. The visual temporal response is known to be slower for color changes (Burr & Morrone, 1993; Mullen, Thompson, & Hess, 2010) and motion changes (Werkhoven, Snippe, & Alexander, 1992) compared to luminance changes. In addition, physically synchronous changes in color and motion direction can be seen as asynchronous (Moutoussis & Zeki, 1997). One factor suggested to produce this perceptual asynchrony is a difference in modulation waveform between color and motion direction (Nishida & Johnston, 2002). These differences may affect asynchrony-based spatial segregation. 
To examine segmentation performance systematically based on temporal asynchrony involving luminance, color, motion direction, and their combinations, we used a square stimulus divided into four quadrants, each containing multiple patch elements. The elements changed one of the attributes at regular intervals, with the elements in one quadrant being phase delayed by 90° relative to the rest. By changing the temporal frequency of attribute changes and evaluating the ability of the visual system to discern the target from nontarget quadrants at each temporal frequency, we derived the temporal frequency tuning of each stimulus condition. The similarity in the tuning characteristics would indicate the degree of affinity of the underlying process across different attributes, while the dissimilarity in the tuning characteristics would suggest where the differences may exist in the processing stream. 
In what follows, we will report two experiments. Experiment A compared the asynchronous target detection performances among luminance, color, motion direction, and their combinations while using square wave as the waveform of stimulus modulation. The results showed that motion direction behaves differently from luminance and color. Experiment B introduced triangular-wave modulation in addition to square-wave modulation, applying the two waveforms to luminance, color, spatial phase, and their combinations. The motivation for conducting Experiment B was to test a hypothesis that the differences among the attributes found in Experiment A might be due to the difference in the waveform. Since Experiment A is a subset of Experiment B, we collected the data for the two experiments together to make the results of all the conditions directly comparable. We developed this data collection plan on the basis of the results of preliminary experiments. 
Experiment A—visual segmentation on luminance, color, and motion direction
Method
Apparatus
The stimuli were displayed on a VIEWPixx /3D LCD monitor (VPixx Technologies, Saint-Bruno-de-Montarville, Canada) with a 1,920-pixel × 1,080-pixel resolution and a 120 Hz refresh rate. The lowest, mean, and highest display luminance was 1.8, 48.4, and 96.7 cd/m2, respectively. The linear output of the luminance was calibrated using an i1Pro chromometer (VPixx Technologies) for each channel. Each pixel subtended 1.3 arcmin at a viewing distance of 70 cm. The participant was seated in a dark room with a chinrest stabilizing the head. The experiments were programmed using the Psychophysics Toolbox (Brainard, 1997) in MATLAB release 2022a (MathWorks, Natick, MA, USA). The study protocol followed the ethical standards of the Declaration of Helsinki except for preregistration and was approved by the Ethics Committee of Kyoto University (KUIS-EAR-2020-003). 
Participants
The experiment included two of the authors and two naive participants (four males, mean age 24.67 years). The participants declared that they had normal or corrected-to-normal vision. All the participants were informed of the nature of the research and provided informed consent before the experiment started. Participants received monetary compensation after the experiment finished. A revised version of the minimum motion method (Anstis & Cavanagh, 1983; Cavanagh, MacLeod, & Anstis, 1987) was used to measure the equiluminant ratio (in terms of the system intensity value) between the red (R) and green (G) channels for each participant before the main experiment. During the procedure, participants saw a Gabor drifting rightward. The sinusoidal wave of the R and G channels is 180° out-of-phase. Participants were required to adjust the ratio between the channels to minimize the sense of motion. The mean R/G ratio value was 0.58. 
Stimuli
In each trial, a square stimulus was displayed for 1,000 ms. The stimulus was divided into four quadrants, each composed of 8 × 8 tightly tiled 0.5° × 0.5° patches. The patches alternated in luminance, color, or motion direction. Patches in the same quadrant made changes in the same attribute and with the same timing. The changes in one of the quadrants were delayed relative to the remaining three quadrants. We postulated that this target quadrant would be segregated perceptually. 
When the luminance was modulated, the luminance profile of the patch was a Gaussian blob as follows (except under the cross-attribute conditions paired with motion direction; see below):  
\begin{eqnarray} && L\left( {x,y,t;\sigma } \right) \nonumber \\ && \quad = {\rm{\ }}{L_{mean}} \cdot \left( {1 + c\left( {t;\epsilon ,\xi ,{A_L}, {C_0}} \right) \cdot {e^{ - \frac{{{x^2} + {y^2}}}{{2{\sigma ^2}}}}}} \right) \quad \end{eqnarray}
(1)
 
\begin{eqnarray}c\left( {t;{\rm{\ }}\epsilon ,\xi ,{A_L}, {C_0}} \right) = \pm {{\rm{A}}_{\rm{L}}} \cdot Sqr\left( {t;\epsilon, \xi } \right) + {{\rm{C}}_0}\quad \end{eqnarray}
(2)
 
\begin{eqnarray}Sqr\left( {t;\epsilon ,\xi } \right) = SGN\left( {sin\left( {2\pi \xi t - \epsilon } \right)} \right)\quad \end{eqnarray}
(3)
 
\begin{eqnarray} && SGN\left( {sin\left( {2\pi \xi t - \epsilon } \right)} \right) \nonumber \\ &&\quad = \left\{ {\begin{array}{@{}r@{\quad}l@{}} 1, & sin\left( {2\pi \xi t - \epsilon } \right) \ge 0\\ - 1, & sin\left( {2\pi \xi t - \epsilon } \right) < 0 \end{array}} \right. \quad \end{eqnarray}
(4)
where x, y corresponded to the spatial location; t represents the time in seconds; σ modulates the width of the Gaussian profile, which was set as 6 arcmin; Lmean is the mean luminance; c(t) is the temporal contrast modulation function, where ξ is the temporal frequency (TF) in hertz; ε is the temporal phase shift, set as \(\frac{1}{{4\xi }}\) for the delayed quadrant, or equivalent to a 90° phase delay; AL determines the maximum modulation amplitude, which is set as 0.3; and C0 is a mean contrast, a constant drawn from the uniform distribution [–0.7, 0.7]. This term randomly shifts the waveform upward or downward, completely tiling the entire value range to exclude additional luminance cues. Sqr(t) is the square-wave function, which changed the sinusoidal function by adopting the sign function (annotated as SGN(f(t))). The sign function mapped all the positive real numbers and zero to positive 1 while mapping all the negative real numbers to negative 1. 
When the color was modulated, the luminance profile of the patch was also a Gaussian blob as follows (except under the cross-attribute conditions paired with motion direction; see below):  
\begin{eqnarray} \left\{ \begin{array}{@{}l@{}} {\rm{R}}\left( {x,y,t;\sigma } \right) = {L_{mean}} \cdot \left( {1 + c\left( {t;\epsilon ,\xi ,{A_c}, {R_0}} \right) \cdot {e^{ - \frac{{{x^2} + {y^2}}}{{2{\sigma ^2}}}}}} \right)\\ {\rm{G}}\left( {x,y,t;\sigma } \right) = - {L_{mean}} \cdot \left( {1 + {L_{iso}} \cdot c\left( {t;\epsilon ,\xi ,{A_c}, {G_0}} \right) \cdot {e^{ - \frac{{{x^2} + {y^2}}}{{2{\sigma ^2}}}}}} \right)\\ {\rm{\ B}}\left( {x,y,t} \right) = 0{\rm{\ \ \ }} \end{array} \right.\quad \end{eqnarray}
(5)
where R, G, and B represent the three color channels and Liso is the isoluminant index between R and G; Ac determines the maximum modulation amplitude and its sign for each temporal profile, either 0.5 or −0.5, was randomly assigned to each patch; and R0 and G0 is the mean chromatic contrast, a constant drawn from the uniform distribution [–0.5, 0.5], and R0 = G0. The constant is shared between R and G channel. c(t; ε, ξ, Ac, R0) and c(t; ε, ξ, Ac, G0) is the same function as Equation 2. The only difference is changing AL into Ac while changing C0 into R0 and G0 respectively. 
When the motion direction was modulated, the luminance profile of the patch was a Gabor function:  
\begin{eqnarray} && L\left( {x,y,t;\sigma ,\omega ,\theta ,\phi } \right) \nonumber \\ && \quad = {L_{mean}}\left( {1 + \cos \left( {2\pi \omega r - \phi \left( {t;\epsilon ,\xi ,{A_M},\ {\phi _0}} \right)} \right) \cdot {e^{ - \frac{{{x^2} + {y^2}}}{{2{\sigma ^2}}}}}} \right)\quad \end{eqnarray}
(6)
 
\begin{eqnarray}r = {\rm{\ }}ycos\left( \theta \right) + xsin\left( \theta \right)\quad \end{eqnarray}
(7)
 
\begin{eqnarray} && \phi \left( {t;\epsilon ,{\rm{\ }}\xi ,{A_M},\ {\phi _0}} \right) \nonumber \\ && \quad = \pm {A_M} \cdot \left[ {2\left| {2\left( {t\xi - \left\lfloor {t\xi + \frac{1}{2}} \right\rfloor } \right)} \right| - 1} \right] + {\phi _0}\quad \end{eqnarray}
(8)
where ω is the spatial frequency, set as one cycle per degree; θ is the spatial orientation, randomly drawn from the uniform distribution within the interval [ − π,  π]; ϕ(t) is the temporal modulation function for the spatial phase; and \(\lfloor {t\xi + \frac{1}{2}} \rfloor \) is the floor function that outputs the closest natural number that is not larger than the input. AM is the magnitude of shift per second, which is 90° or –90°, and ϕ0 is the mean phase, a constant drawn from the uniform distribution, which ranges [–90, 90]. In the actual code, we implemented the last function using an equivalent integral form as follows:  
\begin{eqnarray} && \phi \left( {t;\epsilon ,{\rm{\ }}\xi ,{A_M},\ {\phi _0}} \right) \nonumber \\ && \quad = \pm {{\rm{A}}_{\rm{M}}} \cdot \left[ {\frac{{4\xi }}{\mathbb{F}}\mathop \sum \limits_0^t Sqr\left( {t;\epsilon ,\xi } \right) - 1} \right]{\rm{\ }} + {\phi _0}\quad \end{eqnarray}
(9)
 
\(\mathbb{F}\) is the sampling rate of the monitor, which is 120 Hz. With this integral form, we can eliminate the artifact caused by the down-sampling from a continuous curve. 
When the luminance or color was paired with motion direction in cross-attribute conditions, we used the Gabor function:  
\begin{eqnarray} && L\left( {x,y,t;\sigma ,\omega ,\theta ,\phi } \right) \nonumber \\ && \quad = {L_{mean}} \cdot \Big( 1 + c\left( {t;\epsilon ,\xi ,{A_L}, {C_0}} \right) \nonumber \\ && \qquad \cdot \cos \left( {2\pi \omega r - \phi } \right) \cdot {e^{ - \frac{{{x^2} + {y^2}}}{{2{\sigma ^2}}}}} \Big)\quad \end{eqnarray}
(10)
and the RGB profile in the color condition:  
\begin{eqnarray} \begin{array}{@{}l@{}} \left\{ \begin{array}{@{}l@{}} R\left( {x,y,t;\sigma } \right) = {L_{mean}} \cdot \Big( 1 + c\left( {t;\epsilon ,\xi ,{A_C}, {R_0}} \right) \\ \qquad\qquad \qquad\,\, \cdot \cos \left( {2\pi \omega r - \phi } \right) \cdot {e^{ - \frac{{{x^2} + {y^2}}}{{2{\sigma ^2}}}}} \Big)\\ G\left( {x,y,t;\sigma } \right) = - {L_{mean}} \cdot \Big( 1 + {L_{iso}} \cdot c\left( {t;\epsilon ,\xi ,{A_C}, {G_0}} \right) \\ \qquad\qquad \qquad\,\, \cdot \cos \left( {2\pi \omega r - \phi } \right) \cdot {e^{ - \frac{{{x^2} + {y^2}}}{{2{\sigma ^2}}}}} \Big)\\ B\left( {x,y,t} \right) = 0{\rm{\ \ }} \end{array} \right.\\ \end{array} \nonumber \\ \end{eqnarray}
(11)
where ϕ was fixed at a value randomly drawn from [− π,  π]. c(t; ε, ξ, AL, C0) is the same formula as Equation 1. c(t; ε, ξ, AC, R0), c(t; ε, ξ, AC, G0) are the same as Equation 5
All the stimulus profiles were remapped to an 8-bit scale before being presented. Figure 2 shows a sample frame and waveform in the lower panel. 
Figure 2.
 
Stimuli for Experiment A. (A) Stimulus snapshots for within-attribute luminance (left), cross-attribute luminance and color (center), and cross-attribute luminance and motion direction (right). (B) The temporal modulation waveforms for areas D for delayed area (red) and N nondelayed area (blue). The modulation was a square wave and was temporally delayed for area D relative to area N. The polarity of modulation (solid or dashed line) was chosen randomly for each element in both areas. The red and blue planes indicate the frames in which the pattern was updated. R is the refresh rate of the monitor. The lower side, attached to the x-axis, shows the two-dimensional projection.
Figure 2.
 
Stimuli for Experiment A. (A) Stimulus snapshots for within-attribute luminance (left), cross-attribute luminance and color (center), and cross-attribute luminance and motion direction (right). (B) The temporal modulation waveforms for areas D for delayed area (red) and N nondelayed area (blue). The modulation was a square wave and was temporally delayed for area D relative to area N. The polarity of modulation (solid or dashed line) was chosen randomly for each element in both areas. The red and blue planes indicate the frames in which the pattern was updated. R is the refresh rate of the monitor. The lower side, attached to the x-axis, shows the two-dimensional projection.
Design and procedure
A method of constant stimuli with a four-alternative-forced-choice task was used. In each trial, participants viewed a 1-second stimulus clip and reported which quadrant was most visually segregated from the others by pressing the “Q,” “W,” “A,” or “S” keys on the keyboard for left upper, right upper, left lower, or right lower quadrant, respectively. The location of the delayed quadrant was randomized across the trials. No fixation point was given. Three attributes were used: luminance (Lum), color (Col), and motion direction (Dir). There were two different attribute presentation conditions: within-attribute and cross-attribute. Only one of the attributes was modulated in the within-attribute condition, whereas two of the three attributes were modulated in the cross-attribute condition. In the latter case, one attribute changed in two diagonal quadrants, while another attribute changed in the remaining two quadrants. The attribute changing in the target quadrant was chosen randomly. There were six stimulus conditions in total: three within-attribute conditions and three cross-attribute conditions (Lum–Col, Lum–Dir, and Col–Dir). 
The response was measured at 11 temporal frequencies ranging from 0 to 10 Hz in 1 Hz steps. The stimulus condition was fixed in a session, while the temporal frequency conditions were mixed. Each participant was required to complete all the conditions. The session order was randomized for each participant. There were 30 repetitions for each stimulus condition, temporal frequency, and participant combination. Six stimulus conditions for Experiment A were collected together with 15 stimulus conditions for Experiment B. In total, 21 (stimulus condition) × 11 (temporal frequency) × 30 (trial) = 6,930 trials were run by each participant. It took about 3 to 4 hours to finish Experiments A and B
Results
We computed the proportion of the correct responses (i.e., the responses choosing the delayed quadrant) to evaluate segregation performance. The grand average of all participants’ data was used to inspect the general performance. A bootstrap procedure is applied to each temporal frequency for estimating the confidence interval. Within each repetition, we resampled the four participants’ proportions correct with replacement, then calculated one mean value on the sampled proportion correct. The process was repeated 100,000 times, and the confident area was defined as the range between the 2.5% quantile and the 97.5% quantile of the repetitions. 
For conditions including only color and luminance (within-attribute luminance, within-attribute color, and cross-attribute luminance and color; the first column of Figure 3), the grand average showed high performance for all the conditions unless the temporal frequency was very high. The segmentation accuracy was perfect at times for the two within-attribute conditions. The cross-attribute condition also reached 100% accuracy for some observers. There was a minor difference in the temporal tuning among the three conditions. The frequency range at which the performance was significantly higher than the guess rate (25%) was broader for luminance (1–7 Hz) than for color (1–5 Hz) or for their combination (1–5 Hz). 
Figure 3.
 
Proportion correct as a function of the temporal frequency for each attribute pair. The first row shows the grand average curves over four participants; the second to fifth rows show individual curves separately. Different attribute pairs are shown with distinct colors and symbols. The red line with diamonds corresponds to the cross-attribute condition, while the other color lines are within-attribute conditions: The gray line with squares represents luminance, the blue line with circles represents color, and the yellow line with triangles represents motion direction. Luminance is abbreviated as Lum/L, color as Col/C, and motion direction as Dir/D. The error bar for each point represents the centralized 95% confidence interval estimated using the bootstrapping procedure in grand average, while the error bar for each participant is the centralized 95% interval of binomial distribution. The black dashed line indicates the guess rate (0.25).
Figure 3.
 
Proportion correct as a function of the temporal frequency for each attribute pair. The first row shows the grand average curves over four participants; the second to fifth rows show individual curves separately. Different attribute pairs are shown with distinct colors and symbols. The red line with diamonds corresponds to the cross-attribute condition, while the other color lines are within-attribute conditions: The gray line with squares represents luminance, the blue line with circles represents color, and the yellow line with triangles represents motion direction. Luminance is abbreviated as Lum/L, color as Col/C, and motion direction as Dir/D. The error bar for each point represents the centralized 95% confidence interval estimated using the bootstrapping procedure in grand average, while the error bar for each participant is the centralized 95% interval of binomial distribution. The black dashed line indicates the guess rate (0.25).
By contrast, the segmentation accuracy was low for conditions including motion direction (the second and third columns of Figure 3). The maximum target detection accuracy was around 60% for the within-attribute motion direction, unlike luminance and color. Additionally, the performance obtained under cross-attribute conditions rarely surpassed the guess rate significantly, indicating an inability to accomplish the task. 
Interim discussion
Experiment A measured the frequency tuning characteristics for visual segmentation across luminance, color, and motion direction to investigate whether the performance was invariant regardless of the attribute. In the within-attribute condition, the result indicated that participants could segregate the delayed quadrant regardless of the attribute, consistent with previous studies (Alais et al., 1998; Sekuler & Bennett, 2001; Usher & Donnelly, 1998). However, the performance was significantly worse for motion direction than for luminance or color. Furthermore, in the cross-attribute condition, the segmentation task was possible with the luminance–color pair but nearly impossible for the two pairs that included motion direction. The heterogeneity in segmentation performance among different stimulus conditions indicates a limitation of the simple attribute invariance hypothesis and a unique characteristic of motion direction. 
One possible factor explaining poor performances in conditions related to motion direction is the slow temporal response of motion processing. For example, the sensitivity to detect motion acceleration suggests that the temporal integration of the velocity signal is at least 100 ms (Werkhoven et al., 1992). However, this factor does not explain why we found very poor performance for cross-attribute conditions, including motion direction, even at low temporal frequencies. In addition, Maruya, Holcombe, and Nishida (2013) found that the upper limit to detect the synchrony of a pair of motion direction alternations could be high (> 10 Hz) enough to cover the frequency range we tested. Given these findings, the observed poor performances in conditions related with motion direction should not be ascribed solely to the low temporal resolution of motion processing. 
One may wonder why the performance of the within-attribute motion condition was not as good as Maruya et al. (2013) reported. This study showed high temporal limits for within-motion synchrony judgments only when the compared motion pair had a common component vector. The temporal limit dropped to ∼3 Hz when no common component existed. This drop is likely to reflect the temporal characteristics of the motion change comparison stage rather than that of the initial change detection stage. In our stimulus configuration, the elements’ motion directions were random, so they did not have a common vector component. Lacking common components may be one reason the performance of the within-attribute motion condition was not good in the present study. By somehow increasing the spatial coherence of local motion directions, we might be able to improve the performance of asynchrony-based segregation and find a different temporal characteristic. This is an issue left for further study. In what follows, we will focus on the results found with our specific motion stimuli in which elements had random orientations and directions and made 180° motion reversals. 
Cross-attribute conditions, including motion direction (luminance vs. motion, or color vs. motion), showed very poor segregation performance. Regarding this finding, we hypothesized that the mechanism underlying a color motion asynchrony (Moutoussis & Zeki, 1997) may also play some role in synchrony-based grouping and asynchrony-based segregation. In this cross-attribute timing illusion, physically synchronous alternations in color and motion direction are perceived as asynchronous, with motion direction changes being apparently delayed relative to color changes. This apparent delay could impair synchrony-based grouping between motion changes and color (or luminance) changes. One explanation for the color–motion asynchrony effect is waveform mismatching (Nishida & Johnston, 2002; for a review of their time marker theory, Fujisaki, Kitazawa, & Nishida, 2012). The modulation of color signals showed instantaneous changes, maintaining a value for a brief period and then swiftly shifting to another value. By contrast, the modulation of motion direction gradually shifted the spatial phase in one direction for a period, followed by a gradual shift in the opposite direction. Forced matching between a first-order change (a change of a nondynamic attribute that can be defined at a single measurement in time, such as a change in luminance or color) and a second-order change (a change in the direction of change, such as a reversal of motion direction or luminance change direction) would result in mismatching in time. Following this hypothesis, an effect similar to color–motion asynchrony was observed for a variety of attribute pairs between first-order change (square-wave attribute change) and second-order change (triangular-wave attribute change) (Nishida & Johnston, 2002). 
We suspected that the dissimilar tuning curve of the motion direction and the other attributes in Experiment A may also be due to the different alternation patterns of the signals. Therefore, Experiment B investigated the impact of the waveform type. 
Experiment B—visual segmentation on different waveforms
To compare the waveform effect among the attributes systematically, we applied square-wave modulation (making constant interval first-order changes) and triangular-wave modulation (making constant interval second-order changes) to the three attributes: luminance, color, and spatial phase. Note that application of a triangular-wave modulation to the spatial phase shift was equivalent to Experiment A’s motion direction change condition. If the waveform induced different segmentation abilities, we would expect three results from Experiment B. First, when using square-wave modulation of the spatial phase, the tuning curve should be similar to that for luminance and color. Second, when using triangular-wave modulation for luminance and color, the tuning curve should be similar to that for motion direction. Third, when mixing square-wave modulation and triangular-wave modulation, the tuning curve should be similar to the luminance–motion direction or the color–motion direction pair. We hypothesized that if asynchrony-based segmentation is attribute invariant, we would find similar tuning curves regardless of the attribute and the combination after equalizing the waveform among luminance, color, and spatial phase. 
Method
The experiment included two authors and two naive participants (four males, mean age 24.67 years), identical to Experiment A. The method was identical to that of Experiment A, except that two temporal modulation functions were introduced: square wave (Sqr(t)) and triangular wave (Tri(t)). The two functions with aligned changing points were defined as follows (a demonstration of the waveform is shown in Figure 4):  
\begin{eqnarray}Sqr\left( {t;\epsilon ,\xi } \right) = SGN\left( {sin\left( {2\pi \xi t - \epsilon } \right)} \right)\quad \end{eqnarray}
(12)
 
\begin{eqnarray}Tri\left( {t;\epsilon ,{\rm{\ }}\xi } \right) = \frac{{4\xi }}{\mathbb{F}}\mathop \sum \limits_0^t Sqr\left( {t;\epsilon ,\xi } \right) - 1\quad \end{eqnarray}
(13)
 
Figure 4.
 
Experiment B used square and triangular waves. (A) Six examples of time series for one element in the stimuli (30 frames for each). The top three rows follow the square-wave modulation, and the bottom three follow the triangular-wave modulation. The corresponding attributes are luminance, color, and spatial phase from top to bottom. (B) The temporal modulation waveforms for square (red) and triangular (blue) waves. The square and triangular waves are aligned in terms of the simultaneity for the update timing of the square wave and the slope alternate timing of the triangular wave. The red and blue planes indicate the timing. The polarity of modulation (solid line or dashed line) is chosen randomly for each element in the stimuli. R is the refresh rate of the monitor. The lower side, attached to the x-axis, shows the two-dimensional projection.
Figure 4.
 
Experiment B used square and triangular waves. (A) Six examples of time series for one element in the stimuli (30 frames for each). The top three rows follow the square-wave modulation, and the bottom three follow the triangular-wave modulation. The corresponding attributes are luminance, color, and spatial phase from top to bottom. (B) The temporal modulation waveforms for square (red) and triangular (blue) waves. The square and triangular waves are aligned in terms of the simultaneity for the update timing of the square wave and the slope alternate timing of the triangular wave. The red and blue planes indicate the timing. The polarity of modulation (solid line or dashed line) is chosen randomly for each element in the stimuli. R is the refresh rate of the monitor. The lower side, attached to the x-axis, shows the two-dimensional projection.
There were six different attribute conditions: luminance (Lum), color (Col), spatial phase (Pha), and their three combinations. For each, three waveform configurations were introduced: all square wave, all triangular wave, and mixed waves. In the mixed-wave condition, the quadrants on the same diagonal followed identical temporal profiles, whereas those on the other diagonal followed the other temporal profile. For example, the upper-left and lower-right quadrants followed a square wave, while the lower-left and upper-right quadrants followed a triangular wave. The square and triangular waves were considered synchronous when the timing of the change in the value of the square wave and that of the slope of the triangular wave matched in time. 
In total, there were 21 stimulus conditions: 9 within-attribute conditions (3 attribute conditions × 3 waveform conditions) and 12 cross-attribute conditions (3 attribute conditions × 4 waveform conditions). Cross-attribute conditions had one more mixed wave condition than within-attribute conditions because each attribute could be a square or triangular wave. For example, in the Lum–Col conditions, a square wave was given to Lum, while a triangular wave to Col, or vice versa. 
Since six stimulus conditions were common to Experiment A, we collected the data for the whole experiment together. The results for the six common conditions were based on the same data. 
Results
We will use L, C, and P to represent luminance, color, and spatial phase, respectively; 1 represents a square wave (first-order change), and 2 represents a triangular wave (second-order change). The top two rows of Figure 5 compare the performances for two within-attribute conditions and the corresponding cross-attribute condition separately for each attribute combination. The individual participants' curves for Figure 5 were shown in the Appendix as Figures A1A4
Figure 5.
 
Proportion correct as a function of the temporal frequency for each attribute and waveform pair. Rows 1 and 2 compare the performance among within- and cross-attribute conditions under the all-square-wave and all-triangular-wave results, respectively, with distinct colors and symbols representing attribute pairs. Luminance is abbreviated as Lum/L, color as Col/C, and phase as Pha/P; 1 stands for square-wave modulation, and 2 stands for triangular-wave modulation. Rows 3 and 4 compare performance among waveform pairs under within-attribute and cross-attribute conditions, respectively. Error bars represent the 95% confidence interval, and the black dashed line indicates the guess rate (0.25).
Figure 5.
 
Proportion correct as a function of the temporal frequency for each attribute and waveform pair. Rows 1 and 2 compare the performance among within- and cross-attribute conditions under the all-square-wave and all-triangular-wave results, respectively, with distinct colors and symbols representing attribute pairs. Luminance is abbreviated as Lum/L, color as Col/C, and phase as Pha/P; 1 stands for square-wave modulation, and 2 stands for triangular-wave modulation. Rows 3 and 4 compare performance among waveform pairs under within-attribute and cross-attribute conditions, respectively. Error bars represent the 95% confidence interval, and the black dashed line indicates the guess rate (0.25).
All square-wave conditions shown in the first row were identical to those in Figure 3, except spatial phase changes replaced motion direction changes. All within-attribute conditions resulted in a nearly perfect performance (proportion correct close to 100%) when the frequency was 1–5 Hz for luminance, 1–3 Hz for color, and 1–5 Hz for spatial phase. All cross-attribute conditions (L1–C1, L1–P1, and C1–P1) also showed good performances at lower temporal frequencies, but the effective frequency range was shrunk: 1–5 Hz for L1–C1 and C1–P1, and 1–4 Hz for C1–P1. In general, the pattern of results was more invariant against attribute variation compared with the results of Experiment A, especially for cross-attribute conditions, including the spatial phase. However, they were not perfectly attribute invariant. For within-attribute conditions, the upper temporal limit was higher for spatial phase and luminance than for color. For cross-attribute conditions, the performance was generally worse than for the within-attribute conditions. Among the three conditions, the performance was slightly worse for L1–P1 and C1–P1 than for L1–C1. 
When all the waveforms were switched to triangular waves (the second row of Figure 5), the segmentation task became much more difficult. The performances for the within-attribute conditions (L2–L2 and C2–C2) were far from perfect, even at the best temporal frequency, comparable to the performance of the P2–P2 condition (identical to the motion direction change condition in Experiment A). The lowering of the performance can also be shown by the shrinkage of the effective frequency range, 2–6 Hz in luminance, 1–3 Hz in color, and 1–6 Hz in spatial phase. The performances for the cross-attribute conditions (L2–C2, L2–P2, and C2–P2) were even poorer. The effective frequency range was reduced to 2–3 Hz for L2–C2, 3–5 Hz for L2–P2, and 1–4 Hz for C2–P2. In general, the tuning curves were not very different among different attribute conditions, as Experiment A has shown, although they were not perfectly attribute invariant, as we found with the square-wave conditions. 
The effects of the waveform can be seen more clearly in the bottom two rows of Figure 5. In the within-attribute condition (the third row), participants consistently showed the best segmentation performance with square waves and worse performance with triangular waves. The accuracy fell below the guess rate when the square and triangular waves were mixed. A similar result was obtained for the cross-attribute conditions (the fourth row), except that the overall performance was worse than the within-attribute conditions, and the tendency of the accuracy to fall below the guess rate for the mixed-wave conditions was unclear. 
Although the participants could not detect the physically delayed target quadrant in the mixed-wave conditions, they often made the inverse response, that is, the responses choosing the quadrant located diagonal to the target (Figure 6); the detailed values of the mixed-wave condition are shown in Table 1, and the individual participants' curves for Figure 6 were shown in Appendix as Figures A5A8. Although we defined the correct choice as the quadrant with a physical asynchrony in stimulus change relative to the other three quadrants regardless of other stimulus manipulations, the asynchronous quadrant may not be the one that appeared most segregated from the others for cross-attribute/order conditions. In cross-attribute/order conditions, let us assume two quadrants are A, and the others are B, that is, {A, A, B, B}. If temporal synchrony is given to one of A's, then the quadrant set can be described as {A+,A–,B–,B–}. Since B– is the majority, the question is whether the observer chooses A+ or A–. The response for A+ is the correct response, while that for A– is the inverse response. 
Figure 6.
 
The proportion of inverse responses as a function of the temporal frequency of each attribute and waveform pair.
Figure 6.
 
The proportion of inverse responses as a function of the temporal frequency of each attribute and waveform pair.
Table 1.
 
Proportion of choosing target or inverse response in mixed-wave conditions, within-attribute. Note: A bold style annotates the value significantly higher than the guess rate.
Table 1.
 
Proportion of choosing target or inverse response in mixed-wave conditions, within-attribute. Note: A bold style annotates the value significantly higher than the guess rate.
As shown in Figure 6, the proportion of the inverse responses was below the guess rate for the conditions where participants could make correct choices toward the target. However, when the waveform was not matched (i.e., mixed wave), the proportion of choosing the inverse response rose above the guess rate. The same observations as the cross-attribute conditions in the correct response, the proportion of choosing the inverse response, also decreased. Besides, the frequency range for the inverse response was similar to that of the correct responses. For within-attribute conditions, the range was 1–8 Hz for luminance, 1–3 Hz for color, and 1–8 Hz for spatial phase. For cross-attribute conditions, the range was not very clear. 
General discussion
The study explored whether visual segmentation based on temporal asynchrony is attribute invariant by comparing the performance across different attributes and combinations thereof. To achieve this, we measured the proportion of correct responses for detecting an asynchronous target as a function of the temporal frequency of stimulus changes. With the assumption of attribute invariance, the frequency tuning curve should be similar across different attributes. 
In Experiment A, similar frequency tuning curves (though with minor differences) were observed for luminance, color, and their combination but not for conditions including motion direction. Experiment B revealed that the tuning curve differences between motion direction and the other attributes could be ascribed to the waveform difference instead of the attribute difference: The alternation of the square-wave motion direction can be considered triangular-wave alternation of the spatial phase. When matching the waveform type, the tuning curves for luminance, color, and spatial phase became highly comparable. Our results support the hypothesis that perceptual segregation based on stimulus asynchrony occurs universally regardless of the attribute that carries the change signals (Guttman et al., 2005), which might be subserved by the mechanism as shown in Figure 1 while equating the waveform type and order of stimulus change is critical to make the performance comparable across different attributes. 
Effects of waveform type or the order of changes
Experiment A showed that the target detection performance was worse for motion direction changes than for luminance and color changes. Experiment B showed that the target detection performance was generally worse for second-order changes produced by triangular-wave modulations than for first-order changes produced by square-wave modulations. Regardless of the attribute and the within- or cross-attribute condition, the performance was best when there was asynchrony between first-order changes, worse when there was asynchrony between second-order changes, and worst (close to the chance level) when there was asynchrony between first- and second-order changes. 
The poor performance when mixing the first- and second-order changes can be explained by the waveform (dis)similarity. If the visual system detects (a)synchrony based on the cross-correlation–like computation of sensory signals that preserve the effect of this phase shift, a timing-aligned square and triangular waves will be perceived as asynchronous. As shown in Figure 7, the correlation peaks not when the point of first-order change (abrupt transition of a square wave) and the point of second-order change (the point of inflection of a triangular wave) are synchronized but at about 90° out of phase. This is because of a 90° phase shift of the fundamental frequency between square and triangular waves. Our finding that the inverse response increased for cross-waveform conditions supports the cross-correlation calculation idea. In the cross-waveform conditions, the inverse response was the response of choosing a quadrant diagonally opposite the target. The inverse-response quadrant had the same waveform as the target while having a 90° phase shift from the target. The target's fundamental frequency phase is closer to the remaining nontarget quadrants with a square-wave modulation. Note that two opposite phase modulations (e.g., 0° and 180°) were mixed in one region, so the phase shift of –90° and +90° did not matter much in the current argument. Our findings, therefore, are consistent with the hypothesis that the visual system compares the time courses of relatively raw sensory signals by computing the cross-area (dis)similarity through cross-correlation–like computation. However, there remains a higher degree of ambiguity in the algorithm details. On the other hand, these findings are inconsistent with the hypothesis that the visual system extracts universal temporal features (messengers) from stimulus changes in any order and then compares their timings across space to group or segment areas (Guttman et al., 2005). 
Figure 7.
 
Pearson's correlation coefficient as the function of the phase shift of a triangular wave relative to a square wave. The correlation between square and triangular waves is zero when the phase shift between the changing points is zero (i.e., the changing timings are aligned). In contrast, the correlation peaks when the phase shift between the changing points is 90° (misaligned) since the fundamental frequencies are aligned.
Figure 7.
 
Pearson's correlation coefficient as the function of the phase shift of a triangular wave relative to a square wave. The correlation between square and triangular waves is zero when the phase shift between the changing points is zero (i.e., the changing timings are aligned). In contrast, the correlation peaks when the phase shift between the changing points is 90° (misaligned) since the fundamental frequencies are aligned.
Note that the cross-correlation computation does not contradict the previous findings about visual segmentation induced by timing misalignment (Kandil & Fahle, 2001; Sekuler & Bennett, 2001; for a review, Blake & Lee, 2005). The greater the timing misalignment (or the temporal phase shift) between the matched wave becomes, the lower the waveform similarity (e.g., cross-correlation) will be. Segmentation based on timing misalignment and that based on temporal waveform dissimilarity will predict similar results. 
Nevertheless, simple cross-correlation computation does not explain why the asynchrony between second-order changes is less effective than between first-order changes. Even if the algorithm of computing temporal asynchrony adopted in the signal comparison stage (i.e., the second stage in Figure 1) is shared across all the attributes and waveforms, they can affect the resolution of the temporal asynchrony detection by changing the temporal characteristics of the signals encoded in the first stage and to be compared in the second stage. In addition, some mechanisms included in the comparison stage, such as temporal integration of the sensory signal, might make the asynchrony between triangular waves less salient than between square waves. In the current study, we cannot specify the precise source of the waveform effect, and we are currently developing detailed computational models of the segmentation process. 
Attribute-invariant process
Despite the general tendency of attribute invariance, some of our findings were incompatible with strict attribute invariance. First, there are minor differences among attributes even under the matched waveform conditions. For instance, we found lower performance at high temporal frequencies for color than for luminance. This difference may reflect how the early visual mechanisms respond to different attributes. Psychophysical research shows that the visual impulse response is more sluggish for color than for luminance (Burr & Morrone, 1993), which is at least qualitatively supported by physiological findings (Mullen et al., 2010). Moreover, various stimulus factors, including contrast and spatial frequencies, influence the temporal response of the visual system (Watson, 1986). The variation of the tuning characteristics may at least partially be ascribed to the difference in the temporal characteristics (e.g., impulse response) of local attribute alternation detectors, the first stage in Figure 1. Further investigation is needed to determine the extent to which changes in the impulse response function contribute to the observed variation in temporal tuning functions associated with asynchrony-based segregation. 
The second finding incompatible with strict attribute invariance is that the performance was slightly worse for cross-attribute conditions than for within-attribute conditions, regardless of the attribute pairs. Explanation of this difference obviously requires a process beyond local attribute signal encoding. Some loss of performance for the cross-attribute may be ascribed to the mismatch in temporal response between different attributes to be compared in the second stage in Figure 1. We may also need to consider additional noise factors associated with comparing and integrating different attributes across space (Meese, 2010). 
Concerning the processing mechanism, our findings are consistent with the hypothesis that asynchrony detection for a variety of attribute combinations is processed by a single common comparison mechanism (Figure 1). Critically, we found that tuning functions for different attributes were similar in shape as long as we matched the waveform. We also found that the tuning functions of the cross-attribute conditions were similar in shape to those of the corresponding within-attribute conditions and that the cross-attribute performance was never superior to the corresponding within-attribute conditions. Although a few deviations from strict attribute invariance were observed in current data, they do not contradict the common mechanism hypothesis, since they can be attributed to the early stages’ tuning characteristics or mismatch of the compared signals. 
However, functional attribute invariance in psychophysics does not necessarily imply a common underlying neural process. Our results are also consistent with an alternative hypothesis that the visual system has multiple comparison mechanisms specialized for specific attributes or for within- and cross-attributes, and all of them share similar processing characteristics. Further exploration is needed using both modeling and experimentation to draw a clear conclusion about the underlying mechanism. 
Related studies
Although studies have proposed that the timing of motion direction alternations could serve as an effective cue for asynchrony-based segmentation (Kandil & Fahle, 2004; Lee & Blake, 1999; Usher & Donnelly, 1998), at least under the conditions we tested, a direction reversal of continuous motion is found to be less effective than a luminance or color change as a cue for within-attribute asynchrony-based segregation. On the other hand, we demonstrate that an abrupt spatial phase shift, which induces an apparent motion, is as effective as a luminance or color change. 
Our position differs from that of Farid and Adelson (2001), who claimed that it is not the motion direction reversal per se but rather the transient luminance contrast change produced by the direction reversal that acts as the temporal cue for segregation based on direction change asynchrony. Using their explanation, we expected that the segmentation would be easy when the motion direction change (P2) was paired with the luminance change (L1) since it is effectively an asynchronous detection of luminance changes. However, we found that the performance of the L1–P2 condition was low, close to the chance level. We accept that changes in the spatial phase or motion direction could be an effective cue for visual segmentation. 
Lee and Blake (1999) proposed that differences in temporal structure between areas, rather than simple temporal asynchrony, are a strong segmentation cue. They showed that even when the alternation timing was occasionally aligned in a random sequence, participants still could perform the visual segmentation if the correlation of the two alternation timing series was low enough (see also Blake & Lee, 2005; Guttman, Gilroy & Blake, 2007). These studies used random temporal alternations, whereas we used regular alternations. Motion direction changes at random intervals seem to be an effective segmentation cue. Why? One possibility is that differences in random temporal structure and asynchrony between repetitive changes drive different segmentation mechanisms, as Lee and Blake (1999) proposed. An alternative possibility is that they drive the same asynchrony-sensitive mechanism, but long intervals between each motion direction alternation occasionally appearing in the sequence may improve segmentation performance for random structure stimuli. Stated another way, since segregation is generally easier at low-temporal-frequency alternations, and a random sequence contains low-temporal-frequency modulation components, segmentation might be easier for random sequences than for rapid repetitive alternations. These issues should be investigated further. 
Potential implementation of the attribute-invariant computation on neurophysiology
Our conceptual framework for the underlying processing assumes a set of independent channels for luminance, color, and motion direction, potentially featuring significantly different temporal tunings for detecting alternations at the first stage. However, the underlying neural structures for detecting these attributes cannot be well specified, and they might exhibit some interaction and overlap. For instance, neurophysiological studies have highlighted that tuning characteristics of chromatic and achromatic stimuli may not be markedly different in the primary stages of visual information processing, such as LGN or V1 (D'Souza, Auer, Strasburger, Frahm, & Lee, 2011; Mullen et al., 2010; for a book chapter, Kremers, Baraas, & Marshall, 2016), as compared to those indicated by psychophysical performances. There is also evidence showing that the neurons in V4 may also show direction selectivity, suggesting the overlapping mechanism for color and motion (Tolias, Keliris, Smirnakis, & Logothetis, 2005). Notably, the activity of a set of neurons does not necessarily explain perceptual experience. As Jiang, Zhou, and He (2007) demonstrated, a color patch flickering at sufficiently high frequency can be perceived as static, yet the response in V1 to V4 cortex persists, indicating a dissociation between subjective experience and neuron response in the early cortex response. Hence, we cannot conclude which set of neurons corresponds to the assumed first stage. The actual neural architecture may be considerably more complex than we currently assume. 
Regarding the asynchrony comparator in the second stage, there is also no consensus about which area in the cortex is responsible for determining temporal relationships. Also, the area might be distributed over the cortex. It is suggested that the inferior parietal lobe (IPL) in the right hemisphere deficiency impairs the out-of-phase detection (Battelli, Cavanagh, Martini, & Barton, 2003), while the insula is activated for visual-audio asynchrony detection (Bushara, Grafman, & Hallett, 2001). 
We do not necessarily assume direct correspondence between the underlying neural architecture and the model shown in Figure 1. However, even though the channels or architectures can be highly complicated and interactive, we expect the algorithm of neural signal processing to be consistent, particularly in extracting the amodal signal for temporal relationship comparison and asynchrony signal detection. Recent magnetic resonance imaging (MRI) studies, such as those conducted by Groen et al. (2022) and Zhou, Benson, Kay, and Winawer (2018), reveal analogous linear–nonlinear structures representing temporal information from V1 to V3. Additionally, correlation calculation is frequently considered a candidate for multisensory asynchrony detection algorithms, as observed in studies predicting electroencephalogram or MRI data (Parise & Ernst, 2016; Pesnot-Lerousseau, Parise, Ernst, & Van Wassenhove, 2022). The recent study also points out that spatiotemporal correlation can be a computation for the system conducting the spatial pooling or integration (Mehrani & Tsotsos, 2023; Sun, Chen, Yang, & Nishida, 2023a; Sun, Chen, Yang, & Nishida, 2023b). 
Conclusions
Existing studies partially support an attribute-invariant process that relies primarily on temporal information for the visual segmentation task. In our experiments, we assessed segmentation performance across a more diverse attribute setup and demonstrated the prominence of an attribute-invariant computation when waveform matching is ensured. The observed impact of the waveform challenges the previously posited hypothesis that the simple timing of changes, irrespective of temporal order and attribute, can be compared directly across space. Our discoveries prompt a reevaluation of the hypothesis. A deeper exploration into how simultaneity is determined across space is essential to uncover the mechanism underlying spatiotemporal visual segmentation. 
Acknowledgments
Supported by MEXT/JSPS KAKENHI (Japan) 20H00603 and 20H05605 to SN and JST, as well as the establishment of university fellowships toward the creation of science technology innovation, Grant No. JPMJFS212 to YJC and ZTS. 
Resource availability: All the resources, including the experiment code, demo video, raw data, high-resolution figure in the article, and the bootstrapping result, are uploaded on the Open Science Framework platform, with the following link to access the project: https://osf.io/ye4ag/?view_only=8b700fda62254778887b7eac90a864cb
Commercial relationships: none. 
Corresponding author: Yen-Ju Chen. 
Email: chen.yenju.44z@st.kyoto-u.ac.jp. 
Address: Graduate School of Informatics, Kyoto University, Kyoto 606-8501, Japan. 
References
Alais, D., Blake, R., & Lee, S.-H. (1998). Visual features that vary together over time group together over space. Nature Neuroscience, 1(2), 160–164, https://doi.org/10.1038/414. [CrossRef] [PubMed]
Anstis, S., & Cavanagh, P. (1983). A minimum motion technique for judging equiluminance. Toronto: York University, http://wexler.free.fr/library/files/anstis%20(1983)%20a%20minimum%20motion%20technique%20for%20judging%20equiluminance.pdf.
Battelli, L., Cavanagh, P., Martini, P., & Barton, J. J. S. (2003). Bilateral deficits of transient visual attention in right parietal patients. Brain, 126(10), 2164–2174, https://doi.org/10.1093/brain/awg221. [CrossRef] [PubMed]
Blake, R., & Lee, S.-H. (2005). The role of temporal structure in human vision. Behavioral and Cognitive Neuroscience Reviews, 4(1), 21–42, https://doi.org/10.1177/1534582305276839. [CrossRef] [PubMed]
Brainard, D. H. (1997). The psychophysics toolbox. Spatial Vision, 10(4), 433–436, https://doi.org/10.1163/156856897X00357. [CrossRef] [PubMed]
Burr, D. C., & Morrone, M. C. (1993). Impulse-response functions for chromatic and achromatic stimuli. Journal of the Optical Society of America A, 10(8), 1706, https://doi.org/10.1364/JOSAA.10.001706. [CrossRef]
Bushara, K. O., Grafman, J., & Hallett, M. (2001). Neural correlates of auditory–visual stimulus onset asynchrony detection. Journal of Neuroscience, 21(1), 300–304, https://doi.org/10.1523/JNEUROSCI.21-01-00300.2001. [CrossRef]
Cavanagh, P., MacLeod, D. I. A., & Anstis, S. M. (1987). Equiluminance: Spatial and temporal factors and the contribution of blue-sensitive cones. Journal of the Optical Society of America A, 4(8), 1428, https://doi.org/10.1364/JOSAA.4.001428. [CrossRef]
D'Souza, D. V., Auer, T., Strasburger, H., Frahm, J., & Lee, B. B. (2011). Temporal frequency and chromatic processing in humans: An fMRI study of the cortical visual areas. Journal of Vision, 11(8), 8, https://doi.org/10.1167/11.8.8. [CrossRef] [PubMed]
Farid, H., & Adelson, E. H. (2001). Synchrony does not promote grouping in temporally structured displays. Nature Neuroscience, 4(9), 875–876, https://doi.org/10.1038/nn0901-875. [CrossRef] [PubMed]
Fujisaki, W., Kitazawa, S., & Nishida, S. (2012). Multisensory timing. In Stein, B. E. (Ed.), The new handbook of multisensory processing (pp. 301–318). Cambridge, MA: MIT Press, https://doi.org/10.7551/mitpress/8466.003.0026.
Groen, I. I. A., Piantoni, G., Montenegro, S., Flinker, A., Devore, S., Devinsky, O., ... Winawer, J. (2022). Temporal dynamics of neural responses in human visual cortex. The Journal of Neuroscience, 42(40), 7562–7580, https://doi.org/10.1523/JNEUROSCI.1812-21.2022. [CrossRef]
Guttman, S. E., Gilroy, L. A., & Blake, R. (2005). Mixed messengers, unified message: Spatial grouping from temporal structure. Vision Research, 45(8), 1021–1030, https://doi.org/10.1016/j.visres.2004.10.014. [CrossRef] [PubMed]
Guttman, S. E., Gilroy, L. A., & Blake, R. (2007). Spatial grouping in human vision: Temporal structure trumps temporal synchrony. Vision Research, 47(2), 219–230, https://doi.org/10.1016/j.visres.2006.09.012. [CrossRef] [PubMed]
Jiang, Y., Zhou, K., & He, S. (2007). Human visual cortex responds to invisible chromatic flicker. Nature Neuroscience, 10, 657–662, https://doi.org/10.1038/nn1879. [CrossRef] [PubMed]
Kandil, F. I., & Fahle, M. (2001). Purely temporal figure–ground segregation. European Journal of Neuroscience, 13(10), 2004–2008. [CrossRef]
Kandil, F. I., & Fahle, M. (2004). Figure–ground segregation can rely on differences in motion direction. Vision Research, 44(27), 3177–3182, https://doi.org/10.1016/j.visres.2004.07.027. [CrossRef] [PubMed]
Kremers, J., Baraas, R. C., & Marshall, N. J. (Eds.). (2016). Human color vision. Cham, Switzerland: Springer International Publishing, https://doi.org/10.1007/978-3-319-44978-4.
Lee, S.-H., & Blake, R. (1999). Visual form created solely from temporal structure. Science, 284(5417), 1165–1168, https://doi.org/10.1126/science.284.5417.1165. [CrossRef] [PubMed]
Maruya, K., Holcombe, A. O., & Nishida, S. (2013). Rapid encoding of relationships between spatially remote motion signals. Journal of Vision, 13(2), 4, https://doi.org/10.1167/13.2.4. [CrossRef] [PubMed]
Meese, T. S. (2010). Spatially extensive summation of contrast energy is revealed by contrast detection of micro-pattern textures. Journal of Vision, 10(8), 14, https://doi.org/10.1167/10.8.14. [CrossRef] [PubMed]
Mehrani, P., & Tsotsos, J. K. (2023). Self-attention in vision transformers performs perceptual grouping, not attention. http://arxiv.org/abs/2303.01542.
Moutoussis, K., & Zeki, S. (1997). A direct demonstration of perceptual asynchrony in vision. Proceedings of the Royal Society of London. Series B: Biological Sciences, 264(1380), 393–399, https://doi.org/10.1098/rspb.1997.0056. [CrossRef]
Mullen, K. T., Thompson, B., & Hess, R. F. (2010). Responses of the human visual cortex and LGN to achromatic and chromatic temporal modulations: An fMRI study. Journal of Vision, 10(13), 13, https://doi.org/10.1167/10.13.13. [CrossRef] [PubMed]
Nishida, S., & Johnston, A. (2002). Marker correspondence, not processing latency, determines temporal binding of visual attributes. Current Biology, 12(5), 359–368, https://doi.org/10.1016/S0960-9822(02)00698-X. [CrossRef]
Parise, C. V., & Ernst, M. O. (2016). Correlation detection as a general mechanism for multisensory integration. Nature Communications, 7(1), 11543, https://doi.org/10.1038/ncomms11543. [CrossRef] [PubMed]
Pesnot-Lerousseau, J., Parise, C., Ernst, M., & Van Wassenhove, V. (2022). Multisensory correlation computations in the human brain identified by a time-resolved encoding model. Nature Communications, 13, 2489, https://doi.org/10.1038/s41467-022-29687-6. [CrossRef] [PubMed]
Sekuler, A. B., & Bennett, P. J. (2001). Generalized common fate: Grouping by common luminance changes. Psychological Science, 12(6), 437–444, https://doi.org/10.1111/1467-9280.00382. [CrossRef] [PubMed]
Sun, Z., Chen, Y.-J., Yang, Y.-H., & Nishida, S. (2023a). Modeling of human motion perception mechanism: A simulation based on deep neural network and attention transformer. Journal of Vision, 23(9), 4894, https://doi.org/10.1167/jov.23.9.4894. [CrossRef]
Sun, Z., Chen, Y.-J., Yang, Y.-H., & Nishida, S. (2023b, November 2). Modeling human visual motion processing with trainable motion energy sensing and a self-attention network. Paper presented at the 37th Conference on Neural Information Processing Systems, https://openreview.net/forum?id=tRKimbAk5D.
Tolias, A. S., Keliris, G. A., Smirnakis, S. M., & Logothetis, N. K. (2005). Neurons in macaque area V4 acquire directional tuning after adaptation to motion stimuli. Nature Neuroscience, 8(5), 591–593, https://doi.org/10.1038/nn1446. [CrossRef] [PubMed]
Usher, M., & Donnelly, N. (1998). Visual synchrony affects binding and segmentation in perception. Nature, 394(6689), 179–182, https://doi.org/10.1038/28166. [CrossRef] [PubMed]
Watson, A. B. (1986). Temporal sensitivity. Handbook of Perception and Human Performance, 1(6), 1–43.
Werkhoven, P., Snippe, H. P., & Alexander, T. (1992). Visual processing of optic acceleration. Vision Research, 32(12), 2313–2329, https://doi.org/10.1016/0042-6989(92)90095-Z. [CrossRef] [PubMed]
Wertheimer, M. (1923). Laws of organization in perceptual forms. First published as Untersuchungen zur Lehre von der Gestalt II. Psychologische Forschung, 4, 301–350. [CrossRef]
Zhou, J., Benson, N. C., Kay, K. N., & Winawer, J. (2018). Compressive temporal summation in human visual cortex. The Journal of Neuroscience, 38(3), 691–709, https://doi.org/10.1523/JNEUROSCI.1724-17.2017. [CrossRef]
Appendix
Figure A1.
 
Individual participants' curves for proportion correct as functions of temporal frequency for within and cross attribute conditions with square wave modulation.
Figure A1.
 
Individual participants' curves for proportion correct as functions of temporal frequency for within and cross attribute conditions with square wave modulation.
Figure A2.
 
Individual participant's curves for proportion correct as functions of temporal frequency for within and cross attribute conditions with triangular wave modulation.
Figure A2.
 
Individual participant's curves for proportion correct as functions of temporal frequency for within and cross attribute conditions with triangular wave modulation.
Figure A3.
 
Individual participant's curves for proportion correct as functions of temporal frequency for the same attribute condition with various waveform modulations.
Figure A3.
 
Individual participant's curves for proportion correct as functions of temporal frequency for the same attribute condition with various waveform modulations.
Figure A4.
 
Individual participant's curves for proportion correct as functions of temporal frequency for the cross attribute condition with various waveform modulations.
Figure A4.
 
Individual participant's curves for proportion correct as functions of temporal frequency for the cross attribute condition with various waveform modulations.
Figure A5.
 
Individual participant's curves for proportion of inverse resonses as functions of temporal frequency for within and cross attribute conditions with square wave modulation.
Figure A5.
 
Individual participant's curves for proportion of inverse resonses as functions of temporal frequency for within and cross attribute conditions with square wave modulation.
Figure A6.
 
Individual particilant's curves for proportion of inverse response as functions of temporal frequency for within and cross attribute conditions with triangular wave modulation.
Figure A6.
 
Individual particilant's curves for proportion of inverse response as functions of temporal frequency for within and cross attribute conditions with triangular wave modulation.
Figure A7.
 
Individual participant's curves for proportion of inverse response as functions of temporal frequency for the same attribute conditions with various waveform modulations.
Figure A7.
 
Individual participant's curves for proportion of inverse response as functions of temporal frequency for the same attribute conditions with various waveform modulations.
Figure A8.
 
Individual participant's curve for proportion of inverse response as functions of temporal frequency for cross attribute conditions with various waveforms.
Figure A8.
 
Individual participant's curve for proportion of inverse response as functions of temporal frequency for cross attribute conditions with various waveforms.
Figure 1.
 
A possible computational mechanism that can explain attribute-invariant segmentation based on temporal asynchrony. First, the separated channels (with specific impulse response functions) encode the attribute changes at each location, then feed the outputs to the second stage, where attribute-invariant temporal signals are generated and compared across space. The spatial segmentation is produced when the comparison process detects asynchrony.
Figure 1.
 
A possible computational mechanism that can explain attribute-invariant segmentation based on temporal asynchrony. First, the separated channels (with specific impulse response functions) encode the attribute changes at each location, then feed the outputs to the second stage, where attribute-invariant temporal signals are generated and compared across space. The spatial segmentation is produced when the comparison process detects asynchrony.
Figure 2.
 
Stimuli for Experiment A. (A) Stimulus snapshots for within-attribute luminance (left), cross-attribute luminance and color (center), and cross-attribute luminance and motion direction (right). (B) The temporal modulation waveforms for areas D for delayed area (red) and N nondelayed area (blue). The modulation was a square wave and was temporally delayed for area D relative to area N. The polarity of modulation (solid or dashed line) was chosen randomly for each element in both areas. The red and blue planes indicate the frames in which the pattern was updated. R is the refresh rate of the monitor. The lower side, attached to the x-axis, shows the two-dimensional projection.
Figure 2.
 
Stimuli for Experiment A. (A) Stimulus snapshots for within-attribute luminance (left), cross-attribute luminance and color (center), and cross-attribute luminance and motion direction (right). (B) The temporal modulation waveforms for areas D for delayed area (red) and N nondelayed area (blue). The modulation was a square wave and was temporally delayed for area D relative to area N. The polarity of modulation (solid or dashed line) was chosen randomly for each element in both areas. The red and blue planes indicate the frames in which the pattern was updated. R is the refresh rate of the monitor. The lower side, attached to the x-axis, shows the two-dimensional projection.
Figure 3.
 
Proportion correct as a function of the temporal frequency for each attribute pair. The first row shows the grand average curves over four participants; the second to fifth rows show individual curves separately. Different attribute pairs are shown with distinct colors and symbols. The red line with diamonds corresponds to the cross-attribute condition, while the other color lines are within-attribute conditions: The gray line with squares represents luminance, the blue line with circles represents color, and the yellow line with triangles represents motion direction. Luminance is abbreviated as Lum/L, color as Col/C, and motion direction as Dir/D. The error bar for each point represents the centralized 95% confidence interval estimated using the bootstrapping procedure in grand average, while the error bar for each participant is the centralized 95% interval of binomial distribution. The black dashed line indicates the guess rate (0.25).
Figure 3.
 
Proportion correct as a function of the temporal frequency for each attribute pair. The first row shows the grand average curves over four participants; the second to fifth rows show individual curves separately. Different attribute pairs are shown with distinct colors and symbols. The red line with diamonds corresponds to the cross-attribute condition, while the other color lines are within-attribute conditions: The gray line with squares represents luminance, the blue line with circles represents color, and the yellow line with triangles represents motion direction. Luminance is abbreviated as Lum/L, color as Col/C, and motion direction as Dir/D. The error bar for each point represents the centralized 95% confidence interval estimated using the bootstrapping procedure in grand average, while the error bar for each participant is the centralized 95% interval of binomial distribution. The black dashed line indicates the guess rate (0.25).
Figure 4.
 
Experiment B used square and triangular waves. (A) Six examples of time series for one element in the stimuli (30 frames for each). The top three rows follow the square-wave modulation, and the bottom three follow the triangular-wave modulation. The corresponding attributes are luminance, color, and spatial phase from top to bottom. (B) The temporal modulation waveforms for square (red) and triangular (blue) waves. The square and triangular waves are aligned in terms of the simultaneity for the update timing of the square wave and the slope alternate timing of the triangular wave. The red and blue planes indicate the timing. The polarity of modulation (solid line or dashed line) is chosen randomly for each element in the stimuli. R is the refresh rate of the monitor. The lower side, attached to the x-axis, shows the two-dimensional projection.
Figure 4.
 
Experiment B used square and triangular waves. (A) Six examples of time series for one element in the stimuli (30 frames for each). The top three rows follow the square-wave modulation, and the bottom three follow the triangular-wave modulation. The corresponding attributes are luminance, color, and spatial phase from top to bottom. (B) The temporal modulation waveforms for square (red) and triangular (blue) waves. The square and triangular waves are aligned in terms of the simultaneity for the update timing of the square wave and the slope alternate timing of the triangular wave. The red and blue planes indicate the timing. The polarity of modulation (solid line or dashed line) is chosen randomly for each element in the stimuli. R is the refresh rate of the monitor. The lower side, attached to the x-axis, shows the two-dimensional projection.
Figure 5.
 
Proportion correct as a function of the temporal frequency for each attribute and waveform pair. Rows 1 and 2 compare the performance among within- and cross-attribute conditions under the all-square-wave and all-triangular-wave results, respectively, with distinct colors and symbols representing attribute pairs. Luminance is abbreviated as Lum/L, color as Col/C, and phase as Pha/P; 1 stands for square-wave modulation, and 2 stands for triangular-wave modulation. Rows 3 and 4 compare performance among waveform pairs under within-attribute and cross-attribute conditions, respectively. Error bars represent the 95% confidence interval, and the black dashed line indicates the guess rate (0.25).
Figure 5.
 
Proportion correct as a function of the temporal frequency for each attribute and waveform pair. Rows 1 and 2 compare the performance among within- and cross-attribute conditions under the all-square-wave and all-triangular-wave results, respectively, with distinct colors and symbols representing attribute pairs. Luminance is abbreviated as Lum/L, color as Col/C, and phase as Pha/P; 1 stands for square-wave modulation, and 2 stands for triangular-wave modulation. Rows 3 and 4 compare performance among waveform pairs under within-attribute and cross-attribute conditions, respectively. Error bars represent the 95% confidence interval, and the black dashed line indicates the guess rate (0.25).
Figure 6.
 
The proportion of inverse responses as a function of the temporal frequency of each attribute and waveform pair.
Figure 6.
 
The proportion of inverse responses as a function of the temporal frequency of each attribute and waveform pair.
Figure 7.
 
Pearson's correlation coefficient as the function of the phase shift of a triangular wave relative to a square wave. The correlation between square and triangular waves is zero when the phase shift between the changing points is zero (i.e., the changing timings are aligned). In contrast, the correlation peaks when the phase shift between the changing points is 90° (misaligned) since the fundamental frequencies are aligned.
Figure 7.
 
Pearson's correlation coefficient as the function of the phase shift of a triangular wave relative to a square wave. The correlation between square and triangular waves is zero when the phase shift between the changing points is zero (i.e., the changing timings are aligned). In contrast, the correlation peaks when the phase shift between the changing points is 90° (misaligned) since the fundamental frequencies are aligned.
Figure A1.
 
Individual participants' curves for proportion correct as functions of temporal frequency for within and cross attribute conditions with square wave modulation.
Figure A1.
 
Individual participants' curves for proportion correct as functions of temporal frequency for within and cross attribute conditions with square wave modulation.
Figure A2.
 
Individual participant's curves for proportion correct as functions of temporal frequency for within and cross attribute conditions with triangular wave modulation.
Figure A2.
 
Individual participant's curves for proportion correct as functions of temporal frequency for within and cross attribute conditions with triangular wave modulation.
Figure A3.
 
Individual participant's curves for proportion correct as functions of temporal frequency for the same attribute condition with various waveform modulations.
Figure A3.
 
Individual participant's curves for proportion correct as functions of temporal frequency for the same attribute condition with various waveform modulations.
Figure A4.
 
Individual participant's curves for proportion correct as functions of temporal frequency for the cross attribute condition with various waveform modulations.
Figure A4.
 
Individual participant's curves for proportion correct as functions of temporal frequency for the cross attribute condition with various waveform modulations.
Figure A5.
 
Individual participant's curves for proportion of inverse resonses as functions of temporal frequency for within and cross attribute conditions with square wave modulation.
Figure A5.
 
Individual participant's curves for proportion of inverse resonses as functions of temporal frequency for within and cross attribute conditions with square wave modulation.
Figure A6.
 
Individual particilant's curves for proportion of inverse response as functions of temporal frequency for within and cross attribute conditions with triangular wave modulation.
Figure A6.
 
Individual particilant's curves for proportion of inverse response as functions of temporal frequency for within and cross attribute conditions with triangular wave modulation.
Figure A7.
 
Individual participant's curves for proportion of inverse response as functions of temporal frequency for the same attribute conditions with various waveform modulations.
Figure A7.
 
Individual participant's curves for proportion of inverse response as functions of temporal frequency for the same attribute conditions with various waveform modulations.
Figure A8.
 
Individual participant's curve for proportion of inverse response as functions of temporal frequency for cross attribute conditions with various waveforms.
Figure A8.
 
Individual participant's curve for proportion of inverse response as functions of temporal frequency for cross attribute conditions with various waveforms.
Table 1.
 
Proportion of choosing target or inverse response in mixed-wave conditions, within-attribute. Note: A bold style annotates the value significantly higher than the guess rate.
Table 1.
 
Proportion of choosing target or inverse response in mixed-wave conditions, within-attribute. Note: A bold style annotates the value significantly higher than the guess rate.
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×