Another point of discussion is the fact that correlations between binocular rivalry variants were considerably smaller in our Experiment 2 than in our Experiment 1, even when considering the same pair of variants (i.e., the small dot stimulus condition and the small grating stimulus condition). While we are surprised by the size of this difference, there may be a methodological explanation. In Experiment 1, where all conditions involved two percepts that differed in color, we could give observers general instructions (“report which color you see”) once at the beginning of the experiment and then randomly interleave trials of all conditions, with each condition occurring multiple times across short and randomly placed trials. In Experiment 2 we instead opted to present trials for a given condition in a block-wise fashion, motivated by the fact that instructions necessarily differed between conditions (some percepts were distinguished by motion direction, others by orientation, others by color). Previous research has indicated that an observer's percept durations can gradually change over the course of an experiment (Mamassian & Goucher, 2005; Van Ee,
2005; Suzuki & Grabowecki, 2007), and this suggests a way in which this difference in design might have caused the observed differences in correlation strengths. In particular, a gradual drift in average percept duration over the course of an experiment session would affect all conditions similarly in the quasi-random design of Experiment 1, but it would differentially impact different conditions in the blocked design of Experiment 2, potentially explaining why between-condition correlations were lower in the latter case. To test the viability of such an explanation, we examined correlations in percept duration between pairs of trials that both used the same stimulus. In both of our experiments each stimulus was presented in six individual trials, and for this analysis we rank-ordered these trials chronologically from one to six for each combination of experiment, observer and stimulus. This allowed us to quantify the across-observer correlation in average percept duration between the two members of a pair of trials that used the same stimulus, and to examine whether this correlation was impacted by how far apart in time the two trials occurred during an experiment session. If gradual changes do cause reduced correlations when comparing data collected farther apart in time, then we expect correlations to be higher for pairs of trials that followed each other more closely in the chronological sequence. The analysis confirmed this expectation: For each of the eight stimuli used across the two experiments, between-trial correlations became smaller as the difference in rank number between the two trials being compared went up. Specifically, the sign of this relationship was numerically negative in all cases (−0.59 <
r < −0.15), and significantly so in two cases (
p < 0.05, for Experiment 1's large dot stimulus and Experiment 1's small grating stimulus). One might, furthermore, suspect this negative dependence to be stronger for Experiment 1, where a given difference in rank number corresponds to a relatively larger separation in time (because trials that used a different stimulus could intervene). Further analyses provide tentative support for this suspicion: When combining data from all four stimuli used in Experiment 1 the slope of the relationship between correlation coefficient and rank number difference was −0.028, and this correlation was highly significant (
p = 0.006). When combining across the four stimuli in Experiment 2 in the same fashion, the slope was numerically less negative (−0.020) and the statistical significance of the correlation was marginal (
p = 0.06). A formal comparison between the two experiments was not possible, however, because of the different stimuli used in the two experiments. In sum, these analyses confirm that the time interval that separates two periods in which percept durations are measured during an experiment session impacts the degree of correlation between the two periods' data, supporting the idea that the comparatively modest between-condition correlations observed in Experiment 2 may be explained by that experiment's use of a blocked design. Aside from supporting this explanation, this analysis provides a qualification to the general understanding (Pettigrew & Miller,
1998; Shannon, Patrick, Jiang, Bernat, & He,
2011; Katyal, He, He, & Engel,
2019), that test–retest reliability of bistable perception dominance durations is high. The test–retest reliabilities are, apparently, affected by the separation in time (within an experiment session) between test and retest.