Our goal was to investigate whether the smoothness with which emotion is perceived in stimuli matches the smoothness of natural emotion statistics found in the world. To address this, we measured the similarity between the autocorrelation in the perception of emotion in frames presented in random order and the autocorrelation in the natural emotion statistics found in the frames of natural movies when reorganized into proper movie order. Observers viewed shuffled static frames from films and rated each frame using a 2D valence-arousal grid. The ratings of the frames could be organized in two principled ways. In one, the frames were organized as experienced by the participant in the experiment (e.g., abscissa of
Figure 2a). In a second, the frames were re-organized into the sequence of images as intended in the original film from which the frames were drawn (e.g., abscissa of
Figure 2b).
Figure 2a is an example of a single participant's ratings; it reflects the sequence in which the observer experienced and reported perceived emotion as they completed each trial. We call this the “percept-based” sequence because it reflects a sequence of perceptual reports.
Figure 2b, on the other hand, shows the same participant's ratings reorganized for a single film with the frames sequenced into the order the frames would have appeared in the film. We call this the “movie-based” sequence because it reflects the sequence of the natural emotion statistics depicted in the film. The star marker at the first data point on the abscissa of
Figures 2a and
2b is identical and indicates the same rating for the same image, such that “Trial 1” in the perceptual rating is the same as “Frame 1” in the film-based rating example. The difference in apparent variability between
Figures 2a and
2b is because only a very small subset of the data is shown in the figures, and only the very first data point is actually shared between the graphs in this example.
To confirm that participants successfully completed the task, we plotted the distribution of subject errors, in polar angle, across all subjects by subtracting each participant's rating in each trial from their corresponding leave-1-out consensus (
Figure 2c). The resulting error distribution appears relatively normally distributed. A two-sample Kolmogorov-Smirnov test for goodness of fit was run on the two distributions and revealed that the two distributions were significantly different (
p < 0.01).
The percept-based sequence (
Figure 2a) and the movie-based sequence (
Figure 2b) are two different time series of the same data, and an autocorrelation can be calculated separately for each of them. The movie-based sequence results in a movie-based autocorrelation function (
Figure 3a), which reflects the statistics of the information present in the film and is a proxy metric for the natural emotion statistics of the world. The perceptual-sequence from
Figure 2a results in a percept-based autocorrelation (
Figure 3b), which reflects the autocorrelations in perceived emotion of frames presented in random order. A permuted null distribution of autocorrelation functions was also computed for each condition by shuffling the trials (gray lines and ribbons). We then calculated the time constant of when the empirical autocorrelation decayed to the permuted null. For the movie-based autocorrelation, the autocorrelation remained significant with lags up to 11 seconds (
M = 10.845, 95% confidence interval [CI] = 9, 13) (
Figure 3a). For the percept-based autocorrelation in participant ratings, the autocorrelation remained significant for up to 13 seconds (
M = 12.9, 95% CI = 12.3, 13.6) (
Figure 3b). These results suggest that the percept-based and movie-based autocorrelations decayed toward the null distributions with fairly similar time-courses.
We further investigated the similarity between the movie-based and percept-based autocorrelations by fitting decaying exponential curves to the autocorrelations. To compare the movie-based and percept-based autocorrelations, we used the autocorrelation coefficient at timepoints 4, 8, 12, and 16 seconds. We bootstrapped the autocorrelation for each rating and calculated the SSE between the exponential curve for the movie-based and the percept-based autocorrelations (
Figure 3c) with 5000 iterations. The SSE quantifies the similarity between the two curves; that is, the consistency between movie and percept-based autocorrelation functions. We also computed the SSE between the bootstrapped movie-based and permuted null curves, as a baseline.
Figure 3d shows the distribution of SSE scores between the movie-based and percept-based fitted exponential curves (purple) compared to the distribution of SSE scores between the movie-based and permuted null fitted exponential curves (gray histogram;
Figure 3d). The distribution of SSE was significantly different (
p = 0.017, permutation test) indicating that the movie-based and percept-based autocorrelations were more similar to each other than a permuted null distribution would predict.
The similarity in the autocorrelation functions seems clear, but how do we know that this is not coincidental? If the autocorrelation in our perception of emotion attempts to match the autocorrelation observed in the natural emotion statistics found in the world, then we should find that observers own perceptual autocorrelations also match their own movie-based autocorrelation. The perceptual autocorrelation for each observer was based on the specific frames they rated. To calculate the movie-based autocorrelation for each observer, we imputed any missing frames in the observers’ data using the consensus ratings, and then reorganized the frames into the proper movie sequence. We calculated the consensus rating for any given frame by averaging the valence or arousal rating across all observers who rated the frame. We then compared each individual observer's movie-based autocorrelation to their own perceptual autocorrelation to investigate if these two autocorrelations were similar at the individual observer level. For each observer, we fitted decaying exponential curves to their movie-based and percept-based autocorrelations (
Figure 4a). We then bootstrapped (5000 iterations) their movie-based and percept-based autocorrelations and calculated the SSE between the exponential curves fitted on the movie-based and perceptual autocorrelation for the five points that corresponded to the time lag of 0, 4, 8, 12, and 16 seconds (
Figure 4b). Finally, we calculated the average SSE for each observer's movie-percept SSE and their movie-permuted SSE and computed the difference between these two values (
Figure 4c). This quantifies whether the movie-percept autocorrelation functions were more similar than expected from a permuted null distribution. The majority of individual observers had movie-percept autocorrelation functions that were more similar than expected by chance, (χ
2[1,
N = 155] = 11.93,
p < 0.001;
Figure 4c). These results suggest that the movie-based and percept-based autocorrelations were more similar to each other, across observers, than a permuted null distribution would predict, and it provides evidence that our group-averaged finding (
Figure 3d) is not coincidental and can be generalized to individuals (
Figure 4c).
The autocorrelation results for the natural movies (
Figure 3a) reflect information about natural emotion statistics in the movies themselves. The rated images were presented in a random, unrelated order. Any cognitive or response biases that observers may have brought to the task (including any kind of central tendency effects, response biases, anchoring, and any other kinds of perceptual, decisional, attentional, or cognitive biases) would be reflected in the null permuted distributions (
Figure 3a, gray-shaded region).
There are, however, valid concerns about the autocorrelation in the judgments themselves. For example, how do we know that the percept-based autocorrelation (
Figure 3b) is actually due to cognitive or perceptual processes and not just response perseveration, hysteresis, lapsing, sluggish motor control, or some other confound? In fact, there are a range of artifacts that could cause seeming autocorrelations in response data, and typically these are seen as nuisances and controlled in various ways (
Hollingworth, 1910;
Valdez, Ziefle, & Sedlmair, 2018). Our hypothesis, however, is that the autocorrelations in the judgments reflect, at least in part, the visual process of serial dependence.
How can we be sure that there is serial dependence in the perceptual judgments of observers? A key characteristic of serial dependence is that is feature-tuned: only sequential things that are similar generate serial dependence (
Cicchini, Mikellidou, & Burr, 2017;
Cicchini et al., 2018;
Fischer & Whitney, 2014;
Liberman et al., 2014;
Magnussen & Greenlee, 1999;
Manassi et al., 2019). In contrast, a response hysteresis, sluggish motor response, perseveration on a single response, or other confound would not display any tuning to the sequential similarity between the random and unpredictable stimuli. To evaluate whether there is serial dependence, we measured the error of observers' responses in polar angle and calculated serial dependence using the standard approach of fitting a derivative of von Mises to the errors as a function of the difference in sequential stimuli (
Fischer & Whitney, 2014). Response error γ was computed as the distance in degrees between an observer's response and the leave-one-out consensus response (mean angle across all other subjects). The response error was compared to the difference between the consensus for the previous stimulus and the consensus for the current trial, in degrees. We pooled all subject's data into one super-subject and then fitted a derivative of von Mises curve to the data. We quantified serial dependence as the half amplitude of the von Mises curve (
Figure 5a). We bootstrapped the data 5000 times and defined the mean bootstrapped half-amplitude as the response pull towards the previous stimuli (
Figures 5a,
5b). We computed a permutation test with the bootstrapped distributions for each n-back trial. A strong positive half-amplitude was observed in the 1-back trial, indicating that participants' current report was influenced by their judgment in the previous trial (
p < 0.001, permuted null) and the 2-back trial (
p < 0.001, permuted null). The half-amplitude for the 3-back trial was not significant (
p = 0.0528, permuted null). The average response time across participants was 3.96 ± 2.9 seconds, suggesting that affect ratings were attracted toward similar stimuli seen up to 12 seconds ago. Additionally, the emotion tuning (
Figure 5a) and temporal tuning (
Figure 5b) were similar to the time constant of 12 seconds found for the movie-based autocorrelations (
Figure 3a).
We further investigated the strength of the 1-back serial dependence by conducting a linear regression analysis on the response error as a function of the difference between the consensus response in the current trial and the previous trial. We fit a linear regression model on the data from peak-to-trough (from −90 to +90 on the
x-axis;
Figure 5c) and performed a permutation test with the bootstrapped distributions for each n-back trial. The slope for the 1-back trial was 0.039 ± 0.006, and 0.018 ± 0.006 for the 2-back trial which indicates that subjects had a ∼4% pull toward the previous stimuli and ∼2% pull toward stimuli 2 trials back (1-back,
p < 0.001, 2-back,
p = 0.0216, permuted null) (
Figure 5d). The slope for the 3-back trial was not significant (
p = 0.216, permuted null).