Useful data visualizations have the potential to leverage the visual system's natural abilities to process and summarize simple and complex information. Here, we tested whether the design recommendations made for pairwise comparisons generalize to the detection of trends. We created two different types of graphs: line graphs and stripplots. These graphs were created from identical datasets that simulated temperature changes across time. These datasets varied in the type of trend (linear and exponential). Human observers performed a trend detection task for which they judged whether the trend in temperature over time was increasing or decreasing. Participants were more sensitive to trend direction with line graphs compared to stripplots. Participants also demonstrated a systematic bias to respond that the trend was increasing for line graphs. However, this bias decreased with increasing sensitivity. Despite the better sensitivity to line graphs, more than half of the participants found the stripplots more appealing and liked them more than the line graphs. In conclusion, our results indicate that, for trend detection, depicting data with position (line graphs) leads to better performance compared to depicting graphs with color (stripplots). Yet, graphs with color (stripplots) were preferred over the line graphs, suggesting that there may be a tradeoff between the aesthetic design of the graphs and the precision in communicating the information.

*#ShowYourStripes*(see Figure 1). The stripes correspond to warming stripes or stripplots that depicted differences in temperature across nearly two centuries. Red corresponds to a higher relative temperature within the time frame, and blue corresponds to a lower relative temperature. The plots depict trends at the global, continent, country, or state level. The plots have been downloaded nearly one million times, have graced the cover of the

*Economist*, and have been applied to personal items from neckties and shirts to car paint (source: Wikipedia). The plots are aesthetically pleasing, but are they good at depicting trends in climate data?

*ensemble perception*, has been most frequently researched with respect to the mean of a group. For example, observers might see an array of circles and estimate their color, size, or position (Whitney & Yamanashi Leib, 2018). Ensemble perception has focused on whether observers can detect the mean across a variety of stimuli from low-level visual features such as color and orientation to high-level features such as facial expressions.

*n*= 22) and the Psychology Research Participant Pool at Colorado State University (

*n*= 35).

*rnorm*function in R with a mean of 0 and a standard deviation (

*SD*) of 8, 12, or 16. There were also three levels of cyclical noise. One level was to have no cyclical noise, meaning all the noise was random. The other two levels were a short and a long sine wave. This noise was calculated as five times the sine of the product of time (the vector from 1 to 100) multiplied by 0.4 or 0.08 for the short and long waves, respectively. An example of the different components that were combined to create the final dataset are shown in Figure 2. Examples of different combinations are shown in Figure 3. There was an additional trend type for which the slope was 0, and the exponent was 0. Any trends apparent in the data were due to spurious, unintended patterns in the noise. In hindsight, these stimuli should not have been included in the experiment, and data from these trials were excluded from the analyses. The seven trend types × 3 levels of random noise × 3 levels of cyclical noise × 3 repetitions of each produced 189 unique datasets.

*r*< .20) because of noise and random sampling and was removed from the analysis.

*z*= 3.30,

*p*< .001, estimate = 0.73,

*SE*= 0.22. Participants were more accurate with the line graphs than the stripplots. However, this accuracy varied depending on the other aspects of the graphs, as revealed by significant interactions. All two-way interactions were significant,

*p*s < .03. The three-way interaction was also significant,

*z*= −4.03,

*p*< .001. For stripplots, the difference in accuracy between increasing and decreasing trends varied as a function of the exponent,

*z*= −5.00,

*p*< .001, estimate = −0.37,

*SE*= 0.07 (see Figure 4). In contrast, for the line graphs, the difference in accuracy between increasing and decreasing trends did not vary significantly as the exponent increased,

*z*= 1.15,

*p*=.25, estimate = 0.11,

*SE*= 0.09.

*d′*and

*c*, which measure sensitivity and bias, respectively.

*d′*, which was calculated as the z-score of the hit rate minus the z-score of the false alarm rate. Bias was measured using

*c*, which was calculated as −1 times the sum of the z-scores of the hit and false alarm rates divided by 2. Negative

*c*scores indicate a bias to respond that the trend is increasing, and positive

*c*scores indicate a bias to respond that the trend is decreasing. Both measures were calculated for each participant for each graph type and each trend type. With signal detection analysis, trend direction is collapsed within each category to be able to calculate both hits and false alarms.

*d’*as the dependent factor. The fixed effects were trend type (linear, exponential-2, exponential-4, coded as 0, 1, and 2, respectively), graph type (coded as −.5 and .5 for stripplots and lines graphs, respectively), and their interaction as the within-subjects factors. Random effects were included for each participant, including intercepts and slopes for graph type because the model was singular with random effect slopes for trend type.

*t*= 3.53,

*p*< .001, estimate = 0.75,

*SE*= 0.21. Participants were more sensitive to the direction of the trend (higher

*d′*values) when viewing line graphs compared to stripplots (see Figure 5). This advantage for the line graphs over the stripplots increased as the exponent of the trend increased,

*t*= 2.71,

*p*= .007, estimate = 0.17,

*SE*= 0.06. A multiverse analysis (Steegen, Tuerlinckx, Gelman, & Vanpaemel, 2016) revealed a similar pattern regardless of outlier exclusion (see Supplementary materials at https://osf.io/x372y/).

*halves1–2*). We also calculated the difference in means between the first and last quarter (

*quarters1–4*) and the first and last tenths (

*tenths1–10*). In addition, we calculated the difference in means between the third and fourth quarter (

*quarters3–4*). If participants only attended to the first and last items, the tenths1–10 should best predict participants’ responses. If participants only attended to the right half of the graphs, the quarters3–4 should best predict their responses.

*d′*scores for each participant for each graph type for each level of random noise. We analyzed the

*d′*scores with a linear mixed model. The fixed effects were level of random noise (coded as 0, 1, and 2 for low, medium, and high), graph type, and their interaction. The main effect for random noise was significant,

*t*= −4.75,

*p*< .001, estimate = −0.43,

*SE*= 0.09. As random noise increased,

*d′*decreased. The main effect for graph type was significant,

*t*= −6.10,

*p*< .001, estimate = −1.01,

*SE*= 0.17. Participants were more sensitive to trend direction with line graphs than stripplots. The interaction between random noise and graph type was not significant,

*t*= 0.69,

*p*= .49, estimate = 0.09,

*SE*= 0.13. Thus the impact of random noise on sensitivity was similar for both the line graphs and the stripplots (see Figure 7). That sensitivity to the information was similarly impacted by random noise in both conditions is consistent with the idea of a common mechanism underling the processing of both graph types.

*d′*scores for each participant for each graph type and conducted a correlation across the two graph types (see Figure 8). With all participants, the correlation was significant,

*r*= .54,

*df*= 55,

*p*< .001. Excluding participants with a difference score greater than 3 (see triangles in Figure 8), the correlation was even greater,

*r*= .83,

*df*= 48,

*p*< .001. These values are consistent with previous work on individual differences in the precision of ensemble perception (Haberman et al., 2015). This previous work showed moderate-to-high correlations between precision of perceiving the means of ensembles with elements varying across low-level features including orientation and color, but no correlation between low-level and high-level features such as perceiving mean facial expression. This was taken as evidence for common mechanisms underlying ensemble processes for low-level features. The current data replicate and extend this finding to a different kind of ensemble process, namely extracting the trend in the data.

*c*scores, with a linear mixed model. The fixed effects were graph type, exponent, and their interaction. We included random effects for participant including the intercepts and slopes for graph type. The main effect of graph type was significant,

*t*= −5.40,

*p*< .001, estimate = −0.33,

*SE*= 0.06. However, this effect was modulated by trend exponent,

*t*= 4.77,

*p*< .001, estimate = 0.14,

*SE*= 0.03 (see Figure 9). When the trend was linear (exponent = 1), participants were biased when viewing line graphs to respond that the trend was increasing, as revealed by negative

*c*values,

*t*= −4.57,

*p*< .001, estimate = −0.20,

*SE*= .04. When viewing stripplots, they were biased to respond that the trend was decreasing, as revealed by positive

*c*values,

*t*= 2.71,

*p*= .008, estimate = 0.13,

*SE*= 0.05. This difference was significant,

*p*< .001. For the exponential-2 graphs, there was a similar bias to respond increasing for the line graphs,

*p*< .001, but no bias for the stripplot,

*p*= .53. This difference in bias between the two conditions was significant,

*p*= .009. For the exponential-4 graphs, the difference between the two graph types was not significant,

*p*= .40. When viewing the line graphs, the bias to respond that the trend was increasing was approximately half that found with the other trend types,

*t*= −2.26,

*p*= .026, estimate = −0.10,

*SE*= 0.05.

*d′*and

*c*scores. As expected,

*d′*scores were low (

*M*= 0.86,

*SD*= 0.65). We expected low

*d′*scores because the correlations were low (

*r*s < 0.20), and prior research showed poor detection of low correlations (Rensink, 2017). For both the line graphs and the stripplots, there was a significant bias to respond that the trend was increasing,

*p*s < .001 (line graph mean = −0.37,

*SE*= 0.06; stripplots mean = −0.25,

*SE*= 0.06). Thus it seems that there is a general bias for responding that the graphs show that temperature is increasing over time. For line graphs, the bias lessened as the exponent increased. For stripplots, the bias reversed for the linear graphs then lessened as the exponent increased.

*t*= 1.59,

*p*= .119,

*df*= 54.86, 95% CI [−0.05, 0.39], although, results show a slight pattern. Nevertheless, the fact that the preference was not overwhelmingly in favor of the line graphs shows a dissociation between visual sensitivity (for which the line graph was the clear winner) and preference (for which the line graph was not the clear winner).

*#ShowYourStripes*, graced the cover of the

*Economist*, and was used to decorate various items from ties to cars, the line graphs are unlikely to be as sensationalized. The current data show tradeoffs: the cost of using the more aesthetically pleasing stripplots means an 18% reduction in sensitivity.

*Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. CHI Conference,*2014, 551–560, https://doi.org/10.1145/2556288.2557200.

*Psychological Science,*12(2), 157–162, https://doi.org/10.1111/1467-9280.00327.

*Science,*229(4716), 828–833, https://doi.org/10.1126/science.229.4716.828. [CrossRef]

*Proceedings of the SIGCHI Conference on Human Factors in Computing Systems,*1095–1104, https://doi.org/10.1145/2207676.2208556.

*Current Directions in Psychological Science,*16(5), 250–254. [CrossRef]

*Journal of Neurophysiology,*106(3), 1389–1398. [CrossRef]

*Proceedings of the SIGCHI Conference on Human Factors in Computing Systems,*3237–3246.

*Journal of Experimental Psychology: General,*144(2), 432–446. [CrossRef]

*Psychonomic Bulletin & Review,*18(5), 855. [CrossRef]

*Perception & Psychophysics,*46(4), 365–374. [CrossRef]

*Detection Theory: A User's Guide*(Second Edition). New York: Psychology Press.

*Journal of Vision,*15(4), 6, https://doi.org/10.1167/15.4.6. [CrossRef]

*Emotion,*9(6), 898. [CrossRef]

*Color Research & Application,*39(6), 630–635. [CrossRef]

*Nature Neuroscience,*4(7), 739. [CrossRef]

*Journal of Vision,*15(12), 890–890, https://doi.org/10.1167/15.12.890. [CrossRef]

*Journal of Vision,*12(9), 433. [CrossRef]

*Psychonomic Bulletin & Review,*24(3), 776–797. [CrossRef]

*Perspectives on Psychological Science,*11(5), 702–712, https://doi.org/10.1177/1745691616658637. [CrossRef]

*Developmental Science,*18(4), 556–568. [CrossRef]

*Journal of Vision,*16(5), 11, https://doi.org/10.1167/16.5.11. [CrossRef]

*Psychological Review,*61(6), 401–409. [CrossRef]

*R: A language and environment for statistical computing*. Vienna: R Foundation for Statistical Computing.

*Journal of Experimental Psychology: General,*149, 1311–1332. [CrossRef]

*Annual Review of Psychology,*69, 105–129. [CrossRef]

*Elementary Signal Detection Theory*. Oxford: Oxford University Press.