We assembled a stimulus set containing 12 visual art images, 12 nature photographs, and 24 song excerpts, taken from a variety of Internet sources and selected to satisfy several criteria. First, we aimed to include stimuli that would be judged as very beautiful so that as many participants as possible would experience intense beauty in response to the stimuli. Second, for each stimulus type (art images, nature photos, and music), we aimed to include a balanced proportion of stimuli that participants would consider very happy, very sad, or neither. Third, we wanted all stimuli to be unfamiliar to most participants to avoid familiarity introducing a confound. Fourth, because the survey involves two blocks of 24 trials each, one before and one after a mood induction phase, we wanted the sets of stimuli shown in each block to be as similar as possible in terms of styles and perceived emotions represented. This strategy would help to ensure there were no consequential differences in these main measures before and after mood induction caused by stimulus assignment.
To achieve this goal, we first compiled a set of several hundred images and songs that we then piloted by asking a separate group of 15 participants to rate them in terms of perceived beauty, liking, happiness, sadness, and familiarity. These participants were recruited the same way as those who took the main survey (see the Participants section). We compiled this initial large set of stimuli in the following way. To capture visual art that would appeal to a broad range of people, we selected artworks in various popular traditional styles, including Dutch florals and Impressionist and Romantic landscapes. We took nature photos with a broad range of subjects and color palettes from the open-source images site Unsplash (
https://unsplash.com/). To select music stimuli that had the potential to elicit an intense beauty response in most or all participants, we first referenced the Billboard Top 100 artist list (
https://www.billboard.com/charts/artist-100/), and we then searched Dubolt (
https://dubolt.com/) for lesser known artists with similar sounds to those on the Billboard Top 100. Dubolt is a website partnered with Spotify that generates playlists based on artists or songs. Users can tailor playlists according to a variety of music features, like popularity, energy, tempo, and mood. These features enabled us to search for songs fitting our criteria. We selected songs from this list that we expected would be the least familiar of these popular options (songs with lower play counts on Spotify, like those released by emerging artists that were recommended on popular artist pages) to avoid a familiarity bias. Neither the image nor the song sets were selected to be representative of all genres and geographies. We limited visual artwork to predominantly Western representational works, and we limited music to pop (defined broadly), all in an attempt to eliminate geography and genre as confounds. We did not want within-participant beauty rating variance to come from genre; we wanted it to come from the emotion factors of interest to us.
Once songs were selected, we downloaded and trimmed them to 20 seconds each, selecting excerpts we felt were emotionally evocative (often a musical hook or a portion of the chorus) and that had a drop in loudness at the start and end of the clip to avoid abrupt starts and stops that might reduce beauty and liking ratings unnecessarily. We then trimmed each 20-second song clip down to 2 seconds to create an additional set of 24 clips, again taking from the 20-second clips 2-second clips that were especially emotionally evocative.
During presentation of the image stimuli, the screen background was white, the images were horizontally centered, and the images were sized to have a height of 600 pixels (with varying width depending on aspect ratio) for consistency across the set.
To satisfy our fourth criterion (that stimuli shown before and after the mood induction for each participant would be as similar as possible in terms of styles and perceived emotions represented), all 48 stimuli were selected in pairs that were similar in style and subject and that expressed similar emotions according to pilot data. For images, pairs were also in the same orientation (vertical vs. horizontal). Each participant saw one stimulus from each pair in block 1 and the other in block 2 so that the range of style–emotion combinations was the same across both blocks for every participant. Stimuli within each pair were pseudorandomly assigned to block 1 or block 2 in the following way: participants were randomly assigned one of six versions of the survey, where each survey had different combinations of stimuli in each block, but no one block contained both stimuli from a given pair (otherwise the six surveys were identical).
After collecting pilot data from 15 participants on this large set of stimulus candidates, we selected our final 12 visual art images, 12 nature photographs, and 24 song excerpts (each with 20-second and 2-second versions) to fit the criteria outlined above. All stimuli and summary statistics of their ratings can be found here:
https://osf.io/e8uaq/. Nature photographs are shown in
Figure 1.