Participants
Data were collected from 122 undergraduate students at Murdoch University, Perth, Australia. Of these, 15 were excluded for incomplete data (N = 12) or failing to engage with the task (N = 3), leaving a final sample of 107. The final sample included 84 females and 23 males aged 17 to 50 (M = 22.89, SD = 6.47). Participants were reimbursed for their time with either course credit (N = 101), or $20 payment (N = 6). All participants reported normal or corrected-to-normal hearing and provided informed consent. The experiment was approved by the Murdoch University Human Research Ethics Committee (2023/050), and was performed in accordance with the NHMRC National Statement on Ethical Conduct in Human Research (2023).
Music and Noise Stimuli
The auditory stimuli comprised four samples of traffic noise and three musical pieces. The musical stimuli were a subset from a prior study21, specifically Ravel’s orchestral rendition of Claude Debussy’s ‘Tarantelle Styrienne’, conducted by L. Slatkin, Orchestre National de Lyon, 2016; ‘My Favourite Things’ performed by The John Coltrane Quartet, 1961; and Bach’s ‘O Haupt voll Blut und Wunden’, conducted by J. E. Gardiner, Monteverdi Choir, English Baroque Soloists, 1989. Participants heard only the first 105 seconds of each musical piece. These clips had previously been loudness-normalised to the common value of − 23 ± 5*10 − 7 LUFS, as per European Broadcasting Union R-12869.
Traffic noise was selected here as a pervasive environmental noise pollutant likely to be heard in the background of a therapy session44,45. The specific noise clip used was the low-diversity city noise stimulus from Stobbe et al.70. In that study, these traffic noise soundscapes had no effects on lower order cognition (accuracy and speed on a digit-span recall task and a continuous recognition n-back task) but did have effects on self-reported depressiveness, suggesting an impact on higher order cognitive processes. The stimulus features horns, engines, and a constant subtle traffic flow, creating high noise signal variability and frequent deviant sounds (engine revving, vehicle horns, brakes, etc.), which have been found to contribute to noise disruptiveness50–52. The six-minute clip was split into four separate clips of 105 seconds. This meant that, although participants heard the same disruptive noise type throughout all noise conditions in the experiment, they did not hear the exact same exemplar more than once. Each 105-second clip was loudness normalised with the pyloudnorm Python library and had a brief (5-millisecond) fade-out applied to avoid clipping.
In the combined noise-music conditions, the noise was set to be 15dB softer than the music condition. The music was louder than the noise to mimic the effects of traffic noise as background. Pilot work revealed that the specific SNR of 15dB avoided potential auditory masking of the music stimuli while still being disruptive – (other noise types disrupt lower-order cognitive tasks with SNRs between − 5 to + 15 dB)71,72. The combined noise-music clips were also loudness normalised to the same LUFS.
Procedure
Participants completed the experiment individually, either online on their own device or in-person at the Murdoch University Music Cognition Lab. As the experiment was a within-subject manipulation, this hybrid approach did not affect the design validity.
Participants completed the experiment via the web-based platform Pavlovia. Participants viewed the visual inducer previously used in Herff et al.21, a 15-second video of the opening sequence of the video game ‘Journey’ (with written permission of Jenova Chen, CEO of ThatGameCompany, https://thatgamecompany.com, see Fig. 4). The video features a figure ascending a small hill (Fig. 4a). At the top of the hill, a vague landmark (an illuminated mountain) appears in the far distance (Fig. 4b). This visual inducer offered a clear start and direction for the guided mental imagery task. After viewing the inducer, participants were instructed to close their eyes and imagine the figure continuing to walk towards the landmark. A gong sound was played signalling the start of the mental imagery task. After 90 seconds, a second gong sound instructed participants to open their eyes and stop the mental imagery task. In each trial, from the beginning of the video until the end of the imagined period, participants heard either the disruptive noise condition, one of the music conditions, one of the mixed music and noise conditions, or silence. After each imagined period, participants were asked to indicate how vivid their imagery was (on a scale from 0 = Not very clear to 100 = Very clear) and describe their mental imagery in as much detail as possible (in a free-text response format). To avoid introducing bias, no formal definition of ‘imagination’ was provided— participants were left to interpret the instructions freely.
Participants repeated the above process for each condition for a total of eight trials. Each participant heard, in a random order, one silent trial, one randomly assigned noise sample, three musical pieces, and the same three musical pieces with added noise. In the combined music and noise conditions, assignment of the four noise samples to the three musical clips was also randomised to ensure effects were not specific to a particular noise/music combination. Figure 5 shows spectrograms and energy spectra for each of the possible conditions. The final stimuli can be accessed on the Open Science Framework (https://osf.io/pmvb9/). After the last mental imagery trial, participants completed two self-report scales as detailed below. The experimental session from start to finish took around one hour to complete, depending on the detail of participants’ descriptions.
Measures
To determine participants’ musical background, we administered the Goldsmith-Musical Sophistication Index (Gold-MSI)73, a well-validated, widely used measure of musical training developed using a large sample of English-speaking adult participants from the general population. Only the musical training subscale (seven items, M score = 3, SD = 1.3) was used in analysis. There were no hypotheses for musical training effects, instead this subscale was included to control for potential effects from formal training as have been identified in prior studies74.
The short form of the Depression Anxiety Stress Scale (DASS-21)75 was also administered to investigate the effects of mood symptoms on mental imagery across multiple experiments76. The DASS-21 has demonstrated good reliability and validity in assessing depression, anxiety, and stress in non-clinical populations, based on large samples representative of the general adult population77. The 21-item version of the measure also has several advantages over the full-length version, being shorter with a cleaner factor structure and smaller inter-factor correlations78. The DASS results were not included in this analysis as they are part of a larger study and will be reported elsewhere.
To determine participants’ musical background, we administered the Goldsmith-Musical Sophistication Index (Gold-MSI)73, a well-validated, widely used measure of musical training developed using a large sample of English-speaking adult participants from the general population. Only the musical training subscale (seven items, M score = 3, SD = 1.3) was used in analysis. There were no hypotheses for musical training effects, instead this subscale was included to control for potential effects from formal training as have been identified in prior studies74.
The short form of the Depression Anxiety Stress Scale (DASS-21)75 was also administered to investigate the effects of mood symptoms on mental imagery across multiple experiments76. The DASS-21 has demonstrated good reliability and validity in assessing depression, anxiety, and stress in non-clinical populations, based on large samples representative of the general adult population77. The 21-item version of the measure also has several advantages over the full-length version, being shorter with a cleaner factor structure and smaller inter-factor correlations78. The DASS results were not included in this analysis as they are part of a larger study and will be reported elsewhere.
Statistical analysis
Imagined sentiment was assessed by applying the National Language Tool Kit (NLTK)79 and the Valence Aware Dictionary for sEntiment Reasoning (VADER) model80 to detailed descriptions of the mental imagery provided in the free-format responses. The VADER works by mapping lexical features to emotional valence and intensity on a continuum ranging from negative to positive emotionality. Sentiment scores were given a numerical value, with higher values indicating more positive sentiment. Descriptions were also content-analysed for the inclusion of traffic-related terminology by the first author, who was blinded to the respective experimental conditions whilst performing the annotation.
The statistical approach closely follows the approach used in prior work deploying the same paradigm21,26,76. We used Bayesian Mixed Effects models to predict our variables of interest (Vividness; Sentiment; Time and Distance Travelled; Whether the descriptions of imagined content contained mentions of traffic) whilst controlling for cross-random effects in participants and trial number random effects81. We also included a musical expertise as a predictor, however, we did not observe strong evidence that it influenced the models’ predictions. We implemented all models in R82 using the brms package83. Similar to prior work in auditory perception15,76,84–89, all continuous variables were standardised (Mean = 0, SD = 1), and all models were provided with a weakly informative prior (a t-distribution with a mean of 0, a standard deviation of 1, and 3 degrees of freedom)90. Because participants varied dramatically in their reported imagined time and distance travelled, we natural log-scaled participants’ responses, before performing participant-wise normalisation (M = 0, SD = 1).
We ran all models with 1,000 warm-ups and 3,000 iterations on four chains. All models converged (all R-hats = 1.00). To evaluate the evidence, we performed hypothesis tests and report model coefficients (β) relevant to the specific hypotheses, the estimated error of this coefficients (EEβ), as well as the evidence ratio in favour of a given hypothesis (Oddsβ). For convenience we denote effects than can be considered ‘significant’ under an alpha level of 5% with * (i.e., evidence ratio ≥ 19)91.
Data availability statement
The datasets, analytical scripts, and fitted models from the current study are available in the OSF repository, https://osf.io/pmvb9/