Participants
Based on a G*power analysis indicating that 112 participants are sufficient to detect a large effect size (f = 0.4) for physiological data, we recruited 113 undergraduate and graduate students aged 18-28 through convenience sampling. To ensure the validity of the stress-related experiment, we excluded two participants with severe depression and two with severe anxiety based on the simplified Chinese version of the Depression-Anxiety-Stress Scale (DASS-21). Due to equipment issues and invalid responses, five more participants were excluded, resulting in a final sample of 104 participants. All participants were right-handed and had no cognitive or hearing impairments, psychiatric illnesses, or recent use of psychotropic drugs. The final sample included 54 males (51.9%) and 50 females (48.1%), with a mean age of 24.04 years (SD = 2.259, range: 19-29 years).
Materials
Restorative Sounds
We created the restorative sounds in two steps. First, we sampled 171 undergraduate students to evaluate the valence and arousal of 167 sounds from the International Affective Digitized Sounds-2 (IADS-2). We selected natural sounds with high valence and low arousal, which could evoke calm emotions and stress recovery, to use as the sound materials. The chosen sounds included rain, brook, cicada, village pond, robin birdsong, and cow.
Next, we synthesized these selected sounds with different combination modes to create two types of restorative sounds. In the single sound looping mode, we looped a single stimulus for a set duration, creating a rain sound loop, a birdsong loop and a brook sound loop. In the mixed sound environment mode, we set 1-2 sounds as the foreground and 1-2 sounds as the background. We distinguished between foreground and background sounds by using different soundtracks, volumes (with the foreground sounds being louder), and reverb levels (with the background sounds having more reverb). We present various combinations of foreground and background sounds over time, incorporating all the natural sound stimuli to create a dynamically changing sound environment. See appendix 2 for all audio materials.
Measures and Instruments
Depression-Anxiety-Stress Scale (DASS-21): We used the Simplified Chinese version of the DASS-21 to assess participants' depression, anxiety, and stress levels over the past week. This scale consists of 21 items, each scored on a 4-point scale, with 0 being completely inconsistent and 3 being always consistent. The threshold score was 11 for major depression, 8 for severe anxiety, and 13 for severe stress. The overall Cronbach's α coefficient of the scale was 0.891, indicating good reliability and validity (Gong et al., 2010). This scale was used to screen out participants with abnormal psychological conditions.
Self-Report Stress: Participants rated their current stress level on a five-point scale, ranging from not stressed at all to very stressed.
Biopac and Acknowledge: We used a multichannel physiograph (Biopac) and Acknowledge 5.0 software to collect physiological indicators, including electrocardiograph (ECG), respiratory metrics, and skin conductance levels, sampled at 2000 Hz.
Experiment Coding and Presentation: The experiment was coded using Python, and presented by PsychoPy.
Procedure
Participants and Experimental Design
Participants were randomly assigned to one of three groups: Group 1 (single sound looping), Group 2 (mixed sound environment), and Group 3 (control, no sound). Each group had a similar sample size with a 1:1 male-to-female ratio.
The experiment was conducted in a soundproof laboratory, with participants wearing soundproof headphones. Each participant sat approximately 70 cm away from the computer, signed an informed consent form, and then began the experiment.
Experimental Stages:
The experiment consisted of four stages.
Baseline Stage: Participants sat still for two minutes while their physiological data were continuously recorded to establish baseline measurements. Then they reported their subjective stress level to establish their baseline stress level.
Stress Induction: Next, to induce stress, participants completed an adapted version of the Montreal Imaging Stress Test (ad-MIST) tailored for the Chinese population (see Appendix 1.1). The ad-MIST included a five-minute training stage followed by a five-minute test stage. Physiological data and subjective stress levels were collected during and after the test stage.
Rest Stage: After the stress induction, participants rested for 10 minutes. They were informed that the test would resume following the rest period. This procedure was designed to simulate a real work environment, where rest period typically follows work sessions. During this stage, Group 1 listened to a single sound loop. To control the impact of the sound materials, some participants listened to a rain sound loop, some listened to a brook sound loop, and others listened to a birdsong loop. Group 2 listened to a mixed sound environment, and Group 3 had no sound. All sounds were presented at approximately 60 decibels. Participants focused on the sound while staring at a "+" sign on the screen. Physiological data and subjective stress levels were recorded during and after this stage.
Debriefing and Interviews
Participants were initially told the experiment was a math ability test to mask the study's true purpose and prevent biased responses. Afterward, the true purpose was explained, and participants were assured their performance on the MIST task was not indicative of their math ability. This debriefing aimed to prevent psychological distress. Finally, interviews were conducted to gather participants' perceptions of stress and their experiences during the rest stage.
The full experimental process is outlined in Appendix 1.8.
Data Analysis
We used Python 3.11 for the preprocessing and feature extraction of physiological data, and SPSS 18.0 for preliminary analysis. For modeling and visualization of the physiological indicators, we utilized R 4.4.1.
Preprocessing and Feature Extraction
ECG data were filtered using a Butterworth bandpass filter to remove noise below 1 Hz and above 20 Hz and R-waves were detected with the NeuroKit2 package. Using these intervals and the algorithm proposed by Valenza et al. (2018), we computed the heart rate, sympathetic activity Index (SAI), and parasympathetic activity Index (PAI) for the baseline, MIST test and rest stages, respectively. The rest stage was divided into 20 segments of 30 seconds each to analyze stress recovery over time. The choice of 30-second intervals was based on the oscillation period of heart rate. The vascular tension-blood pressure baroreflex causes significant fluctuations in heart rate every 12-15 seconds, with an oscillation period of 24-30 seconds (Lehrer et al., 2020). By averaging heart rate, SAI and PAI for every 30s, we aimed to balance the fluctuations caused by this oscillation, providing a more accurate reflection of the stress recovery trend. Respiratory metrics and skin conductance level (SCL) were pre-processed similarly, though we did not calculate the average respiration rate for every 30 seconds due to missing data.
Finally, we eliminated outliers using a ±3 standard deviation criterion and addressed missing values by applying mean interpolation.
Formal Analysis
Baseline Level Differences: A one-way ANOVA was used to compare baseline physiological stress indicators and subjective stress levels among the three groups.
Effectiveness of Stress Induction: To assess the effectiveness of stress induction, we conducted repeated measures t-tests to compare stress indicators during the test stage against baseline measures. Changes in each indicator (e.g., MIST test heart rate - baseline heart rate) were used as dependent variables in an independent samples ANOVA to assess differences in stress induction levels among the three groups.
Effectiveness of Stress Recovery: To assess the effectiveness of stress recovery, repeated measures t-tests were conducted to evaluate differences in physiological indicators and subjective stress levels between the MIST test stage and the rest stage for each group. Then, changes between the rest stage and the MIST test stage were calculated (e.g., rest stage heart rate - MIST test heart rate) and used as dependent variables in an independent samples ANOVA to assess differences in stress recovery levels among the three groups.
Generalized Additive Models (GAM): To investigate the time course of stress recovery in three groups, we defined time points as the first independent variable, with the beginning of the rest stage marked as t0. The rest stage was divided into 20 segments, each lasting 30 seconds, and each segment represented a time point from t1 to t20. The second independent variable was group assignment: Group 1 (single sound looping), Group 2 (mixed sound environment), Group 3 (control, no sound stimuli).
For each physiological indicator (Heart Rate, SAI, PAI, and SCL), we calculated the change between each time point and t0 (e.g., t1 heart rate – t0 heart rate) to represent the amount of stress recovery at each 30-second interval. These changes served as the dependent variables.
Then, we constructed a Generalized Additive Model (GAM) using R to fit the data to a smooth curve, allowing us to capture the non-linear patterns in the time course of stress recovery and test the effects of each independent variable. The model function was defined as:
g(yij) = β0 groupj + s(timei) + s(timei * groupj) + εij
Here, yij represents the amount of stress recovery for each physiological indicator, with j indicating the group assignment of participants and i representing time points from t0 to t20. The term timei * groupj denotes the interaction between time and group. The smoothing function s () captures the non-linear effects of the variables.
Additionally, we tested the basic assumptions of the model. First, a curve fitting test was performed to determine if the model followed a curve. A significant effective degrees of freedom (edf) value greater than 1 indicates that the model is indeed a curve. Next, we conducted a node test to ensure the appropriateness of the number of nodes used in the smoothing function; a k-index greater than 1 signified a reasonable number of nodes. Normality of the residuals was tested using Q-Q plots, and autocorrelation was checked. If autocorrelation was detected, we adjusted the model by establishing a Generalized Additive Mixed Model (GAMM), incorporating individual subjects as a random effect to account for this issue.
Finally, we used R to visualize the model, and identify curve inflection points and extreme value points. providing a clearer understanding of the stress recovery time course for each group.