Participants
In the sampling phase, we recruited participants through web advertisement. Using a web-based survey, we asked them to complete two questionnaires to assess the frequency at which they experienced chills and tears while listening to music. The prevalence of the intense emotional responses to music was assessed based on the answers to the Barcelona Music Reward Questionnaire (BMRQ) (70) and Aesthetic Experience Scale in Music (AES-M) (12). In experiment 1, forty-three participants who sometimes experienced chills and tears were recruited. Four participants were removed because they did not report any chills or tears responses during the experiment. One participant was removed due to having an abnormality of autonomic nervous system activity. In experiment 2, twelve participants were recruited. One participant was removed due to misunderstanding online emotion ratings (many button presses under 1 s). Final sample was 38 in experiment 1 (14 women; age: M = 21.8, SD = 1.4; BMRQ: M = 77.6, SD = 10.0, Max = 96, Min = 54; AES-M: M = 50.9, SD = 10.9, Max = 74, Min = 20) and 11 in experiment 2 (2 women; age: M = 21.9, SD = 1.6; BMRQ: M = 83.3, SD = 10.4, Max = 99, Min = 69; AES-M: M = 67.8, SD = 17.1, Max = 96, Min = 41). From these questionnaire scores, we confirmed that the participants did not include strong musical anhedonia (10). For experiment 1, we verified that the sample size was appropriate using learning curve analysis (see Supplemental Figure 2). Participants were instructed to abstain from alcoholic beverages the night before the experiment and avoid coffee or cigarettes for three hours before the experiment. The study was approved by the human ethics research committee of the National Institute of Information and Communications Technology (NICT), and written consent was obtained for all participants. Participants were compensated for their participation in the study.
Experimental procedure
Before the experiment, participants were instructed to select 4 songs, that is, 2 pieces of strong pleasurable chill- and 2 tear-evoking song (Supplemental sheet). To keep the music style uniform across participants, participants were asked to select music stimuli from pop/rock genres. In experiment 1, the experimenter additionally selected 4 popular songs based on the former year's hit chart in Japan (Supplemental Table 1). The songs most commonly included vocals, guitar, bass, and drums as similar as self-selected songs. In experiment 2, to generalize the machine learning model for different experimental paradigms, experimenter-selected songs were used from self-selected music of other participants (Supplemental sheet).
In the fMRI experiment, participants first engaged in a training session to familiarize themselves with the experimental procedure. After that participant performed two fMRI sessions. Each session consisted of one run in which the participants listened to 2 self-selected songs and 2 experimenter-selected songs. The order of presentation of songs was pseudo-counterbalanced, which alternated between self-selected and experimenter-selected songs to maximize the uncertainty of emerging songs across trials (see Supplemental Figure 1). Each trial started with an instruction lasting 5 s, followed by a musical excerpt with a maximum duration of 270 s in experiment 1 (M = 257.5 s, SD = 10.6 s) or a maximum duration of 360 s in experiment 2 (M = 281.1 s, SD = 12.6 s). The participants had to indicate, in real-time, their emotional responses while listening to the music by pressing one of four different buttons on an MRI-compatible response pad (1 = neutral, 2 = pleasure, 3(4) = chill, 4(3) = tear). Chills were defined as “goosebumps” or “shivers down the spine,” and tears were referred to as “weeping” or “a lump in the throat” (4, 5, 38, 43). The position of the chill and tear buttons was counterbalanced across participants to avoid ordering bias for these responses. The participants were instructed to hold down the button as long as they experienced the corresponding degree of emotion. Button signals were recorded at a frequency of 100 Hz. Note that the duration of the excerpt was not correlated with the duration of any four emotional responses (ps > .45). At the end of each excerpt, the participants were asked to rate sixteen items (from 1 = not at all to 4 = very strong) they felt in response to the musical excerpt. Each item was required to respond within 5 s. After the experiment, participants were asked to answer the familiarity (from 1=unfamiliar to 4= very familiar) for the experimenter-selected music.
To measure RSFC immediately before music listening, after each song was played, the participants relaxed for 50 s in experiment 1 and 40 s in experiment 2. We intend 80 s for sixteen rating items to reduce the carry-over effect of music listening for the RSFC. To avoid the remaining effect of the rating task, a further 10 s was removed from the examination of task rest.
Quantifying online behavioral measurement
The onset time and duration were recorded for every four emotional responses. Button presses of less than one second were not analyzed to avoid miss pressing. We removed button presses that were the same as the previous buttons and concatenated the duration of these two button presses. Since we asked the participant to press one button every time during music listening, we assumed the two presses were sustained. After that, the duration of each four button presses was summed during the piece of a song.
Brain image and physiological acquisition
In experiment 1, data were acquired using a 3T Siemens Trio scanner equipped with a 32-channel head coil. We recorded two experimental runs corresponding to the two blocks of trials in the main experimental task. Functional volumes were acquired using a T2*-weighted gradient echo, EPI sequence (62 interleaved slices, gap: 0.3 mm, voxel size: 3 × 3 × 3 mm, matrix size: 64 × 64, FOV: 192 mm, TE: 30 ms, TR: 2000 ms, flip angle: 75˚, multiband factor: 2). Additionally, a high-resolution anatomical volume was acquired at the mid-rest of the experimental session using a T1-weighted sequence (208 slices, no gap, voxel size: 1 × 1 × 1 mm, matrix size: 256 × 256, FOV: 256 mm, TE: 3.37 ms, TR: 1900 ms, flip angle: 9˚), which served as anatomical reference for the functional scans.
In experiment 2, neuroimaging data were acquired on a 3T Siemens Prisma scanner with a 64-channel head coil. Both functional and anatomical volumes were acquired with the same parameters as experiment 1. Before the experiment, a resting-state fMRI session of 10 min (600 volumes) was followed with the parameters of T2*-weighted functional images acquisition (72 interleaved slices, gap: 0.16 mm, voxel size: 2 × 2 × 2 mm, matrix size: 100 × 100 mm, FOV: 200 mm, TE: 30 ms, TR: 1000 ms, flip angle: 60˚, multiband factor: 6). The parameter acquisition was determined for a different purpose.
Concurrent with functional imaging, physiological recordings were also acquired using an MR-compatible BIOPAC MP150 polygraph (BIOPAC Systems Inc., USA) for experiments 1 and 2. Cardiovascular, respiration, and electrodermal signals were obtained with a sampling frequency of 1000 Hz. Respiratory activity was assessed by a strain gauge transducer incorporated in a belt tied around the chest. To get a cardiac signal from a photo plethysmogram, the optical pulse sensor was attached to the proximal phalanx of the pinky finger of the subject's left hand. Skin conductance was measured continuously with Ag/AgCl electrodes placed at the index and middle fingers of the left hand.
Brain image and physiological data pre-processing
Functional MRI data were pre-processed using Statistical Parametric Mapping software (SPM12; http://www.fil.ion.ucl.ac.uk/spm). Distortion correction was applied using field maps. Functional images were realigned to the mean image of the series, slice-time corrected, motion corrected, co-registered to the structural image, normalized to MNI space, and spatially smoothed with a 6 mm FWHM Gaussian kernel. Afterward, using CONN toolbox (https://web.conn-toolbox.org/home), the common nuisance variables were regressed out, including the subject-specific white matter mask and the cerebrospinal fluid signal mask (five PCA parameters, CompCor) and 12 movement regressors (six head motion parameters and their first-order temporal derivatives). Physiological noise regressors were also included as the nuisance variables. Tapas PhysIO Toolbox (71) was used to calculate HR and RR variables, including cardiac response function, respiration function, and RETROICOR at the same temporal resolution as the fMRI time series. Since the previous study indicated the relationship between resting physiological arousal and musical chills (72), removing the physiological noise is important to know the link between resting brain state and musical emotion. The linear trends of time courses were removed, and band-pass filtering (0.008–0.09 Hz) was applied to the time series of each voxel to reduce the effect of low-frequency drifts and high-frequency physiological noise.
RSFC feature extraction
ROIs (region of interests) were defined using a functional brain atlas, which was derived from resting-state fMRI data based on InfoMap and a winner-take-all parcellation algorithm (41, 42). The atlas includes 288 ROIs spanning the whole brain, including the cerebellum and brainstem. The spherical ROIs were diameter = 8 mm. Each ROI was assigned to each 13 functional network communities (see Figure 2A). For each participant, a time course was computed for each ROI by averaging the BOLD signal of its constituent voxels at each time point. Second, functional connectivity between each pair of ROIs was calculated as the partial correlation (Pearson’s r) between the time courses of each pair of ROIs, yielding one correlation matrix per music stimuli per participant. Partial rather than simple correlations were used to rule out indirect connectivity effect (73). Fisher’s r-to-z transformation was then implemented to improve the normality of the correlation coefficients, resulting in 288 × 288 symmetric connectivity matrices representing the set of connections in each RSFC profile immediately before music listening. After that, the upper triangle of these matrices was used as the feature space for machine learning-based predictive modelling. Note that four trials of RSFC from different individuals were removed because Tapas PhysIO Toolbox cannot compute physiological denoising variables.
For predictive modelling, we used a network (includes several ROIs) seed-based functional connectivity to examine our auditory-reward network hypothesis and another possibility related to the auditory network. We set each 12 auditory seed networks as a feature (e.g., Auditory-Reward, Auditory-Default mode). We calculated the partial correlation matrix 12 (6-left and right) auditory ROI and several ROI in other networks (e.g., Reward: 8, Default mode: 65). We also examined whole-brain functional connectivity according to the distribution model of emotional brain (74, 75). In addition, we exploratory investigated the other possible 66 combinations of two networks (e.g., Visual-Reward, Salience-FrontoParietal).
Predictive modelling of emotional responses using RSFC and statistical analysis
We used LASSO regression to decode each of the duration of emotional responses (neutral, pleasure, chill, tear) from RSFC immediately before music listening. The analysis was conducted using scikit-learn 1.0.2 in Python 3.8.13. First, RSFC features are standard scaled. Next, a LASSO decoder was trained, wherein an L1 regularization parameter λ was used for shrinkage and tuned to control overfitting. After averaging the eight RSFCs per participant, nested LOPOCV was applied, with outer LOPOCV estimating the model's generalizability and the inner LOPOCV to determine the optimal parameter λ. In the outer LOPOCV, 37 participants were used as the training set, with the remaining used as the testing set. Based on the optimal λ(see next paragraph), a model was trained using all subjects in the training set, and the model was then used to predict the outcome of a subject in the testing set. For all 38 participants, the above procedure was repeated.
Within each loop of the outer LOPOCV, the inner LOPOCVs were applied to determine the optimal λ. The training set for each loop of the inner LOPOCV was further partitioned into 37 participants, similar to the outer loop. Under a given λ in the range [0.001, 0.01, 0.1, 1, 10, 100, 1000], 36 participants were selected to train the model, and the remaining one participant was used to test the model. This procedure was repeated 37 times to ensure that each participant was used once as the testing dataset, thereby resulting in a total of 37 inner LOPOCV loops. For each λ value, accuracy between the actual and predicted outcomes was calculated for each inner LOPOCV loop and averaged across the 37 inner loops. The mean accuracy was defined as the inner prediction accuracy, and λ with the highest inner prediction accuracy was selected as the optimal λfor the outer LOPOCV.
Parametric and non-parametric statistical tests were used to evaluate the decoding results for the duration of musical emotion. To evaluate if the prediction performance was significantly better than would be expected by chance, we performed a correlation test. Furthermore, we conducted a permutation test only for significantly correlated variables to reduce computational time. As for the permutation test, for repeated random LOPOCV, the outer LOPOCV was repeated 10,000 times, but each time the duration of musical emotion across the training data was permuted without replacement. We calculated the p-value of obtaining a mean correlation in the null distributions equal to or higher than the true correlation from the analyses. FDR correction (Benjamini–Hochberg) was applied to correct multiple comparisons of the p-value.
Trial-by-trial predictive modelling using the three periods of the auditory-reward network
To complement the result of the main analysis, we investigated whether trial-by-trial RSFC can predict the subjective duration of chills. We attempted to predict the duration of musical chills from the preceding auditory-reward RSFC as like LOPOCV analysis. We also attempted to predict the ongoing duration of chills from the auditory-reward FC during music listening because task FC may have a higher predictive ability of behavior than RSFC (76, 77). In addition, we tried to predict the duration of chills from the following auditory-reward RSFC. The chills experienced for the preceding music listening may influence the auditory-reward RSFC. If this is true, the auditory-reward RSFC should predict a backward chills experience.
Like the LOPOCV-LASSO model, we construct a nested LOTOCV-LASSO model. First, we removed two outlier trials above three standard deviations, resulting in 298 trials being analyzed. Even if we did not remove outliers, the analysis showed the same results. In the inner loop, 296 trials were selected to train the model, and the remaining one trial was used to test the model to determine optimal λ. This procedure was repeated 297 times to ensure that each trial was used once as the testing dataset, resulting in 297 inner LOTOCV loops. For each λ value, accuracy between the actual and predicted outcomes was calculated for each inner LOTOCV loop and averaged across the 297 inner loops. In the outer LOTOCV, 297 trials were used as the training set, with the remaining one used as the testing set. Using the optimal λ, the model was trained using all trials in the training set, and the model was used to predict the outcome of a trial in the testing set, repeating for all 298 trials. Note that because we did not have the post-RSFC for the last trial for every two sessions, we used 222 trials to make the LOTOCV-LASSO model that predicts the backward chills experience. To test the significant prediction ability of chills experience, we conducted a correlation test and 1,000 permutation tests on actual chills duration and predicted chills duration.
Validation of RSFC predictive modelling
The prediction results of the auditory-reward network both for the LOPOCV- and LOTOCV-LASSO models were further validated with a range of different RSFC time windows. We set RSFC time windows by decreasing one TR. Because prior evidence suggests that 22.5 s functional connectivity can distinguish distinct cognitive tasks (22), we set RSFC time windows ranging from 20 to 40 s (20, 22, …, 40 s). Furthermore, using the minimum significant 26 s time windows (see Figure 2F-left), the sliding window analysis was performed over successive 26 s RSFC windows (24 s overlapping) from 0 to 40 s (0 to 26, 2 to 28, …, 14 to 40 s). We conducted a correlation test on actual chills duration and predicted chills duration.
Physiological and neural intensity of the emotional chills
To investigate whether RSFC can predict not only subjective duration but also physiological arousal and neural activation during musical chills, we examined the intensity of physiological and neural responses while participants experience chills.
HR and RR was quantified after band-pass filtering raw signal (HR: low-pass 35 Hz, high-pass 1 Hz; RR: low-pass 1 Hz, high-pass 0.05 Hz). HR was calculated by the inverse instantaneous inter-beat intervals (in ms) from the photo plethysmogram using a peak detection algorithm implemented by SciPy 1.9.0 to detect successive R-waves. Careful visual confirmation of the electrocardiogram ensured that the automatic R-wave detection procedure had been performed correctly. We then performed cubic spline interpolation implemented by SciPy 1.9.0 within the two successive HR values to obtain 10 Hz time-series HR data. RR was also calculated by the peak detection algorithm and cubic spline interpolation. SCR data were analyzed using Ledalab (Version 3.4.9, MATLAB). Data were low-pass filtered (1 Hz) and downsampled to 10 Hz. Continuous decomposition analysis (CDA) was performed to extract the phasic SCRs from the electrodermal activity (50). CDA yields the phasic driver underlying skin conductance data as a continuous measure of SCR, which would be robust to common artifacts. We examined only SCR responses above 0.05 μS, which indicates an unambiguous increase in skin conductance (72). SCR scores were transformed by log function to modify distribution bias (50).
We calculated HR, RR, and SCR scores during the chills experience after synchronizing these scores to subjective responses. For every subjective chill experience, the average physiological response for the 1 s time window before the onset was calculated as a baseline. Any score exceeding three-point standard deviations from the mean was removed as an outlier. The sum of each physiological index while experiencing chills was examined as the score of three physiological activities.
Neural responses during the chills experience were quantified by the BOLD signal from each of 140 ROIs from AAL3.1 (45), excluding cerebellum regions. The pre-processing was the same as RSFC data except for filtering: the time series BOLD signals were high-pass filtered with a cut-off of 0.008 Hz (125 s). To avoid removing task-related neural activity, the low-pass filter was not used for the BOLD signal. Percent signal change was calculated relative to the 2 volumes, which were included before and after 1 volume on chills onset, to correct for differences in starting baseline. Copied time courses are shifted by 4 s to account for the hemodynamic delay of the fMRI signal. Data that deviate from the average by more than three standard deviations were removed for quality assurance (53). We investigated the sum of percent signal change during chill response as each 140 neural activity scores. We conducted correlation and 10,000 permutation tests on actual physiological and neural activities and predicted these activities. FDR correction was applied to correct multiple comparisons.
Asymmetry analysis of the auditory-reward RSFC
We tested for rightward asymmetry of emotional responses (7, 9, 10, 25) by performing a machine learning prediction on the difference between the subjective duration, SCR intensity, right-NAcc activity, and left-insula activity of chills in the right versus left ROIs, using the 40 s auditory-reward RSFC. We used 45 connections for a 10 × 10 partial correlation matrix (six auditory and four reward ROI) from each hemisphere to make the LOPOCV-LASSO model. To test the hypothesis that the right but not left hemisphere relates to neuropsychological responses to chills, we conducted correlation and 10,000 permutation tests on actual chills experience and predicted chills experience. FDR correction was applied to correct multiple comparisons. Furthermore, to test the difference in prediction accuracy (Pearson’s r) between the right and left hemispheres, we performed 10,000 bootstrap tests (78).
Generalization of predictive model
In experiment 2, participants were presented with 8 songs (4 self-selected and 4 experimenter-selected). The target was set as the mean duration of subjective chills responses for 8 songs for each participant. The single auditory-reward RSFC immediately before music listening (task rest) was calculated, and then the within-subject RSFC with the 8 songs was averaged, which resulted in one RSFC for each subject. The duration of RSFC is varied between 6 to 30 s. By inputting the auditory-reward RSFC score for the machine learning model from Experiment 1, we got the predicted score of subjective, physiological, and neural aspects of the chills experience for each RSFC duration. Furthermore, we calculated the auditory-reward RSFC from the traditional 10 minutes resting state (intrinsic rest) to examine whether the intrinsic brain network is different from RSFC immediately before music listening. To compare equal conditions for two types of RSFC, we sampled 8 short-time BOLD signals from 10 minutes of resting brain state. The random sampling is repeated 10,000 times to avoid biased sampling. The auditory-reward RSFC scores averaged 10,000 samplings were entered into the machine-learning model from Experiment 1 and got the predicted scores for the intrinsic resting brain state. Since the prediction accuracy summing subjective duration, SCR, right-NAcc, and left-insula was highest when we analyzed the 18 s auditory-reward RSFC (see Supplemental Figure 7), we decided to use 18 s both for the task and intrinsic rest. The evaluation of prediction accuracy for each task and intrinsic rest was performed by a one-sided correlation test. FDR correction was applied to correct multiple comparisons.
In addition, in a complementary analysis to the machine learning approach, we examined a univariate analysis of the difference in the 18 s auditory-reward RSFC scores between task and intrinsic rest. Using each of the eight auditory-reward network pairs for each participant, we performed a linear mixed model with a significance set at Satterthwaite’s approximation by lme4 (79).