How can cry acoustics associate newborns’ distress levels with neurophysiological and behavioral signals?

doi:10.21203/rs.3.rs-2238719/v1

Download PDF

Article

How can cry acoustics associate newborns’ distress levels with neurophysiological and behavioral signals?

https://doi.org/10.21203/rs.3.rs-2238719/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Even though infant crying is a common phenomenon in humans' early life, it is still a challenge for researchers to properly understand it as a reflection of complex neurophysiological functions. Our study aims to determine the association between neonatal cry acoustics with neurophysiological signals and behavioral features according to different distress levels of newborns. Multimodal data from 25 healthy term newborns were collected simultaneously recording infant cry vocalizations, electroencephalography (EEG), near-infrared spectroscopy (NIRS) and videos of facial expressions and body movements. Results showed correlations between most of the features extracted from the signals depending on the infant’s arousal state, among them: fundamental frequency (F0), brain activity (delta, theta and alpha frequency bands), cerebral and body oxygenation, heart rate, facial tension and body rigidity. Additionally, these associations reinforce that what is occurring at an acoustic level can be characterized by behavioral and neurophysiological patterns. Our findings strengthen the potential of crying as a biomarker evidencing the physical, emotional and health status of the infant becoming a crucial tool for caregivers and clinicians.

Physical sciences/Engineering/Biomedical engineering

Health sciences/Health care/Paediatrics

Biological sciences/Neuroscience

cry acoustic

EEG

NIRS

newborns

distress

Human infants' communication through crying shares its evolutionary basis with animal distress calls and is based on their physical and emotional state¹ under the solicitation of help-provisioning and nurturing behavior². Thus, newborn crying may function as a distant early warning signal or "biological siren"³ that engages the caregiver's attention and demands their return to the infant's side⁴. In contrast with discrete signals, which manifest little variation in duration or intensity, infant crying fits much better in the concept of graded signals that convey degrees of distress and that reflect the intensity and duration of the eliciting stimulus. Hence, the sounds of crying convey a level of distress and/or urgency of need¹.

Research studies published in the last 20 years focused on the identification of the acoustic cry features^5–7 to differentiate preterm and full-term newborns^8,9, neonatal diseases^10–14, neurodevelopmental disorders^15,16, and correlation with parents’ languages^17,18 through signal processing and artificial intelligence techniques^19–23. Main acoustic cry features are characterized by the fundamental frequency (F0), the vibration frequency of the vocal folds^24,25, the resonance frequencies (F1-F3), resulting from the vocal tract maturation, the parameters concerning vibrato rate and extent and its regularity (jitter and shimmer), and the noise levels^6,19. Although infant acoustical cry analysis has been studied for years, there are only a few studies investigating the acoustic characteristics of different cry states. Most of them being focused on pain cries, characterized by higher changes in F0^26,27. Pain is usually considered a physical condition while distress is more linked to an emotional state. In newborns, both situations imply an urgent need for attention, and this explains why they share common acoustic characteristics suggesting that they may also follow the same neuronal pathway⁴.

The production of infant cry vocalizations is a complex process requiring coordinated brain activity and involvement of the central nervous system, which includes laryngeal activity, respiratory movements, and supralaryngeal (articulatory) activity under parasympathetic vagal control². In infant crying literature, the role of the vagus, the tenth cranial nerve of the parasympathetic nervous system, has been underlined as the most proximate neural input affecting acoustics, specifically F0^3,4,24,25. F0 increases are mainly modulated by the vocal fold tension, caused by the contraction of the intrinsic muscles of the larynx, innervated by sympathetic and parasympathetic (vagal) input from the autonomic nervous system. More specifically, vagal input from the right nucleus ambiguus of the medulla produces an inhibitory effect on the vocal contraction of the laryngeal muscles. Thus, diminished vagal activity in response to acute stress is considered to be the cause of laryngeal muscle contraction and tightening of the vocal folds, resulting in higher F0 in infant crying²⁸. Hence, vagal control of the larynx produces changes in vocal intonation as well as changes in heart rate, creating a cardiovascular state associated with specific emotions. This emphasizes that the degree of distress and/or urgency reflected in infant cries is acoustically evident, as well as in facial expressions, vagal tone, cortisol levels, gross bodily movements and brain activity⁴.

According to the literature, a few studies^24,29,30 investigated the vagal function and the F0 of infant crying in typically developing infants and its consistency with the polyvagal hypothesis. Porter²⁴ is the pioneer in reporting the correlation between cardiac vagal tone and the F0 of crying in term newborns who experienced a circumcision procedure. In this case, the vagal tone, measured by respiratory sinus arrhythmia (RSA), was significantly reduced during the severely stressful procedure, and the reduction was paralleled by a significant increase in the F0 of the pain infants’ cries. Stewart et al.,²⁹ reported co-variation between autonomic states and cry acoustic features in 6-month-old infants experiencing the still-face model. They demonstrated that the reduction in RSA following the still-face paradigm was associated with a decreased modulation of cry acoustic features (e.g., less variation in F0). In addition, Shinya et al.,³⁰ reported how differences in vagal function (also assessed with RSA) may be associated with the F0 of spontaneous cries due to the modulation of vocal fold tension in infants at an early developmental stage.

Regarding brain activity during crying, there have been only a few reports^28,31,32 suggesting the brain stem model of crying, supported by animal studies and human cases that focus on the implication of basal ganglia, cerebellum, and brainstem in anencephalic infants³¹. Furthermore, primate studies have suggested the implication of bilateral cingulate cortex, limbic system-anterior part, and hippocampal gyri in crying vocalization³². Nonetheless, the localization of brain regions associated with vocalization and crying in human infants remains a difficult task²⁸.

Nowadays, brain signals can be non-invasively and continuously measured by near-infrared spectroscopy (NIRS) and/or electroencephalography (EEG). On the one hand, NIRS offers non-invasive, in-vivo, real-time monitoring for cerebral tissue oxygenation, measuring cerebral haemodynamics and metabolic changes through the light absorbance to calculate oxy-hemoglobin and deoxy-hemoglobin. On the other hand, EEG is a non-invasive neuroimaging technique, one of neonates' oldest yet most valuable diagnostic and prognostic tests to measure brain electrical activity. It has been used for decades to objectively assess neurological status, quantify cortical activity, and may provide evidence for brain activity alterations in the neonate.

There are few studies^{9, 33–38} related to the brain activity associated with the newborn's cry acoustic features. Regarding the association between regional oxygenation and cry acoustic the literature is limited. Manfredi et al.,³³ show that the blood oxygenation level in preterm newborns is affected by stress caused by the effort required during crying. Orlandi et al.,⁹ state that the recovery time after the crying episode is more stable and rapid in full-term newborns than in preterm ones. Considering EEG, until now, literature approaches the study of crying related to the caregiver³⁴ or the association between crying and pain^35,36 and possible maternal depression³⁷. One study by Futagi et al.,³⁸ analyzed the neurophysiological activity evoked in the theta band of 29 infants with EEG, finding that the cry elicited a posterior theta brain activity.

In summary, scarce research has been accomplished in order to understand infant cry by concurrently assessing diverse newborn’s measures. Thus, this manuscript presents an exploratory study where a multimodal data collection has been conducted to understand if cry, EEG, NIRS, facial expressions and body movements have associations among them and with newborns’ distress conditions.

First, our aim was to characterize and compare the different distress levels of newborns’ cries using the features mentioned above. Second, to determine the associations between cry acoustics with the neurophysiological and behavioral features depending on the level of distress of the newborn and estimate their concordance. Finally, our last secondary aim was to build a Deep Learning audio classification algorithm evaluating the level of distress in infant cry to prove its potential as a signal biomarker. Therefore, we hypothesized that what is occurring at an acoustic level can also be characterized and associated with behavioral and brain neurophysiological patterns underlying the human infant cry.

To our knowledge, this is the first study that uses cry audio analysis as a potential clinical biomarker of newborns’ distress state, cross-validated with behavioral and brain signal analysis in newborns.

Participants

Twenty-five healthy full-term newborns (mean gestational week 39.24 ± 7.82, recording age 7.27 ± 11.40 days after birth, 15 males/10 females, head circumference 34.08 ± 1.43, birth weight 3020.20 ± 324.11, Apgar Score at five minutes 9.79 ± 0.66 were recruited at the maternity ward of the Hospital Clínic Barcelona (Spain). Out of this initial sample, 2 newborns were not included, because we did not obtain enough analyzable data due to movement artifacts. The remaining 23 newborns were included in the final analysis. One single session was recorded per subject.

Infants had been assessed by board-certified neonatologists and diagnosed as healthy term newborns with no major congenital abnormalities or illness since birth. Newborns under medication and/or with congenital malformations, chromosomal abnormalities, hypoxic-ischemic encephalopathy, intraventricular hemorrhage greater than grade 2, and any other type of brain damage, congenital heart disease, siblings with autism spectrum disorders or other neurodevelopmental disorder were excluded from this study.

Ethical Considerations

The study was conducted following the Institutional Research Ethics and the declaration of Helsinki. Formal ethical approval was granted by the Local Ethical Committee, Hospital Clínic Barcelona (Ref: NeuroCry/HCB/2021/0843). The consent form documented the study's aims, nature, and data acquisition procedures. Anonymization and data confidentiality was maintained throughout the study. All parents agreed and signed the informed consent prior to participation. In addition, signed informed consent was obtained from the family to publish the newborn’s face in this manuscript for an open access publication.

Procedure

Data collection was performed during the standard routine of newborn nursing (before and post feeding, during some medical procedures, etc.). As such one session was conducted with each neonate. Synchronized EEG, NIRS, audio, and video recordings were acquired for each newborn, who was lying down comfortably in a cot in the hospital maternity ward. Continuous unique sessions lasting from 20 to 120 minutes were recorded in a paradigm where the newborn could be calm-awake or crying. Within this paradigm, different distress levels were defined as changes in the newborn’s status generated by uncomfortable scenarios (i.e., fuzziness, stress, pain, etc.), yielding in the following conditions: resting, cry and distress.

In order to ensure a proper data crossing among different nature data sources, all the devices were properly synchronized via timestamp before each session. In addition, markers were introduced in every signal type. Figure 1 shows the experimental design and overall analysis pipeline.

After extracting features from the neurophysiological and audio signals and analyzing COMFORT scale scores from facial expressions and body movements, we conducted statistical analysis between the extracted features and the different distress levels based on the cry sequences.

Audio Analysis Pipeline

Data acquisition. Newborn crying emissions were recorded by means of a portable high-quality field recorder (ZOOM H1N™) equipped with a unidirectional microphone, positioned at a fixed distance (30cm) from the infant’s mouth and stored on a multimedia laptop on a .WAV double channel audio track, with sampling rate Fs = 48 kHz and 24-bit resolution. Cries were never induced for the purpose of the study, as spontaneous vocalizations are part of normal infant behavior. Several audio recordings were registered during each session, in order to include various crying episodes, with a suitable amount of time both before and after each cry episode. During the recording, environmental noises, including human speech and noises from medical machinery, were also captured. Thus, our dataset resembles that of real-world samples.

Data processing. Segmentation. All the audio recordings have been manually segmented into cry episodes (CEs – the amount of time the infant cries in each audio recording divided by silence periods). Then, CEs have been manually segmented into cry units (CUs - individual cry patterns within a CE separated by an expiration phase). Visual spectrographic analysis has been carried out using iZotope RX 7 Audio Editor™. The classification of CEs and CUs has been done manually, considering segments with high spectral content and intensity over time as distress cries and those with lower spectral content as normal cries³⁹ (See Fig. 1a). Both the segmentation and the qualitative assessment of every CEs and CUs have been carefully reviewed by at least two cry signal experts. Cries without unanimous agreement among the experts were excluded from further analyses. Afterwards, the three different distress levels have been acoustically identified in every CE:

resting: no CEs, pause or resting periods with silent audio recordings, the newborn is not crying but awake/alert state.
cry: CEs composed by lower spectral content CUs and milder acoustical intensity.
distress: more acoustically intense CEs that are composed of high spectral content CUs.

Feature extraction.

Cepstrum Analysis. In order to prove the objectivity of qualitative labeling, Machine Learning algorithms have been used as an automatic approach to validate manual audio segmentation. For that purpose, two different approaches have been executed. The first one uses traditional Machine Learning based on a recent study¹¹ with a similar infant cry classification (pain vs non-pain) achieving a 90.7% accuracy using Random Forest⁴⁰, reason why this algorithm has been considered our baseline. As input features, the first thirteen Mel Frequency Cepstral Coefficients (MFCCs) of every CU have been computed using the Python 3 package for audio analysis Librosa. MFCCs are widely used in acoustic research and have been proven to be a reliable feature to classify audio signals⁴¹.

The second approach computes the spectrograms from each CU that were used as input of a Deep Learning (DL) algorithm. This DL method was employed to validate the manual binary classification of cry and distress conditions. In this case, we used a Convolutional Neural Network (CNN)⁴² consisting of 2-dimensional convolutional layers and dense layers. In order to avoid overfitting, pooling layers were also used together with batch normalization layers to optimize the training. In both approaches, 80% of the samples were used for the training and 20% for the validation.

Time Analysis. Within CEs, the actual vocalizations (cryCE) are not continuous, but are punctuated by inspirations and spontaneous pause or silence periods (unvoicedCE). The frequency of each of them within CEs was assessed by quantifying their total duration and percentage of appearance in the full CE. Specifically, the duration of every CU and the unvoiced window between CUs within every CE were computed for every cry pattern. Hence, the following variables were studied for full cry episodes: total duration in seconds and percentage of unvoiced part between CU per CE, duration in seconds and percentage of cry per CE and of unvoiced part between CE. Those episodes separated by less than 5 seconds labeled with the same condition were not included in the study as they should be considered together as a unique episode⁴³.

Frequency Analysis. Currently, there are several software tools that estimate F0 and resonance frequencies but all of them were initially applied to adult voices. Since the adult and infant vocal passages differ in shape, these tools should be used with precaution⁵. Digital signal processing and frequency analysis of each CU were conducted through Praat software⁴⁴, considering that is the most commonly used. Default values were changed according to the infant cry literature. A band-pass filter between 200–1200 Hz was selected. Audio recordings were collected with a sampling rate of 48,000 Hz, and the signal was low-pass filtered at 10,000 Hz⁴⁵. The main frequency features include F0 and its descriptive statistics (maximum, minimum, mean, standard deviation), the resonance frequencies of the vocal tract (F1, F2, F3) along with the percentage of high-pitch (F0 > = 800Hz)⁴⁶ and hyper-phonation (F0 > = 1000HZ)⁴⁷ level of the CU were computed. Other voice quality parameters related to the phonation of the vocalization are also included: local jitter (Jitter: micro-variations of the F0 measured with pitch period length deviations), local shimmer (Shimmer: amplitude deviations between pitch periods), harmonic to noise ratio (HNR, quantifies the amount of additive noise in the voice signal)⁴⁸. These perturbation measures are widely extended in clinical settings⁴⁹.

Eeg Pipeline

Data acquisition. Neurophysiological data were acquired using an ANT Nëo Monitor eego™ (ANT Neuro, Germany – CE mark MDD 93/42/EEC, CE class Iia, FDA 510(k) in USA.) with 8 EEG and 2 aEEG channels mounted in an elastic cap (waveguard ™ original, Germany) with high-quality Ag/AgCl sensors. These non-invasive, gel-based electrodes are fixed to the cap and present a very low profile, which makes this cap very comfortable for the newborn (e.g., avoiding excessive rubbing and pressure onto the scalp). The electrodes were placed according to the extended 10–20 positioning system (channels F3, F4, C3, C4, T7, T8, P3, P4) and were later re-referenced offline to the average reference. The sensor impedance was kept below 10kΩ, and EEG data were acquired at a sampling rate of 512 Hz. All recordings were done by research assistants/clinicians with EEG acquisition experience.

Data processing. The dataset was analyzed offline using Matlab r2022a with the Brainstorm Toolbox⁵⁰. A band-pass filter between 1–45 Hz was applied to the EEG data to remove power line contamination and low frequency artifacts. EEG data were manually examined by a careful visual inspection to detect ocular, muscle, and jump artifacts confirmed by an EEG expert (SP). We did not use an automatic algorithm because most of the methods available (e.g., ICA) are developed for adults' brain signals acquired in normal environments that generate artifacts that are generally easy to detect and correct. In our case and due to the nature of the acquisition of the data, newborns are crying, sometimes crying irritated while recording, so the movements and artifacts generated by that situation are not easy to detect nor correct, so the automatic methods become ineffective (see supplementary material).

After that, bad channels were manually identified and interpolated using spherical splines⁵¹. A maximum of 1 channel was interpolated if more channels were found as bad the whole trial was rejected from the analysis. The remaining artifact-free data were segmented into four-second epochs⁵², according to the audio/distress segmentation criteria mentioned in the section before yielding to the following conditions: resting, cry and distress.

EEG data analysis was performed for the following classical frequency bands: delta (ẟ: 1-4Hz), theta (θ: 4-8Hz) and alpha (α: 8-12Hz). Higher frequencies, from beta to gamma range, were not included in the analysis to avoid contamination with muscle activity.

Additionally, the power spectrum of each EEG sensor was computed by using Welch’s periodogram method⁵³, taking the 4s segments tapered with a Hanning window and a 50% overlap. For each sensor, relative power was calculated by normalizing the power at each frequency by total power over the 1–45 Hz range.

To quantify the relative power changes across conditions with respect to the resting state, the total relative power of the frequency bands analyzed was considered as 100%, and the percentage of relative power for each frequency band was calculated for each sensor and all the conditions.

Nirs Pipeline

Data acquisition. NIRS data were collected for the newborns along with the acquisition of audio and EEG, hence having the same conditions and same timestamps used on the other signals’ registrations. In this case, Root O3™ (Masimo, USA) - CE mark G1 092076 0013 Rev. 00) was the equipment selected for NIRS data acquisition. This device uses NIRS forehead sensors to enable measuring regional hemoglobin oxygen saturation (rSO2), i.e., the central oxygenation level. Functional arterial hemoglobin oxygen saturation (SpO2), i.e., the peripheral oxygenation level and pulse rate (PR-bpm), i.e., the heart rate signal are continuously and non-invasively monitored with a fingertip sensor on the newborn.

Data processing. rSO2, SpO2, and PR-bpm data were collected every 2 seconds and saved by the device. Later, these variables were exported offline and analyzed in Python 3. NIRS data that were characterized by a standard deviation lower than 0.5 were not considered in the analysis to eliminate errors from the data acquisition process. Also, the interquartile range (1.5*IQR) method was used to remove outliers. The remaining non-rejected data were segmented into normal cry, distress and resting time episodes based on the timestamps obtained in the audio segmentation criteria explained in the audio signal processing section. The 15 seconds preceding and following each segment were discarded⁹. In addition, a low band-pass filter was applied to the corresponding CE intervals removing SpO2 mean values lower than 80⁵⁴, rSO2 lower than 50⁵⁵ or PR-bpm lower than 70⁵⁶ to eliminate noise and errors derived from newborn’s movements.

Facial Expression & Body Movement Analysis

Nowadays, neonatologists use common tools to measure distress levels in newborns from a qualitative perspective, especially assessing crying, facial expressions, and body movements. Among them, the COMFORT scale allows for assessing distress states, sedation, and pain in nonverbal pediatric patients, being cry characteristics part of the assessment^57,58. The COMFORT scale was adapted to Spanish, and it has been shown to be a valid and reliable tool (Cronbach alpha coefficient of 0.785 for newborns) to assess comfort in a group of children admitted to a Spanish Intensive Care Unit^59,60. The COMFORT scale has been used to qualitatively evaluate the video recordings of facial expressions and body movements during each session and to identify the levels of distress.

Data acquisition and processing. A high-quality video recording of the newborn was acquired for each session ensuring the registration of facial expressions and body movements following a standardized protocol. Afterwards, two experts reviewed (AL, IAP) and assessed the newborns individually according to the COMFORT scale for each cry episode on the video. In case of disagreement between the experts, a third reviewer (AP) was asked to present their evaluation. The aspects evaluated include six sections: alertness, agitation, crying, body movements, muscular tone, and facial tension. Each section can be rated from 1 (calm infant) to 6 (stressed infant) and the total distress score of each CE ranges from 6 to 30, with larger score values indicating a higher arousal threshold.

Statistical Analysis

Statistical analysis was performed using Matlab r2022a, Graphpad Prism 8 and SPSS22. We conducted statistical comparisons between all three conditions (resting, crying and distress) and for each pairwise condition related to audio, EEG, and NIRS signals, and the COMFORT scale.

Shapiro-Wilk test was applied to feature arrays to verify that data were not normally distributed. Also, due to the nature of the data collection, which consisted of spontaneous cry recordings during the newborn's daily routine, the segments of the three different conditions (resting, crying, and distress) were not balanced. As such, we randomly selected a representative number of segments for each signal feature (audio, EEG, NIRS), described in the results section below.

Audio and NIRS processed data were compared with an ANOVA and a Tukey-Kramer tests for post hoc comparisons and a bootstrapping procedure repeated 10000 times to correct for normality. EEG and the COMFORT scale data were assessed with a Mann-Whitney U-test for pairwise comparisons, and a Kruskal-Wallis test when more than 3 conditions were compared. For EEG pairwise comparisons, the Holm-Bonferroni correction method was applied while for the 3 condition comparisons the Dunn's test was selected.

For an integrative approach, EEG/NIRS features and COMFORT scale results were correlated with the acoustics features using the Spearman (Rho) correlation coefficient. Additionally, the Kendall Coefficient of Concordance (W)⁶¹ was calculated to assess the level of agreement between audio features with neurophysiological and behavioral data for cry and distress conditions. We used Cohen's interpretation guideline⁶², where 0.3 < = W < 0.5 and W > = 0.5 correspond to moderate and strong agreement effects, respectively.

We analyzed cry acoustic features, cry acoustic features, neurophysiological signals from EEG and NIRS and facial expression and body movements’ scale from our newborns’ sample to assess and characterize different distress levels of cry along with the association between features.

Deep Learning Algorithm To Validate The Audio Manual Classification

The comparison of Machine Learning (ML) and Deep Learning (DL) techniques to validate the automatic segmentation of the cry recordings (Fig. 2a) is presented in the current section.

Through the manual segmentation we were able to identify 1473 cry CU, and 491 distress CU. This dataset was splitted into training (1572 CU) and testing (392 CU) sets to train a classifier.

A random split approach has been applied. ML and DL models were trained using the training set. The RF model achieved 89% accuracy, 97% sensitivity and 57% specificity rates on the test set discriminate distress vs non-distress conditions. Instead, the CNN model achieved 93% accuracy, 83% sensitivity and 95% specificity rates (Fig. 2b).

Acoustic Features And Distress Levels Based On Cepstral Cry Analysis

The present section shows the results obtained by comparing the cry features extracted with the cepstral analysis and the different distress levels identified through the 1964 CU extracted through the manual segmentation.

Table 1 shows the differences between conditions for the acoustic features for time and frequency domain analysis. The time domain analysis showed that the unvoiced CE as its percentage was shorter for distress compared to the cry condition. On the other hand, CryCE exhibited longer periods for cry condition compared to the presence of distress.

Moreover, F0 (mean), F0 (min) and HNR decreased in the distress condition compared to the cry one. An increase in features such as F0 (max), F0 (std), F1, F2, F3, high-pitch (F0 > 800Hz) and hyper-phonation percentage (F0 > 1000 Hz), Jitter, and Shimmer were found for distress compared to cry condition (see Table 1).

Table 1

Audio features characteristics (Time and Frequency Domain Analysis) and statistically significant differences among conditions (Cry and Distress conditions)
Feature	Cry	Distress	Statistics	p-value
Acoustic
Number of CEs	40	21	-	-
Number of CUs	1473	491	-	-
Time Domain
Unvoiced CE	19.446 ± 17.510	14.825 ± 9.193	1276	0.589
Unvoiced CE (%)	0.395 ± 0.163	0.353 ± 0.177	1320	0.227
Cry CE	30.833 ± 31.882	30.152 ± 16.749	1190	0.452
Cry CE (%)	0.605 ± 0.163	0.647 ± 0.177	1160	0.227
Frequency Domain
F0(mean)	477.563 ± 109.396	412.587 ± 109.124	11.419	0.001*
F0(min)	292.807 ± 112.566	221.517 ± 71.113	13.183	0.001*
F0(max)	717.339 ± 250.428	752.683 ± 283.166	-2.619	0.015*
F0(std)	94.230 ± 64.355	141.050 ± 86.254	-12.749	0.001*
F1(mean)	1428.963 ± 406.158	1630.672 ± 470.433	-9.148	0.001*
F2(mean)	3557.709 ± 448.793	3739.816 ± 452.544	-7.738	0.001*
F3(mean)	5897.238 ± 487.545	6094.533 ± 458.543	-8.125	0.001*
High-pitch(F0 > 800Hz)	0.029 ± 0.122	0.047 ± 0.125	-2.886	0.005*
Hyper-phonation(F0 > 1000Hz)	0.012 ± 0.079	0.023 ± 0.081	-2.590	0.011*
HNR	11.880 ± 4.381	6.662 ± 3.300	23.862	0.001*
Jitter	0.016 ± 0.011	0.022 ± 0.013	-10.437	0.001*
Shimmer	0.113 ± 0.044	0.143 ± 0.041	-13.228	0.001*

ANOVA and Tukey-Kramer tests were used for post hoc comparisons and a bootstrapping procedure was repeated 10000 times to correct for normality and unbalanced categories. F values and p-values are shown in the two last columns. Data are presented as mean ± std. * is referred as statistically significant (p < 0.05) and ** as statistically highly significant (p < 0.001).

Patterns In Neurophysiological Data For Different Cry Distress Levels

Regarding the EEG findings, the power spectrum analysis showed that for different distress levels (Fig. 3b), the relative power change in the delta band decreased compared to the resting condition. All 8 electrodes studied showed this pattern (p < 0.001). For theta and alpha bands, the different distress levels depicted the opposite pattern, an increase of the relative power change compared to the resting condition. Additionally, Fig. 3a shows the topological distribution of the relative power values across the different conditions for delta, theta and alpha bands. For different distress levels, the resting condition attenuated and the distribution of the power varied. The cry condition showed in the delta band a frontal relative power distribution around the F3 and F4 electrodes. The distress condition showed a fronto-parietal pattern compared to the resting condition in delta and theta bands. For the alpha band, the distress condition depicted a frontal relative power distribution.

Figure 3b depicts the percentage of change in relative power for the different distress levels studied. In the delta band, all electrodes presented statistical differences (p < 0.001) showing a decrease in the percentage of change in the relative power and the mean percentage of change within this frequency band for cry was − 3.15% and − 6.27% for distress conditions compared to resting (100%). An increase in the percentage of change can be observed in the theta band. Statistically significant differences (p < 0.001) were found for electrodes P4, F3, F4, T7 and T8. The mean percentage of change in theta band for the cry condition was 66.54% and 93.67% for distress compared to resting. All electrodes on alpha showed statistically significant differences (p < 0.001). A pattern of increase in the percentage of change similar to the theta band was found for the alpha one. The mean percentage of change in alpha band for cry was 166.55% and 215.69% for distress compared to resting.

Furthermore, the statistical differences between the conditions regarding the relative power distribution showed different topological patterns. Thus, in the delta and alpha bands when comparing the resting and cry conditions, a significant and diffuse pattern can be observed in the whole head (Fig. 4.a-b-ẟ-α). Antero-posterior statistically significant differences can be noted comparing both different distress levels of cry in the delta and theta bands while the alpha band showed mostly frontal differences (Fig. 4. b-θ-α, c-ẟ-θ-α). Analyzing the theta band, a posterior pattern of differences occurred comparing resting and cry conditions (Fig. 4.a-θ). Table 1S (see supplementary material) reports the results of the statistical analysis and shows the statistically significant differences found between conditions, frequency bands, and electrodes.

Briefly, the distress condition, acoustically associated with high spectral content and intensity over time, presented higher percentage changes in relative power in the theta and alpha bands, and conversely lower in the delta band compared to the cry and resting conditions.

Variation In The Oxygenated Hemoglobin Level During The Newborn Arousal State

The present section introduces the results obtained by comparing the regional and functional arterial hemoglobin levels together with the pulse rate in the different newborn conditions.

Figure 5 shows the differences between NIRS collected variables and the different conditions. The frontal oxygen saturation expressed as rSO2 decreased in the cry and distress condition compared to the resting condition (Figure 5a) even though no statistical differences were found. SpO2 also decreased in the cry and distress conditions (p<0.05) compared to the resting condition (Figure 5b). The pulse rate measured by PR-bpm increased during cry (p<0.001) and distress (p<0.001) conditions compared to resting (Figure 5c). From a descriptive perspective, when high spectral content and intensity are present acoustically, we noticed a trend of SpO2 and rSO2 decreases accompanied with a statistically significant increase of the PR-bpm. Table 1S (see supplementary material) shows significant differences between rSO2, SpO2, and PR-bpm and the three conditions.

Behavioral Changes Determined By The Distress In Cry Acoustic Features

In this section, we addressed the statistical comparison between the newborns’ distress levels in the COMFORT scale scores.

Figure 6 shows the differences between all items within the behavioral assessment scored with the COMFORT scale for different distress levels derived from the cry cepstral analysis. Higher scores were found in the distress condition for all the features analyzed compared to cry and resting conditions. Table 2S (see supplementary material) shows detailed values for the statistical significance comparison among conditions.

Integrative Approach Between Audio Features And Neurophysiological Signals

With the aim to explore to what extent the audio features of the different distress levels of cry were associated with the neurophysiological (EEG and NIRS) and behavioral (COMFORT scale) variables analyzed in this study, we applied a Spearman correlation analysis and Kendall’s coefficient (W) of concordance using the aforementioned features.

Figure 7 shows the whole correlation matrix between all the features analyzed. Audio features in the frequency domain such as F0 (min) (electrodes ẟP4: R = 0.42 - p = 0.04) and F1 (ẟP3: R = 0.43 - p = 0.03, ẟC3: R = 0.42 - p = 0.03) correlated positively to the EEG relative power in the delta band respectively. However, we found negative correlations when Jitter (ẟT7: R=-0.49 - p = 0.01), Shimmer (ẟT7: R=-0.45 - p = 0.02) and F3 (ẟP4: R=-0.42 - p = 0.03, ẟC3: R=-0.40 - p = 0.04) are compared to the delta band power. On the other hand, Jitter (electrode F3: (R = 0.42- p = 0.03), the percentage of high-pitch (F0 > 800Hz) (θP3: R = 0.41 - p = 0.04) and the percentage of hyper-phonation (F0 > 1000Hz) (θP3: R = 0.45 - p = 0.02) positively correlate with EEG on theta band power respectively. Contrary to delta band, F1 (ẟP3: R=-0.49 - p = 0.01, ẟC3: R=-0.40 - p = 0.04) correlated negatively with theta band power.

On other features such as NIRS, we found a negative correlation between the rSO2 with cryCE (R=-0.54 - p = 0.005) and a positive one between PR-bpm and cryCE (R = 0.67 - p = 0.0003). Additionally, delta band power correlated positively with SpO2 (ẟP3: R = 0.43 - p = 0.03). Furthermore, we found a negative correlation in theta band power and rSO2 (θC4: R=-0.41 - p = 0.04) and between alpha band power and SpO2 (ɑP3: R=-0.41 - p = 0.04, ɑP4: R=-0.45 - p = 0.02, ɑF3: R=-0.49 - p = 0.01, ɑF4: R=-0.46 - p = 0.02 and ɑT7: R=-0.48 - p = 0.01)

Several variables from the COMFORT scale also correlated with audio features. For instance, the percentage of cryCE correlated positively with all the scores from the COMFORT scale (p < 0.01). On the other hand, we found negative correlations between the percentage of unvoicedCE and most of the COMFORT scale scores (p < 0.01). Also, several theta and alpha bands features also correlated positively with the values of the COMFORT scale; however, the same COMFORT scale scores correlate negatively with the delta band. For a detailed description of all statistically significant correlations found related to these comparisons see supplementary Table 3S & 4S.

Although some features were not statistically significant, the correlation matrix (see Fig. 7) depicted some clear trends. In summary, audio features such as HNR and F0 (mean) reflect a positive correlation with the delta band power while F0 (max), F0 (std), and Shimmer with theta band power (mainly on electrodes T7, T8, and F3). On the time domain, cryCE% presents some positive correlations with theta and alpha band power while unvoicedCE% with delta band power. In general, the alpha band presents very similar correlations to the theta band but it shows less strong correlation on audio features.

To measure the level of agreement among audio features, EEG and NIRS features, and the COMFORT scale scores during cry and distress conditions, the concordance coefficient W was computed. Figure 8 shows W coefficients for the cry (purple) and distress (red) conditions, an asterisk identifies the W values greater than 0.5 indicating strong agreement levels among features.

Most of the EEG features exhibited strong levels of agreement with the audio features such as F0 (mean, min, max, std), Jitter, Shimmer, F1, F2, F0 > 800 and F0 > 1000 with delta band power for cry and distress conditions. HNR, cryCE (%) and unvoicedCE (%) showed higher levels of agreement with theta and alpha band power in both cry and distress conditions. Additionally, F3, the percentage of high-pitch (F0 > 800) and the percentage of hyper-phonation (F0 > 1000) presented stronger levels of concordance with the alpha band power.

F0 (mean and min), HNR, F1, F2 and cryCE (%) exhibited a strong level of concordance with theta band power, especially for distress. The higher values of agreement (W > 0,75) were found for F0 (mean and min) with theta band power (electrode C3), unvoicedCE (%) with theta band power (electrodes F4 and T7) in the distress condition and alpha band power (electrode P3) in cry condition.

Regarding NIRS features, the quality acoustic measure HNR seems to exhibit the strongest level of concordance in both cry and distress conditions for rSO2, SpO2 and PR-bpm.

Concerning the COMFORT scale scores, the stronger agreements are present on F0 (min) for the distress condition and the resonance frequencies (F1, F2, F3), hyper-phonation and cryCE (%) in the cry condition.

This study presents an analysis of cry with neurophysiological and behavioral signals, comparing acoustic cry features (F0; F1-F3; Jitter, Shimmer, percentages of voiced and unvoiced episodes) to EEG power spectrum, NIRS hemoglobin oxygenation, physiological features (i.e., pulse rate and oxygen saturation), movement features (e.g., body rigidity), facial expression characteristics (e.g., tension) during three different newborn conditions (i.e., cry, distress, and resting).

Our findings showed, for the first time, that cry acoustic features are correlated with EEG, NIRS, facial expression and body movement changes measured with the COMFORT scale, supporting cry research studies that want to prove the potential use of cry analysis as a clinical biomarker to describe the infant’s health status.

Also, we demonstrated that there are statistically significant differences among the features related to the three newborn conditions (i.e., cry, distress, and resting).

Within this work, we have also developed a Deep Learning algorithm as an automatic approach to perform audio recording segmentation to identify distress cries.

Limited research has been conducted in order to understand infant cry as a reflection of complex neurophysiological and behavioral functions. However, the characteristics of cry patterns in acoustics signals have been studied and well defined on specific features such as F0, but to the best of our knowledge, there is no literature regarding multimodal data collection focused on comparing cry acoustic features to hemodynamics changes, EEG oscillations, heart rate, facial expression, and gross bodily movements of newborns. Previous studies investigated correlations between newborn cry and NIRS⁹ or neonatal facial expressions⁶³ or EEG^36,38, or body movements⁶⁴ separately. Moreover, no studies have been conducted correlating cry and neurophysiological signals to different newborns’ distress levels.

Our results suggested that higher distress levels in newborns represented more F0 changes, high-pitched and hyper-phonated cries along with tendencies of higher Jitter and Shimmer and lower HNR, higher amount of cryCE and less unvoiced periods, decrease delta activity and increase theta and alpha activation, higher heart rate, lower cerebral and body oxygenation, and higher scores on the COMFORT scale assessment of the body/face expressions. These results matched with the scant studies^24,25,29,30 investigating the relation between vagal function and the F0 of infant crying, even in typically developing infants. This is in line with Zeskind’s findings⁶⁵ where cries with a faster repetition rate, shorter cry expirations or pauses, and higher F0 values may elicit more urgent caregiver responses than other vocalizations with less intense acoustic characteristics. Also, our results matched the limited literature on Jitter, Shimmer, HNR or excessive crying when studying irritable infants⁶⁶ or dysphonation in adults⁴⁹. In a summary, our findings were consistent with the assumption that the myelinated branch of the vagus system is involved in both the regulation of heart rate and laryngeal muscles, suggesting that vagal influence on the heart may reflect vagal output to the laryngeal muscles, related to the F0 of infant crying³⁰. In fact, the audio features extracted from the time domain analysis such as cryCE correlated negatively with rSO2 and positively with PR-bpm. Moreover, several items from the behavioral COMFORT scale were associated with F0 (mean), F1, F3, hyper-phonation (F0 > 1000Hz), unvoicedCE and cryCE percentages. These results were also coherent with the findings from Craig et al.,^67,68 enhancing the association of the state of arousal of the infant cry acoustics with physiological measures such as higher cardiac vagal tone and lower oxygen levels combined with behavioral signs of distress such as facial tension, rigidity or vigorous body movements.

Regarding neurophysiological signals, two previous^36,38 studies analyzed cry episodes and EEG brain activity. Maitre et al.,³⁶ studied whether the cry presence and amplitude in acoustic signals in a newborn is a valid indicator of pain processing in the brain, reinforcing the idea of crying as an ancient evolutionary tool for infants' communication. Additionally, Futagi et al.,³⁸ found that theta band activity in infants aged 2–11 months was associated with posterior regions (bi-posterior and mid temporal areas and occipital electrodes) during sucking or crying using power spectral analysis. They also hypothesized that the posterior rhythms are driven by the neuronal impulses from the limbic system through the connection between them, correlating the theta band activity to the affective experience. However, these studies do not delve into the dynamics of the cry or either the different distress levels over different frequency bands or do they add extra variables that allow the identification of other patterns.

In our study, we proved that the delta band relative power of the different distress levels decreased compared to the resting state condition. Delta band in a predominant frequency with diffuse activity over central and occipital regions during wakefulness of a newborn⁶⁹. Therefore, it is logical that while other types of electrical activity decrease, resting activity increases in this frequency band.

Moreover, theta and alpha bands' power showed similar trends for the different distress levels of cry. Theta and alpha bands depicted an increase in the percentage relative power change compared to the resting condition (more than 60% for theta band and more than 100% for alpha one) over frontal-parietal and temporal areas. These increases in power over different distress levels suggest the association between theta and alpha bands and stress episodes. Walter and Walter⁷⁰ reported an increment of theta activity during a pleasant emotional change as during an unpleasant one. Several articles^35,71,72 over the years reported brain activity in theta and alpha bands during mental stress or mental tasks. In summary, an increase in the activity of both frequency bands is related to more variation in F0, high-pitched and hyper-phonated cries, higher Jitter and Shimmer, a greater number of cryCE periods and less unvoiced segments implying higher distress levels.

Furthermore, comparing the acoustic features to EEG, NIRS and the COMFORT scale scores, we found a positive correlation between SpO2 with delta band power and a negative correlation between SpO2 and the power in theta and alpha bands, which from a neurophysiological perspective, strengthen the feature associations for different distress levels. Interestingly, for frequency audio features, F0 (min), high-pitch (F0 > 800Hz), hyper-phonation (F0 > 1000Hz), Jitter and Shimmer correlated with delta and theta bands power in EEG, mainly in frontal, temporal and parietal electrodes. Other features such as F0 (mean), F0 (std), HNR, cryCE% and unvoicedCE% show evident trends in the same frequency bands. Moreover, sometimes, delta, theta, and alpha bands correlate with the values of the COMFORT scale. According to the literature related to cortical activation in adults⁵³ and newborns⁴⁹, the correlations that we found enhance the fact that more intense cry vocalizations characterized by higher spectral values represent an increase of brain activity in theta and alpha band and a decrease in delta band power, implying more agitation for the newborn.

Additionally, it is important to highlight that, to the best of our knowledge, to this date there are no studies that have used Deep Learning with a CNN approach for the classification of the different distress levels of the newborn cry achieving robust and high accuracy results. Most of the literature is based on Machine Learning classification techniques. Qiaobing et al.,⁷³ assessed infant cry distress levels with an automatic approach based on Hidden Markov Models (HMMs) achieving 87.1% accuracy in the testing sample. Similar classification problems¹¹ differentiating painful cry episodes achieved up to 90.7% accuracy, 91% sensitivity and 68% specificity. Considered as a secondary aim of this research, our Deep Learning approach obtained 93% of accuracy, 83% sensitivity and 95% specificity, showing better performance in identifying distress and non-distress infant cries and supporting the validation of our audio manual segmentation.

Nevertheless, this exploratory study presents some limitations. The main ones are related to the small sample size presented and the low density of EEG (i.e., only 8 electrodes were recorded) and NIRS (only one frontal electrode was used) systems. Previous studies^2,74 hypothesized that crying is associated with subcortical and limbic structures related to emotional processing, but our low-density system restricted the possibility to investigate such deep structures. Despite this limitation, we were able to identify clear patterns of brain activity and statistical differences were found among features and newborns’ conditions. In addition, we found associations between acoustic and brain signals supporting the hypothesis that crying can be considered a biomarker of neurophysiological changes in newborns. Another limitation of our study is linked to the difficulty experienced during data acquisition because infant recordings are usually affected by noise artifacts, either muscular due to neonatal movement or contamination due to environmental noise. Regarding EEG recordings, there are no robust automatic algorithms that allow the correction of newborn’s movement artifacts, for this reason, manual inspection by an EEG expert is necessary to ensure the best signal quality. Similar issues were found with the NIRS equipment, as the fingertip sensor was not always able to detect a clear signal, for example, when the infant was agitated, the signal was polluted with noise and outliers. Lastly, we were not able to collect balanced data samples for each condition due to the nature of spontaneous crying. In fact, infants usually cry less often in painful or stressful situations. As such our data samples are limited.

Our future studies will focus on collecting data on a larger sample of newborns and apply denser EEG systems to investigate the underlying neurophysiological sources delimited by anatomical regions during different cry distress levels and cry associations with prematurity and/or pathological signs. In future steps, we will first increase the number of healthy term babies and we will also recruit preterm and pathological infants in a multicentric longitudinal study. In this way, we will be able to replicate and extend the analysis presented in this manuscript by comparing data from different sub-cohorts to validate that infant cry is an objective sign, signal and symptom evidencing the physical, emotional and health status of a newborn.

This work characterizes and compares different distress levels on acoustic signals with EEG, NIRS and the COMFORT scale scores supporting the idea that different acoustic patterns reflect neurophysiological and behavioral changes related to the newborn arousal state. In addition, we presented for the first time, an automated classifier based on a Deep Learning model able to detect different levels of distress of crying.

In conclusion, the present study identifies and provides important evidence to cover an existing literature gap related to the multimodal association of newborn cry acoustics with brain activity, cerebral and body oxygenation, heart rate, facial expression and body movements. This relationship proves that the acoustical analysis of the infant cry may play a pivotal role to recognize different distress levels of crying. Moreover, it strengthens the promising use of infant cry as a biomarker supporting caregivers and clinicians on the early detection of certain pathologies and neurodevelopmental disorders.

ACKNOWLEDGMENTS

Zoundream AG, a health tech startup specializing in cry analysis, has been the sponsor of this study in collaboration with the Hospital Clínic Barcelona. The authors thank all the newborns' families and the Hospital Clínic - Maternitat for their trust and kind cooperation during data collection.

CONFLICT OF INTEREST

The authors declare competing interests (Funding, Employment or Confidentiality interests) in relation to the work described herein.

Ana Laguna, Sandra Pusil, Irene Acero Pousa and Paolo Piras are employed by Zoundream AG. Ana Laguna is also co-founder of the company and owns stock in Zoundream AG. Silvia Orlandi and Jonathan Adrian Zegarra-Valdivia receive compensation for the collaboration as members of the scientific advisory board of Zoundream AG. Clàudia Palomares’ salary is funded by Zoundream AG through Fundació Clínic. Anna Lucia Paltrinieri and Oscar Garcia-Algar declare no potential conflict of interest.

DATA AVAILABILITY

The data that support the findings of this study are available from Zoundream AG. Data can be available from the authors upon reasonable request, and with the written permission of Zoundream AG.

AUTHOR CONTRIBUTIONS

AL: Conceptualization, Funding acquisition, Methodology, Data processing, Formal analysis, Manuscript – original draft preparation. SP: Conceptualization, Methodology, Data processing, Formal analysis, Manuscript – original draft preparation and preparing figures. IAP: Methodology, Data processing, Formal analysis, Manuscript – original draft preparation. JAZV: Advice on Data processing, Manuscript – review & editing. ALP: Site facilitator, Conceptualization, Data collection, Methodology. PP: Data processing. CP: Data collection. OGA: Site facilitator, Funding acquisition. SO: Advice on Data analysis pipeline and Result discussion, Manuscript – review & editing. All present authors contributed to the article and approved the submitted version.

Friedlander, R. Crying as a Sign, Symptom & a Signal. Journal of the Canadian Academy of Child and Adolescent Psychiatry 15, 40 (2006).
Bylsma, L. M., Gračanin, A. & Vingerhoets, A. J. J. M. The neurobiology of human crying. Clinical Autonomic Research 29, 63–73 (2019).
Golub, H. L. & Corwin, M. J. A Physioacoustic Model of the Infant Cry. in Infant Crying 59–82 (Springer US, 1985). doi:10.1007/978-1-4613-2381-5_3.
Porges, S. W., Doussard-Roosevelt, J. A., Lourdes Portales, A. & Suess, P. E. Cardiac vagal tone: Stability and relation to difficultness in infants and 3-year-Olds. Dev Psychobiol 27, 289–300 (1994).
LaGasse, L. L., Neal, A. R. & Lester, B. M. Assessment of infant cry: Acoustic cry analysis and parental perception. Ment Retard Dev Disabil Res Rev 11, 83–93 (2005).
Wermke, K., Mende, W., Manfredi, C. & Bruscaglioni, P. Developmental aspects of infant’s cry melody and formants. Med Eng Phys 24, 501–514 (2002).
Manfredi, C. et al. Automated detection and classification of basic shapes of newborn cry melody. Biomed Signal Process Control 45, 174–181 (2018).
Manfredi, C., Bocchi, L., Orlandi, S., Spaccaterra, L. & Donzelli, G. P. High-resolution cry analysis in preterm newborn infants. Med Eng Phys 31, 528–532 (2009).
Orlandi, S., Bocchi, L., Donzelli, G. & Manfredi, C. Central blood oxygen saturation vs crying in preterm newborns. Biomed Signal Process Control 7, 88–92 (2012).
Lawford, H. L. S., Sazon, H., Richard, C., Robb, M. P. & Bora, S. Acoustic Cry Characteristics of Infants as a Marker of Neurological Dysfunction: A Systematic Review and Meta-Analysis. Pediatr Neurol 129, 72–79 (2022).
Parga, J. J. et al. Defining and distinguishing infant behavioral states using acoustic cry analysis: is colic painful? Pediatr Res 87, 576–580 (2020).
Ruíz Díaz, M. A., Reyes García, C. A., Altamirano Robles, L. C., Xalteno Altamirano, J. E. & Verduzco Mendoza, A. Automatic infant cry analysis for the identification of qualitative features to help opportune diagnosis. Biomed Signal Process Control 7, 43–49 (2012).
Kheddache, Y. & Tadj, C. Resonance frequencies behavior in pathologic cries of newborns. J Voice 29, 1–12 (2015).
Kheddache, Y. & Tadj, C. Identification of Diseases in Newborns Using Advanced Acoustic Features of Cry Signals. Biomed Signal Process Control 50, 35–44 (2019).
Esposito, G. & Venuti, P. Understanding early communication signals in autism: a study of the perception of infants’ cry. J Intellect Disabil Res 54, 216–223 (2010).
Sheinkopf, S. J., Iverson, J. M., Rinaldi, M. L. & Lester, B. M. Atypical cry acoustics in 6-month-old infants at risk for autism spectrum disorder. Autism Res 5, 331–339 (2012).
Mampe, B., Friederici, A. D., Christophe, A. & Wermke, K. Newborns’ cry melody is shaped by their native language. Curr Biol 19, 1994–1997 (2009).
Manfredi, C. et al. Automated analysis of newborn cry: relationships between melodic shapes and native language. Biomed Signal Process Control 53, 101561 (2019).
Morelli, M. S., Orlandi, S. & Manfredi, C. BioVoice: A multipurpose tool for voice analysis. Biomed Signal Process Control 64, 102302 (2021).
Farsaie Alaie, H. & Tadj, C. Cry-based classification of healthy and sick infants using adapted boosting mixture learning method for gaussian mixture models. Modelling and Simulation in Engineering 2012, (2012).
Rosales-Pérez, A. et al. Classifying infant cry patterns by the Genetic Selection of a Fuzzy Model. Biomed Signal Process Control 17, 38–46 (2015).
Orlandi, S., Reyes Garcia, C. A., Bandini, A., Donzelli, G. & Manfredi, C. Application of Pattern Recognition Techniques to the Classification of Full-Term and Preterm Infant Cry. J Voice 30, 656–663 (2016).
Zabidi, A. et al. Detection of asphyxia in infants using deep learning Convolutional Neural Network (CNN) trained on Mel Frequency Cepstrum Coefficient (MFCC) features extracted from cry sounds. Journal of Fundamental and Applied Sciences 9, 768–778 (2018).
Porter, F. L., Porges, S. W. & Marshall, R. E. Newborn Pain Cries and Vagal Tone: Parallel Changes in Response to Circumcision. Child Dev 59, 495–505 (1988).
Soltis, J. The signal functions of early infant crying. Behavioral and Brain Sciences 27, 443–458 (2004).
Bellieni, C. v., Sisto, R., Cordelli, D. M. & Buonocore, G. Cry features reflect pain intensity in term newborns: an alarm threshold. Pediatr Res 55, 142–146 (2004).
Zamzmi, G. et al. A Review of Automated Pain Assessment in Infants: Features, Classification Tasks, and Databases. IEEE Rev Biomed Eng 11, 77–96 (2018).
Vogt, B. A. & Barbas, H. Structure and Connections of the Cingulate Vocalization Region in the Rhesus Monkey. in The Physiological Control of Mammalian Vocalization 203–225 (Springer US, 1988). doi:10.1007/978-1-4613-1051-8_12.
Stewart, A. M. et al. The covariation of acoustic features of infant cries and autonomic state. Physiol Behav 120, 203–210 (2013).
Shinya, Y., Kawai, M., Niwa, F. & Myowa-Yamakoshi, M. Associations between respiratory arrhythmia and fundamental frequency of spontaneous crying in preterm and term infants at term‐equivalent age. Dev Psychobiol 58, 724–733 (2016).
Newman, J. D. Neural circuits underlying crying and cry responding in mammals. Behavioural Brain Research 182, 155–165 (2007).
Kaada B R. Somato-motor, autonomic and electrocorticographic responses to electrical stimulation of rhinencephalic and other structures in primates, cat, and dog; a study of responses from the limbic, subcallosal, orbito-insular, piriform and temporal cortex, hippocampus-fornix and amygdala. Acta Physiol Scand Suppl 24, 1–262 (1951).
Manfredi, C. et al. Non-invasive distress evaluation in preterm newborn infants. in 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society vol. 2008 2908–2911 (IEEE, 2008).
Piallini, G., de Palo, F. & Simonelli, A. Parental brain: cerebral areas activated by infant cries and faces. A comparison between different populations of parents and not. Front Psychol 6, (2015).
Norman, E. et al. Electroencephalographic Response to Procedural Pain in Healthy Term Newborn Infants. Pediatr Res 64, 429–434 (2008).
Maitre, N. L. et al. Cry presence and amplitude do not reflect cortical processing of painful stimuli in newborns with distinct responses to touch or cold. Arch Dis Child Fetal Neonatal Ed 102, F428–F433 (2017).
Laurent, H. K. & Ablow, J. C. A cry in the dark: depressed mothers show reduced neural activation to their own infant’s cry. Soc Cogn Affect Neurosci 7, 125–134 (2012).
Futagi, Y., Ishihara, T., Tsuda, K., Suzuki, Y. & Goto, M. Theta rhythms associated with sucking, crying, gazing and handling in infants. Electroencephalogr Clin Neurophysiol 106, 392–399 (1998).
Gustafson, G. E. & Green, J. A. On the Importance of Fundamental Frequency and Other Acoustic Features in Cry Perception and Infant Development. Child Dev 60, 772 (1989).
Breiman, L. Random Forests. Machine Learning 2001 45:1 45, 5–32 (2001).
Ji, C., Mudiyanselage, T. B., Gao, Y. & Pan, Y. A review of infant cry analysis and classification. EURASIP J Audio Speech Music Process 2021, 1–17 (2021).
O’Shea, K. & Nash, R. An Introduction to Convolutional Neural Networks. (2015) doi:10.48550/arxiv.1511.08458.
Barr, R. G., Kramer, M. S., Boisjoly, C., McVey-White, L. & Pless, I. B. Parental diary of infant cry and fuss behaviour. Arch Dis Child 63, 380–387 (1988).
Boersma, P. Praat, a system for doing phonetics by computer. Glot International 5, (2002).
Rautava, L. et al. Acoustic quality of cry in very-low-birth-weight infants at the age of 1 1/2 years. Early Hum Dev 83, 5–12 (2007).
Kheddache, Y. & Tadj, C. Frequential characterization of healthy and pathologic newborns cries. Am J Biomed Eng 3, 182–193 (2013).
Zeskind, P. S. et al. Development of Translational Methods in Spectral Analysis of Human Infant Crying and Rat Pup Ultrasonic Vocalizations for Early Neurobehavioral Assessment. Front Psychiatry 2, (2011).
Teixeira, J. P., Oliveira, C. & Lopes, C. Vocal Acoustic Analysis – Jitter, Shimmer and HNR Parameters. Procedia Technology 9, 1112–1122 (2013).
Teixeira, J. P. & Fernandes, P. O. Acoustic Analysis of Vocal Dysphonia. Procedia Comput Sci 64, 466–473 (2015).
Tadel, F., Baillet, S., Mosher, J. C., Pantazis, D. & Leahy, R. M. Brainstorm: A User-Friendly Application for MEG/EEG Analysis. Comput Intell Neurosci 2011, 1–13 (2011).
Perrin, F., Pernier, J., Bertrand, O. & Echallier, J. F. Spherical splines for scalp potential and current density mapping. Electroencephalogr Clin Neurophysiol 72, 184–187 (1989).
Cohen, M. X. Analyzing Neural Time Series Data: Theory and Practice. Analyzing Neural Time Series Data (2014) doi:10.7551/MITPRESS/9609.001.0001.
Welch, P. D. The Use of Fast Fourier Transform for the Estimation of Power Spectra: A Method Based on Time Averaging Over Short, Modified Periodograms. IEEE Transactions on Audio and Electroacustics 15, (1967).
Lu, Y.-C. et al. Reevaluating Reference Ranges of Oxygen Saturation for Healthy Full-term Neonates Using Pulse Oximetry. Pediatr Neonatol 55, 459–465 (2014).
Lian, C., Li, P., Wang, N., Lu, Y. & Shangguan, W. Comparison of basic regional cerebral oxygen saturation values in patients of different ages: a pilot study. Journal of International Medical Research 48, 030006052093686 (2020).
Kliegman, R. M. & Geme, J. S. Nelson Texbook of Pediatrics. Chapter 449: History and Physical Examination in Cardiac Evaluation. Nelson Textbook of Pediatrics vol. 2 (Elsevier Health Sciences, 2019).
van Dijk, M. et al. The reliability and validity of the COMFORT scale as a postoperative pain instrument in 0 to 3-year-old infants. Pain 84, 367–377 (2000).
Wielenga, J., de Vos, R., de Leeuw, R. & de Haan, R. Comfort Scale: A Reliable and Valid Method to Measure the Amount of Stress of Ventilated Preterm Infants. Neonatal Network 23, 39–44 (2004).
Bosch-Alcaraz, A., Jordan, I., Guàrdia Olmos, J. & Falcó-Pegueroles, A. Adaptación transcultural y características de la versión española de la escala COMFORT Behavior Scale en el paciente crítico pediátrico. Med Intensiva 44, 542–550 (2020).
Bosch-Alcaraz, A. et al. Especificidad y sensibilidad de la COMFORT Behavior Scale-Versión española para valorar el dolor, el grado de sedación y síndrome de abstinencia en el paciente crítico pediátrico. Estudio multicéntrico COSAIP (Fase 1). Enferm Intensiva 33, 58–66 (2022).
Kendall, M. G. (Maurice G. & Gibbons, J. D. Rank correlation methods. (E. Arnold, 1990).
Cohen, J. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences (Routledge, 2013). doi:10.4324/9780203774441.
Melo, G. M. de, Lélis, A. L. P. de A., Moura, A. F. de, Cardoso, M. V. L. M. L. & Silva, V. M. da. Pain assessment scales in newborns: integrative review. Revista Paulista de Pediatria 32, 395 (2014).
Orlandi, S. et al. AVIM—A contactless system for infant data acquisition and analysis: Software architecture and first results. Biomed Signal Process Control 20, 85–99 (2015).
Zeskind, P. S., Sale, J., Maio, M. L., Huntington, L. & Weiseman, J. R. Adult Perceptions of Pain and Hunger Cries: A Synchrony of Arousal. Child Dev 56, 549 (1985).
Fuller, B. F., Keefe, M. R., Curtin, M. & Garvin, B. J. Acoustic Analysis of Cries from ‘Normal’ and ‘Irritable’ Infants. West J Nurs Res 16, 243–253 (1994).
Craig, K. D., Prkachin, K. M. & Grunau, R. E. Handbook of pain assessment. The facial expression of pain. (The Guilford Press, 2001).
Craig, K. D., Whitfield, M. F., Grunau, R. V. E., Linton, J. & Hadjistavropoulos, H. D. Pain in the preterm neonate: behavioural and physiological indices. Pain 52, 287–299 (1993).
Eisermann, M., Kaminska, A., Moutard, M.-L., Soufflet, C. & Plouin, P. Normal EEG in childhood: From neonates to adolescents. Neurophysiologie Clinique/Clinical Neurophysiology 43, 35–65 (2013).
Walter, V. J. & Walter, W. G. The central effects of rhythmic sensory stimulation. Electroencephalogr Clin Neurophysiol 1, 57–86 (1949).
Seo, S.-H. & Lee, J.-T. Stress and EEG. in Convergence and Hybrid Information Technologies (InTech, 2010). doi:10.5772/9651.
Jena, S. Examination stress and its effect on EEG. Int J Med Sci Public Health 4, 1493 (2015).
Qiaobing Xie, Ward, R. K. & Laszlo, C. A. Automatic Assessment of Infants’ Levels-of-Distress from the Cry Signals. IEEE Transactions on Speech and Audio Processing 4, 253 (1996).
Etkin, A., Egner, T. & Kalisch, R. Emotional processing in anterior cingulate and medial prefrontal cortex. Trends Cogn Sci 15, 85–93 (2011).

Competing interest reported. The authors declare competing interests (Funding, Employment or Confidentiality interests) in relation to the work described herein. Ana Laguna, Sandra Pusil, Irene Acero Pousa and Paolo Piras are employed by Zoundream AG. Ana Laguna is also co-founder of the company and owns stock in Zoundream AG. Silvia Orlandi and Jonathan Adrian Zegarra-Valdivia receive compensation for the collaboration as members of the scientific advisory board of Zoundream AG. Clàudia Palomares’ salary is funded by Zoundream AG through Fundació Clínic. Anna Lucia Paltrinieri and Oscar Garcia-Algar declare no potential conflict of interest.

SupplementaryMaterial.docx

Download PDF

Version 1

posted

You are reading this latest preprint version

How can cry acoustics associate newborns’ distress levels with neurophysiological and behavioral signals?

Status:

Version 1

Abstract

Figures

Introduction

Methods

Participants

Ethical Considerations

Procedure

Audio Analysis Pipeline

Eeg Pipeline

Nirs Pipeline

Facial Expression & Body Movement Analysis

Statistical Analysis

Results

Deep Learning Algorithm To Validate The Audio Manual Classification

Acoustic Features And Distress Levels Based On Cepstral Cry Analysis

Patterns In Neurophysiological Data For Different Cry Distress Levels

Variation In The Oxygenated Hemoglobin Level During The Newborn Arousal State

Behavioral Changes Determined By The Distress In Cry Acoustic Features

Integrative Approach Between Audio Features And Neurophysiological Signals

Discussion

Conclusion

Declarations

References

Additional Declarations

Supplementary Files

Status:

Version 1