Predictive Model of Sleep Disorders in Pregnant Women Using Machine Learning and SHAP Analysis

doi:10.21203/rs.3.rs-5255925/v1

Background

Sleep disorders in pregnant women are common and can adversely affect maternal and infant health. We aimed to develop a reliable machine learning (ML) model for early prediction of sleep disorders during pregnancy to inform interventions.

Methods

We analyzed data from 1,681 pregnant women in western China. Logistic regression and LASSO regression were used to identify key predictors of sleep disorders. Eight ML algorithms were compared, with LightGBM selected for its superior predictive performance. SHAP analysis was employed to interpret the model and assess the impact of risk factors.

Results

Seven significant predictors were identified: age, morning sickness, pregnancy intention, pre-pregnancy health, underlying diseases, anxiety, and depression. LightGBM demonstrated the best performance with an AUC of 0.687, accuracy of 0.670, and specificity of 0.764. The SHAP values revealed that these factors are associated with a positive influence on the model's risk score predictions.

Conclusion

Our LightGBM model, with its high accuracy and interpretability, can effectively predict sleep disorders in pregnant women, potentially aiding in the development of targeted interventions to improve maternal and infant health.

maternal sleep disorders

prediction

LASSO

machine learning

SHAP

Women, especially first-time mothers, undergo significant changes in both their physical and psychological states as pregnancy advances. The sleep quality of pregnant women is further affected by physiologic changes in neuroendocrine hormones, pregnancy-related physical discomforts, and the growth of the fetus^[1]. As a result, sleep disturbances are prevalent throughout pregnancy, particularly among women who suffer from substantial sleep disruptions and insufficient rest^[2]. The prevalence of sleep disorders was estimated using the Pittsburgh Sleep Quality Index (PSQI) scale, with a 5 or higher score indicating potential sleep disorders. The study found that 29–76% of pregnant women may be affected ^{[3, 4]}. The prevalence of poor sleep quality among pregnant women varies in different countries, for instance, it was reported to be 34.14% in China^[5], 58.7% in the United States^[6], 43.1% among Singaporean women^[7], and 24.4% in Taiwan^[8].

Numerous studies indicate that pregnant women, as compared to their non-pregnant counterparts, exhibit lower sleep efficiency, experience more frequent and prolonged episodes of night-waking, and spend a greater proportion of their sleep in light stages while receiving less deep sleep and rapid eye movement (REM) sleep^{[2, 4]}. Poor sleep quality during pregnancy is linked to adverse health and obstetric outcomes. It has been shown to contribute to conditions such as obesity, diabetes, heart disease, hypertension, mood disorders, and a weakened immune system, potentially increasing the risk of morbidity and mortality^[8–10]. Disruptions in sleep and circadian rhythms are common during pregnancy, and poor sleep quality is correlated with a range of negative outcomes, including an increased likelihood of cesarean section, prolonged labor, preterm birth, low birth weight, and fetal growth restriction.^[11–14].

Various socio-demographic, physical, and psychological factors influence sleep quality. Several independent factors have been identified that can affect the likelihood of experiencing poor sleep quality, such as age^[4], parity^[6], and pre-pregnancy body mass index (BMI)^[15]. Additionally, psychological conditions, including anxiety and depression, are known to be correlated with sleep disturbances^[16]. However, these factors alone do not provide a precise method for calculating the risk probability of poor sleep quality, making early prediction very challenging. Therefore, identifying factors that influence sleep quality and developing an early predictive model is a strong guarantee for effective health interventions to improve sleep quality in various populations. This is particularly crucial for enhancing the health outcomes of mothers and their children.

Currently, the majority of studies on sleep issues during pregnancy rely on traditional statistical analysis methods^{[17, 18, 5]}. In contrast, there is a relative scarcity of research using machine learning (ML) approaches to address this specific concern. In recent years, the medical field has increasingly employed machine learning for various applications^[19]. ML techniques have demonstrated their efficacy in improving the accuracy of predictive models^[20].

This study aimed to identify potential risk factors associated with sleep problems in pregnant women, focusing on discovering independent variables that significantly impact sleep quality during gestation and establish predictive relationships. We constructed eight ML models to forecast sleep issues and subsequently compare their performance. Additionally, we employed SHapley Additive Explanations (SHAP)^[21] for ML visualization to elucidate the contributions of these risk factors to the development of maternal sleep disorders during pregnancy. Our research will help identify women at high risk of sleep problems early on, enabling timely intervention for optimal prenatal care.

Data Source and Study Population

In this study, 1681 pregnant women were recruited during their initial prenatal visit at the General Hospital of Ningxia Medical University, spanning from February 2022 to April 2023. Upon visiting the hospital's obstetric clinic for antenatal checkups, all participants were administered a comprehensive set of questionnaires. The standardized questionnaires were used to gather data on sociodemographic factors, behavioral habits, pregnancy conditions, and clinical information. The survey encompassed general demographic characteristics, the psychological state of the participants during pregnancy, and their sleep quality. All participants provided written informed consent, and the study protocol was approved by the Ethics Committee of Ningxia Medical University.

The inclusion criteria for the study were as follows: (1) age 18 years or older; (2) at least 8 weeks pregnant at the time of enrollment; (3) complete participant sleep information available; (4) no history of psychiatric illness and the ability to understand and complete questionnaires. The exclusion criteria included: (1) diagnosed with a sleep disorder; (2) women with acute or critical illness and severe pregnancy complications; (3) severe infections during pregnancy, fetal malformations; (4) missing information on sleep or other critical variables. Initially, a total of 1846 women were surveyed. Ultimately, the study encompassed 1681 women who completed questionnaires assessing sleep quality, stress, and depressive status during pregnancy.

Sleep Quality Assessment

Sleep quality was evaluated using the Pittsburgh Sleep Quality Index (PSQI), a validated instrument for measuring sleep quality in Chinese pregnant women^{[22, 23]}. The PSQI comprises a 19-item self-rated questionnaire that gauges sleep quality over the preceding month. Scores on the PSQI range from 0 to 21, with higher scores signifying poorer sleep quality. Sleep disorders were defined as a sum score of ≥ 5 by previous studies^[22].

Feature Data Collection for Pregnancy Sleep Disorders

The levels of anxiety and depression were evaluated using the Pregnancy-Related Anxiety Questionnaire (PRAQ) and the Edinburgh Postnatal Depression Scale (EPDS). When the PRAQ overall score was ≥ 24, women were classified as experiencing pregnancy-related anxiety^[24]. The cutoff point of EPDS standardized score ≥ 9 reflects depressive symptomatology^[25].

Data on sociodemographics, pregnancy conditions, and clinical information were collected from pregnant women using standardized questionnaires. The questionnaires included participants’ age, education level, occupation, residence, pre-pregnancy BMI, gestational weeks for current pregnancy, personality traits, participant's only-child status, number of embryos, number of pregnancies, number of abortions, history of labor induction, method of conception, severity of morning sickness, pregnancy intention, mother's preference for fetus's sex, family's preference for fetus's sex, physical health before pregnancy, and underlying diseases.

To ensure data integrity, participants with more than 30% missing values in their data were excluded from the study. For indicators with missing values below 30%, the missing data were imputed using the median value of each respective category.

Feature Selection for Pregnancy Sleep Disorders

Following univariate regression analysis to identify significant predictors, these predictors were subsequently incorporated into a multivariate regression model. To further refine the feature selection, the Least Absolute Shrinkage and Selection Operator (LASSO) regression was applied^[26]. By integrating logistic regression with LASSO, we synergistically identified the most relevant features. This combined approach effectively harnessed the strengths of both methods to select a subset of features from the original dataset that are most predictively associated with pregnancy sleep disorders, which were then utilized for model training.

The Development of ML Models

A 5-fold cross-validation method was employed to partition the dataset, with four folds serving as the training sets for model development and one folds reserved as the test set to assess the model's predictive performance. Eight ML models were utilized to identify sleep disorders in pregnant women, including Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGBoost), Support Vector Machine (SVM), Naive Bayes (NB), Light Gradient Boosting Machine (LightGBM), Decision Tree (DT), and Artificial Neural Networks (ANNs). To enhance the performance of each ML algorithm, hyperparameters were fine-tuned using a grid search approach.

The Evaluation of ML Models

The model evaluation metrics encompassed the area under the receiver operating characteristic curve (AUC), accuracy, precision, sensitivity, specificity, and the F1 score. Following an assessment of the performance of each ML model, the algorithm that demonstrated the best overall performance in identifying sleep disorders during pregnancy was selected. The model was then interpreted using SHAP (SHapley Additive Explanations)^[21], where the SHAP values were calculated to ascertain the significance of the features.

Statistical Analysis

Continuous variables were characterized using the mean ± standard deviation (SD), while categorical variables were expressed as frequencies and percentages (n%). Comparisons between variables were made using the Student’s t-test for continuous data and the Rao-Scott Pearson χ2 test for categorical data. Univariable and multivariable analyses were performed using logistic regression. Significant predictors identified in the univariable analysis were subsequently included in the multivariable logistic regression model. To mitigate the risk of overfitting, Lasso regression was applied to reduce the influence of multicollinearity and to enhance the model's robustness. The performance of the ML models was evaluated using metrics such as the area under the receiver operating characteristic curve (AUC), accuracy, precision, sensitivity, specificity, and the F1 score. All statistical analyses were conducted using R software, version 4.4.1, with packages including ggplot2, lattice, caret, tidyverse, skimr, pROC, randomForest, and xgboost. Statistical significance was set at a two-tailed p-value of 0.05 or less.

The process of participant selection and the study design are depicted in Fig. 1. The screening data and demographic characteristics of the 1681 pregnant women were stratified by the presence or absence of sleep disorders and are detailed in Table 1. A total of 618 individuals (36.8%) were categorized as having sleep disorders and were included in the analysis, while 1063 (63.2%) served as controls. Among the pregnant women, sleep disorders were significantly associated with factors such as age, gestational weeks for current pregnancy, number of abortions, severity of morning sickness, pregnancy intention, physical health before pregnancy, underlying diseases, anxiety, depression, and the combined effect of anxiety and depression (P < 0.05).

Table 1

Basic characteristics of pregnant women by sleep disorders.
Variables		N	No sleep disturbance (PSQI ≤ 5)	Sleep disorder (PSQI>5)	X²	p
Total		1681	1063(63.2%)	618(36.8%)
Age(years)					7.489	0.024
	<30	670	450(42.3%)	220(35.6%)
	30–34	670	404(38.0%)	266(43.0%)
	≥ 35	341	209(19.7%)	132(21.4%)
Education level					3.083	0.379
	Middle School and below	214	135(12.7%)	79(12.8%)
	High School/Junior College	256	165(15.5%)	91(14.7%)
	College / Bachelor	1118	712(67.0%)	406(65.7%)
	Postgraduate and above	93	51(4.8%)	42(6.8%)
Occupation					2.168	0.538
	Unemployed	424	272(25.6%)	152(46.6%)
	Farmers / Workers	53	37(3.5%)	16(2.6%)
	Enterprise	619	380(35.7%)	239(38.7%)
	Others	585	374(35.2%)	211(34.1%)
Residence					3.719	0.054
	Urban	1432	892(83.9%)	540(87.4%)
	Rural	249	171(16.1%)	78(12.6%)
Pre-pregnancy BMI					1.889	0.389
	< 18.5	213	137(12.9%)	76(12.3%)
	18.5–23.9	1059	657(61.8%)	402(65.0%)
	>24	409	269(25.3%)	140(2.7%)
Gestational weeks for current pregnancy					41.055	0.000
	<14	93	58(5.5%)	35(5.7%)
	14–27	1029	710(66.8%)	319(51.6%)
	≥ 28	559	295(27.8%)	264(42.7%)
Personality traits					2.194	0.334
	Introverted	190	113(10.6%)	77(12.5%)
	Moderate	1151	726(68.3%)	425(68.8%)
	Extrovert	340	224(21.1%)	116(18.8%)
Only child status					1.429	0.232
	Yes	223	133(12.5%)	90(14.6%)
	No	1458	930(87.5%)	528(85.4%)
Number of embryos					0.078	0.780
	1	1653	1046(98.4%)	607(98.2%)
	≥ 2	28	17(1.6%)	11(1.8%)
Number of pregnancies					0.127	0.722
	1	654	417(39.2%)	237(38.3%)
	≥ 2	1027	646(60.8%)	381(61.7%)
Number of abortions					4.478	0.034
	0	1080	703(66.1%)	377(61.0%)
	≥ 1	601	360(33.9%)	241(39.0%)
History of labor induction					0.175	0.676
	0	1527	968(91.1%)	559(90.5%)
	≥ 1	154	95(8.9%)	59(9.5%)
Method of conception					1.142	0.285
	Spontaneous conception	1565	995(93.6%)	570(92.2%)
	Artificial pregnancy assistance	116	68(6.4%)	48(7.8%)
Severity of morning sickness					43.831	0.000
	Mild	663	460(43.3%)	203(32.8%)
	Moderate	661	429(40.4%)	232(37.5%)
	Severe	357	174(16.4%)	183(29.6%)
Pregnancy intention					8.383	0.015
	Planned Pregnancy	640	391(36.8%)	249(40.3%)
	Let nature take its course	776	518(48.7%)	258(41.7%)
	Accidental Pregnancy	265	154(14.5%)	111(18.0%)
Mother's preference for fetus's sex					4.911	0.086
	Male	107	57(5.4%)	50(8.1%)
	Female	202	128(12.0%)	74(12.0%)
	Both	1372	878(82.6%)	494(79.9%)
Family's preference for the fetus's sex					3.292	0.193
	Male	117	65(6.1%)	52(8.4%)
	Female	157	102(9.6%)	55(8.9%)
	Both	1407	896(84.3%)	511(82.7%)
Physical health before pregnancy					40.027	0.000
	Good	1106	758(71.3%)	348(56.3%)
	Fair	554	296(27.8%)	258(41.7%)
	Poor	21	9(0.8%)	12(1.9%)
Underlying disease					16.357	0.000
	No	1540	996(93.7%)	544(88.0%)
	Yes	141	67(6.3%)	74(12.0%)
Anxiety					60.304	0.000
	PRAQ<23	1144	795(74.8%)	349(56.5%)
	PRAQ ≥ 24	537	268(25.2%)	269(43.5%)
Depression					72.675	0.000
	EPDS<9	1144	802(75.4%)	342(55.3%)
	EPDS ≥ 9	537	261(24.6%)	276(44.7%)
Anxiety & Depression					71.270	0.000
	No	1410	953(89.7%)	457(73.9%)
	Yes	271	110(10.3%)	161(26.1%)

Feature Screening

In the univariate analysis, 11 variables demonstrated statistical significance. These potential risk factors were then included in the multivariable logistic regression model. After conducting the multivariable logistic regression analysis, seven predictors were identified as being significantly associated with the risk of sleep disorders during pregnancy: age, severity of morning sickness, pregnancy intention, physical health before pregnancy, underlying disease, anxiety, and depression. (Table 2)

Table 2

Univariate and multivariate logistic regression analyses for sleep disorders.
Variables		Univariate analyses			Multivariate analyses
Variables		OR	95% CI	P	OR	95% CI	P
Age (years)
	<30	1.00 (Reference)			1.00 (Reference)
	30–34	1.35	1.08 ~ 1.68	0.009	1.39	1.09 ~ 1.77	0.007
	≥ 35	1.29	0.99 ~ 1.69	0.064	1.34	1.00 ~ 1.81	0.051
Education level
	Middle School and below	1.00 (Reference)
	High School/Junior College	0.94	0.65 ~ 1.37	0.758
	College / Bachelor	0.97	0.72 ~ 1.32	0.867
	Postgraduate and above	1.41	0.86 ~ 2.31	0.175
Occupation
	Unemployed	1.00 (Reference)
	Farmers / Workers	0.77	0.42 ~ 1.44	0.417
	Enterprise	1.13	0.87 ~ 1.45	0.366
	Others	1.01	0.78 ~ 1.31	0.943
Residence
	Urban	1.00 (Reference)
	Rural	0.75	0.56 ~ 1.01	0.054
Pre-pregnancy BMI
	< 18.5	1.00 (Reference)
	18.5–23.9	1.10	0.81 ~ 1.50	0.531
	>24	0.94	0.66 ~ 1.33	0.718
Gestational weeks for current pregnancy
	<14	1.00 (Reference)
	14–27	0.74	0.48 ~ 1.16	0.189
	≥ 28	1.48	0.94 ~ 2.33	0.087
Personality traits
	Introverted	1.00 (Reference)
	Moderate	0.86	0.63 ~ 1.18	0.342
	Extrovert	0.76	0.53 ~ 1.10	0.142
Only child status
	Yes	1.00 (Reference)
	No	0.84	0.63 ~ 1.12	0.232
Number of embryos
	1	1.00 (Reference)
	≥ 2	1.12	0.52 ~ 2.40	0.780
Number of pregnancies
	1	1.00 (Reference)
	≥ 2	1.04	0.85 ~ 1.27	0.722
Number of abortions
	0	1.00 (Reference)			1.00 (Reference)
	≥ 1	1.25	1.02 ~ 1.53	0.034	1.07	0.86 ~ 1.34	0.546
History of labor induction
	0	1.00 (Reference)
	≥ 1	1.08	0.76 ~ 1.51	0.676
Method of conception
	Spontaneous conception	1.00 (Reference)
	Artificial pregnancy assistance	1.23	0.84 ~ 1.81	0.286
Severity of morning sickness
	Mild	1.00 (Reference)
	Moderate	1.23	0.97 ~ 1.54	0.083	1.15	0.90 ~ 1.46	0.273
	Severe	2.38	1.83 ~ 3.11	< .001	2.08	1.57 ~ 2.75	< .001
Pregnancy intention
	Fully prepared for pregnancy	1.00 (Reference)
	Going with the flow	0.78	0.63 ~ 0.97	0.027	0.77	0.61 ~ 0.97	0.029
	Unplanned pregnancy	1.13	0.85 ~ 1.51	0.405	0.86	0.63 ~ 1.18	0.356
Mother's preference for the fetus's sex
	Male	1.00 (Reference)
	Female	0.66	0.41 ~ 1.06	0.086	0.73	0.44 ~ 1.21	0.220
	Both	0.64	0.43 ~ 0.95	0.028	0.78	0.51 ~ 1.20	0.263
Family's preference for the fetus's sex
	Male	1.00 (Reference)
	Female	0.67	0.41 ~ 1.10	0.115
	Both	0.71	0.49 ~ 1.04	0.081
Physical health before pregnancy
	Good	1.00 (Reference)
	Fair	1.90	1.54 ~ 2.34	< .001	1.55	1.23 ~ 1.95	< .001
	Poor	2.90	1.21 ~ 6.96	0.017	1.48	0.59 ~ 3.70	0.398
Underlying disease
	No	1.00 (Reference)
	Yes	2.02	1.43 ~ 2.86	< .001	1.74	1.20 ~ 2.53	0.004
Anxiety
	PRAQ<23	1.00 (Reference)
	PRAQ ≥ 24	2.29	1.85 ~ 2.82	< .001	1.76	1.31 ~ 2.37	< .001
Depression
	EPDS<9	1.00 (Reference)
	EPDS ≥ 9	2.48	2.01 ~ 3.06	< .001	2.02	1.50 ~ 2.70	< .001
Anxiety & Depression
	No	1.00 (Reference)
	Yes	3.05	2.34 ~ 3.99	< .001	1.05	0.66 ~ 1.66	0.838

According to the LASSO regression, the optimal feature number was 10 (Fig. 2), including age, education level, residence, gestational weeks for current pregnancy, the severity of morning sickness, pregnancy intention, physical health before pregnancy, underlying disease, anxiety, and depression.

Ultimately, based on the outcomes of the logistic regression and LASSO analyses, seven variables were selected for building the predictive models due to their significant associations with the risk of sleep disorders during pregnancy. These variables include age, severity of morning sickness, pregnancy intention, physical health before pregnancy, underlying disease, anxiety, and depression.

Comparison of Model Performance of ML

The training set (n = 1345) comprised 494 positive samples and 851 negative samples, while the test set (n = 336) included 124 positive samples and 212 negative samples. Utilizing the seven selected variables, models including Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGBoost), Support Vector Machine (SVM), Naive Bayes (NB), Light Gradient Boosting Machine (LightGBM), Decision Tree (DT), and Artificial Neural Networks (ANNs) were developed on the training set. The models' performance was evaluated using ROC-AUC, accuracy, precision, sensitivity, specificity, recall, and F1-score (Fig. 3). The Naive Bayes model demonstrated the highest AUC performance of 0.690 based on the test set data, closely followed by LightGBM with an AUC of 0.687 (Fig. 4). Upon synthesizing all performance indicators, the LightGBM model demonstrated superior discrimination ability, achieving an accuracy of 67.0%, precision of 55.8%, sensitivity of 50.8%, specificity of 76.4%, recall of 50.8%, F1 score of 53.2% (Fig. 3, 5). Consequently, the LightGBM model was chosen for further prediction and analysis.

SHAP Interpretation and Feature Importance

The SHAP method was utilized to interpret the output of the final model by calculating the contribution of each variable to the prediction. It offers two types of insights: global and local. The global explanation depicted in SHAP summary plots describes the overall functionality of the model at the feature level. In contrast, local explanations, visualized through waterfall plots, detail the impact of each feature on individual predictions, providing clarity on the model's decision process for specific cases.

The SHAP feature importance for the LightGBM model was depicted in Fig. 6A, where features are ranked based on their mean absolute SHAP values, reflecting their relative impact on the model's predictions. According to the SHAP values, depression was identified as the most influential feature for detecting sleep disorders by the LightGBM model. Figure 6B further elucidates the positive or negative effects of these factors, with yellow dots signifying higher feature values and purple dots indicating lower values. Through analysis of the SHAP summary plot, it was observed that elevated feature values corresponding to depression, anxiety, severe morning sickness, maternal age, pregnancy intention, pre-pregnancy physical health, and the presence of underlying diseases were associated with a positive influence on the model's risk score predictions.

Explanation of the machine learning model at the individual level

The SHAP method analyzes individual characteristics to forecast sleep disorder risks in pregnant women (Fig. 7). The baseline value, E[F(x)] = − 0.6, signifies the average SHAP value across all pregnant women. A waterfall plot, commencing with the model's baseline expected value at the base, uses each row to denote whether each feature positively (red) or negatively (blue) influences the model's output. The value f(x) represents the cumulative SHAP value for each individual. Figure 7A demonstrates an accurate prediction of no sleep disorder for a pregnant woman, attributed to her younger age, moderate morning sickness, and absence of anxiety symptoms. Conversely, Fig. 7B correctly identifies a sleep disorder, due to severe morning sickness, depressive symptoms, and poor health before pregnancy. The LightGBM model effectively distinguishes between the presence or absence of sleep disorders, and accurately pinpoint the risk probability based on individual circumstances.

Pregnancy sleep disorders are common sleep issues in pregnant women, impacting both maternal and fetal health. Studies suggest that 29–76% of pregnant women face varying degrees of sleep problems, with our research indicating a prevalence of 36.8% for sleep disorders in pregnant women. We evaluated eight ML methods for prediction, with the LightGBM model showing the highest performance. Furthermore, the SHAP method is used to explain the LightGBM model and determine the degree of influence of each feature on sleep disorders. Our research results show that depression is the most important risk factor for sleep disorders during pregnancy. In addition, anxiety, severe early pregnancy reactions, poor physical health before pregnancy, age, pregnancy intention, and the presence of underlying diseases also have an adverse effect on sleep during pregnancy.

Our predictive model offers a novel approach to understanding sleep disturbances in pregnant women. LightGBM is a gradient-boosting framework known for its speed and efficiency, and it has shown high performance in processing large-scale datasets^[27]. These attributes have become increasingly popular among researchers for developing risk prediction models^[28]. In previous studies, LightGBM has demonstrated superior performance in predicting the severity of obstructive sleep apnea syndrome and in the detection of sleep apnea^{[29, 30]}.

The LightGBM model, using SHAP values to rank feature importance, identified depression as the most significant predictor of sleep disorders during pregnancy. Elevated depression scores were linked to an increased risk of sleep disturbances. This highlights the need to address mental health in prenatal care to reduce sleep issues in pregnant women. Our findings align with previous research showing depression exacerbates sleep disruption, particularly frequent nighttime awakenings^[31]. Hormonal changes during pregnancy may interact with mood disorders, affecting sleep ^[32]. Anxiety, another key psychological factor, also contributes to sleep problems, suggesting a complex relationship between mental health and sleep during pregnancy. The relationship between depression, anxiety, and sleep is intricate and bidirectional, highlighting the complex impact of mental health on sleep during pregnancy^[33]. Emotional states marked by anxiety or depression can increase arousal and negative thoughts, adversely affecting sleep quality. Poor sleep, in turn, can exacerbate these emotional states, including anxiety, depression, and stress, and may disrupt emotional regulation, leading to negative mood the next day^[34]. While the mechanisms of these interactions are not fully understood, they likely involve neurophysiological processes.

Our findings indicate that the severity of morning sickness, maternal age, and pregnancy intention are associated with a positive influence on the model's risk score predictions. Morning sickness, nocturnal urinary frequency, and physical discomforts like heartburn and fatigue can significantly impair sleep^[5]. These symptoms may degrade sleep through hormonal fluctuations, physical discomfort, and psychological stress. Advanced maternal age is linked to increased sleep disturbances, with a meta-analysis of 11,002 subjects revealing higher Pittsburgh Sleep Quality Index (PSQI) scores and prevalence of sleep issues (P < 0.05) ^[4]. Younger women (under 28 years) tend to have longer sleep durations and later sleep times compared to their older counterparts^[8], possibly due to age-related changes in melatonin secretion affecting sleep consolidation^[35]. Pregnancy intention also plays a significant role in the psychological well-being and lifestyle habits of pregnant women, influencing sleep quality^[36]. These factors underscore the complex interplay of physiological, psychological, and lifestyle elements during pregnancy on sleep. Recognizing these influences is crucial for developing effective interventions to enhance sleep quality in pregnant women.

Our study possesses several strengths. Firstly, the use of 5-fold cross-validation for model development ensures robustness by minimizing bias and variance, effectively utilizing data for both training and testing. Secondly, the model incorporates seven easily accessible, cost-effective, and highly compliant predictive factors, making it suitable for large-scale screening^[16]. Thirdly, our approach integrates traditional logistic regression with machine learning algorithms. Machine learning is particularly proficient in identifying complex patterns in large-scale datasets, however, it may involve high computational intensity and be less transparent in terms of interpretability. Conversely, logistic regression, being more straightforward and interpretable, requires less data but assumes linear relationships and may not fully capture intricate interactions^[37]. Furthermore, we employed SHAP to visualize machine learning model outputs, enhancing interpretability by clarifying the importance and influence of each feature on sleep disorder predictions.

However, our study also has some limitations. First, the reliance on self-reported questionnaires to determine maternal sleep disorders during pregnancy may introduce subjective biases. Second, as a single-center study with only internal validation, the generalizability of our model's results is relatively limited. Therefore, future multicenter studies and external validations are necessary to confirm the model's applicability and reliability across broader populations. Additionally, while the AUC value of our LightGBM model (AUC = 0.687) is not exceptionally high, it remains within an acceptable range. There is potential for future improvement in the performance of such models by exploring new algorithms.

Sleep disorders are common among pregnant women and can negatively affect both maternal and infant health outcomes. Identifying high-risk individuals early is essential for the implementation of preventive strategies. In this study, we developed machine learning (ML) models to predict the risk of developing sleep disorders and to explore associated risk factors. The LightGBM ML model, known for its interpretability, demonstrated superior accuracy, efficiency, and robustness in predicting sleep disorders compared to other models.

AUC

Area Under the Curve

EPDS

Edinburgh Postnatal Depression Scale

ML

Machine Learning

PRAQ

Pregnancy-Related Anxiety Questionnaire

PSQI

Pittsburgh Sleep Quality Index

REM

Rapid Eye Movement

ROC

Receiver Operating Characteristic

SVM

Support Vector Machine

XGBoost

Extreme Gradient Boosting

Ethics approval and consent to participate

The study was approved by the Ethics Committee of Ningxia Medical University (No. 20222084). Informed consent was signed by all participants at the time of inclusion. All methods were carried out according to relevant guidelines and regulations.

Consent for publication
Not applicable.

Competing interests
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Funding

This work was supported by the National Natural Science Foundation of China (grant number 82260647). Ningxia Natural Science Foundation (grant number 2023AAC03208).

Author Contribution

CL and YL contributed equally to the article. The two authors designed the study and drafted the manuscript together. SG originated the study, conceptualization, supervised implementation. LW, CW, ZY, and HW acquired the data. HL and YL interpreted the data and performed statistical analysis. YD and DY searched the literature. All authors have contributed significantly and agree with the manuscript's content.

Acknowledgments

The authors thank all the participants for their time and efforts.

Availability of data and material

The datasets that support the conclusions of this study are all available in the article.

SANAPO L. Maternal sleep disordered breathing and offspring growth outcome: A systematic review and meta-analysis[J/OL]. Sleep Med Rev. 2024. 10.1016/j.smrv.2023.101868.
MEERS JM. Sleep During Pregnancy[J/OL]. Curr Psychiatry Rep. 2022;24(8):353–7. 10.1007/s11920-022-01343-2.
MINDELL JA, COOK R A, NIKOLOVSKI J. Sleep patterns and sleep disturbances across pregnancy[J/OL]. Sleep Med. 2015;16(4):483–8. 10.1016/j.sleep.2014.12.006.
SEDOV I D, CAMERON E E MADIGANS, et al. Sleep quality during pregnancy: A meta-analysis[J/OL]. Sleep Med Rev. 2018;38:168–76. 10.1016/j.smrv.2017.06.005.
DU M, LIU J, HAN N, et al. Maternal sleep quality during early pregnancy, risk factors and its impact on pregnancy outcomes: a prospective cohort study[J/OL]. Sleep Med. 2021;79:11–8. 10.1016/j.sleep.2020.12.040.
CHRISTIAN L M, CARROLL J E, PORTER K, et al. Sleep quality across pregnancy and postpartum: effects of parity and race[J/OL]. Sleep Health. 2019;5(4):327–34. 10.1016/j.sleh.2019.03.005.
GELAYE B, ADDAE G. Poor sleep quality, antepartum depression and suicidal ideation among pregnant women[J/OL]. J Affect Disord. 2017;209:195–200. 10.1016/j.jad.2016.11.020.
TSAI S Y, LEE P L, LIN JW, et al. Cross-sectional and longitudinal associations between sleep and health-related quality of life in pregnant women: A prospective observational study[J/OL]. Int J Nurs Stud. 2016;56:45–53. 10.1016/j.ijnurstu.2016.01.001.
CAI S, GLUCKMAN TANS. Sleep Quality and Nocturnal Sleep Duration in Pregnancy and Risk of Gestational Diabetes Mellitus[J/OL]. Sleep. 2017;40(2). 10.1093/sleep/zsw058/2662319. https://academic.oup.com/sleep/article/doi/. [2024-07-09].
CONLON R P K, WANG B. Demographic, Pregnancy-Related, and Health-Related Factors in Association with Changes in Sleep Among Pregnant Women with Overweight or Obesity[J/OL]. Int J Behav Med. 2021;28(2):200–6. 10.1007/s12529-020-09887-4.
SHARKEY K M, PEARLSTEIN T B, CARSKADON MA. Circadian phase shifts and mood across the perinatal period in women with a history of major depressive disorder: A preliminary communication[J/OL]. J Affect Disord. 2013;150(3):1103–8. 10.1016/j.jad.2013.04.046.
STACEY T. Association between maternal sleep practices and risk of late stillbirth: a case- control study[J].
LIU H, LI H, LI C et al. Associations between Maternal Sleep Quality Throughout Pregnancy and Newborn Birth Weight[J/OL]. 10.1080/15402002.2019.1702551
NAGHI I, KEYPOUR F, AHARI S B, et al. Sleep disturbance in late pregnancy and type and duration of labour[J/OL]. J Obstet Gynaecol. 2011;31(6):489–91. 10.3109/01443615.2011.579196.
GUINHOUYA B C, BISSON M. Body Weight Status and Sleep Disturbances During Pregnancy: Does Adherence to Gestational Weight Gain Guidelines Matter?[J/OL]. J Women’s Health. 2019;28(4):535–43. 10.1089/jwh.2017.6892.
QIU C, GELAYE B, ZHONG Q Y, et al. Construct validity and factor structure of the Pittsburgh Sleep Quality Index among pregnant women in a Pacific-Northwest cohort[J/OL]. Sleep Breath. 2016;20(1):293–301. 10.1007/s11325-016-1313-4.
LORET DE MOLA C, CARPENA M X, DIAS I M, et al. Sleep and its association with depressive and anxiety symptoms during the last weeks of pregnancy: A population-based study[J/OL]. Sleep Health. 2023;9(4):482–8. 10.1016/j.sleh.2023.05.003.
KING C E, WILKERSON A, NEWMAN R, et al. Sleep, Anxiety, and Vitamin D Status and Risk for Peripartum Depression[J/OL]. Reproductive Sci. 2022;29(6):1851–8. 10.1007/s43032-022-00922-1.
MO Y K, HAHN M W, SMITH ML. Applications of machine learning in phylogenetics[J/OL]. Mol Phylogenet Evol. 2024;196:108066. 10.1016/j.ympev.2024.108066.
GREENER J G, KANDATHIL S M MOFFATL, et al. A guide to machine learning for biologists[J/OL]. Nat Rev Mol Cell Biol. 2022;23(1):40–55. 10.1038/s41580-021-00407-0.
CHEN H, LUNDBERG S M, LEE S I. Explaining a series of models by propagating Shapley values[J/OL]. Nat Commun. 2022;13(1):4512. 10.1038/s41467-022-31384-3.
BUYSSE D J, REYNOLDS C F, MONK T H, et al. The Pittsburgh sleep quality index: A new instrument for psychiatric practice and research[J/OL]. Psychiatry Res. 1989;28(2):193–213. 10.1016/0165-1781(89)90047-4.
ZHANG H, LI Y, ZHAO X, et al. The association between PSQI score and hypertension in a Chinese rural population: the Henan Rural Cohort Study[J/OL]. Sleep Med. 2019;58:27–34. 10.1016/j.sleep.2019.03.001.
JIXING ZHOU, SHANSHAN ZHANG, YUZHU TENG, et al. Maternal pregnancy-related anxiety and children’s physical growth: the Ma’anshan birth cohort study[J/OL]. BMC Pregnancy Childbirth. 2023;23(1):384. 10.1186/s12884-023-05711-5.
ZHAO Y, KANE I, WANG J, et al. Combined use of the postpartum depression screening scale (PDSS) and Edinburgh postnatal depression scale (EPDS) to identify antenatal depression among Chinese pregnant women with obstetric complications[J/OL]. Psychiatry Res. 2015;226(1):113–9. 10.1016/j.psychres.2014.12.016.
XIE Q Y, WANG M W, HU Z Y, et al. Screening the Influence of Biomarkers for Metabolic Syndrome in Occupational Population Based on the Lasso Algorithm[J/OL]. Front Public Health. 2021;9:743731. 10.3389/fpubh.2021.743731.
KE G, MENG Q, FINLEY T et al. LightGBM: A Highly Efficient Gradient Boosting Decision Tree[J].
LIAO H, ZHANG X, ZHAO C, et al. LightGBM: an efficient and accurate method for predicting pregnancy diseases[J/OL]. J Obstet Gynaecol. 2022;42(4):620–9. 10.1080/01443615.2021.1945006.
HAN H, OH J. Application of various machine learning techniques to predict obstructive sleep apnea syndrome severity[J/OL]. Sci Rep. 2023;13(1):6379. 10.1038/s41598-023-33170-7.
XIONG X, WANG A, HE J, et al. Application of LightGBM hybrid model based on TPE algorithm optimization in sleep apnea detection[J/OL]. Front NeuroSci. 2024;18:1324933. 10.3389/fnins.2024.1324933.
RUIZ-ROBLEDILLO N, CANÁRIO C, DIAS C C, et al. Sleep during the third trimester of pregnancy: the role of depression and anxiety[J/OL]. Psychol Health Med. 2015;20(8):927–32. 10.1080/13548506.2015.1017508.
QIU C, GELAYE B, FIDA N, et al. Short sleep duration, complaints of vital exhaustion and perceived stress are prevalent among pregnant women with mood and anxiety disorders[J/OL]. BMC Pregnancy Childbirth. 2012;12(1):104. 10.1186/1471-2393-12-104.
SHI C, WANG S, TANG Q, et al. Cross-lagged relationship between anxiety, depression, and sleep disturbance among college students during and after collective isolation[J/OL]. Front Public Health. 2022;10:1038862. 10.3389/fpubh.2022.1038862.
YOO J, SLAVISH D. Daily reactivity to stress and sleep disturbances: unique risk factors for insomnia[J/OL]. Sleep. 2023;46(2):zsac256. 10.1093/sleep/zsac256.
RUSSEL J. REITER, DUN XIAN TAN, AHMET KORKMAZ, et al. Melatonin and stable circadian rhythms optimize maternal, placental and fetal physiology[J/OL]. Hum Reprod Update. 2014;20(2):293–307. 10.1093/humupd/dmt054.
LAGADEC N, STEINECKER M, KAPASSI A, et al. Factors influencing the quality of life of pregnant women: a systematic review[J/OL]. BMC Pregnancy Childbirth. 2018;18(1):455. 10.1186/s12884-018-2087-4.
WANG L, WEN L, SHEN J, et al. The association between PM2.5 components and blood pressure changes in late pregnancy: A combined analysis of traditional and machine learning models[J/OL]. Environ Res. 2024;252:118827. 10.1016/j.envres.2024.118827.

No competing interests reported.

Predictive Model of Sleep Disorders in Pregnant Women Using Machine Learning and SHAP Analysis

Status:

Version 1

Abstract

Background

Methods

Results

Conclusion

Figures

Background

Methods

Data Source and Study Population

Sleep Quality Assessment

Feature Data Collection for Pregnancy Sleep Disorders

Feature Selection for Pregnancy Sleep Disorders

The Development of ML Models

The Evaluation of ML Models

Statistical Analysis

Results

Feature Screening

Comparison of Model Performance of ML

SHAP Interpretation and Feature Importance

Explanation of the machine learning model at the individual level

Discussion

Conclusion

Abbreviations

Declarations

Ethics approval and consent to participate

Consent for publication
Not applicable.

Competing interests
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Funding

Author Contribution

Acknowledgments

Availability of data and material

References

Additional Declarations

Status:

Version 1

Predictive Model of Sleep Disorders in Pregnant Women Using Machine Learning and SHAP Analysis

Status:

Version 1

Abstract

Background

Methods

Results

Conclusion

Figures

Background

Methods

Data Source and Study Population

Sleep Quality Assessment

Feature Data Collection for Pregnancy Sleep Disorders

Feature Selection for Pregnancy Sleep Disorders

The Development of ML Models

The Evaluation of ML Models

Statistical Analysis

Results

Feature Screening

Comparison of Model Performance of ML

SHAP Interpretation and Feature Importance

Explanation of the machine learning model at the individual level

Discussion

Conclusion

Abbreviations

Declarations

Ethics approval and consent to participate

Consent for publication Not applicable.

Competing interests The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Funding

Author Contribution

Acknowledgments

Availability of data and material

References

Additional Declarations

Status:

Version 1

Consent for publication
Not applicable.

Competing interests
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.