Data source
This study selected qualified ART women from the National Vital Statistics System (NVSS) Participant Use File. The NVSS natality data is a retrospective dataset from the Centers for Diseases Control and Prevention’s National Center for Health Statistics (NCHS) that captures all births data from all 50 United States (U.S.) states and the District of Columbia.
Population Selection
The ART women who had live-birth deliveries and available maternal demographic, as well as clinical characteristics from 2015-2019, were included in this study. The excluded criteria were the following: 1) pregnant women under 18 years old; 2) the infant died after birth; 3) multiparous women; 4) having missing data on prepregnancy BMI, history of PTB, parity history, cesarean history, birth interval, prepregnancy diabetes, prepregnancy hypertension, history of smoking before and during pregnancy, gestational diabetes, gestation hypertension, eclampsia, and fertility-enhancing drugs.
Outcomes
The major outcome was gestational age which was obtained on the birth certificate and determined from ultrasonography. We included only those gestational ages in the range of 17-47 weeks. Our outcomes were further divided into term births (TB: ≥ 37 weeks gestation), and preterm births (PTB: < 37 weeks gestation).
Development of the Prediction Model
We collected and classified the following demographic and medical variables: age (18-24, 25-34, 35-44, ≥ 45), race (white, black, American Indian or Alaskan Native (AIAN), Asian, Pacific islander, more than one race), pre-pregnancy BMI (underweight < 18.5, normal 18.5 - 24.9, overweight 25.0 - 29.9, obesity I 30.0 - 34.9, obesity II 35.0 - 39.9, obesity III ≥ 40.0), history of PTB, parity history, cesarean history, birth interval (no previous pregnancy, 0-3 months, 4-17 months, 18-35 months, 36-59 months, ≥ 60 months), prepregnancy diabetes, prepregnancy hypertension, history of smoking before and during pregnancy, gestational diabetes, gestational hypertension, eclampsia, and fertility-enhancing drugs.
The t-test and c2 test was used to compare continuous and categorical variables for TB and PTB women, respectively. Univariable analysis was performed to identify variables associated with PTB. A predictor with P < 0.05 in univariate analysis was included as a candidate variable for multivariable analysis. Then, we used Cox regression for multivariable analysis to develop a model. All statistical analyses were performed using SPSS version 21.0 and R Package Regression Modeling Strategies.
Evaluation of the predictive model
It was necessary to assess the predictive accuracy of the model before developing a nomogram. The common methods, used to verify the predictive power of the model, included c-index for discrimination, Brier score for overall performance, and calibration slope for calibration.
The c-index a was used to evaluate the discrimination which was the ability of the predictive model to distinguish between people who have experienced an event and those who have not. The c-index value is 1, which indicates that the model can be accurately discriminate, while the value equal to 0.5 indicates the random chance of correctly identifying the event. Calibration slope is another index to measure the performance of the prediction model, which checks the consistency between the predicted results and the actual results. A 45° calibration curve represents an ideal prognosis prediction. And the calibration slope value is closer to 1, the performance is better. The Brier score evaluates overall performance and evaluates the difference between observed and predicted values. The closer the value is to 0, the better the predictive power.
The bootstraps using 1000 repeats were used for internal validation of our model and to obtain a deviation correction prediction accuracy measure for the final model. [19] Finally, we used receiver operator characteristic curve (ROC), and calibration curve to evaluate the utility of our nomogram in the validation set.
Creation of the Nomogram
A nomogram was developed as the visualized graphical representation of our final model. [20] There is a guideline on the top of the nomogram that shows a score from 0 to 100 for each predictor. The predictor variables are shown below with a scale showing their effect size, visually showing the relative weight of each variable, and allowing points to be assigned to each significant clinical feature. The sum of the points for each predictor and the corresponding result for predicting the probability of premature birth can be read from the bottom 2 lines.