Study design and setting
This study involved a secondary analysis of a database of the universe of singleton births between January 2002 and December 2018 at Dr. Sótero del Río Hospital in the South-East public health district of Santiago, Chile. This public hospital is the main hospital of the Southeast Metropolitan Health Service, serving 1.7 million people of medium and low socioeconomic status, equivalent to 1/3 of the population of the Metropolitan Region and almost 10% of the national population.
Data collection and variables
In the hospital's maternity ward, all the information on each pregnant woman admitted for delivery is entered in a standardized manner. This database contains information on the pregnant woman's history (i.e., sociodemographic, obstetric, morbid), as well as all data upon admission (i.e., physical examination, exams, symptoms), delivery information (i.e., type of delivery, gestational, age, complications) and conditions upon discharge.
Initially, 113,068 birth records with pregnancy and delivery information were extracted. The type of delivery was categorized into four groups: spontaneous vaginal delivery, forceps, elective CS, and EmCS. There were 1,363 that did not have recorded the type of delivery. We also excluded twin deliveries (n = 1,437).
Initially, 65 variables were considered potential predictors. Then, those variables with more than 40% missing data in the database were eliminated; thus, 59 potential predictor variables were analyzed. Subsequently, deliveries with transverse and breech presentation (n = 1,950) and deliveries with two or more cesarean sections (n = 889) were eliminated because they are indications for elective CS in Chile. Then, observations with missing weight, height, age, blood pressure, or gestational age data were eliminated (n = 11,523). Then, 400 unlikely data points were eliminated: for example, maternal age less than 13 years and greater than 51 years, maternal height less than 130 cm and greater than 200 cm, maternal weight less than 30 kg and greater than 180 kg, and body mass index [BMI] less than 14 kg/m2 and greater than 60 kg/m2. Last, 181 deliveries at a gestational age of 22 weeks or less were eliminated (14). Therefore, the final database used for the analysis included 83,936 birth records.
In total, 59 potential predictors of EmCS were studied (51 categorical and 8 numerical). Of these, 28 variables were related to the pregnancy period, for example, sociodemographic data (age and education), anthropometric data (pregestational weight, height, and BMI), lifestyle (smoking status and alcohol or non-legal drug consumption), and comorbidities (e.g., preeclampsia and gestational diabetes mellitus [GDM]). The other 31 variables were recorded at delivery, such as the vital signs at admission (systolic and diastolic blood pressure and axillary temperature), clinical evaluation at admission (onset of labor, contractions, uterine tone, the status of membranes), and variables related to the time in which the delivery occurred (day of the week [weekend vs. weekdays] and shift [day or night]). Details can be found in Supplementary Materials 1 and 2.
Data analysis
For the study of predictors of EmCS, the data for EmCS were compared with those for spontaneous deliveries and instrumental (forceps) deliveries. A descriptive analysis was performed with each of the 59 potential predictors, calculating the mean and standard deviation (SD) for the 8 numerical variables and proportions for the 51 categorical variables. Subsequently, 2 models were constructed to determine the probability of having an EmCS as a function of the associated risk factors. Each of these models considered different potential predictor variables. The first of these models (PRED1) considered the 28 predelivery-related variables, and the second model included all 59 variables (PRED2) (see Supplementary Materials 1 and 2).
In each model, 5 machine-learning (ML) algorithms were applied: Logistic regression, Random Forest, AdaBoost, XGBoost, and Optimal classification tree (14). For all models, the dataset was divided into 2 new sets, the first with 70% (n = 58,755) of the data, called the training set, and the second with the other 30% (n = 25,181) of data, called the test set.
For the numerical predictors, such as age, height, BMI, and gestational age, we consider all feasible values without using pre-established categories. For example, for BMI, we consider all integer values between 14 to 60 kg/m2 instead of using the usual four categories (underweight, normal overweight, and obese). Thus, models could choose the best split of each variable to maximize the prediction power for EmCS.
Because the dataset presented an imbalance about the variable to predict (delivery route) (i.e., 14,773 [17.6%] EmCS and 69,163 [82.4%] vaginal and forceps deliveries), the synthetic minority oversampling technique was used (SMOTE). SMOTE is a method that aims to balance the distribution of cases, randomly increasing the sample of minority cases (EmCS in this case) when replicating them, and using existing data as inputs to generate these new samples (15).
The statistical metrics sensitivity, specificity, and accuracy were calculated to evaluate the models. The models with higher sensitivity were preferred because they allowed determining the algorithms with the best results when predicting outcomes with a low proportion, as in this case. All statistical analyses were performed in Python, version 3.
Ethics approval
The present investigation followed the ethical standards of the Declaration of Helsinki. We used anonymized secondary data for our analysis and did not present information that would allow the identification of the study subjects. Therefore, the consent of the participants was not required. The Ethics Committee of the South-East Metropolitan Health Area approved the use of this database.