Aim
We endeavour to develop multivariable logistic regression prediction models to estimate risk of late-pregnancy stillbirth from 35 weeks gestation using a national dataset of all births in Australia (2005-2015) to ultimately inform decision-making around timing of birth for women who reside in Australia.
Study design
This is a protocol for a cross-sectional study using the total population of singleton term gestation births in Australia (2005-2015) derived from the National Perinatal Data Collection (NPDC) (1998-2015) (11, 20). The dataset includes 5,188 stillbirths among 3.1 million births at an estimated rate of 1.7 stillbirths per 1000 births (11). Multiple pregnancies, congenital abnormalities, and babies missing gestational age information will be excluded. A congenital abnormality is defined as a stillbirth classified as code 0100 “Congenital Abnormality” using the Perinatal Society of Australia and New Zealand (PSANZ) Perinatal Death Classification System (21). A completed Compliance with Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD) checklist is available in supplementary materials (Supplementary Table 1).
Sample size
To ensure the development of a robust prediction model for each week gestation from 35 weeks, sample size calculations recommended by Riley et al. are provided for stillbirth as a binary outcome to (B1) estimate overall outcome proportion with precision, (B2) target a small mean absolute prediction error, (B3) target a shrinkage factor of 0.9, and (B4) target small optimism of 0.05 in the apparent R2 (22). Based on these criteria, the population derived from the NPDC is expected to be sufficient and is detailed below.
Stata 16.0 procedure pmsampsize was used for criteria B1, B3, and B4 where anticipated R2 value is 0.02 with a maximum of 25 parameters (candidate risk factors), and overall proportion of stillbirth is 0.0017 and derived from the estimated stillbirth rate of 1.7 stillbirths per 1000 births in our study population (22, 23):
psampsize, type(b) rsquared(0.02) parameters(25) prevalence(0.0017)
This indicates that at least 24,811 births are required, corresponding to 43 events (where prevalence of stillbirth is 0.0017) and an events per candidate predictor parameter of 1.69.
For criteria B2, we applied the Mean Absolute Prediction Error (MAPE) formula at a value of 0.050 for the anticipated outcome proportion (0.0017) and 25 candidate predictor parameters. This indicated a required total of 92 participants in the development dataset at a MAPE of 0.05 or 494 participants at a MAPE of 0.02.
Data source
All births with gestational age information from 35 weeks gestation in Australia (2005-2015) will be included. Data will be made available via the AIHW Maternal and Perinatal Health Unit. Further information on available data items and reporting can be found in the supplementary materials (Supplementary Table 2). The NPDC is a national population-based cross-sectional collection of data for all pregnancies and births established in 1991 (24). All births from the 6 states and 2 territories of Australia are reported as part of the NPDC and include Queensland (QLD), New South Wales (NSW), Australian Capital Territory (ACT), Victoria (VIC), South Australia (SA), Tasmania (TAS), Western Australia (WA), and Northern Territory (NT) (Table 1). Perinatal data are collected for each birth in each state and territory, usually by midwives and other birth attendants (11). The data is collated by the relevant state or territory health department and a standard de-identified extract is provided to the AIHW on an annual basis to form the NPDC (11). Stillbirths in Australia are defined by the PSANZ as fetal deaths from gestational age of at least 20 weeks or birthweight of at least 400 grams, except in Victoria and Western Australia, where births are included if gestational age is at least 20 weeks or, if gestation is unknown, birthweight is at least 400 grams (11, 21).
Table 1. All births in Australia from 35 weeks gestation, 2005-2015.
Jurisdiction
|
Total (n)
|
Stillbirths (n)
|
Livebirth (n)
|
Stillbirth rate (per 1000)
|
NSW
|
1,021,491
|
1,758
|
1,019,289
|
1.7
|
VIC
|
716,145
|
1,161
|
714,486
|
1.6
|
QLD
|
645,416
|
1,087
|
643,888
|
1.7
|
WA
|
336,532
|
517
|
335,808
|
1.5
|
SA
|
209,873
|
320
|
209,351
|
1.5
|
TAS
|
64,418
|
112
|
64,279
|
1.7
|
ACT
|
61,751
|
138
|
61,580
|
2.2
|
NT
|
40,723
|
95
|
40,592
|
2.3
|
Overall
|
3096349
|
5188
|
3089273
|
1.7
|
Model development
Established characteristics and conditions associated with an increased risk of stillbirth will be considered as candidate predictors (16, 25-27). The predictor selection process is illustrated in Figure 1. Reference group coding will be informed by literature and existing reporting recommendations. Frequencies (%) will be presented for categorical variables and for all missing data (further information on handling of missing data described below). For normally distributed continuous variables, mean and standard deviation will be reported. For continuous variables demonstrating skewed distributions, median and IQR will be reported. For all continuous variables, minimum and maximum will be presented. If clinically appropriate and statistically justifiable, independent continuous variables will be converted to groups according to published guidelines and recommendations (11, 28).
Univariable logistic regression models will be developed first for all gestations to explore individual prognostic factors for inclusion in a multivariable logistic regression model where the outcome (stillbirth) is binary and the prognostic factors are either continuous or categorical. In multivariable logistic regression models, variance inflation factor (VIF) will be performed prior to fitting the final model to identify collinearity where VIF <4 indicates low correlation, VIF between 5-10 indicates high correlation, and VIF above 10 indicates multicollinearity (29). Candidate predictors demonstrating multi- or collinearity with VIF ≥5 will be reviewed through clinical consultation to ultimately select candidate predictor for inclusion in the final model. Backward stepwise elimination in a multivariable logistic regression model will be applied to remove non-significant factors with p-values greater than 0.100 in line with Akaike’s Information Criterion (30). Finally, the risk prediction model will be applied and fully validated for each week gestation from 35 weeks (six total models: 35, 36, 37, 38, 40, and 41+ weeks).
Missing data
Missing data for predictors is most likely to result from failed reporting for all births in specific years by jurisdictions (see Supplementary Table 2 for comments on missing data). Data-years where reporting of candidature predictors may be excluded if missing data exceeds 15% for the total population (31, 32). To assess potential bias of missing data unrelated to reporting issues, a sensitivity analysis will be performed to explore ‘best’ and ‘worst’ scenarios by replacing missing values with “best” and “worst” outcomes (33). Multiple imputation will be considered for predictors with greater than 15% missing values (32, 33). No births will be excluded due to missing candidate predictor data except for those missing gestational age information.
Validation
Final gestation-specific models will be subject to temporal internal and external validation. Population characteristics and performance measures will be reported for all individual models (34). Internal validation will be performed using bootstrapping with 1000 repetitions (35). Summary stillbirth rates will be reported for the bootstrapped sample. Final models will be externally validated using data derived from study years not used for model development (36).
Model performance
The performance of development and validation datasets will be assessed via overall performance (R2), calibration, discrimination and clinical performance will be assessed through positive predictive value (PPV), and negative predictive value (NPV). A fixed false positive cut-off of 10% will be used for PPV and NPV (37).
Calibration characterizes model performance in terms of agreement between predicted (expected) risk and observed risk and be reported using a calibration plot (38). An intercept of zero and ratio of observed and expected equal to one (O/E=1) is defined as best possible calibration (39). Calibration plots will contain 95% Confidence Intervals to infer the degree of calibration between observed outcomes and predictions.
Discrimination is defined as the model’s ability to distinguish stillbirths and non-stillbirths and will be measured via calculation of the C statistic and receiver operator characteristic (ROC) curve. A ROC curve is used to assess the performance of a categorical classifier and is a plot of sensitivity (true positive rate) versus 1-specificity (false positive rate) where different points on the curve correspond to different cut-off points used to designate positive identification/classification. (40). Using the ROC curve, the performance of the predictors will be further quantified by calculating the area under the curve, or AUC. The AUC score range is 0.0-1.0, where a score of 0.5 can be equated to a ‘coin flip,’ 0.0 is perfectly inaccurate, and 1.0 is perfectly accurate (41). A non-parametric comparison of AUC will be performed using the Mann-Whitney U-statistic for individual gestational age models (26).
In addition to calibration and discrimination, PPV and NPV will be reported to characterize clinical usefulness. A decision curve analysis will be considered to characterize potential decision thresholds (42).