Data
We obtained birth records for ZIP code catchment areas within four states across Central Appalachia, provided by departments of health in Kentucky (KY), Tennessee (TN), Virginia (VA), and West Virginia (WV) (Figure 1). This dataset is described in detail in (16); briefly, a total of 409,394 birth records were obtained from departments of health and geocoded to street address of reported maternal address. Due to missingness in street addresses with many records reporting only mailing addresses (rural route or P.O. box), particularly for earlier years of the dataset, this resulted in a final dataset comprising 194,084 births between 1990 and 2015 with street-level data (16). Yearly surface mining land cover areas were obtained from Marston and Kolivras (17) for the years 1986 through 2015 using 30 m resolution Landsat remote sensing imagery. This dataset uses remote sensing technology and satellite imagery to quantify changes in the extent of surface mining areas over the corresponding years. Specifically, for each year, barren land on which vegetation had been removed was identified, and pixels that were tied to other types of land disturbance, such as clear cutting of timber, were excluded. Remaining barren pixels lying within areas permitted for mining within the USGS-defined Appalachian coalfields were designated as places where active mining was likely taking place. Active surface mining was delineated from pre- and post-mining areas through classification of vegetation, where actively mined areas tend to be barren while post-mining areas tend to show some revegetation through reclamation efforts. Validation with randomly placed points on 1 m resolution aerial photography found an overall accuracy of 88%. Lastly, mined areas <40 acres in size were removed from the analysis, as the Office of Surface Mining Reclamation and Enforcement reported that economically viable mines are generally at least 40 acres in size (18).
Figure 1. Study area (red) and surface mining coal production (thousand tons) at the county level for 1993 (top) and 2015 (bottom). Coal production data for 2015 and 1993 were obtained from (19) and (20)
Airsheds for each of the active surface mine in each year of analysis were estimated using the HYbrid Single-Particle Lagrangian Integrated Trajectory (HYSPLIT4) atmospheric trajectory model (21), which uses meteorological data to compute atmospheric trajectories, particle dispersion, and air concentrations. Detailed methods using HYSPLIT4 to characterize airsheds of surface mines are described in further detail in McKnight et al. (22). Briefly, we modeled individual airsheds of each active surface mine in the study area. The results of this process were raster data sets comprised of frequency values relating to air movement from surface mines per gridded cell. To quantify each birth record’s total exposure from all mines, we summed the frequency values of all airsheds extracted at each birth record’s location for the cumulative amount of air from surface mines experienced at the maternal residential address. Lastly, watersheds were classified using 10-digit hydrologic unit codes (HUC10) within the United States Geological Survey’s Watershed Boundary Dataset, which represent the areal extent of surface water drainage using an aggregate collection of hydrologic unit data (23) and amount of active surface mining within watersheds for each year of the study period was calculated.
Analysis
We employed mediation analyses to explore the potential exposure pathways that could explain associations between living in close proximity to active surface mining activities and increased odds of preterm birth and low birthweight described in (16), accounting for pre-mining differences and additional individual-level covariates available on birth records. Mediation analyses are helpful in exploring the underlying mechanisms underpinning a known relationship between an exposure and outcome (24). Generally, mediation is suggested when four criteria are met, as outlined in Table 1 (25).
Table 1
Mediation criteria, adapted from (25)
Criteria Assessed
|
Statistical Test(s) Used
|
1
|
Independent variable significantly influences the mediator
|
Unadjusted regression model
|
Sobel test
|
2
|
Independent variable significantly influences the dependent variable (in absence of the mediator)
|
Adjusted regression model
|
3
|
Mediator significantly and uniquely influences the dependent variable
|
4
|
The effect of the independent variable on the dependent variable shrinks with addition of the mediator
|
|
Our independent variable of interest was the percent of land within a 5km radius of the maternal residential address that was designated as an active surface mine during the majority year of gestation. Our dependent variables of interest included 1) preterm birth (PTB), defined as birth before 37 weeks of gestation; 2) low birthweight (LBW), defined as birthweight less than 2,500 grams; and 3) term low birthweight (TLBW), defined as birth occurring at ≥ 37 weeks gestation and birthweight less than 2,500 grams. Using these criteria, we tested whether mediation between proximity to active surface mining activities during the year containing the majority of the pregnancy (majority gestation year) and these adverse birth outcomes occurred via the airshed pathway or the watershed pathway. We quantified these potential mediators (Figure 2), respectively, as: 1) the cumulative potential exposure to air pollutants via the airshed experienced at the maternal residential address during the majority gestation year, and 2) the percent of land experiencing active surface mining within the watershed of residency during the majority gestation year. The cumulative potential exposure to airborne pollutants via the airshed is outlined in further detail within Mcknight et al. (22); briefly, these values represent the cumulative frequency of air originating from active surface mines, as modeled via the HYSPLIT4 atmospheric trajectory model. Residential addresses associated with higher values of this airshed model have higher potential exposure to surface mining air pollution, including fine particulate matter.
Figure 2. Directed acyclic graph outlining potential mediators of the association between surface mining and adverse birth outcomes.
We firstly employed a Sobel test using the bda package in R software to quantify whether mediators significantly influenced the relationship between the independent and dependent variables (26, 27). We further performed a generalized linear model regression analysis using the stats package within the base R software. Covariates within our model included categorical variables for maternal age(‘18 - 35 years’,‘<18 years’,‘> 35 years’), highest education attained by the mother at the child’s birth (‘8th grade or less’, ‘9th - 12th grade (includes high school graduates)’, ‘Post high-school education (with or without degree)’), race (‘White’, ’Black’, ’Other’), ethnicity (‘Hispanic’, ‘Not Hispanic’), self-reported tobacco use during pregnancy (‘Yes’,’No’), sex of the child (‘Male’, ‘Female’), payment type for birth medical services (‘Medicaid’, ‘Private Insurance’, ‘Self-Pay’, ‘Other’), state (‘Kentucky’, ‘Tennessee’, ‘Virginia’, and ‘West Virginia’), and continuous percent of land within 5 km of maternal residence that was not actively experiencing mining activities during the majority gestation year, but would be subsequently mined in later years (referred to as ‘pre-mining’ activities), to account for any temporal baseline difference prior to active mining. Because mining activities tend to show spatial autocorrelation (e.g., active mining tends to move progressively across the landscape and is therefore closely correlated with pre-mining landcover), we further included an interaction term between the amount of pre-mined land and surface mining land within a 5 km buffer of maternal residence. Lastly, to allow for non-linear temporal trends observed within the data, we included a spline with 4 degrees of freedom in the year covariate using the splines package within R software. This model builds on the model detailed in (16), which can generally be specified as:
where logit(
yi) represents the odds of an adverse outcome (PTB, LBW, and TLBW) occurring for a given woman’s birth,
i; Mi represents the % of pre-mining land area within a 5km buffer of the maternal residence of individual
i; Ti represents the % of surface mining area within a 5km buffer of the maternal residence of individual
represents the interaction effect of having land within 5 km of the maternal residence of individual,
i, that is both likely actively being mined (active mining) and will subsequently be mined (pre-mining);
bs(t) represents a non-linear spline with 4 degrees of freedom to fit temporal trends in the data; and
represents the suite of fixed covariates
K as outlined above (
16).
To explore hypothesized mediation pathways, we tested three model types which iteratively built on each other (Figure 3) (27). Our base adjusted model (Model 0) included the suite of fixed effects described above, including the independent variable of interest (e.g., % surface mining activities in a 5 km buffer). Model 1 included the same suite of variables as Model 0, with the addition of the airshed mediator variable, while Model 2 further included both the airshed and watershed mediators. We report model performance metrics, including McFadden’s pseudo-R2 statistic, Akaike Information Criterion (AIC), and mean adjusted error (MAE) and root-mean squared error (RMSE) measures. AIC represents a measure of model predictive power as a trade-off with model complexity, with lower values generally representing models with better fit (28). MAE and RMSE represent model precision and bias, with values closer to zero representing better model fit, while pseudo-R2 values represent variance explained by the model, with values between 0.2 – 0.4 representing reasonable model fit across a range of applications (29–32). Lastly, we converted regression results to probabilities using the following equation:
Figure 3. Model diagrams outlining the iterative model building process. Iterative models were built for each dependent variable (e.g., PTB, LBW, and TLBW).