Study population
The present study was conducted using a sub-sample of 523 breast cancer cases and 523 matched controls from the XENAIR study (35), for whom measurements of biomarkers were available. This nested case-control study within the national E3N-Generations cohort, included 5,222 cases of invasive breast cancer and 5,222 matched controls followed from 1990 (at baseline) to 2011 (17, 20). As described in our previous studies, controls were randomly selected from women who were free of breast cancer, based on incidence density sampling and matched to controls according to age, date, menopausal status, residential area, and blood sample(35)... The flowchart of study participants selection is provided in the Supplementary Fig. 1.
The E3N-Generations prospective study, a continuing French cohort study, was established as an extension of the E3N cohort of women (Etude Epidémiologique auprès des femmes de la Mutuelle Générale de l’Education Nationale), which includes the E3N women’s children, their fathers and, in the future, their grandchildren.
The E3N-cohort Generations 1 was started in 1990 to investigate the key risk factors for cancer and chronic diseases among women (36). At recruitment (1990–1991), a total of 98,995 French women aged 40 to 65 years old, and insured with MGEN (a national health insurance scheme covering primarily teachers) were recruited. Participants completed self-administered questionnaires that collect data on socio-demographic characteristics, lifestyle, reproductive factors, anthropometry, past medical history, and familial history of cancer. The addresses of the cohort participants were collected at baseline and at each of the thirteen follow-ups questionnaires. Self-reported cases were validated through the retrieval of medical records from treating physicians, with pathological confirmation received for 93% of cases. The study was approved by the French National Commission for Data Protection and Privacy (CNIL), and informed consent was obtained from each participant.
Pollutant exposure assessment
As previously described (14), long-term exposure levels of the three pollutants (NO2, BaP and PCB153) were estimated at the subjects’ residential addresses using two models in accordance of the existence of measurement and emission data of the pollutants of interest for the study period (1990–2011).
BaP and PCB153 were estimated using is a chemistry-transport model “CHIMERE”. This model, with a spatial resolution of 0.125° × 0.0625° (around 7 × 7 km) simulates pollutant transport from local to continental scales, by utilizing data (e.g. emission, meteorological fields, and boundary conditions) as inputs and runs a set of equations reflecting the physicochemical steps associated with the evolution of concentrations (37). CHIMERE takes into account main particles that are directly emitted and whether they are anthropogenic or natural, and models the concentrations levels of each particle with aerodynamic diameters varying from a few nanometers to 10 µm (37). NO2 levels were evaluated using a land use regression (LUR, 50 × 50 m) model, a widely used approach to model and to predict spatial variations in air pollution concentrations (38, 39). The model employs proximity measures like circular buffers of different sizes, to capture geographical features that explain variability in monitored concentrations at specific locations (i.e. monitoring sites or addresses) (40, 41). In the present study, a LUR model (50 × 50 m) was developed using the average annual NO2 data for the period of 2010 to 2012 (14). This “baseline” model further incorporated inputs from COPERNIC (a chemical transport model providing NO2 background concentrations across France) and localised variables related to road traffic and land use, available throughout the country(41, 42). The model underwent validation through comparisons with measurements across France using a hold-out validation approach with independent monitoring sites. The LUR model was retrospectively extrapolated to 1990 using annual local trends derived from the CHIMERE model (43).
For each woman, annual mean concentration of NO2, BaP and PCB153 were evaluated at their geocoded residential addresses for each year from 1990 to 2011. The average of these annual mean concentration for each pollutant were then calculated for each woman from the year they entered into the cohort until their index date (which corresponds to the date of breast cancer diagnosis for cases and date of selection for controls).
Metabolic health biomarker assays
The biomarkers investigated in this study were chosen based on their previously established individual associations with breast cancer risk and air pollutants (21, 22, 24, 27, 29, 30, 44). These included pre-diagnostic circulating levels of albumin (g/L), c-reactive protein (CRP) (mg/L), triglycerides (mmol/L), cholesterol (mmol/L), high-density lipoproteins cholesterol (HDL) (mmol/L), low-density lipoproteins cholesterol (LDL) (mmol/L), parathormone (PTH) (pg/mL), thyroid-stimulating hormone (mlU/L), prolactin (mlU/L), estradiol (pmol/L), testosterone (nmol/L), SHBG (nmol/L) and progesterone (nmol/L).
Albumin and CRP were quantified by bromocresol green (BCG) analysis and immunoturbidimetric-high sensitivity analysis, respectively, using a Hitachi 911 analyzer (Roche Diagnostics, US) (45). Using a modular analyzer (Roche Diagnostics, US), triglycerides, cholesterol, HDL, and LDL were quantified employing enzyme immune-inhibition analysis (45). PTH, thyroid-stimulating hormone, prolactin, estradiol, testosterone, SHBG and progesterone were quantified by electrochemiluminescence immunoassay (ECLIA) method using the Elecsys analyzer (Roche Diagnostics, US) (45).
Statistical analysis
The main characteristics of the population and biomarker levels were described distinctly for cases and controls, using means, standard deviations (SDs), percentiles, minimum and maximum values for continuous variables, and counts and percentages for qualitative variables. Pearson correlation analyses were performed to check correlations between biomarkers. The linearity of the pollutant-cancer and mediator-cancer associations was verified using restricted cubic splines with four degrees of freedom (46). Conditional logistic regressions were employed to calculate odds ratios (ORs) and their corresponding 95% confidence intervals (CIs) for the associations between exposure to each pollutant and the risk of breast cancer. We modelled the pollutants as continuous variables (one SD increase) and as categorical variables (quartiles). Linear regression analyses were used to estimate the associations between each pollutant level and each biomarker of metabolic health with adjustments for confounders. The effect of each biomarker on breast cancer (per one SD increase) was estimated using conditional logistic regression analyses.
A four-way decomposition mediation analysis was fitted to assess whether the associations between atmospheric pollutants and breast cancer risk were mediated by selected biomarkers (34). Data on n individuals were observed as independent and identically distributed (C, X, M, Y), with Y being the binary outcome of interest, X the exposure, M a continuous mediator variable measured after X but before Y, and C representing pre-exposure confounders of the effects of (X, M) on Y. (Fig. 1). The four-way decomposition analysis assumes that after adjusting for the potential confounders, there is no unobserved confounding that affects the relationship between exposure and outcome, and between exposure and mediator, and there are no confounders of the mediator-outcome relationship that may be affected by the exposure (post-exposure confounders) (47). This approach allows to determine the controlled direct effect (CDE), the reference interaction effect (INTref), the mediated interaction effect (INTmed) and the Pure Indirect Effect (PIE) (Fig. 1), assuming the following regression models:
$$\:\begin{array}{c}logit\left\{Pr\left(Y=1|X=x,M=m,C=c\right)\right\}={\theta\:}_{0}+{\theta\:}_{1}x+{\theta\:}_{2}m+{\theta\:}_{3}xm+{\theta\:}_{4}^{{\prime\:}}c\#\left(1\right)\end{array}$$
And
$$\:\begin{array}{c}E\left[M|X=x,C=c\right]={\beta\:}_{0}+{\beta\:}_{1x}+{\beta\:}_{2}^{{\prime\:}}c\#\left(2\right)\end{array}$$
VanderWeele and Vansteelandt derived expressions for the CDE and the PIE all on the risk ratio scale. The total effect (TE), CDE, and PIE were given by:
$$\:\begin{array}{c}R{R}_{c}^{TE}=exp\left[{\theta\:}_{1}+{\theta\:}_{2}{\beta\:}_{1}+{\theta\:}_{3}\left({\beta\:}_{0}+{\beta\:}_{1}{x}^{*}+{\beta\:}_{1}x+{\beta\:}_{2}^{{\prime\:}}c+{\theta\:}_{2}{\sigma\:}^{2}\right)\left(x-{x}^{*}\right)+\frac{1}{2}{\theta\:}_{3}^{2}{\sigma\:}^{2}\left({x}^{2}-{x}^{*2}\right)\right]\#\left(3\right)\end{array}$$
The control direct effect is given by:
$$\:\begin{array}{c}R{R}_{c}^{CDE}\left({m}^{*}\right)=exp\left[\left({{\theta\:}}_{1}+{{\theta\:}}_{3}{m}^{*}\right)\left(x-{x}^{*}\right)\right]\#\left(4\right)\end{array}$$
The reference interaction is given by:
$$\:\begin{array}{c}R{R}_{c}^{IN{T}_{ref}}\left({m}^{*}\right)=\int\:\left\{\frac{E\left[x,m,c\right]}{E\left[{x}^{*},{m}^{*},c\right]}-\frac{E\left[{x}^{*},m,c\right]}{E\left[{x}^{*},{m}^{*},c\right]}-\frac{E\left[x,{m}^{*},c\right]}{E\left[{x}^{*},{m}^{*},c\right]}+1\right\}dP\left(m|{x}^{*}c\right)\#\left(5\right)\end{array}$$
The mediated interaction is given by:
$$\:\begin{array}{c}R{R}_{c}^{IN{T}_{med}}=\int\:\left\{\frac{E\left[x,m,c\right]}{E\left[{x}^{*},{m}^{*},c\right]}-\frac{E\left[{x}^{*},m,c\right]}{E\left[{x}^{*},{m}^{*},c\right]}\right\}\left\{dP\left(x,c\right)-dP\left({x}^{*},c\right)\right\}\#\left(6\right)\end{array}$$
The pure indirect effect is given by:
$$\:\begin{array}{c}R{R}_{c}^{PIE}=exp\left[\left({{\theta\:}}_{2}{{\beta\:}}_{1}+{{\theta\:}}_{3}{{\beta\:}}_{1}{x}^{*}\right)\left(x-{x}^{*}\right)\right]\#\left(7\right)\end{array}$$
In this study, the CDE corresponds to the effect of the pollutant on breast cancer risk without mediation by the biomarker and without interaction between the pollutant and the biomarker. The INTmed corresponds to the effect of the pollutant on the breast cancer risk due to both the mediation of the biomarker and the interaction between the pollutant and the biomarker. The INTref corresponds to the effect of the pollutant on the breast cancer risk due solely to the interaction between the pollutant and the biomarker. The PIE corresponds to the effect of the pollutant on the breast cancer risk due solely to the mediation by the biomarker.
The sum of these four effects (i.e. CDE, INTref, INTmed, PIE) equals the total effect (TE) of the pollutant on breast cancer risk. The proportion of each of the four effects is calculated relative to the TE, thus, their sum equals 1. Of note, in some situations, negative proportions and proportions exceeding 100% may be observed. A negative proportion indicates that the indirect effect is opposite to the TE. In this case, the proportions of other effects may exceed 100%. This scenario typically arises when the associations between exposure and biomarker, and between biomarker and outcome are in opposite directions. Mediation analyses were conducted for biomarkers that have previously been demonstrated to have significant associations with breast cancer. Mediation analysis considered causal effects for changes in pollutant levels from the 25th to the 75th percentile and each mediator fixed at its median level.
All multivariable models were adjusted for confounding factors identified by a direct acyclic graph (Supplementary Fig. 2), including body mass index, menopausal hormone replacement therapy use, urban/rural status at birth, urban/rural status at inclusion, alcohol drinking, breastfeeding, mammography before inclusion, oral contraceptive use, age at full-term pregnancy and parity, smoking status, total physical activity.
Analyses were conducted using R software version 4.2.3. Mediation analyses were conducted using STATA 14.